Thursday, 1 December 2011

why I avoid QRegExp in Qt 4 and so should you

A few times, when I've been working on something performance-critical, I've had people suggest (or ask me to review code) using QRegExp. I usually tell them that "this is a bad idea, QRegExp is slow/unmaintained", but I never actually sat down to do benchmarks. Well, in Qt 5, the subject of replacing QRegExp has been discussed a bit, and we have a volunteer: Giuseppe D'Angelo, a Qt hacker and italian student living in the UK (looking for work, by the way!).

One of the first steps he has taken has been to quantify the performance of two of our leading candidates for replacements: PCRE, and ICU's regex engine. He also took the liberty of benchmarking Qt's existing QRegExp - here's the results:

(Caveat: Giuseppe has asked for someone more familiar with ICU to look over the code there to make sure the results aren't negatively impacted.)

RESULT : REBenchmark::PCREBenchmark():"URI":
     1,123 msecs per iteration (total: 1,123, iterations: 1)
RESULT : REBenchmark::PCREBenchmark():"Email":
     1,798 msecs per iteration (total: 1,798, iterations: 1)
RESULT : REBenchmark::PCREBenchmark():"Date":
     99 msecs per iteration (total: 99, iterations: 1)
RESULT : REBenchmark::PCREBenchmark():"URI|Email":
     2,650 msecs per iteration (total: 2,650, iterations: 1)

RESULT : REBenchmark::ICUBenchmark():"URI":
     11,674 msecs per iteration (total: 11,674, iterations: 1)
RESULT : REBenchmark::ICUBenchmark():"Email":
     17,056 msecs per iteration (total: 17,056, iterations: 1)
RESULT : REBenchmark::ICUBenchmark():"Date":
     392 msecs per iteration (total: 392, iterations: 1)
RESULT : REBenchmark::ICUBenchmark():"URI|Email":
     30,552 msecs per iteration (total: 30,552, iterations: 1)

RESULT : REBenchmark::QRegExpBenchmark():"URI":
     21,579 msecs per iteration (total: 21,579, iterations: 1)
RESULT : REBenchmark::QRegExpBenchmark():"Email":
     55,426 msecs per iteration (total: 55,426, iterations: 1)
RESULT : REBenchmark::QRegExpBenchmark():"Date":
     1,357 msecs per iteration (total: 1,357, iterations: 1)
RESULT : REBenchmark::QRegExpBenchmark():"URI|Email":
     77,224 msecs per iteration (total: 77,224, iterations: 1)

(These tests were run on: Ubuntu 10.04 LTS 32 bit, Core(TM)2 Duo CPU P9600  @ 2.66GHz, 4GB RAM, GCC 4.4.3, Qt 4.7.4, PCRE 8.20, ICU

As we see from these results, PCRE is the clear winner, probably thanks to its use of JIT in evaluating the expressions. We see ICU showing decent results, and getting second place (see the caveat above). Leading the rear of the pack, we see QRegExp, 10x-30x slower than PCRE - and these are hardly complicated regular expressions.

In conclusion, for now, if you want performance, you should probably steer clear of QRegExp, and find other ways to do what you want, either using QString directly if it's a simple job, or looking into a light shim around another regex engine until Qt 5, when hopefully, your problems will go away with a shiny new class.