Sunday, 22 January 2012

QFileSystemWatcher internals in Qt 5

Just thought I'd share some details on some of the recent changes I've pushed to Qt 5 a few weeks ago. (Yes, this post is rather overdue, I've been a bit slack with writing it). If you were in Tampere when I gave a short, completely underprepared Q&A on Qt 5 a few days ago, this won't be news to you, but I will go into a bit more detail.

tl;dr, all in all, a lot of code was deleted, and things still function more or less the same, except a bit better. That's quite a common story for Qt 5, I hope... :)

First of all, platform support: as with Qt 5 itself, Symbian support is no longer a goal. Since I wanted to make some changes to internals, and wasn't able to even remotely come close to building the Symbian code, it was removed.

On Linux, the (ancient, and no longer used by default) dnotify backend also met its maker. Since inotify has been around for some 6-7 years, it was about time, especially as the dnotify backend had some interesting bugs in behaviour.

The OS X FSEvents backend (also unused for quite some time, due to bugs, and not being a recommended way of working apparently) joined to make for a trinity of dead implementations. OS X's watching is survived by kqueue, which it shares with BSD platforms.

The currently supported backends are:
  • inotify (on Linux)
  • kqueue (on BSD and OS X)
  • WaitForMultipleObjects on Windows, which I need to become more familiar with. Not having a Windows machine has meant that I'm not really able to do much here...
Aside from backend support, there were some more 'fun' changes which went in. First, some detail on implementation. Each QFileSystemWatcher has an 'engine' associated with it, which is backend-specific, and does the actual monitoring. The backend is responsible for communicating changes to the 'frontend' QFileSystemWatcher, which then sends the notifications to the API user.

In the past, QFileSystemWatcher engines used to be run in a thread. I'm not sure why this was done originally, but it pretty much never made much sense - monitoring file changes is not a particularly intensive operation, so this is just a waste of resources (thread stack, time to start the thread, etc) - which was compounded by this being a thread per engine, meaning that if you have a few different libraries monitoring files, they'd each start their own thread.

Another nasty side effect of this thread was resource consumption caused by monitoring. If you monitored a large number of paths, but couldn't consume events faster than the OS was throwing them at you, then that engine thread would happily sit there and keep on reading them and turning them into Qt signals for the QFileSystemWatcher/user code. But because that code was on a different thread, and unable to keep up, you'd just keep getting more, and more, and more signals, and memory usage would keep growing and growing.

This thread has now been removed, so changes are implicitly rate-limited to the thread the QFileSystemWatcher lives in, meaning that all of these are no longer a problem. Kudos should also go to Bradley Hughes for fixing a few issues which I missed on platforms other than Linux after it was integrated.

Brad also took this work a step further: QFileSystemWatcher has never been documented as being thread-safe, but the engines may have happened to be more or less thread-safe thanks to living on a different thread to the QFileSystemWatcher, through mutexing. One part inside Qt itself actually needed this for autotests to function correctly, too: QFileSystemModel. He fixed this requirement, and was thus able to remove the mutexes from the engines. Thanks!

I'd also like to thank Brad, João Abecasis, and anyone I've forgotten for helping to review these changes and get them integrated.

(One thing I neglected to mention above - the thread story is a little more complicated on Windows. Windows still has threads inside the engine (although the engine itself is no longer a thread, so there's still one less). This is necessary because WaitForMultipleObjects can only process up to MAXIMUM_WAIT_OBJECT handles at a time, unless you use multiple threads to do the monitoring, so that's exactly what it does. It spawns multiple threads on-demand as soon as it can't find a thread with a spare slot. But this is nothing new.)

Thursday, 1 December 2011

why I avoid QRegExp in Qt 4 and so should you

A few times, when I've been working on something performance-critical, I've had people suggest (or ask me to review code) using QRegExp. I usually tell them that "this is a bad idea, QRegExp is slow/unmaintained", but I never actually sat down to do benchmarks. Well, in Qt 5, the subject of replacing QRegExp has been discussed a bit, and we have a volunteer: Giuseppe D'Angelo, a Qt hacker and italian student living in the UK (looking for work, by the way!).

One of the first steps he has taken has been to quantify the performance of two of our leading candidates for replacements: PCRE, and ICU's regex engine. He also took the liberty of benchmarking Qt's existing QRegExp - here's the results:

(Caveat: Giuseppe has asked for someone more familiar with ICU to look over the code there to make sure the results aren't negatively impacted.)


RESULT : REBenchmark::PCREBenchmark():"URI":
     1,123 msecs per iteration (total: 1,123, iterations: 1)
RESULT : REBenchmark::PCREBenchmark():"Email":
     1,798 msecs per iteration (total: 1,798, iterations: 1)
RESULT : REBenchmark::PCREBenchmark():"Date":
     99 msecs per iteration (total: 99, iterations: 1)
RESULT : REBenchmark::PCREBenchmark():"URI|Email":
     2,650 msecs per iteration (total: 2,650, iterations: 1)

RESULT : REBenchmark::ICUBenchmark():"URI":
     11,674 msecs per iteration (total: 11,674, iterations: 1)
RESULT : REBenchmark::ICUBenchmark():"Email":
     17,056 msecs per iteration (total: 17,056, iterations: 1)
RESULT : REBenchmark::ICUBenchmark():"Date":
     392 msecs per iteration (total: 392, iterations: 1)
RESULT : REBenchmark::ICUBenchmark():"URI|Email":
     30,552 msecs per iteration (total: 30,552, iterations: 1)

RESULT : REBenchmark::QRegExpBenchmark():"URI":
     21,579 msecs per iteration (total: 21,579, iterations: 1)
RESULT : REBenchmark::QRegExpBenchmark():"Email":
     55,426 msecs per iteration (total: 55,426, iterations: 1)
RESULT : REBenchmark::QRegExpBenchmark():"Date":
     1,357 msecs per iteration (total: 1,357, iterations: 1)
RESULT : REBenchmark::QRegExpBenchmark():"URI|Email":
     77,224 msecs per iteration (total: 77,224, iterations: 1)

(These tests were run on: Ubuntu 10.04 LTS 32 bit, Core(TM)2 Duo CPU P9600  @ 2.66GHz, 4GB RAM, GCC 4.4.3, Qt 4.7.4, PCRE 8.20, ICU 4.8.1.1)

As we see from these results, PCRE is the clear winner, probably thanks to its use of JIT in evaluating the expressions. We see ICU showing decent results, and getting second place (see the caveat above). Leading the rear of the pack, we see QRegExp, 10x-30x slower than PCRE - and these are hardly complicated regular expressions.

In conclusion, for now, if you want performance, you should probably steer clear of QRegExp, and find other ways to do what you want, either using QString directly if it's a simple job, or looking into a light shim around another regex engine until Qt 5, when hopefully, your problems will go away with a shiny new class.

Wednesday, 23 November 2011

Avoiding graphics flicker in Qt / QML

It's very common when writing QML applications to write a small stub, something like the following:


int main(int argc, char **argv)
{
    QApplication application(argc, argv);
    QDeclarativeView view;
    view.setSource(QUrl("qrc:/qml/main.qml"));
    view.showFullScreen();
    return a.exec();
}

What's wrong with this? It's a very subtle problem. I'll give you a moment to think about it, and a video to see if you notice the problem. Make sure you don't cheat.

(demonstrating removal of flicker in QML)

Back already? Have you figured it out? That's right, it flickers. Horrifically.

So what causes this? By default, QWidgets are drawn parent first, with parents drawing children. When a widget is drawn, first, it draws its background, then it draws the actual content. That background proves to be a problem, in this case.

If we add the following lines to the above example, the flicker goes away, and my eyes no longer want to bleed:
    view.setAttribute(Qt::WA_OpaquePaintEvent);
    view.setAttribute(Qt::WA_NoSystemBackground);
    view.viewport()->setAttribute(Qt::WA_OpaquePaintEvent);
    view.viewport()->setAttribute(Qt::WA_NoSystemBackground);

NB: I'm not completely sure that adding it to both the view, and the viewport is completely necessary, but it can't harm at least. Make sure to re-set it if you change viewports.

For completeness, here's the full, fixed example:

int main(int argc, char **argv)
{
    QApplication application(argc, argv);
    QDeclarativeView view;
    view.setSource(QUrl("qrc:/qml/main.qml"));
    view.setAttribute(Qt::WA_OpaquePaintEvent);

    view.setAttribute(Qt::WA_NoSystemBackground);
    view.viewport()->setAttribute(Qt::WA_OpaquePaintEvent);
    view.viewport()->setAttribute(Qt::WA_NoSystemBackground);

    view.showFullScreen();
    return a.exec();
}

(If you're curious, Qt::WA_OpaquePaintEvent basically implies that you'll repaint everything as necessary yourself (which QML is well behaved with), and Qt::WA_NoSystemBackground tells Qt to nicely not paint the background.)

NB: on Harmattan (and Nemo Mobile) at least, make sure you always use QWidget::showFullScreen(). The compositor in use there unredirects fullscreen windows (meaning no compositor in the way), so you get faster drawing performance, and every frame counts.

(obligatory thanks to Daniel Stone of X and Collabora fame, for telling me to stop blaming X, and start blaming the crappy toolkits ☺)

Fast UI with Qt 4 on mobile

For device manufacturers, and those targeting device manufacturers like us in the Mer¹ and Nemo Mobile².communities, we need a performant base, and Qt's default configuration on Linux is ..not really that performant. It uses what is known as the 'native' graphics system, which uses X (and XRender) to do a lot of the grunt work. Unfortunately, XRender isn't exactly what you'd call speedy in many cases, and making loads of round trips to ask X to draw things probably doesn't help either.

There's another option in the 'raster' graphics system, which does all rendering client-side in your application using (as you'd expect) Qt's software rasterizer, which is adequate enough for performance on most desktops, but still not quite optimal on desktops: hardware acceleration is the missing goodie.

Qt as of 4.8 also includes what's known as the MeeGo graphics system - as the name implies, it's used on the Nokia N9. It uses hardware acceleration (plus some additional EGL extensions) to perform absolute magic and make your pixels (especially QML :)) fly, so if you're working on getting a device together using Qt 4, I'd highly recommend you look at it.

Because no blog post is complete without a video, here's one:
(comparing the performance of raster and MeeGo graphics systems on a Lenovo S10-3t)

This sounds great, but there's one caveat. If you're on certain types of graphics hardware (SGX in particular), then you'll probably not to just want to enable the MeeGo graphics system for everything, because you'll end up with a lot of GL contexts allocated, which is not good for two reasons; they're scarce resources, and they take up a large chunk of RAM (something in the order of 5-10mb, depending on your PVR configuration file. I never actually looked).

There is good news at hand, though! Qt has *another* graphics system, specifically designed to proxy everything through another system, and allow for runtime switching back to raster. In our case, obviously, we want it to use MeeGo's graphics system by default, but fall back to raster to avoid taking extra resources when not needed.

There's also some bad news: it didn't work out of the box with Qt 4.8 RC1.

But some more good news: I fixed it (and yes, I upstreamed the patches)!
http://qt.gitorious.org/qt/qt/commit/a7c77bd46ef85bae624e829cb2a02110ec60b318
and
http://qt.gitorious.org/qt/qt/commit/0ceab866c76e0d9eb17bc1f3d42af06c0033560b
are the two commits you'll want.

After that, configure Qt with -graphicssystem runtime, and -runtimegraphicssystem meego (or set QT_DEFAULT_RUNTIME_SYSTEM=meego), and it'll work beautifully, provided your system has the required GL extensions.

Alternatively, you can use Mer, which already includes these patches right now, and, as of this week, has this beautiful magic enabled by default on the Nokia N900/N950/N9 hardware ports, where this is really needed (and works well). I'm working on getting it enabled by default on other systems that can support it, too, like the Lenovo S10-3t, as time permits.

Thursday, 10 November 2011

debugging connmand

just so it's out there on the intarwebs, since it cost me a few minutes of head scratching: when you're trying to debug connmand, -d isn't enough. You need to use -n (nofork) too, otherwise you'll be left with a console, wondering why you aren't seeing the magics happening.