Friday, 12 September 2014

profiling is not understanding

When software goes slow, generally, the first reaction is to profile. This might be done through system tools (like Instruments on OS X, perf/valgrind/etc on Linux, VTune, etc). This is fine and good, but just because you have the output of a tool does not necessarily correlate to understanding what is going on.

This might seem like an obvious distinction, but all too often, efforts at improving performance focus on the small picture ("this thing here is slow") and not the bigger picture ("why is this so slow"). At Jolla, I had the pleasure of running into one such instance of this, together with Gunnar Sletta, my esteemed colleague, and friend.

As those of you who are familiar with Jolla may know, we had been working on upgrading to a newer Qt release. This also involved quite a bit of work for us, both in properly upstreaming work we had done on the hurry to the late-2013 release, and in isolating problems and fixing them properly in newer code (the new scenegraph renderer, and the v4 javascript engine in particular have been an interesting ride to get both at once!).

As a part of this work, we noted that touch handling was quite slow (something which we had worked around for our initial release, but now wanted to solve properly). This was due to the touch driver on the Jolla introducing touchpoints faster than the display was updating, that is, while the display might be updating at 57 hz (yes, the Jolla is weird, it doesn't do 60 hz) - we might be getting input events a lot more frequently than that.

This was, in turn, causing QtQuick to run touch processing (involving costly item traversals, as well as the actual processing of touch handling) a lot more frequently than the display was updating. As these took so much time, this in turn slowed rendering down, meaning even more touch handling was going on per frame. A really ugly situation.

Figure 1: Event tracing inside the Sailfish OS Compositor
Figure 1 demonstrates this happening at the compositor level. The bottom slice (titled "QThread") is the event delivery thread, responsible for reading events from evdev The peaks there are - naturally - when events are being read in. The top thread is the GUI thread, and the high peaks there are touch events being processed and delivered to the right QtQuick item (in this case, a Wayland client, we'll get to that later). The middle slice is the compositor's scenegraph rendering (using QtQuick).

With the explanation out of the way, let's look at the details a bit more. It's obvious that the event thread is regularly delivering events at around-but-not-quite twice the display update. Our frame preparation on the GUI thread looks good, despite the too-frequent occurrence of event delivery, though, and the render thread is coping too.

But this isn't a major surprise - the compositor in this case is dead simple (just showing a fullscreen client). What about the client? Let's take a look at it over the same timeframe...

Figure 2: Event tracing for the client (Silica's component gallery, in this case)
Figure 2 focuses on two threads in the client: the render thread (top), and the GUI thread (bottom). Touch events are delivered on the GUI thread, QtQuick processes them there while preparing the next frame for the render thread.

Here, it's very clear that touch processing is happening way too often, and worse than that, it's taking a very long time (each touch event's processing is taking ~4ms), not leaving much time for rendering - and this was on a completely unloaded device. In a more complicated client still, this impact would be much, much worse, leading to frame skipping (which we saw, on some other applications).

Going back to my original introduction here, if we had used traditional profiling techniques, we'd have seen that touch handling/preparation to render was taking a really long time. And we might have focused on optimizing that. Instead, thanks to some out-of-the-box thinking, we looked at the overall structure of application flow, and were able to see the real problem: doing extra work that wasn't necessary.

As an aside to this, I'm happy to announce that we worked out a neat solution to this: QtQuick now doesn't immediately process touch events, instead, choosing to wait until it is about to prepare the next frame for display - as well as "compressing" them to only deal with the minimal number of sensible touch updates per frame. This should have no real impact on any hardware where touch delivery was occurring at a sensible rate, but for any hardware where touch was previously delivering too fast, this will no longer be a problem as of Qt 5.4.

(Thanks to Gunnar & myself for the fix, Carsten & Mikko for opening my eyes about performance tooling, and Jolla for sponsoring this work.

P.S. If you're looking for performance experts, Qt/QML/etc expertise or all round awesome, Gunnar and myself are currently interested in hearing from you.)

Wednesday, 13 August 2014

sailing in search of fresh waters

I've had a long, quiet time on this blog over the past few years while I've been frantically helping Jolla to launch their self-named product: the Jolla. I've enjoyed (almost) every day I've been there: they really are a great bunch of people and the work has been plentiful and challenging.

But as the saying goes, "this too shall pass". Nothing lasts forever, and it's time for a change: after this week, I will be taking a break from Jolla to get some fresh perspective.

On the bright side, maybe I'll have some more time for writing now :)

If anyone is interested in getting a hold of a C++/Qt/QML/Linux expert with a focus on performance, expertise on mobile, and a wide range of knowledge across other areas who loves open source, please let me know.

Thursday, 24 October 2013

Every time you use CONFIG+=ordered, a kitten dies.

QMake users: public service announcement. If you use CONFIG+=ordered, please stop right now. If you don't, I'll hunt you down. I promise to god I will.

There is simply no reason to use this, ever. There's two reasons this might be in your project file:
  1. you have no idea what you are doing, and you copied it from somewhere else
  2. you have a target that needs to be built after another target, and you don't know any better
If you fit into category 1, then I hope you're turning red right now, because by using CONFIG+=ordered, you're effectively screwing over multicore builds of your code. See a very nice case of this here.

If you fit into category 2, then you're doing it wrong. You should specify dependencies between your targets properly like this:

TEMPLATE = subdirs
SUBDIRS = src plugins tests docs
plugins.depends = src
tests.depends = src plugins

And then you'll have docs built whenever the build tool feels like it, and the rest built when their dependencies are built.

If you have subdirectories involved in this, then you need an extra level of indirection in your project, but it's still not rocket science:

TEMPLATE = subdirs
src_lib.subdir = src/lib
src_lib.target = sub-src-lib

src_plugins.subdir = src/plugins
src_plugins.target = sub-plugins
src_plugins.depends = sub-src-lib

SUBDIRS = src_lib src_plugins

For those of you wondering why I sound frustrated about this, I've fixed so many instances of this by now that it's just getting old and tired, frankly. And I still keep running into more. That's countless minutes of wasted build time, all because of laziness boiling down to a single line. Please fix it.

Wednesday, 18 July 2012

Qt 5 and Android

Astute observers of the Qt 5 repositories may have noticed that for quite a while, patches have been trickling in from me allowing Qt 5 to compile on Android.

The goal in mind was to allow use of Qt on Android primarily in order to work at the system level (not using the regular Android display stack, but using Wayland on Android) - tying in with Collabora's other work on Android, but this work also doesn't preclude someone from e.g. implementing a platform plugin to allow Android applications to run natively on unhacked devices, similar to Necessitas on Qt 4 - and I'd very much like to see that happen upstream.

In terms of compilation, there is one approach currently upstreamed that involves using the NDK, see this wiki page for more information. You'll note it's quite easy to do a build yourself, something that was quite intentional, since I figure that the only way it's going to improve easily is if it is easy to hack on it. I'm sure the build & installation instructions can be more optimal still (like installing to /system/lib, etc) but it's a start. Contributions welcome. I should also take a moment to thank the Necessitas guys, their mkspecs provided a nice starting point.

I had started an alternative route of integrating Qt with Android image builds (so, check out the Android tree, repo sync, drop Qt in place, run 'make' and have it built & deployed for you), but unfortunately, my sponsored time to work on this ran out, and so I wasn't able to finish it. It's still an interesting area of work, and so, I do plan to try continue it in my spare time.

In terms of actually using it, one area which is a bit of pain still, is that there's a bug in the way bionic's linker handles R_ARM_COPY relocations - instead of looking up the symbol to copy in the shared libraries the binary depends on, it finds the binary's symbol instead, meaning it doesn't really do any actual relocation.

The symptom of this is that your binary will crash on start due to things being zero'd out that really shouldn't be (like QObject::staticMetaObject in my case), depending on how it's been built. Thanks to Thiago for helping me nut that very difficult problem out. There is a patch pending on Android's gerrit instance, but I need to find the time to go rebase the patch and retest it to make sure it still works, although the code changes in the area look quite trivial.

For those of you who are visually oriented: I'm sorry, but there's not much to show here, because - as of yet - I don't have anything graphical running. Though in theory, it might be already possible to easily shoehorn Wayland libraries into the NDK using Pekka's work, and build QtWayland that way. But if anyone wants to talk Qt on Android, or better still, contribute, I'm all ears.

Massive kudos to Collabora for sponsoring my work on this!

Wednesday, 30 May 2012

writing a layout in QML

Sometimes, for whatever reason, the layouts provided "out of the box" in QML just don't cut it. lately, I've been doing a few rather different things for experimentation and learning purposes that have meant I've run into quite a lot of these cases.

when this happens, the first instinct is to fall into despair - but there's really no need for that. writing your own layout really isn't that hard. Here's a small, fairly self contained example doing just that.

MyLayout.qml:
import QtQuick 2.0

Item {
    id: layout

    property bool ready: false

    onChildrenChanged: performLayout()
    onWidthChanged: performLayout()
    onHeightChanged: performLayout()

    /* the meat of the layout */
    function performLayout() {
        /* nothing to layout? don't bother then */
        if (layout.children.length == 0)
            return

        var currentX = 0

        console.log("DOING LAYOUT FOR " + layout.children.length + " ITEMS")

        /* first real step of doing anything: go over all the children */
        for (var i = 0; i < layout.children.length; ++i) {
            var obj = layout.children[i]

            /* in the real world, we'd probably do something a lot more complex,
             * but let's just position our children along a row.
             */
            console.log("Positioning at " + currentX + " to " + (currentX + obj.width))
            obj.x = currentX
            currentX += obj.width
        }

        console.log("LAYOUT DONE")
    }
}

MyItem.qml:
import QtQuick 2.0

Rectangle {
    width: 100
    height: 50
    color: "black"

    Rectangle {
       width: 90
       height: 40
       anchors.centerIn: parent

       color: "red"
    }
}

main.qml
import QtQuick 2.0

Rectangle {
    width: 1000
    height: 100

    MyLayout {
        MyItem { }
        MyItem { }
        MyItem { }
        MyItem { }
        MyItem { }
        MyItem { }
        MyItem { }
    }
}

This is obviously very simplified for demonstration purposes, to name a few things that it doesn't do:
  • it omits things like margins, wrapping
  • it will break if MyItem is ever anchored
  • it doesn't use anchoring (which might make for a more optimal implementation in this particular case - at the least, it wouldn't have to relayout if the width/height of the layout changed)
  • it doesn't relayout if the geometry of the children change
  • it relayouts whenever properties changes, which isn't optimal if e.g. the layout is animating a change, instead, it should delay a relayout using a Timer

All of these are left to the reader, but hopefully it's of some help in getting started.