Saturday, 30 October 2010

the Qt contribution ecosystem

After recently writing about the broken contribution process in Qt, I got a little bit inspired to see what the current 'lay of the land' of the Qt contribution ecosystem looks like. So, I did what any self-respecting hacker would do, and wrote a quick script over the course of a few hours to generate the statistics I wanted. Beware, gitstats takes a long while to run over repositories with a big history. If you want to skip the work of running it over Qt yourself, grab a copy of the CSVs I generated.

Before I go any further, I'd like to emphasise the following from gitstats' README:
The information gitstats generates is in two primary facets: which individuals
are doing the work, and which organisations do those individuals come from.

The raw output from gitstats isn't perfect (obviously) as some individuals
change email addresses, etc, but gitstats does make some effort to try keep
track of people.

Usual disclaimer about raw data applies, you should have insight into the real
situation behind the figures before trying to apply any sort of real
interpretation to it
.

There are a few ways in particular that gitstats' information is flawed. The biggest being that the 'organisation' an individual belongs to is generated from the first component of their email address.

The problems here:
  • Wth people contributing from email addresses like ritt.ks@gmail.com (who is, by the way, a very prolific external Qt contributor): gitstats lumps them into an organisation like 'gmail'.

    This isn't bad for the *majority* of contributors, but obviously for those from free mail domains, it's not really correct.
  • This also results, in some cases, in an extra 'organisation' being created, when someone who is already contributing to Qt switches to another mail address, as can be seen in the data with e.g. 'abecasis', created by João Abecasis, a Nokian, who has occasionally committed from joao@abecasis.name.

    This can't really be fixed in a satisfactory way automatically, as e.g. you may very well have people who move from one company to another, but keep contributing to Qt.

    The not so obvious problem with this is that contributions go to the organisation that user is in when they made that particular commit, so João's commits under abecasis.name go (incorrectly) to abecasis instead of to Nokia.

The Organisations

Notes when interpreting this graph:

  • I had to cap the LOC added by Nokia, because it was off the scale. Part of this is because of updates to /src/3rdparty/ (think things like Webkit updates) by @nokia.com addresses
  • For all intents and purposes, Trolltech and Nokia can be lumped together, I didn't do so because of the previous note
  • 'gmail', as noted above, is actually a group of lots of individual contributors. There's some more of these (like 'users', which is people with email addresses from users.sourceforge.net)
  • Some of the large contributions (gmail, archlinux, holodeck1, seznam, ostash) are bulked out because they include translation updates
Of interest to me when looking at this graph:
  • Nokia is the elephant in the room. This is not unexpected, given that they have some 130 folk contributing to Qt. They are undoubtedly the largest single contributing organisation by a very, very long way.
  • It looks like there are companies other than Nokia interested in contributing to Qt at a fairly large scale, for example, accenture, sosco, and digia. I'd presume that these are contractors. I'm informed that contractors generally work from the Nokia offices, which would explain why I hadn't seen much of them in Gitorious. (They still go through a code review process, as do other Nokia employees.)
  • There is a large impact thanks to individual contributors
  • There is a big KDE presence. This is not unexpected. :)
  • There are a number of smaller companies in the Qt ecosystem, such as Codethink, Collabora (my own employer), Blankpage basyskom, and medical-insight.
The Individuals

Notes when interpreting this graph:
  • I chose to cut out Nokia/Trolltech, because otherwise, it would have been pretty pointless. They contribute a lot, and there are a lot of them. Besides, for me at least, it's more interesting seeing the individual contributors.
  • There is at least two ex-Nokian/TT people here:
    • Anders Bakken, whom I have left in because he still occasionally contributes to Qt for his new employer (I think). But part of his number will of course come from his time at Nokia.
    • Benjamin Meyer is an ex-TT guy, who left a few years ago. Since he never committed from either a TT or Nokia address, there's not much point to removing him, I think.
Of interest to me when looking at this graph:
  • Contractors seem to be doing quite a bit (Shane Kearns, Mikka Heikkinen). Not surprising.
  • Ritt Konstantin is a hero.
  • There are a lot of people working on translations outside of Nokia (Ritt Konstantin, Laszlo Papp, Jure Repinc, Victor Ostashevsky), and just like we saw on the organisations graph, the numbers get a bit skewed as a result
  • The numbers drop off very quickly, especially if you were to disqualify translations from these figures. This is a bit disappointing, but not surprising, given the hurdles to contributing to Qt.

5 comments:

  1. From the KDE point of view it's worth mentioning that in fact _most_ KDE contributors don't have an @kde.org mail address so the numbers there are probably under-represented. Though I guess that's probably the case for a lot of the orgs.

    ReplyDelete
  2. I might probably have emphasized that a bit more, yes, so I'll do it here: mail address is certainly not the best way to determine 'affiliation', but it is the easiest way to do so - particularly in an automated fashion.

    Another example where it's not quite correct is in my own case: I commit with my work address (@collabora.co.uk), but *most* of the time I contribute isn't for work.

    ReplyDelete
  3. I think most of those gmail addresses are KDE contributors

    ReplyDelete
  4. Thanks for this, quite an interesting read, and yeah, given the noise in the data, a pretty good analysis too.

    ReplyDelete
  5. Just FYI, what was sosco is now Accenture so that's really all one org. Most of the work they've done will be on the Symbian port and the mobility stuff. It'll almost entirely be paid for by Nokia as you've guessed, along with the digia stuff.

    ReplyDelete