The Problem with Collecting Statistics to Measure Developer Productivity

A common thing I've seen in my time as a developer is that there will often be an attempt to measure developer productivity. While I can understand the desire to gain an insight into how a project is coming along, with very few exceptions it turns into an exercise in futility. The problem with most methods used to try to quantify developer productivity is that they are akin to measuring a portrait artist's productivity based on the number of brushstrokes completed.

One example I have seen is to measure productivity based on the number of lines of source code, typically measured in the thousands (KLOC). The basic idea here is that it is possible to take a comparable software project (with a known number of lines of source code)  and compare with where the current project is in terms of the number of lines of source code, thus establishing an estimate for how much effort remains. There are several problems with this approach. The first (and most obvious problem) is that it assumes that two software projects are truly directly comparable. Unless this is a situation where the same 'cookie cutter' projects are being produced repeatedly, it is quite rare for two different software projects to be close enough in complexity, staffing experience/talent, and budget/schedule for lines of code to be a useful measure. As well, measuring lines of code doesn't account for differences in developer's abilities (e.g. two developers may write the same feature using a very different number of lines of code), nor does it account for differences in the number of lines of code that would be written in different programming languages.

Another example I have seen is to count the number of code checkins to the source code repository. This is a particularly baffling statistic to measure, since it cannot be correlated to developer productivity. Even after accounting for checkin standards, there is far too much variability in how software is developed for code checkin counts to be meaningful. One feature may require significant changes to a single file, while another feature may require trivial changes to multiple files. As well, there is going to be variability in how developers checkin their code, even with standards established. Some developers will use as few checkins as possible (collecting changes into larger chunks) while some developers will produce many more checkins (with each checkin having a much smaller set of changes).

These are just a few of the kinds of measurement follies that exist out in the wild. If these types of measurements (and others like them) are problematic, then what sort of measurements (if any) are useful? 

Quite frankly, the best measurement of developer productivity is how useful the software is to users or how much enjoyment they get from using the software. Everything else isn't really that important. Certainly, there are considerations for timeliness and remaining within the budget, but the questions around those topics aren't going to be answered by trying to measure developer productivity as if software was produced on an assembly line. What works for measuring productivity on the factory floor does not work in the context of a development team.

Folks, don't let your managers try to measure your productivity using methods best used in other professions.