How long has a line of code existed in a codebase?
It’s easy to get an approximate
answer on a per-line basis usinggit blame
(and who last modified it). Answering this question in aggregate, however, is not easy with git blame
alone, and is where an AskGit query can help.What’s the average age of all lines of code?
We can calculate the average age of all lines of code in a git repository with the following query:
-- average time (in days) since a line of code was last modified | |
SELECT | |
AVG(julianday(datetime('now')) - julianday(author_when)) AS avg_age | |
FROM files, blame('', '', path) | |
JOIN commits ON hash = commit_hash ORDER BY avg_age |
Running the above will yield a single value:
+--------------------+
| AVG_AGE |
+--------------------+
| 162.64079124513728 |
+--------------------+
Which is the average age (in days) of a line of code in a git repository. This alone is interesting, but maybe not that useful. Let’s see if we can break that number out…
Average age by author
This query modifies the original to group results by author (the author_name
column), showing the average age of a line of code added by a particular contributor.
+------------------+--------------------+
| AUTHOR_NAME | AVG_AGE |
+------------------+--------------------+
| Author #1 | 39.76696759229526 |
+------------------+--------------------+
| Author #2 | 44.11296238214041 |
+------------------+--------------------+
.
.
.
| Author #20 | 291.14547453681007 |
+------------------+--------------------+
| Author #21 | 432.53069444419816 |
+------------------+--------------------+
This could be a useful way of looking at how active an author is in a codebase - by looking at how recent their “live”
contributions are. Conversely, it can be used to identify “legacy” authors, whose contributions are now old.Average age by file
-- average time (in days) since a line of code was last modified | |
-- broken out by file path | |
SELECT | |
path, | |
AVG(julianday(datetime('now')) - julianday(author_when)) AS avg_age | |
FROM files, blame('', '', path) | |
JOIN commits ON hash = commit_hash | |
GROUP BY path | |
ORDER BY avg_age |
This query makes a similar change to break results out by file path, and yields results looking like:
+------------------------------+--------------------+
| PATH | AVG_AGE |
+------------------------------+--------------------+
| path/to/file1.go | 14.493564205910815 |
+------------------------------+--------------------+
| path/to/file2.go | 15.012321128171276 |
+------------------------------+--------------------+
| path/to/file3.go | 15.458618565888727 |
+------------------------------+--------------------+
.
.
.
+------------------------------+--------------------+
| path/to/file100.go | 21.454814814962447 |
+------------------------------+--------------------+
Which is the average age of a line of code in a particular file. This could be an interesting way to find “stale” files that haven't been updated in a while (and maybe deserve a revisit?). Conversely, it could be used to find the “youngest” files in a codebase, ones that have recently been changed significantly and may be a current “hotspot.”
What comes next?
The queries here can be modified to fit different use cases and codebases. For instance, maybe individual files is too granular a view and file type (by extension) is more useful. Or maybe certain files should be excluded (generated code, vendored dependencies, etc.)
AskGit gives you a powerful way to explore these questions in single or multiple codebases. Try it out!
Approximate because git blame
will show who lost modified a particular line, not necessarily who authored it. This means trivial changes will be included (adding/removing whitespace, simple formatting, etc) and also changes that may have involved moving that piece of code around.
“Live” as in code that’s currently in the codebase, last modified by this author