Identifying Recently Active Git Contributors 📅 📈
Using a bit of SQL to find the most *recently* active git contributors to an open-source codebase
The other day, I had a need to contact the maintainer(s) of an open-source project. There was no contact info and no project website. Opening an issue felt like a bit much, but could’ve worked as a last resort.
I wanted to reach the “core” set of maintainers, without knowing explicitly who they were (no CONTRIBUTORS file or explicit contact list). Looking at the commit log yielded some recent committers, but it’s hard to differentiate a “core maintainer” from a “passer-by” (someone fixing a typo, patching their specific bug, etc: contributing but not actively maintaining).
I wanted to know who the most recent active contributors were - not necessarily the most active overall, but the most active within the last N days. The original authors may have the most commits + LOC, but could be long gone from the project. This askgit
query helped achieve that:
SELECT
author_name,
count(*)
FROM commits
WHERE
author_when > date('now', '-90 days')
AND
parents < 2
GROUP BY
author_name
ORDER BY
count(*) DESC
This produced a ranked list of contributors in the past 90 days by commit count (excluding merge commits).
The GitHub contributor insights page offers a really nice way to surface the same information visually.
Needless to say, I was able to find my “core maintainers” and reach them via Twitter DM 🙂.
Narrowing the Search
This was a pretty simple use case for AskGit, with a nice visual alternative in the GitHub UI (though the SQL version would be necessary for any repo not on GitHub).
A further step could be looking specifically for contributors to certain sections of a codebase. Maybe I don’t care about overall contribution - just contribution to specific features (by limiting the query to only commits that changed certain files).
For instance:
SELECT
author_name,
count(DISTINCT hash)
FROM
commits, stats('', commits.hash)
WHERE
author_when > date('now', '-120 days')
AND
parents < 2
AND
file_path LIKE 'go.mod'
GROUP BY
author_name
ORDER BY
count(*) DESC
should show me only the contributors who updated the go.mod
file. The LIKE
clause could be modified to filter by certain file types or paths (depending on the structure of the repo). LIKE ‘*.go‘
or LIKE ‘important/feature/*‘
The built-in language-detection functions could also be useful to examine only test or documentation files (who most recently maintains the docs for this project?).
I hope this is useful! If you want to stay up to date with more queries and use cases, go ahead and subscribe if you haven’t already 🙂
As usual, you can try out queries on our demo site.