Identifying Recently Active Git Contributors 📅 📈

Using a bit of SQL to find the most *recently* active git contributors to an open-source codebase

The other day, I had a need to contact the maintainer(s) of an open-source project. There was no contact info and no project website. Opening an issue felt like a bit much, but could’ve worked as a last resort.

I wanted to reach the “core” set of maintainers, without knowing explicitly who they were (no CONTRIBUTORS file or explicit contact list). Looking at the commit log yielded some recent committers, but it’s hard to differentiate a “core maintainer” from a “passer-by” (someone fixing a typo, patching their specific bug, etc: contributing but not actively maintaining).

I wanted to know who the most recent active contributors were - not necessarily the most active overall, but the most active within the last N days. The original authors may have the most commits + LOC, but could be long gone from the project. This askgit query helped achieve that:

SELECT
  author_name,
  count(*)
FROM commits
WHERE
  author_when > date('now', '-90 days')
AND
  parents < 2
GROUP BY
  author_name
ORDER BY
  count(*) DESC

This produced a ranked list of contributors in the past 90 days by commit count (excluding merge commits).

The GitHub contributor insights page offers a really nice way to surface the same information visually.

Needless to say, I was able to find my “core maintainers” and reach them via Twitter DM 🙂.

Narrowing the Search

This was a pretty simple use case for AskGit, with a nice visual alternative in the GitHub UI (though the SQL version would be necessary for any repo not on GitHub).

A further step could be looking specifically for contributors to certain sections of a codebase. Maybe I don’t care about overall contribution - just contribution to specific features (by limiting the query to only commits that changed certain files).

For instance:

SELECT
  author_name,
  count(DISTINCT hash)
FROM
  commits, stats('', commits.hash)
WHERE
  author_when > date('now', '-120 days')
AND
  parents < 2
AND
  file_path LIKE 'go.mod'
GROUP BY
  author_name
ORDER BY
  count(*) DESC

should show me only the contributors who updated the go.mod file. The LIKE clause could be modified to filter by certain file types or paths (depending on the structure of the repo). LIKE ‘*.go‘ or LIKE ‘important/feature/*‘

The built-in language-detection functions could also be useful to examine only test or documentation files (who most recently maintains the docs for this project?).

I hope this is useful! If you want to stay up to date with more queries and use cases, go ahead and subscribe if you haven’t already 🙂

As usual, you can try out queries on our demo site.