Yesterday, I saw this thread on HN (an open-source project includes a dependency with a license that forbids its use). It got me thinking about how the query used in this post could be amended to download the licenses of all declared dependencies in a Go codebase (using go.mod
). Here’s what I came up with.
Identify the dependencies
SELECT | |
json_extract(value, '$.mod.path') | |
FROM files f, json_each(go_mod_to_json(contents), '$.require') j | |
WHERE f.path = 'go.mod' |
This will use the built-in JSON functions and go_mod_to_json
to list all the required dependencies in a go.mod
file. Here are the first 5 from the AskGit codebase:
+-----------------------------------+
| JSON_EXTRACT(VALUE, '$.MOD.PATH') |
+-----------------------------------+
| github.com/BurntSushi/toml |
+-----------------------------------+
| github.com/DATA-DOG/go-sqlmock |
+-----------------------------------+
| github.com/ProtonMail/go-crypto |
+-----------------------------------+
| github.com/asaskevich/govalidator |
+-----------------------------------+
| github.com/augmentable-dev/vtab |
+-----------------------------------+
Fetch each dependency
Next, we’ll want to grab the LICENSE
file from every dependency:
SELECT | |
str_split(mod, '/', 0) || '/' || str_split(mod, '/', 1) || '/' || str_split(mod, '/', 2) AS repo, -- this is a bit whacky, need to get the repo part of the module path | |
contents -- the actual LICENSE file | |
FROM | |
(select json_extract(value, '$.mod.path') AS mod from files f, json_each(go_mod_to_json(contents), '$.require') j WHERE f.path = 'go.mod') AS deps, | |
files('https://' || repo) -- this will clone every every repo to a temp directory | |
WHERE repo IS NOT NULL | |
AND path = 'LICENSE' | |
AND repo LIKE 'github.com%' -- was failing on some non-git dependencies, shortcut for now, but limits to modules only on GitHub |
It looks like a lot 🙂, but we’re extending the previous query to:
Extract the git repo part of a Go mod path
Join with the
files
table to clone every dependency to atmp
directoryPull out the contents of the
LICENSE
file (if it exists)
Currently, this will only work with git-hosted modules (so we add a filter to only select ones hosted on GitHub, for now).
Results
And there you have it, the contents of the LICENSE
file for every dependency required in a go.mod
file (at least the ones on GitHub).
You can see the output for the AskGit project here.
The content of licenses is pretty verbose - a next step would be to identify the type of license and just return that (i.e. MIT vs Apache 2.0) rather than the full text…