I once was tasked to get a list of all licences for each dependency in all services my team used, and I forgot to document it at the time, it may be useful in the future if I need to do something similar, and since it’s not really the most elegant way, next time, I can improve on this method. If you end up using this method or something better, do let me know, I always love a good automation.
Gather all repos
We were using pyup at the time, and since pyup lists all dependencies in a single page, I hit Ctrl+S on that page for each of our repos, a bit slow, but not too bad unless you have more than 10 services/packages to collect.
I saved all the collected .html
files on the same directory.
Make the CSV
Now that we have a bunch of .html
files with the data we need (that page also gives us the licence for each dependency), we need to parse the html and get each of the dependencies and their licence on a list and then print it out as a CSV.
Here is the script I wrote at the time:
from bs4 import BeautifulSoup
from os import listdir
def main():
licenses_by_package_name = {}
for filename in listdir('.'):
if filename.endswith('py'):
continue
with open(filename) as html_file:
soup = BeautifulSoup(html_file.read(), 'html.parser')
for requirement in soup.find_all(class_='requirement-row'):
package = requirement.find(class_="client-link").string
licence_part = requirement.find(class_="license")
licence = licence_part.find("span", class_="size-11")["title"]
licenses_by_package_name[package.strip()] = licence
for pk, li in licenses_by_package_name.items():
print("{}, {}".format(pk, li))
if __name__ == "__main__":
main()
I’ll leave refactoring the script as an exercise for the reader, I only needed to run it once, so I’m fine with it.
I ran it this way:
# You need to install beautifulsoup in order to run the script (`pip install beautifulsoup4`)
$ python licences.py > licences.csv
You’ll end up with a .csv
file with the dependency name and it’s licence, here’s an example:
django-nested-admin, BSD-2-Clause
Django, BSD-2-Clause
coverage, Apache-2.0
paramiko, LGPL-2.0-only
django-admin-rangefilter, MIT
shapely, BSD-2-Clause
I hope this is helpful, and let me know if you find a better way to do this for python projects.