How to get a list of licences for dependencies of multiple projects

I once was tasked to get a list of all licences for each dependency in all services my team used, and I forgot to document it at the time, it may be useful in the future if I need to do something similar, and since it’s not really the most elegant way, next time, I can improve on this method. If you end up using this method or something better, do let me know, I always love a good automation.

Gather all repos

We were using pyup at the time, and since pyup lists all dependencies in a single page, I hit Ctrl+S on that page for each of our repos, a bit slow, but not too bad unless you have more than 10 services/packages to collect.

I saved all the collected .html files on the same directory.

Make the CSV

Now that we have a bunch of .html files with the data we need (that page also gives us the licence for each dependency), we need to parse the html and get each of the dependencies and their licence on a list and then print it out as a CSV.

Here is the script I wrote at the time:

from bs4 import BeautifulSoup

from os import listdir

def main():
    licenses_by_package_name = {}
    for filename in listdir('.'):
        if filename.endswith('py'):
            continue
        with open(filename) as html_file:
            soup = BeautifulSoup(html_file.read(), 'html.parser')
            for requirement in soup.find_all(class_='requirement-row'):
                package = requirement.find(class_="client-link").string
                licence_part = requirement.find(class_="license")
                licence = licence_part.find("span", class_="size-11")["title"]
                licenses_by_package_name[package.strip()] = licence

    for pk, li in licenses_by_package_name.items():
        print("{}, {}".format(pk, li))


if __name__ == "__main__":
    main()

I’ll leave refactoring the script as an exercise for the reader, I only needed to run it once, so I’m fine with it.

I ran it this way:

# You need to install beautifulsoup in order to run the script (`pip install beautifulsoup4`)
$ python licences.py > licences.csv

You’ll end up with a .csv file with the dependency name and it’s licence, here’s an example:

django-nested-admin, BSD-2-Clause
Django, BSD-2-Clause
coverage, Apache-2.0
paramiko, LGPL-2.0-only
django-admin-rangefilter, MIT
shapely, BSD-2-Clause

I hope this is helpful, and let me know if you find a better way to do this for python projects.