Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: environment licenses as proper SPDX #459

Closed
jkowalleck opened this issue Dec 11, 2022 · 10 comments · Fixed by #576 or #605
Closed

feat: environment licenses as proper SPDX #459

jkowalleck opened this issue Dec 11, 2022 · 10 comments · Fixed by #576 or #605
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@jkowalleck
Copy link
Member

jkowalleck commented Dec 11, 2022

Currently the licenses are read from package metadata.
They also come from trove classifiers.
Values are trove classifiers - read list A and list B.
See also: PEP 639 https://peps.python.org/pep-0639
The current implementation results in named licenses like "MIT License" as this is the value from classifier License :: OSI Approved :: MIT License.

It would be great to have results as SPDX license IDs, instead of named licenses, if possible.
Therefore, some well-known classifiers might be mapped to SPDX names. The library might already to the correct factory behavior that is needed to create SPDX instead of named.

All that is missing is a mapping from classifiers to SPDX ids.
See latest general list of SPDX ids: https://spdx.org/licenses/
See list of library-supported SPDX ids: https://github.com/CycloneDX/cyclonedx-python-lib/blob/main/cyclonedx/schema/spdx.schema.json

⚠️ ATTENTION:
some classifiers do not have a unique mapping to SPDX, the map 1:N.
For example trove License :: Eiffel Forum License (EFL) could be in SPDX either EFL-1.0 or EFL-2.0.
I would suggest skip unclear relations, and just map those that are 1:1.
See the thoughts in https://peps.python.org/pep-0639/#mapping-license-classifiers-to-spdx-identifiers

@jkowalleck jkowalleck added enhancement New feature or request help wanted Extra attention is needed labels Dec 11, 2022
@a1lu
Copy link
Contributor

a1lu commented Dec 11, 2022

PEP639 would fix this issue.

@a1lu
Copy link
Contributor

a1lu commented Dec 11, 2022

I prepared a script that merges the pypi.org classifier list with the license list from https://spdx.org.

For the trove classifieres I removed the sometimes occurring (license tag) tags behind the name.

For some spdx license names I implemented a easy string replacement to adapt then on the classifier names.
Some licenses are not directly mergeable and need some handwork or just missing from https://spdx.org

import urllib.request
import json
import re


regex = re.compile("^(.*) \(.*\)$")

licenses = set()
with urllib.request.urlopen('https://pypi.org/pypi?%3Aaction=list_classifiers') as f:
    content = f.read().decode('utf-8')
    lines = content.split("\n")
    for line in lines:
        clsf = line.split(" :: ")
        if clsf[0] == "License":
            license = clsf[-1]
            match = regex.match(license)
            if match:
                license = match.group(1)
            licenses.add(license)

spdx_licenses = {}
with urllib.request.urlopen('https://spdx.org/licenses/licenses.json') as f:
    html = f.read().decode('utf-8')
    spdx = json.loads(html)
    for license in spdx["licenses"]:
        name = license["name"]
        if name.startswith("European Union Public License"):
            name = name.replace("License", "Licence")
        if name.startswith("GNU"):
            name = name.replace("v1.0", "v1")
            name = name.replace("v2.0", "v2")
            name = name.replace("v3.0", "v3")
            if name.endswith("only"):
                name = name.replace(" only", "")
            if name.startswith("GNU Library General"):
                name = name.replace("Library", "Lesser")
        if name.startswith("Mulan Permissive"):
            name = name.replace(", Version ", " v")
        if name == "MIT No Attribution":
            name = "MIT No Attribution License"
        if name == "The MirOS Licence":
            name = "MirOS License"
        if name in spdx_licenses:
            continue
        spdx_licenses[name] = license["licenseId"]


matching = licenses.intersection(spdx_licenses.keys())
missing = licenses.difference(spdx_licenses.keys())

print("spdx_license_map = {")
for license in sorted(matching):
    print(f"\"{license}\": \"{spdx_licenses[license]}\",")

for license in sorted(missing):
    print(f"\"{license}\": \"\",")
print("}")

Output:

spdx_license_map = {
"Aladdin Free Public License": "Aladdin",
"Attribution Assurance License": "AAL",
"Boost Software License 1.0": "BSL-1.0",
"CeCILL-B Free Software License Agreement": "CECILL-B",
"CeCILL-C Free Software License Agreement": "CECILL-C",
"Common Development and Distribution License 1.0": "CDDL-1.0",
"Eclipse Public License 1.0": "EPL-1.0",
"Eclipse Public License 2.0": "EPL-2.0",
"European Union Public Licence 1.0": "EUPL-1.0",
"European Union Public Licence 1.1": "EUPL-1.1",
"European Union Public Licence 1.2": "EUPL-1.2",
"GNU Affero General Public License v3": "AGPL-3.0",
"GNU Affero General Public License v3 or later": "AGPL-3.0-or-later",
"GNU General Public License v2": "GPL-2.0",
"GNU General Public License v2 or later": "GPL-2.0+",
"GNU General Public License v3": "GPL-3.0-only",
"GNU General Public License v3 or later": "GPL-3.0-or-later",
"GNU Lesser General Public License v2": "LGPL-2.0-only",
"GNU Lesser General Public License v2 or later": "LGPL-2.0+",
"GNU Lesser General Public License v3": "LGPL-3.0-only",
"GNU Lesser General Public License v3 or later": "LGPL-3.0-or-later",
"Historical Permission Notice and Disclaimer": "HPND",
"ISC License": "ISC",
"Intel Open Source License": "Intel",
"MIT License": "MIT",
"MIT No Attribution License": "MIT-0",
"MirOS License": "MirOS",
"Motosoto License": "Motosoto",
"Mozilla Public License 1.0": "MPL-1.0",
"Mozilla Public License 1.1": "MPL-1.1",
"Mozilla Public License 2.0": "MPL-2.0",
"Mulan Permissive Software License v2": "MulanPSL-2.0",
"Nethack General Public License": "NGPL",
"Nokia Open Source License": "Nokia",
"Open Group Test Suite License": "OGTSL",
"Open Software License 3.0": "OSL-3.0",
"PostgreSQL License": "PostgreSQL",
"Ricoh Source Code Public License": "RSCPL",
"SIL Open Font License 1.1": "OFL-1.1",
"Sleepycat License": "Sleepycat",
"The Unlicense": "Unlicense",
"University of Illinois/NCSA Open Source License": "NCSA",
"X.Net License": "Xnet",
"Academic Free License": "",
"Apache Software License": "",
"Apple Public Source License": "",
"Artistic License": "",
"BSD License": "",
"CC0 1.0 Universal (CC0 1.0) Public Domain Dedication": "",
"CEA CNRS Inria Logiciel Libre License, version 2.1": "",
"Common Public License": "",
"DFSG approved": "",
"Eiffel Forum License": "",
"Free For Educational Use": "",
"Free For Home Use": "",
"Free To Use But Restricted": "",
"Free for non-commercial use": "",
"Freely Distributable": "",
"Freeware": "",
"GNU Free Documentation License": "",
"GNU General Public License": "",
"GNU Library or Lesser General Public License": "",
"GUST Font License 1.0": "",
"GUST Font License 2006-09-30": "",
"IBM Public License": "",
"Jabber Open Source License": "",
"MITRE Collaborative Virtual Workspace License": "",
"Netscape Public License": "",
"OSI Approved": "",
"Other/Proprietary License": "",
"Public Domain": "",
"Python License": "",
"Python Software Foundation License": "",
"Qt Public License": "",
"Repoze Public License": "",
"Sun Industry Standards Source License": "",
"Sun Public License": "",
"Universal Permissive License": "",
"Vovida Software License 1.0": "",
"W3C License": "",
"Zope Public License": "",
"zlib/libpng License": "",
}

Feel free to use and adapt it as you like.

@pombredanne
Copy link

@a1lu you wrote:

PEP639 would fix this issue.

I initiated this and when approved it will then take years to trickle down IMHO.

@pombredanne
Copy link

Note that you should consider using scancode-toolkit for the license detection: it does the work alright!

@andife
Copy link
Contributor

andife commented Aug 29, 2023

Is there any activity on this topic? I would be very interested in it.

As an idea for a better mapping of the licenses you could also read in the wheel, the metadata line license_files, or the file and see which license is the most similar?

@jkowalleck
Copy link
Member Author

jkowalleck commented Aug 29, 2023

@jkowalleck jkowalleck removed the help wanted Extra attention is needed label Aug 29, 2023
@jkowalleck
Copy link
Member Author

thanks to #571 i learned that the topic is still unsolved.

@jkowalleck jkowalleck added the help wanted Extra attention is needed label Aug 31, 2023
@jkowalleck
Copy link
Member Author

jkowalleck commented Aug 31, 2023

re #459 (comment)
re #459 (comment)

looks like the license factory is prepared, but it is not used or the input is insufficient.
there is still no mapping from license troves to SPDX values.

in addition, some inputs are not properly detected as SPDX. for example "MIT" is a known id, but it is not recognized as such

@jkowalleck jkowalleck changed the title feat: licenses as proper SPDX feat: environment licenses as proper SPDX Aug 31, 2023
@jkowalleck jkowalleck removed the help wanted Extra attention is needed label Aug 31, 2023
@jkowalleck jkowalleck self-assigned this Aug 31, 2023
@jkowalleck jkowalleck added this to the 4.0.0 milestone Sep 6, 2023
@jkowalleck
Copy link
Member Author

✔️ SPDX id mappong via #576
✔️ SPDX expressions via #578

@jkowalleck jkowalleck linked a pull request Oct 24, 2023 that will close this issue
@jkowalleck jkowalleck linked a pull request Oct 25, 2023 that will close this issue
@jkowalleck
Copy link
Member Author

This feature will be part of the next/upcoming major release.
Changelog: see #605
Install via: pip install cyclonedx-bom==4.0.0rc1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
4 participants