1577: update and clean license list generation to return more SPDXID for more inputs#1691
1577: update and clean license list generation to return more SPDXID for more inputs#1691
Conversation
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Benchmark Test ResultsBenchmark results from the latest changes vs base branch |
| "apl1": "APL-1.0", | ||
| "apl1.0": "APL-1.0", | ||
| "apl1.0.0": "APL-1.0", | ||
| "apps2.0.0p": "App-s2p", |
There was a problem hiding this comment.
The logic is now detecting versions to expand from the simplified string: We can talk through if we want to handle this case s2p where 2 is not a version. I don't think it would be App-s3p in the future
There was a problem hiding this comment.
There seem to be a couple of these cases which has caused the list to grow by 18 lines --- potentially 6 cases of a string that are now expanding version permutations which were not doing it before this PR
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
| `)) | ||
|
|
||
| var versionMatch = regexp.MustCompile(`-([0-9]+)\.?([0-9]+)?\.?([0-9]+)?\.?`) | ||
| var versionMatch = regexp.MustCompile(`([0-9]+)\.?([0-9]+)?\.?([0-9]+)?\.?`) |
There was a problem hiding this comment.
Causing us to match on numbers in licenses that would otherwise not be considered versions, but I think the trade off of being able to match on more strings out in the wild and return a correct SPDX ID is a good one here.
| // so we need to guarantee the order they are created to avoid mapping them wrongly. So we use a sorted list. | ||
| // To overwrite deprecated licenses during the first pass we would later on rely on map order, | ||
| // [which in go is not consistent by design](https://stackoverflow.com/a/55925880). | ||
| // The order of variations/permutations of a license ID matter. |
There was a problem hiding this comment.
Simplified this logic down to a single pass after we sort the list
There was a problem hiding this comment.
I commented out the log messages to reduce the noise since it's just a script we run to generate our license list and not part of the larger syft program.
You can turn those on to see how the sort prevents things like alpm1 from mapping to later versions
|
|
||
| // We want to replace deprecated licenses with non-deprecated counterparts | ||
| // For more information, see: https://github.com/spdx/license-list-XML/issues/1676 | ||
| if other.Deprecated { |
There was a problem hiding this comment.
findReplacementLicense already assumes a deprecated input
| return l.ID == other.ID | ||
| } | ||
|
|
||
| func (ll LicenseList) findReplacementLicense(deprecated License) *License { |
There was a problem hiding this comment.
moved above the function canReplace for readability
| return "", false | ||
| } | ||
|
|
||
| func cleanLicenseID(id string) string { |
There was a problem hiding this comment.
Had to duplicate this between generate and the actual package. If there is a better way to share the code happy to update to that!
| true, | ||
| }, | ||
| // the below few cases are NOT expected, however, seem unavoidable given the current approach | ||
| { |
There was a problem hiding this comment.
No longer returning true 🥳
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
|
Things to look at in the AM:
|
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Update the license_list.go to have more permissible inputs for greater SPDXID matching. EX: GPL3 gpl3 gpl-3 and GPL-3 can all map to GPL-3.0-only By moving all strings to lower and removing the "-" we're able to return valid SPDX license ID for a greater diversity of input strings. --------- Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Summary
Update the
license_list.goto have more permissible inputs for greater SPDXID matching.The
spdxlicensepackage contains theIDmethod which interacts with the generated filelicense_list.goThe current implementation contains
-in the keys. This PR removes these-and sanitizes the inputs so that we can match on a wider range of inputs found in the wild.spdxlicense.IDhas also been changed to only consider the generated list and return if a value existsThe logic of how this information is encoded has been temporarily been moved to the format helpers.
Note this is temporary. #1554 will be used to update our license parsing logic so that license creation is done at the same time as package creation.
In a follow up PR encoders or other middle layers of syft should no longer have any concerns surrounding updating/finding the correct SPDXID or Expression as this will be done when packages are created at the cataloger level.
Example:
GPL3gpl3gpl-3andGPL-3can all map toGPL-3.0-onlyBy moving all strings to lower and removing the - we're able to return valid SPDX license ID for a greater diversity of input strings.