Skip to content

feat: add option for including full license content in the SBOM#3856

Closed
spiffcs wants to merge 9 commits intomainfrom
3088-catalog-full-licenses
Closed

feat: add option for including full license content in the SBOM#3856
spiffcs wants to merge 9 commits intomainfrom
3088-catalog-full-licenses

Conversation

@spiffcs
Copy link
Copy Markdown
Contributor

@spiffcs spiffcs commented May 5, 2025

Description

This PR adds the option for users to configure syft to return the full text of licenses it comes across during scanning.

Currently, syft can be configured to return the contents for licenses that do not return a valid SPDX ID. This PR adds an additional config option called include-full-text.

By default syft does not return any contents of any license in any SBOM.

If a user sets include-unknown-license-content then syft will ONLY include content when an SPDX license ID cannot be determined.

If a user sets include-full-text then syft will try to include available content for all license objects it returns.
NOTE: this config option can lead to very large SBOM.

Reviewer Notes

This new option gives more nuance to the license scanner which could result in different amounts of licenses being returned for a package. Consider the new test case where multiple versions of MIT are being returned from scanning a single file where different content offsets denote the different instances of the license being discovered.

In a previous scan users would just see a single MIT instance in the licenses. With this new option the licenses ID are no longer deduplicated because of differences in their content fields. This part of the design should considered before merge.

Imagine a license file where multiple ID are extracted from the same text. Each ID would lead to a separate license being created. To make sure we're not creating multiple licenses with identical content fields we use the returned Offset to read the file at the points of discovery for the given ID. This allows us to be more discerning in our license objects at the cost of increased SBOM size.

Depends on #3857
Fixes #3088

  • New feature (non-breaking change which adds functionality)
  • Documentation (updates the documentation)

Checklist:

  • I have added unit tests that cover changed behavior
  • I have tested my code in common scenarios and confirmed there are no regressions
  • I have added comments to my code, particularly in hard-to-understand sections

Todo

  • [] Update individual catalogers to use the new scanner and make sure catalogers are trying their best to return contents when the option is enabled

spiffcs added 9 commits May 5, 2025 12:36
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
* main:
  fix: use "contents" field and remove "fullText" license field  (#3857)
  Add `deep-squashed` scope to annotate all layers where a package exists (#3138)
  fix: propagate unarchive error of file source (#3845)
* main:
  annotate hidden paths in all-layers scope (#3855)
…uction

Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
* main:
  chore: update license sort to be stable with contents field (#3860)
@spiffcs spiffcs marked this pull request as ready for review May 6, 2025 16:08
@spiffcs spiffcs requested a review from wagoodman May 6, 2025 16:08
@spiffcs spiffcs changed the title feat: add options for cataloging full licenses feat: add option for including full license content in the SBOM May 6, 2025
@spiffcs spiffcs marked this pull request as draft May 6, 2025 16:32
@spiffcs spiffcs added this to OSS May 6, 2025
@spiffcs spiffcs moved this to In Progress in OSS May 6, 2025
@spiffcs spiffcs self-assigned this May 6, 2025
@spiffcs
Copy link
Copy Markdown
Contributor Author

spiffcs commented May 12, 2025

Superseded by #3876

@spiffcs spiffcs closed this May 12, 2025
@github-project-automation github-project-automation Bot moved this from In Progress to Done in OSS May 12, 2025
@spiffcs spiffcs deleted the 3088-catalog-full-licenses branch May 13, 2025 03:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Detect whether full license text or a license name has been provided

1 participant