Skip to content

feat: Rework amend command, add optional operations#160

Merged
italvi merged 20 commits intomainfrom
144-add-amend-operator-to-delete-ambigious-licenses
May 13, 2024
Merged

feat: Rework amend command, add optional operations#160
italvi merged 20 commits intomainfrom
144-add-amend-operator-to-delete-ambigious-licenses

Conversation

@mmarseu
Copy link
Copy Markdown
Collaborator

@mmarseu mmarseu commented Apr 15, 2024

Overview

This PR originally only intended to add a new operation to the amend command. This addition triggered others and snowballed into a rather large overhaul of the command. I'll try to go into all changes in adequate detail.

  1. New operation DeleteAmbigiousLicenses which deletes licenses identified only by name, with no other context that provides more insight about the license's content. This is not very contentious and requires little further explanation. See Add amend operator to delete ambigious licenses #144 for more.
  2. New command-line option to select operations to run.
  3. Breaking change: InferCopyright no longer runs by default.
  4. Breaking change: Broken up ProcessLicense
  5. Breaking change: InferSupplier no longer adds supplier information to components which already have comparable fields.
  6. Breaking change: Compositions now sets aggregate unknown instead of incomplete
  7. Improved documentation of design constraints in cdxev/amend/operations.py.
  8. Remove semi-integration tests from tests/test_amend.py

2. New command-line option to select operations to run

Because the added DeleteAmbigiousLicenses command can be dangerous, it shouldn't run by default. This necessitates a new mechanism for the user to select the operations to run.

This change introduces the --operations command-line option. It takes a list of operation names to run. If the option is not provided, a default set of operations is run.

3. Breaking change: InferCopyright no longer runs by default

The set of default operations includes all pre-existing operations with one exception: InferCopyright is extremely dangerous and its outright removal from the tool is currently in debate. Pending any decision, this PR makes it an opt-in operation.

4. Breaking change: Broken up ProcessLicense

4.1 Removed feature to delete licenses with name "unknown"

The ProcessLicense operation previously included a feature that deleted any licenses whose name includes the word "unknown" and that doesn't also specify the full text.
This has been been removed for the following reasons:

  • It also deleted licenses that include a URL to the license text but not the text itself.
  • It is unclear why licenses whose name contains the word "unknown" should be treated separately from any other license with similarly inexpressive names.

If the feature is nevertheless desired, it can always be re-added as a separate operation. In that case, it shouldn't be added to the default set.

4.2 New operation LicenseNameToId

This operation runs by default and takes over the part of ProcessLicense that translates well-known names to SPDX ids.

4.3 New operation AddLicenseText

This operation takes over the part of ProcessLicense that adds full text to licenses with only a name.

It doesn't run by default because it needs a parameter to be specified. In the interest of usability, we shouldn't force users to provide a command-line option if they don't want to use the feature.

There are a few important changes:

  • The new operation no longer overwrites existing license texts. I find this to be risky with no practical benefit. If an SBOM contains a license text, chances are there is a reason for that and we shouldn't overwrite it with another text that potentially only shares the same name by coincidence.
  • It no longer adds an empty string as text to licenses where no text has been found. That intentionally makes the SBOM worse than it was before. If it was meant to shut up our custom validator, I would very much prefer to force SBOM authors to deal with such cases explicitly than to include any number of "fake-compliant" licenses in the output.
    If this feature is nevertheless desired, we can add it as a separate, non-default operation.

5. Breaking change: InferSupplier is now more selective

In my effort to make the operation documentation fit for printing in the CLI, I discovered that InferSupplier was unnecessarily zealous. Given the inherent uncertainty in the information it generates (we can't really be sure that the generated supplier is valid), it probably should only operate on the components where it is really required. That's why it now no longer adds supplier information to components that already contain comparable fields (e.g., author, manufacturer, etc.).

6. Breaking change: Compositions now sets aggregate unknown instead of incomplete

Discussion: #161
Besides the aggregate, this PR also changes the behavior regarding the metadata component. Previously, this component was treated the same as any other. With this change, the metadata component keeps its original aggregate, it is not set to "unknown", unless it was "unknown" in the input.

7. Improved documentation of design constraints

The fact that individual operations are now exposed on the CLI for manual selection by the user leads to a few new design considerations for operations which have been documented in the module docstring in cdxev/amend/operations.py.
The upshot is: keep it simple, keep it small, and be damn sure you don't run dangerous operations by default.

8. Remove semi-integration tests from tests/test_amend.py

Some tests have been removed in preparation of #157, where they will be grouped together in one module.

@mmarseu mmarseu self-assigned this Apr 15, 2024
@mmarseu mmarseu linked an issue Apr 15, 2024 that may be closed by this pull request
@mmarseu mmarseu changed the title Add amend operator to delete ambigious licenses feat: Add amend operator to delete ambigious licenses Apr 15, 2024
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 15, 2024

Coverage

Coverage Report •
FileStmtsMissCoverMissing
__main__.py3101595%205–206, 223, 623–624, 628–633, 635, 638, 650–651
amend
   command.py340100% 
   license.py130100% 
   operations.py210597%386, 391, 429, 455, 523
TOTAL15836595% 

Tests Skipped Failures Errors Time
283 2 💤 0 ❌ 0 🔥 4.252s ⏱️

@mmarseu mmarseu added the dependencies Pull requests that update a dependency file label Apr 15, 2024
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Apr 15, 2024
Comment thread cdxev/amend/operations.py Outdated
@mmarseu mmarseu linked an issue Apr 22, 2024 that may be closed by this pull request
@mmarseu mmarseu force-pushed the 144-add-amend-operator-to-delete-ambigious-licenses branch 2 times, most recently from fed482f to 0734ed2 Compare April 23, 2024 13:24
@mmarseu mmarseu marked this pull request as ready for review April 23, 2024 14:07
@mmarseu mmarseu requested review from CBeck-96 and italvi April 23, 2024 14:07
@mmarseu mmarseu linked an issue Apr 24, 2024 that may be closed by this pull request
Copy link
Copy Markdown
Collaborator

@italvi italvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmarseu I did some tests and seems like most of the commands are working as expected, so after doing the changes, we should be able to merge this one (as the comments are mostly regarding grammar/comments)

However, some more remarks:

  • Maybe you should rename this MR as it does more than described and is more a rework than just feat.
  • I'm fine with "3. Breaking change: InferCopyright no longer runs by default", now it is up to anybody to decide whether they want to do this or not. Maybe a print to stdout with an INFO would be useful, like it is already done in class LicenseNameToId and AddLicenseText, would even add some consistency and then the changes are traceable 😉.
  • In "4. Breaking change: Broken up ProcessLicense" you mention that "It no longer adds an empty string as text to licenses where no text has been found. That intentionally makes the SBOM worse than it was before", however if I only have one ambiguous licenses in the licenses-array, then --operation delete-ambiguous-licenses leads to an empty array, i.e. licenses: []. Should not we, in this case, remove the key licenses completely, as an empty array does not add any (information) value?
    Why the "one" is bold: It seems that --operation delete-ambiguous-licenses only deletes the first license with license.name, e.g.
"licenses": [
                {"license": {"id": "Apache-2.0"}},
                {"license": {"name": "Some license"}},
                {"licence": {"name": "blub"}}
            ],

leads to

"licenses": [
                {"license": {"id": "Apache-2.0"}},
                {"licence": {"name": "blub"}}
            ],

This is something you do not have included in your tests, there you only have the case one id and one name.

  • Do we need a test for the case that license.name is similar to an SPDX-ID and the delete-ambiguous-licenses operation is used? Just to make sure that it maps it to a license.id and does not delete it.
  • The reason why we wanted supplier, which you now changed with "5. Breaking change: InferSupplier is now more selective", was the current tool landscape: Most of the tools (e.g. DependencyTrack and BlackDuck) do not understand e.g. publisher, mostly they either only supported supplier or, additionally, author.

Comment thread NOTICE Outdated
Comment thread cdxev/amend/command.py Outdated
Comment thread cdxev/amend/command.py Outdated
Comment thread cdxev/__main__.py Outdated
Comment thread cdxev/__main__.py Outdated
Comment thread tests/auxiliary/licenses/license_name
Comment thread docs/available_commands.md
Comment thread tests/test_main.py
Comment thread cdxev/amend/operations.py Outdated
Comment thread cdxev/amend/operations.py Outdated
@mmarseu
Copy link
Copy Markdown
Collaborator Author

mmarseu commented May 6, 2024

@mmarseu I did some tests and seems like most of the commands are working as expected, so after doing the changes, we should be able to merge this one (as the comments are mostly regarding grammar/comments)

However, some more remarks:

  • Maybe you should rename this MR as it does more than described and is more a rework than just feat.

Fair enough.

  • I'm fine with "3. Breaking change: InferCopyright no longer runs by default", now it is up to anybody to decide whether they want to do this or not. Maybe a print to stdout with an INFO would be useful, like it is already done in class LicenseNameToId and AddLicenseText, would even add some consistency and then the changes are traceable 😉.

Done

  • In "4. Breaking change: Broken up ProcessLicense" you mention that "It no longer adds an empty string as text to licenses where no text has been found. That intentionally makes the SBOM worse than it was before", however if I only have one ambiguous licenses in the licenses-array, then --operation delete-ambiguous-licenses leads to an empty array, i.e. licenses: []. Should not we, in this case, remove the key licenses completely, as an empty array does not add any (information) value?

We can. I'll add it.

Why the "one" is bold: It seems that --operation delete-ambiguous-licenses only deletes the first license with license.name, e.g.

"licenses": [
                {"license": {"id": "Apache-2.0"}},
                {"license": {"name": "Some license"}},
                {"licence": {"name": "blub"}}
            ],

leads to

"licenses": [
                {"license": {"id": "Apache-2.0"}},
                {"licence": {"name": "blub"}}
            ],

This is something you do not have included in your tests, there you only have the case one id and one name.

I've added a test for this. Weirdly, it works perfectly fine for me 😕

  • Do we need a test for the case that license.name is similar to an SPDX-ID and the delete-ambiguous-licenses operation is used? Just to make sure that it maps it to a license.id and does not delete it.

I don't think so. The user is free to make sure that they run license-name-to-id before delete-ambiguous-licenses. If they don't, they might run into this problem. In practice, this shouldn't be very frequent, because delete-ambiguous-licenses doesn't run by default. Most people outside Festo wouldn't ever use it. Internally, we need to make sure our pipeline templates invoke operations in the correct order.

  • The reason why we wanted supplier, which you now changed with "5. Breaking change: InferSupplier is now more selective", was the current tool landscape: Most of the tools (e.g. DependencyTrack and BlackDuck) do not understand e.g. publisher, mostly they either only supported supplier or, additionally, author.

I understand. I'll revert this particular change.

@mmarseu mmarseu changed the title feat: Add amend operator to delete ambigious licenses feat: Rework amend command, add optional operations May 6, 2024
@mmarseu mmarseu requested a review from italvi May 6, 2024 13:50
Copy link
Copy Markdown
Collaborator

@italvi italvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmarseu please resolve the merge conflicts and comment how to proceed with the year in NOTICE. But otherwise I do not have any further comments 😃

@mmarseu mmarseu force-pushed the 144-add-amend-operator-to-delete-ambigious-licenses branch from 89d5669 to 14e5942 Compare May 13, 2024 12:27
@mmarseu
Copy link
Copy Markdown
Collaborator Author

mmarseu commented May 13, 2024

@mmarseu please resolve the merge conflicts and comment how to proceed with the year in NOTICE. But otherwise I do not have any further comments 😃

I've added the year and opened #179 to discuss options for automation.

@italvi italvi merged commit b04bdc5 into main May 13, 2024
@italvi italvi deleted the 144-add-amend-operator-to-delete-ambigious-licenses branch May 13, 2024 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation enhancement New feature or request settings_changes unittests

Projects

None yet

2 participants