-
-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve license detection for wrong SPDX license identifiers #3912
Comments
create a rule for
and make this 99 relevant that's the approach for BSD's that will be picked over the SPDX detection, it should at least |
Add a new matcher_order attribute to LicenseMatch and use it for sorting matches rather than the matcher string. This was we can ensure that there is a proper precedence between matchers when two matches are matching exactly the same text. The new sort order for matcher is like that: - 0: 1-hash - 1: 2-aho - 2: 1-spdx-id - 3: 3-seq - 4: 5-undetected - 5: 5-aho-frag - 6: 6-unknown The outcome is that a hash or aho match for the same text at the same position will take precedence of the SPDX id match, allowing to curate and correct some incorrect license expressions if needed. Reference: #3912 Reported-by: Ayan Sinha Mahapatra <[email protected]> Signed-off-by: Philippe Ombredanne <[email protected]>
I pushed a fix in c581828 The default sort order or LicenseMatch was based on the "matcher" string, hence "1-spdx-id" would always beat a "2-aho" match. Now we have a new "matcher_order" integer attribute that is used to sort instead and the hash and aho always take precedence over SPDX. |
Add a new matcher_order attribute to LicenseMatch and use it for sorting matches rather than the matcher string. This was we can ensure that there is a proper precedence between matchers when two matches are matching exactly the same text. The new sort order for matcher is like that: - 0: 1-hash - 1: 2-aho - 2: 1-spdx-id - 3: 3-seq - 4: 5-undetected - 5: 5-aho-frag - 6: 6-unknown The outcome is that a hash or aho match for the same text at the same position will take precedence of the SPDX id match, allowing to curate and correct some incorrect license expressions if needed. Reference: #3912 Reported-by: Ayan Sinha Mahapatra <[email protected]> Signed-off-by: Philippe Ombredanne <[email protected]>
Consider the following text:
Here
BSD
is not a valid license expression and even adding a rule is insufficient because theSPDX-License-Identifier
based detection was moved before the hash license detection.We should either:
The text was updated successfully, but these errors were encountered: