#3659 Fixed copyright detection normalization #3939
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Normalize Copyright Symbols in License Detection
What does this PR do?
This PR normalizes copyright symbols from [C] to (C) in the license detection logic.
Why is this change necessary?
The current detection logic fails to recognize [C] as a valid copyright sign, leading to false negatives in copyright detection.
How was this change implemented?
A new function, normalize_copyright_symbols, was added to the copyrights.py file, which replaces [C] with (C) and handles variations of the copyright statement.
What are the benefits of this change?
This change improves the accuracy of copyright detection, ensuring that more licenses are correctly identified.
Are there any breaking changes?
No breaking changes are introduced in this PR.
Testing and Validation:
Unit tests were updated to include cases with both [C] and (C) symbols to ensure they are correctly processed.
Additional Notes:
Related issue: #3929 (Improve Copyright Detection)