-
-
Notifications
You must be signed in to change notification settings - Fork 571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update maven detections #3447
Update maven detections #3447
Conversation
bc8c56d
to
7f96760
Compare
Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Add a new config variable for datafile handlers, which if enabled will run a package-ecosystem specific implementation of license detection. Added a function to implement this for maven which only keeps the relevant information and runs license detection on the whole license statement and not on the respective values/attributes one-by-one. Enabled this for maven to improve license detection there. Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
7f96760
to
ac04984
Compare
Adds a fix for maven packacge assembly, combining MANIFEST.MF and pom.xml files into a single package, instead of multiple top level packages. Also fixes pom.properties file parsing. Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
c23072c
to
4456975
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AyanSinhaMahapatra Looks good, I just highlighted some small issues.
503f7d8
to
0165bdf
Compare
tests/summarycode/data/classify/with_package_data.expected.json
Outdated
Show resolved
Hide resolved
dd51a69
to
76ef47f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see some nits for your consideration
src/licensedcode/data/rules/bsd-3-clause-no-nuclear-warranty_and_jj2000_2.RULE
Outdated
Show resolved
Hide resolved
if comment or doc_url: | ||
package["extra_data"] = {} | ||
if comment: | ||
package["extra_data"]['notes'] = comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to find a better way than using extra_data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #3447 (comment). We might want to add fields to the PackageData class then. And we might handle this separately, AFAIK this data is not really used anywhere and this is just a fix to make sure MANIFEST parsing returns everything else, and not crash.
@@ -278,7 +248,7 @@ def dget(s): | |||
package['homepage_url'] = dget('Implementation-URL') or dget('Implementation-Url') | |||
|
|||
# Bundle-DocURL: http://logging.apache.org/log4j/1.2 | |||
package['documentation_url'] = dget('Bundle-DocURL') | |||
doc_url = dget('Bundle-DocURL') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get what you are doing, but we are losing data here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are returning this package
as-is to be used for creating a PackageData class, see https://github.com/nexB/scancode-toolkit/blob/update-maven-detections/src/packagedcode/maven.py#L138 with models.PackageData(**manifest,)
. We don't have a documentation_url
or a notes
field in PackageData, and this is why the Manifest parsing used to fail and not return anything. I'm just ensuring we return whatever we can, instead of crashing and not returning anything. And since I'm having this in extra_data
, we're not really losing any data IMHO
return package | ||
|
||
|
||
def parse_scm_connection(scm_connection): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TG1999 ping ... is this related or relevant to your purldb work?
if package_type in ['maven', 'jar']: | ||
return JavaJarManifestHandler.datasource_id | ||
elif package_type == 'osgi': | ||
return JavaOSGiManifestHandler.datasource_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if a maven package also has OSGi thing?s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can improve the parsing and handle more cases allright, but let's do this separately through another issue, and if you have an example of that case please provide the same. But this was also a fix, and I'm only making sure this works without crashing here.
76ef47f
to
8d8cbe3
Compare
@pombredanne @JonoYang addressed all your comments, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a few extra nitrs that you can fix or not as you like before merging. Thanks!
Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
* Add rules for license detection fixes * Fix detection for pom.properties file * Fix --todo plugin for package detection * Fix false positive case for bare word rules * Fix issue of unknown license detection in package manifest Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Adds additional license rules to take care of maven package license detection adjustments. Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
dc8f545
to
e3a1c90
Compare
Update maven package and license detections:
Tasks
Run tests locally to check for errors.