-
-
Notifications
You must be signed in to change notification settings - Fork 625
Detect description and author fields from Cargo.toml manifest #1454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect description and author fields from Cargo.toml manifest #1454
Conversation
Signed-off-by: Ritiek Malhotra <[email protected]>
Signed-off-by: Ritiek Malhotra <[email protected]>
Signed-off-by: Ritiek Malhotra <[email protected]>
Signed-off-by: Ritiek Malhotra <[email protected]>
Signed-off-by: Ritiek Malhotra <[email protected]>
Codecov Report
@@ Coverage Diff @@
## develop #1454 +/- ##
===========================================
- Coverage 83.9% 83.11% -0.79%
===========================================
Files 120 120
Lines 14095 14123 +28
===========================================
- Hits 11826 11739 -87
- Misses 2269 2384 +115
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## develop #1454 +/- ##
===========================================
- Coverage 83.91% 82.94% -0.98%
===========================================
Files 120 121 +1
Lines 14095 14453 +358
===========================================
+ Hits 11828 11988 +160
- Misses 2267 2465 +198
Continue to review full report at Codecov.
|
Signed-off-by: Ritiek Malhotra <[email protected]>
527e663 to
10c9458
Compare
pombredanne
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you: looking good. See my review comments inline.
Also in the parse function there is something wrong from previous commits:
def parse(location):
"""
Return a Package object from a Cargo.toml file or None.
"""
if not is_cargo_toml(location):
return
with io.open(location, encoding='utf-8') as loc: <-- you open `location` as `loc` alright
package_data = toml.load(location, _dict=OrderedDict) <-- ... but you pass a location string and not a loc file-like object
return build_package(package_data)
src/packagedcode/cargo.py
Outdated
| # At the moment, this is only useful for making tests pass | ||
| ordered_dict_map = {} | ||
| for key in ("source_packages", "dependencies", "keywords", "parties"): | ||
| for key in ("source_packages", "dependencies", "keywords"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit picking: can you use single quotes consistently everywhere? (unless this is for a docstring """)
src/packagedcode/cargo.py
Outdated
| **ordered_dict_map | ||
| ) | ||
|
|
||
| field_mappers = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is a single function that you call here, may be it would simpler to just call it directly without using a list of partials, etc? I reckon the code in npm does this, but this is not super readable IMHO and there are many more functions being called.
src/packagedcode/cargo.py
Outdated
|
|
||
| def party_mapper(party, package, party_type): | ||
| """ | ||
| Update package parties with party of `party_type` and return package. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than updating the package, it would be simpler to return the parties, and update the package at the call site instead. This way here has been borrowed from the npm.py way and is not something that I am very proud of so it is best not to carry it around.
May be something such as this:
def party_mapper(party, party_role):
"""
......
""""
for auth in party:
name, email = parse_person(auth)
yield models.Party(
type=models.party_person,
name=name,
role=party_role,
email=email)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Passing and then updating the package within functions can indeed cause confusion. Also, using yield seems a neat idea here!
| return name, email | ||
|
|
||
|
|
||
| person_parser = re.compile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind adding a few unit tests that showcase how this regex and the regex below behave with real data? This helps understand things better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'll add in some tests shortly.
src/packagedcode/cargo.py
Outdated
|
|
||
| name = package_data.get('package').get('name') | ||
| version = package_data.get('package').get('version') | ||
| description = package_data.get('package').get('description') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO assign package_data.get('package') to a variable rather than calling it over and over
src/packagedcode/cargo.py
Outdated
| # TODO: Remove this ordered_dict_map once cargo.py is able to handle | ||
| # the appropriate data (source_packages, dependencies, etc..) | ||
| # At the moment, this is only useful for making tests pass | ||
| ordered_dict_map = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ordered_dict_map sounds a bit strange for a variable name: do not say what type it is, but name its content instead.
But also it might be much simpler to no create empty fields that are not needed and instead just add keyword arguments directly to your package constructor below:
authors = package_data.get('package', {}).get('authors', [])
# this assume a refactored party_mapper based on other comment
parties = list(party_mapper(authors, party_type='author'))
package = RustCargoCrate(
name=name,
version=version,
description=description,
parties=parties,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ordered_dict_mapsounds a bit strange for a variable name: do not say what type it is, but name its content instead.
Good variable naming conventions. I'll keep that it mind. :)
But also it might be much simpler to no create empty fields that are not needed and instead just > add keyword arguments directly to your package constructor below:
...
I did try something like this in #1426 but for a reason I couldn't figure out - tests kept failing. Anyway, just tried to make it work this way again and seems like I had to also modify the respected fields in Cargo.toml.expected files from {} to [].
Signed-off-by: Ritiek Malhotra <[email protected]>
Signed-off-by: Ritiek Malhotra <[email protected]>
Signed-off-by: Ritiek Malhotra <[email protected]>
…ict instance Signed-off-by: Ritiek Malhotra <[email protected]>
|
Thanks for such a detailed review! I've made the appropriate changes. We're still missing regex tests which I'll add in soon. |
| "notice_text": null, | ||
| "manifest_path": null, | ||
| "dependencies": {}, | ||
| "dependencies": [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was important as your use of empty dicts was changing the default type for this: it is a list.
Thank you. LGTM and I will merge as soon as we have these |
|
I've added tests for our regex parser. By the way, IMO using |
|
re |
Yup! Will do.
Um.. I can't see any new inline comments since my last comment and the old ones have been addressed I guess? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow, I had created but did not submit some review items.
Here they are! and sorry for that.
tests/packagedcode/test_cargo.py
Outdated
| ('<[email protected]>', (None, '<[email protected]>')), | ||
| ] | ||
|
|
||
| class TestRegex: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you would need to subclass (object) for now. We are still on Python 2... otherwise you are using "old style" classes ,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yup, forgot about that. I force pushed my last commit to address this.
| self.check_package(package, expected_loc, regen=False) | ||
|
|
||
|
|
||
| PERSON_PARSER_TEST_TABLE = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I would like this style where we have only positional arguments. That's not really easy to read and make sense of. Also to be convinced that its is a better solution, I would also need to see what a failure trace looks like and what feedback it provides to know what to fix where.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intentionally modified two input sets so the tests fail.
$ pytest tests/packagedcode/test_cargo.py
================================== test session starts ==================================
platform linux2 -- Python 2.7.15rc1, pytest-4.3.1, py-1.8.0, pluggy-0.9.0
rootdir: /home/ritiek/Downloads/scancode-toolkit, inifile: setup.cfg
collected 12 items
tests/packagedcode/test_cargo.py .....F.F.... [100%]
======================================= FAILURES =======================================
_____ TestRegex.test_person_parser[Barney Rubble <[email protected]>-expected_person0] ______
self = <test_cargo.TestRegex instance at 0x7f004f85c5a8>
person = 'Barney Rubble <[email protected]>'
expected_person = ('Barney Rubble ', '<[email protected]>')
@pytest.mark.parametrize('person, expected_person', PERSON_PARSER_TEST_TABLE)
def test_person_parser(self, person, expected_person):
parsed_person = cargo.person_parser(person)
person_information = parsed_person.groupdict()
name, email = person_information.get('name'), person_information.get('email')
> assert (name, email) == expected_person
E AssertionError: assert ('Barney [email protected]>') == ('Barney [email protected]>')
E At index 1 diff: u'<[email protected]>' != u'<[email protected]>'
E Use -v to get the full diff
tests/packagedcode/test_cargo.py:89: AssertionError
_ TestRegex.test_person_parser[Some Bad Guy <[email protected]>-expected_person2] __
self = <test_cargo.TestRegex instance at 0x7f004fb5c908>
person = 'Some Bad Guy <[email protected]>'
expected_person = ('Some Good Guy ', '<[email protected]>')
@pytest.mark.parametrize('person, expected_person', PERSON_PARSER_TEST_TABLE)
def test_person_parser(self, person, expected_person):
parsed_person = cargo.person_parser(person)
person_information = parsed_person.groupdict()
name, email = person_information.get('name'), person_information.get('email')
> assert (name, email) == expected_person
E AssertionError: assert ('Some Bad [email protected]>') == ('Some Good [email protected]>')
E At index 0 diff: u'Some Bad Guy ' != u'Some Good Guy '
E Use -v to get the full diff
tests/packagedcode/test_cargo.py:89: AssertionError
=============================== short test summary info ================================
FAILED tests/packagedcode/test_cargo.py::TestRegex::test_person_parser[Barney Rubble <[email protected]>-expected_person0]
FAILED tests/packagedcode/test_cargo.py::TestRegex::test_person_parser[Some Bad Guy <[email protected]>-expected_person2]This is what it looks like running pytest without any additional parameters. It shows us the inputs on which the parametrized test function failed assertion.
That said, This is merely a choice of preference and I don't have any strong opinions on either approach. I could settle with either way. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for the failure trace! It looks decent so I think that can work for smaller tests... though I am not even sure this is practical when there is an expected failure on a test. I do not know if you can also select a subset of the data with the -k CLI arg of py.test.... Anyway, let's keep it as you pushed it for now... and see if I warm up more with the approach :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! :D
Signed-off-by: Ritiek Malhotra <[email protected]>
db4c5a9 to
7416220
Compare
pombredanne
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM and will merge as soon as the CIs complete their runs. These are so slow! but there are a lot of tests too.
|
Thank you again 💯 |
Closes #1424 and addresses #1436.
scancode can now parse description and author fields from Cargo.toml manifest and deal with some edge cases when authors or description are not mentioned under Cargo.toml.
This should be ready to merge if everything looks good!