-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for list Field Coverage #391
Support for list Field Coverage #391
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #391 +/- ##
==========================================
+ Coverage 76.44% 76.54% +0.09%
==========================================
Files 76 76
Lines 3197 3214 +17
Branches 379 384 +5
==========================================
+ Hits 2444 2460 +16
Misses 683 683
- Partials 70 71 +1
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this feature is interesting, but I'm concerned about the impact that iterating over each array of an item might have on performance. Have you tested this patch by running some Scrapy jobs with complex or larger items to see if they perform well compared to the basic version?
I haven't run any test jobs. But this will definitely impact performance. For example, if we assume all items in a job have one field which is a list of All the ways I can think of doing this (using a Maybe we can have it as a setting and inform the user of the possible performance impact? |
Updated it so now it is enabled through a setting and the coverage nesting levels can be set. |
If I understand correctly, this is disabled by default. If so, we can leave it up to users to decide whether they are willing to enable this at the cost of the corresponding performance hit. Maybe mentioning the potential performance hit in the setting documentation is enough. |
yup, you're correct |
Hi all! To make sure, is anything blocking this PR from getting approved? There's a comment in the docs from this PR about the performance impact of enabling this setting. I think that's covered now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️ from me, I did not realize the new docs already mentioned performance.
Closes #390.
When a field scraped by a spider is a list containing objects, there's no way to set thresholds for those fields. This PR adds support for correctly couting and calculating the coverage for those types of fields both at the top level of the item and inside nested structures.