Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up arg_to_iter #51

Merged
merged 6 commits into from
Dec 20, 2023
Merged

Speed up arg_to_iter #51

merged 6 commits into from
Dec 20, 2023

Conversation

Gallaecio
Copy link
Member

Fixes #50

If we restrict the subset of sequence and generator types that we allow arg_to_iter to handle, we can remove the use of is_item altogether to significantly improve its performance.

I tested performance with:

from timeit import timeit

from itemloaders.utils import arg_to_iter


def a():
    return arg_to_iter({})

print(timeit(a, number=1000000))

Results:

@codecov
Copy link

codecov bot commented Mar 18, 2022

Codecov Report

Merging #51 (93d0dd7) into master (760e925) will decrease coverage by 0.01%.
Report is 2 commits behind head on master.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #51      +/-   ##
==========================================
- Coverage   99.25%   99.24%   -0.01%     
==========================================
  Files           4        4              
  Lines         267      266       -1     
==========================================
- Hits          265      264       -1     
  Misses          2        2              
Files Coverage Δ
itemloaders/utils.py 93.54% <100.00%> (-0.21%) ⬇️

@Gallaecio
Copy link
Member Author

The documentation issue is caused by scrapy/scrapy#5402

@wRAR
Copy link
Member

wRAR commented Apr 4, 2022

So I guess this changes the result from true to false for some types not listed in isinstance?

@Gallaecio
Copy link
Member Author

It does.

I think supporting only a limited number of iterables as input for values is a low price for this much of a performance improvement. To be honest, I would have even dropped generator support be it not for the existing tests for it.

I think it would be best to document the specific iterables that are supported as input for values, limit the implementation to that subset, and let users do the required conversion for scenarios where custom types are needed.

Not a strong opinion, though. But since is_item is more likely to support a wider range of values, and likely to become slower as well as a result, I think avoiding its usage here is key for performance.

@kmike kmike merged commit de94c5c into scrapy:master Dec 20, 2023
12 checks passed
@ava7
Copy link

ava7 commented May 12, 2024

It does.

I think supporting only a limited number of iterables as input for values is a low price for this much of a performance improvement. To be honest, I would have even dropped generator support be it not for the existing tests for it.

I think it would be best to document the specific iterables that are supported as input for values, limit the implementation to that subset, and let users do the required conversion for scenarios where custom types are needed.

Not a strong opinion, though. But since is_item is more likely to support a wider range of values, and likely to become slower as well as a result, I think avoiding its usage here is key for performance.

@Gallaecio Any chance of keeping the set? 🥺

@Gallaecio
Copy link
Member Author

Seems reasonable to me, feel free to open a PR.

@ava7 ava7 mentioned this pull request May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The re-introduction of nested item support caused a significant performance degradation
4 participants