Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

count vs __len__() #49

Open
osmuser63783 opened this issue Feb 25, 2024 · 1 comment
Open

count vs __len__() #49

osmuser63783 opened this issue Feb 25, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@osmuser63783
Copy link

(This is a very minor feature request)

I was wondering if there's a reason that the number of features in a feature set is provided by count instead of __len__.

It's more intuitive to call len(features) than features.count.

Also, I am working interactively (in a Jupyter notebook) and when I iterate over feature sets I use tqdm to show me a progress bar. Because feature set length is provided by count instead of __len__() I have to type:
for h in tqdm(planet("w[highway]"), total = planet("w[highway]").count)
instead of just
for h in tqdm(planet("w[highway]")
for every loop :-)

And yes, I am very lazy :-D

@clarisma clarisma added the enhancement New feature or request label Feb 26, 2024
@clarisma
Copy link
Owner

I agree, it would be nice to be able to call len(features) in addition to (or instead of) features.count.

The reason __len__ is not implemented is related to the way the Python list constructor works. As an optimization, it attempts to pre-allocate the list's backing array with an exact size if the source collection implements __len__.

However, this would cause the query to execute twice: once to get its length (i.e. count), then again to retrieve the features and populate the list. This is, in fact, what your code sample does -- it calls count to size the progress bar, then performs the query again in the for loop. In most cases, queries execute so quickly that the performance impact is negligible, but large/complex queries could potentially take minutes.

So it's a compromise that let's the user decide if/when to incur the cost of the extra query run.

Ideally, if PyObject_LengthHint() could try the object's __length_hint__ function before __len__, the Query Engine could then decide whether to run the query or just provide a cheap estimate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants