Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message when calling len() on LazyFilter #817

Merged
merged 3 commits into from
Sep 24, 2022

Conversation

desh2608
Copy link
Collaborator

Currently, when len() is called on an object of type LazyFilter (which is the case when a filter is applied on a lazily loaded manifest), it will throw a TypeError:

TypeError: object of type 'LazyFilter' has no len()

After this change, we instead show an informative message:

NotImplementedError: LazyFilter does not support __len__ because it would require iterating over the whole iterator, which is not possible in a lazy fashion. If you really need to know the length, convert
 to eager mode first using `.to_eager()`. Note that this will require loading the whole iterator into memory.

pzelasko
pzelasko previously approved these changes Sep 24, 2022
Copy link
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, great work! It should be very helpful for the users.

def from_features(features: Union[Iterable[Features], LazyMixin]) -> "FeatureSet":
return (
FeatureSet([f for f in features])
if isinstance(features, LazyMixin)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why not use one of these two options for both cases? I recall list has some weird behavior with regard to whether len is implemented, but maybe we can just always use comprehension?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, list() requires __len__ to be implemented, which is why I had to use list comprehension. However, it seems like list comprehension may be slower than list() (see this), which is why I didn't make it the default.

def __add__(self, other) -> "LazyIteratorChain":
return LazyIteratorChain(self, other)

def __len__(self) -> int:
raise NotImplementedError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check other lazy iterator classes and see if they can also use this error message? At least one that comes to my mind is LazyFlatten.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to LazyFlattener. All other lazy iterator classes already have __len__ defined.

@pzelasko pzelasko merged commit 32e8e81 into lhotse-speech:master Sep 24, 2022
@desh2608 desh2608 deleted the lazy_filter branch September 24, 2022 17:37
@pzelasko pzelasko added this to the v1.8 milestone Sep 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants