Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] supper passing pyarrow.dataset.Expressions to Dataset.filter's expr #50799

Open
schmidt-ai opened this issue Feb 21, 2025 · 0 comments
Open
Labels
enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks

Comments

@schmidt-ai
Copy link

schmidt-ai commented Feb 21, 2025

Description

Currently, one may only pass a string representation of an Expression to Dataset.map(expr...), but it would be nice to allow passing either a string or an Expression.

Use case

I want to programmatically generate a filter expression:

from functools import reduce
from operator import or_

cols = ["a", "b"]

# filter rows where any of `cols` is null
expr = ~reduce(or_, [pa.field(col).is_null(nan_is_null=True) for col in cols])

dataset.filter(expr=expr)

and I'd rather do this with the native pyarrow Expression API instead of string manipulation.

If I try dataset.fiilter(expr=str(expr) it raises ValueError: Invalid syntax in the expression.

@schmidt-ai schmidt-ai added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Feb 21, 2025
@gvspraveen gvspraveen added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

No branches or pull requests

2 participants