-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(api): refactor the implementation of windowing #9200
base: main
Are you sure you want to change the base?
Conversation
edae581
to
84bac37
Compare
fc5d2fd
to
25fe86d
Compare
00e09e8
to
6f14e43
Compare
@chloeh13q This LGTM! Just to clarify, are there any user-facing API changes, or is it only internals that have changed? |
769fb23
to
dec2c92
Compare
@cpcloud Thanks for reviewing!
There are user-facing API changes, although the changes are pretty subtle. We're changing: expr = (
t.window_by(time_col="i")
.tumble(window_size=ibis.interval(minutes=15))
.group_by(["window_start", "window_end", "g"])
.aggregate(mean=_.d.mean())
) to something like expr = (
t.window_by(time_col="i")
.tumble(window_size=ibis.interval(minutes=15))
.group_by(["g"])
.aggregate(mean=_.d.mean())
) The big difference is that we're trying to hide columns like "window_start" "window_end" from the user (as well as providing more alternative syntax in a more Ibis-like manner, e.g., you can use both [ps: only implemented windowed aggregations so far to see if we like this new API and IR changes. Planning to implement window top-n, window join, and cascading windows (aka chained window aggregations) (note that we didn't have alternative syntax for these previously, but you could do them with underlying ops like row number, so these are going to be new APIs) in subsequent PRs] With changes merged from another PR, I was able to add tests for window aggregations in the Spark backend too. |
Description of changes
Some design considerations:
ibis.window()
.The new API:
Issues closed
#8847