Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine causal window frames to produce early results. #8842

Merged
merged 15 commits into from
Jan 15, 2024
Merged

Determine causal window frames to produce early results. #8842

merged 15 commits into from
Jan 15, 2024

Conversation

mustafasrepo
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

In streaming execution, window functions generate result when frame end is smaller than batch size (This guarantees that future rows may not extend up the current window frame). However, for causal window frame (e.g window frame that doesn't include future rows than the current row). We can generate results even if frame end is same with batch size.

With this support we can decrease latency for causal queries.

What changes are included in this PR?

This PR adds support for causality analysis for window frame to decrease latency and memory during generating window function results.

Are these changes tested?

Yes

Are there any user-facing changes?

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) substrait labels Jan 12, 2024
@mustafasrepo mustafasrepo changed the title Track causal window frames to produce early results. Determine causal window frames to produce early results. Jan 12, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @mustafasrepo -- I feel I lack the context to review it properly, maybe could you explain the usecase (ideally in doc comments somewhere) with an example or two?

Also, the PR description talks about emitting batches earlier, but I didn't see any test coverage of that. Was that intended?

datafusion/expr/src/window_frame.rs Outdated Show resolved Hide resolved
datafusion/sqllogictest/test_files/window.slt Outdated Show resolved Hide resolved
datafusion/sql/src/expr/function.rs Show resolved Hide resolved
@mustafasrepo
Copy link
Contributor Author

mustafasrepo commented Jan 15, 2024

Also, the PR description talks about emitting batches earlier, but I didn't see any test coverage of that. Was that intended?

At first, I thought it would be hard to exactly test this behaviour. However, after some thinking I found out it wouldn't be that hard. I have added a test for this case in commit Thanks for pointing this out.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @mustafasrepo

Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did another round of review after @alamb and it LGTM

@ozankabak
Copy link
Contributor

I will go ahead and merge this since it is a fairly simple change that is an incremental improvement to existing functionality without any major change to code structure. In case of any issues related to this PR, let us know and we will address with a quick follow-on PR.

@ozankabak ozankabak merged commit 259e12c into apache:main Jan 15, 2024
22 checks passed
/// Flag indicates whether window frame is causal
/// See documentation for [is_frame_causal] for what causal means in this context and how it is calculated.
is_causal: bool,
/// Flag indicating whether the frame is causal (i.e. computing the result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb added the api change Changes the API exposed to users of the crate label Jan 19, 2024
@alamb
Copy link
Contributor

alamb commented Jan 19, 2024

This slightly changed the API to create WindowFrame (can't use struct syntax to create anymore) so marking as API change. You can still create the struct using WindowFrame::new_bounds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants