-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor: Extract BatchCoalescer
to its own module
#12047
Conversation
@@ -365,511 +335,3 @@ impl RecordBatchStream for CoalesceBatchesStream { | |||
self.coalescer.schema() | |||
} | |||
} | |||
|
|||
/// Concatenate multiple record batches into larger batches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of this PR is to move this code into its own module
@@ -346,7 +316,7 @@ impl CoalesceBatchesStream { | |||
} | |||
CoalesceBatchesStreamState::Exhausted => { | |||
// Handle the end of the input stream. | |||
return if self.coalescer.buffer.is_empty() { | |||
return if self.coalescer.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting the BatchCoalescer into its own module means this code must use an accessor rather than directly look at the coalescer's state, which I think improves the modularity (a tiny bit)
/// combined filter/coalesce operation. | ||
/// | ||
#[derive(Debug)] | ||
pub struct BatchCoalescer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only changes in this module are to make the struct's and some methods pub
and update the documentation slightly
6e9a036
to
3f9bf97
Compare
Will review tomorrow 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you 🚀
Thank you for the review @ozankabak |
Which issue does this PR close?
Part of #7957 and #11628
Rationale for this change
I have hopes to improve how coalesce batches works in DataFusion saving a copy to improve performance, partly by avoiding the need for
CoalesceBatchesExec
Part of this is to use the
BatchCoalescer
more, so let's pull it into its own module. This also makes the code boundaries a bit clearer (e.g. the functions that form the public interface now must be markedpub
)What changes are included in this PR?
BatchCoalescer
to its own moduleAre these changes tested?
By existing CI
Are there any user-facing changes?
No functional changes