-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trigger closing time partition after time some event time have been observed. #103
Comments
@krisskross This seems like basically what the Is there a specific case you're thinking of where you need to close the file sooner? I think one common example might be a time-based partitioner where timestamps are guaranteed to be monotonically increasing, in which case you know when it is safe to rotate a file (although note that the requirement that it is monotonically increasing is harder to guarantee in practice than you may think). Is there some other example you're thinking of that isn't addressed by the existing approach, or are you just trying to reduce the latency of delivery for the final file? |
The use case i'm seeking is closing files earlier so that they can be processed by batch jobs as early as possible. Also it might be possible to get larger files (ideally one per partition) which have the benefit of more efficient compression and faster batch processing. We use an modified version of |
Sure, so maybe a way to implement this would be to add a method to the |
Yes, sounds like reasonable way forward. Maybe as default method on the interface in order to not break backward compatibility? |
Hi
It would be nice to be able to trigger a
close
of a file when a certain event time has been observed. This so that a certain hour can be considered finished instead of waiting forflush.size
orrotate.interval.ms
to trigger theclose
. This would make parquet files larger and finish earlier. Win-win.Cheers,
-Kristoffer
The text was updated successfully, but these errors were encountered: