-
Notifications
You must be signed in to change notification settings - Fork 617
Iceberg with data migrations #24780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iceberg with data migrations #24780
Conversation
/dt |
2241484
to
b140307
Compare
5d9c8d7
to
d493031
Compare
CI test resultstest results on build#60655
test results on build#60858
test results on build#61039
test results on build#61291
test results on build#61361
|
d493031
to
796d262
Compare
Retry command for Build#60858please wait until all jobs are finished before running the slash command
|
Meh. It does not fail when I run it locally with repeat. |
796d262
to
9e89a98
Compare
I increased timeout in the test, as it coordinator loop, as one last use of the long one may be in progress while we are waiting. Also added some logging and removed dead code. Please re-review. |
/dt |
co_return co_await flush(partition); | ||
auto block_offset = block_res.value(); | ||
|
||
auto deadline = model::timeout_clock::now() + 5s; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that this timeout value is very low considering the possible amount of work is there to be done. Should we consider making it larger and configurable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be retried by migration reconciliation, including when controller leadership changes. Since partition is blocked already translation backlog will eventually get processed. Translator flush doesn't do any active work, it just waits for certain offset to be translated. The only problem is cloud storage flush will be invoked every time. @WillemKauf @andrwng if a partition does not receive any further writes is it much overhead to flush its cloud data every few seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this call we are waiting for the translator to actually execute the translation work, and this may take a while depending on the translation gap. If all the calls are idempotent then this is great, it will just be retried and eventually succeed
/dt |
Check that with redpanda.iceberg.delete=false old table data remains available even before we recreate the topic.
And switch back to normal admin after disruptions are over.
add log lines, fix typos
if we unmount the topic before this table may lack metadata
Introduce "offline mode" that cuts all ties to the topic in Redpanda cluster. It carries on querying the query engine and verifying results using info cached before going into offline mode.
for to make functionality is tested while topic is being actively used
Make it possible to configure the number of messages produced by stream
Add scenarios: 1) On unmount all messages that made their way to the topic eventually become available via query engine 2) Upon remount and further produce both old and new messages are in the topic and in the table
to prevent archiver shutdown while waiting
This is mostly to preserve iceberg properties, but also to make sure any newly introduced topic properties are preserved by default.
This is mostly to preserve iceberg properties, but also to make sure any newly introduced topic properties are preserved by default.
Allows to use it for subscriptions where feedback from a called function is necessary, such as a future or an error code. All functions are supposed to return the same type.
Make offset_monitor more universal so that it can be used for different data types.
Also create and subscribe one of these actions: flush data to cloud.
Wait for the offset to be translated when asked by partition to "flush".
When blocking writes collect the offset of the blocking message. Then use it to dispatch all-components flush through partition (leading to cloud storage flush that ignores the offset parameter and datalake translator that waits for the correspondent kafka offset)
9e89a98
to
ae89710
Compare
https://redpandadata.atlassian.net/browse/CORE-8439
Backports Required
Release Notes
Features