Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg with data migrations #24780

Open
wants to merge 23 commits into
base: dev
Choose a base branch
from

Conversation

bashtanov
Copy link
Contributor

@bashtanov bashtanov commented Jan 11, 2025

https://redpandadata.atlassian.net/browse/CORE-8439

  • Add a test for iceberg to read from table whose topic was deleted
  • Fix minor data migration test issues
  • Add a test to run iceberg translation for topics unmounted and then, optionally, mounted
  • For recovered and mounted topics, make Redpanda preserve most topic properties including iceberg ones (fixes https://redpandadata.atlassian.net/browse/CORE-563)
  • When unmounting make sure all messages are translated for iceberg

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Features

  • Make Iceberg and topic mount/unmount work well together

@bashtanov
Copy link
Contributor Author

/dt

@bashtanov bashtanov force-pushed the iceberg-w-data-migrations branch from 2241484 to b140307 Compare January 13, 2025 08:46
@bashtanov bashtanov marked this pull request as ready for review January 13, 2025 09:25
@bashtanov bashtanov force-pushed the iceberg-w-data-migrations branch 5 times, most recently from 721330c to 5d9c8d7 Compare January 13, 2025 14:24
Check that with redpanda.iceberg.delete=false old table data remains
available even before we recreate the topic.
And switch back to normal admin after disruptions are over.
add log lines, fix typos
if we unmount the topic before this table may lack metadata
@bashtanov bashtanov force-pushed the iceberg-w-data-migrations branch from 5d9c8d7 to d493031 Compare January 13, 2025 15:45
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 13, 2025

CI test results

test results on build#60655
test_id test_kind job_url test_status passed
idempotency_tests_rpunit.idempotency_tests_rpunit unit https://buildkite.com/redpanda/redpanda/builds/60655#01946058-a634-40d0-9eaa-a36681179d0c FLAKY 1/2
rptest.tests.partition_reassignments_test.PartitionReassignmentsTest.test_reassignments_kafka_cli ducktape https://buildkite.com/redpanda/redpanda/builds/60655#019460b1-911e-438a-9908-47112e140ed0 FLAKY 2/6
test results on build#60858
test_id test_kind job_url test_status passed
idempotency_tests_rpunit.idempotency_tests_rpunit unit https://buildkite.com/redpanda/redpanda/builds/60858#0194702b-4eab-49b6-a88e-86bc997a49fa FLAKY 1/2
rptest.tests.datalake.simple_connect_test.RedpandaConnectIcebergTest.test_translating_avro_serialized_records.cloud_storage_type=CloudStorageType.S3.scenario=remount ducktape https://buildkite.com/redpanda/redpanda/builds/60858#01947072-773f-429b-924e-88e414d4e4b8 FAIL 0/1

Introduce "offline mode" that cuts all ties to the topic in Redpanda
cluster. It carries on querying the query engine and verifying results
using info cached before going into offline mode.
for to make functionality is tested while topic is being actively used
Make it possible to configure the number of messages produced by stream
@bashtanov bashtanov force-pushed the iceberg-w-data-migrations branch from d493031 to 796d262 Compare January 16, 2025 17:29
@vbotbuildovich
Copy link
Collaborator

Retry command for Build#60858

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/datalake/simple_connect_test.py::RedpandaConnectIcebergTest.test_translating_avro_serialized_records@{"cloud_storage_type":1,"scenario":"remount"}

@bashtanov
Copy link
Contributor Author

Meh. It does not fail when I run it locally with repeat.

@bashtanov bashtanov marked this pull request as draft January 17, 2025 00:09
Add scenarios:
1) On unmount all messages that made their way to the topic eventually
become available via query engine
2) Upon remount and further produce both old and new messages are in the
topic and in the table
This is mostly to preserve iceberg properties, but also to make sure any
newly introduced topic properties are preserved by default.
This is mostly to preserve iceberg properties, but also to make sure any
newly introduced topic properties are preserved by default.
Allows to use it for subscriptions where feedback from a called function
is necessary, such as a future or an error code.
All functions are supposed to return the same type.
Make offset_monitor more universal so that it can be used for different
data types.
Also create and subscribe one of these actions: flush data to cloud.
Wait for the offset to be translated when asked by partition to "flush".
When blocking writes collect the offset of the blocking message.
Then use it to dispatch all-components flush through partition
(leading to cloud storage flush that ignores the offset parameter and
datalake translator that waits for the correspondent kafka offset)
@bashtanov bashtanov force-pushed the iceberg-w-data-migrations branch from 796d262 to 9e89a98 Compare January 17, 2025 08:49
@bashtanov bashtanov marked this pull request as ready for review January 17, 2025 08:49
@bashtanov
Copy link
Contributor Author

I increased timeout in the test, as it coordinator loop, as one last use of the long one may be in progress while we are waiting. Also added some logging and removed dead code. Please re-review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants