Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Deletion of older Iceberg metadata files on new writes causes Iceberg Readers to fail #5189

Open
2 tasks done
junmuz opened this issue Feb 28, 2025 · 1 comment · May be fixed by #5228
Open
2 tasks done

[Bug] Deletion of older Iceberg metadata files on new writes causes Iceberg Readers to fail #5189

junmuz opened this issue Feb 28, 2025 · 1 comment · May be fixed by #5228
Assignees
Labels
bug Something isn't working

Comments

@junmuz
Copy link

junmuz commented Feb 28, 2025

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

In Paimon Iceberg compatibility mode, I have observed that Iceberg metadata files are completely replaced (old ones are deleted once new ones are created) as new data is written from either Spark or Flink. This causes existing reader based on old Iceberg metadata files to fail, making the tables non-usable in scenarios where you have multiple readers involved.

I was looking at the code, and it appears we do delete the old metadata files every time a new one is created. There can be 2 different scenarios for that.

  1. In the first case, when Iceberg metadata files are created based on older metadata files it simply deletes the older file
  2. In the other case, when Iceberg metadata files are created based on snapshots, expireAllBefore is called here that quietly removes all the old metadata files here

Compute Engine

I have verified with both Spark & Flink, but it should be applicable for any compute engine.

Minimal reproduce step

Reproducing steps.

  • Create new table in Iceberg Compatibility mode and insert some data in it via Spark SQL.
CREATE TABLE paimon_catalog.`default`.cities(country STRING,name STRING) TBLPROPERTIES ('metadata.iceberg.storage' = 'hadoop-catalog');


INSERT INTO paimon_catalog.`default`.cities VALUES ('usa', 'sanjose'), ('germany', 'berlin');
  • At this point a Iceberg metadata file should be created

  • Insert some more data into the Paimon table

INSERT INTO paimon_catalog.`default`.cities VALUES ('usa', 'chicago'), ('germany', 'hamburg');
  • The old Iceberg metadata file should be replaced with the newer one.

What doesn't meet your expectations?

Right now, the code doesn't leverage the Iceberg options write.metadata.previous-versions-max and write.metadata.delete-after-commit.enabled as mentioned here. In my view, we can support these Iceberg Options for Iceberg compatible Paimon table and delete old metadata files based on the configurations.

Anything else?

Let me know about your thoughts here.

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@junmuz junmuz added the bug Something isn't working label Feb 28, 2025
@JingsongLi
Copy link
Contributor

+1 to support iceberg options.

@junmuz junmuz linked a pull request Mar 7, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants