-
Couldn't load subscription status.
- Fork 3.4k
Delete the oldest tracked version metadata files after commit #23813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a new test class? Please use existing test classes as much as possible to avoid redundant bootstrap time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iceberg-connector: Delete the oldest tracked version metadata files after commit
Please follow commit message guideline. https://cbea.ms/git-commit/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder, I see you changed a PR title. Please also update a commit title.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I thought the PR's title would be the title of the final merge commit. I will reopen a PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to reopen a PR. Please amend the commit title and push it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use Sets.difference instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Unwrap and static import METADATA_DELETE_AFTER_COMMIT_ENABLED & METADATA_DELETE_AFTER_COMMIT_ENABLED_DEFAULT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Use enhanced instanceof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://trino.io/docs/current/develop/tests.html
Test methods should be defined as package-private.
Same for tearDown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parquet is the default format. Remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unwrap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these config properties related to the test purpose? Please remove if it's unrelated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trino/plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergHiveMetastoreAutoCleanMetadataFile.java
The file path is wrong. It should be plugin/trino-iceberg/..., remove a leading trino.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should verify that the oldest metadata file was exactly removed instead of just checking the count.
|
Thank you very much for your patient review. According to your suggestions, I have completed the revisions. Please review it again. Thank you very much. @ebyhr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder, I see you changed a PR title. Please also update a commit title.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to put this constant in fields as this is used only from testInsertWithAutoCleanMetadataFile. Please change to a local variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test class is for verifying file operation counts. Please move to TestIcebergV2 which has loadTable method or somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please assert metadata files in this loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AssertJ provides hasSize method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unwrap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unwrap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename f to file.
Single file without a delete in a partition can't be optimized any further
When a memory input file is created for avro, rc, and line readers, we need to update the length that will be passed in to the reader since the length of the memory input file can possibly be less than the original input file length.
While HTTP/2 was supported before, now it is used by default with CLI as well as the JDBC driver.
Previous logic was setting all columns as non-nullable which is not valid when the data contains nulls
Description
When using the iceberg table, the xxx-metadata.json file is generated each time commit is executed. In the iceberg table, we can automatically clean the previous metadata file through configuration.
write.metadata.delete-after-commit.enabled = truewrite.metadata.previous-versions-max = 10However, the metadata file is not automatically cleaned in trino. A large number of metadata.json files are left on the hdfs.
Different from #20863 ,I have followed the configurations in iceberg (
write.metadata.delete-after-commit.enabledandwrite.metadata.previous-version-max). Instead of adding a new configuration, this keeps it compatible with iceberg.Release notes
( x) Release notes are required, with the following suggested text: