-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Use avro compression properties from table properties when writing manifests and manifest lists #6799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I am picking up #5893 where @sumeetgajjar left off, as he is now pursuing other projects. |
core/src/test/java/org/apache/iceberg/TestManifestListWriter.java
Outdated
Show resolved
Hide resolved
f5457a3 to
c81b209
Compare
|
@nastra, thank you for your reviews. I have addressed all your feedback. |
nastra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the only thing left is fixing the breaking API change. @rdblue or @danielcweeks could you guys review this as well please?
core/src/test/java/org/apache/iceberg/TestManifestListWriter.java
Outdated
Show resolved
Hide resolved
|
@rdblue thank you for reviewing. I have just been on a short vacation. I'll look into your comments. |
|
@rdblue I have tried to address all your feedback. The only comment I did not understand is the one about not exposing |
e7e52c2 to
32a9a3a
Compare
f01d1d8 to
fb6fd2e
Compare
ConeyLiu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from my part
Is |
fb6fd2e to
8306e42
Compare
I think it would be good to get some additional review from one other committer. /cc @Fokko @amogh-jahagirdar @aokolnychyi could any of you review this one as well please? |
|
Will do a review by the end of this week, sorry for the delay. |
| * @param compressionLevel compression level of the compressionCodec | ||
| * @return a manifest writer | ||
| */ | ||
| public static ManifestWriter<DataFile> write( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I already commented before going on vacation but can't seem to find the old discussion. Sorry if I post the same question again. Have we considered using a builder? My worry with the current approach was that we need to offer an overloaded method every time we add a new parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aokolnychyi thanks for reviewing. I'm interested to hear what @rdblue thinks. In the meantime, let me think about how to address your concern. However, using a builder will mean an API break, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumeetgajjar originally used a Map parameter for this , and @rdblue commented that "we don't want to pass a map of properties around. That's exposing too much where it doesn't need to be, and people tend to misuse generic arguments like this."
What I propose to do then is to introduce a ManifestWriter.Options class and use that here (instead of a Map). I'll also introduce a ManifestListWriter.Options class and use that in ManifestLists.write. These Options classes define what additional parameters are applicable and may be set. If in future, additional parameters are needed, they can be added to these Options classes.
|
@nastra @aokolnychyi please let me know what you think. I updated the PR with the new approach. |
1. remove unecessary edits to use compression codec and compression level from tests 2. move ManifestWriter tests to TestManifestWriter 3. move ManifestListWriter tests to TestManifestListWriter 4. remove unwanted `ManifestLists#write` method
Remove NumberUtil. Use PropertyUtil.propertyAsNullableInt instead. Add convenience method propertyAsNullableInt to TableMetadata. Fix a TestTableBase#writeManifest to write a file with .avro extension. Revert Flink 1.15 changes and make them in Flink 1.16 instead.
Retain but deprecate old newAppender method in ManifestWriter. Make Codec public in Avro, we can can use the enum values. Rename CODEC_METADATA_MAPPING to AVRO_CODEC_NAME_MAPPING in TableTestBase and provide explanatory comment. Use named constants in the map. Adopt suggestion to use AssertJ assertions in test verification. Update zstd-jni version.
Make zstd-jni dependency testRuntimeOnly. Fix some nits. Simplify validate methods in TestManifestWriter and TestManifestListWriter.
... and have the static methods in ManifestFiles and ManifestLists for writing use them.
041b7a0 to
be048b5
Compare
|
Hmm, I think TestExpireSnapshotsAction > dataFilesCleanupWithParallelTasks might be a flaky test? |
|
@nastra @aokolnychyi I have rebased on main and resolved the conflicts with the |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
This is a continuation of #5893.
Use the avro compression properties (
write.avro.compression-codecandwrite.avro.compression-level) set in the table properties to determine compression when writing manifests and manifest lists for the table.