-
Notifications
You must be signed in to change notification settings - Fork 2.9k
API,Core,Spark: Add rewritten bytes to rewrite data files procedure results #6801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API,Core,Spark: Add rewritten bytes to rewrite data files procedure results #6801
Conversation
315b1e4 to
8e7d6c8
Compare
Fokko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| private final FileGroupInfo info; | ||
|
|
||
| /** | ||
| * @deprecated Will be removed in 1.3.0; use {@link |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we create an issue for this, and add it to the 1.3.0 milestone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll usually clean things like this up right after a release, so I'll open an issue for that shortly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, just to make sure that's in the collective memory of the community :)
...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java
Show resolved
Hide resolved
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on board for this adjustment. Only issue is the test suite become a little brittle with the change. I left a note on how I think we can keep it from requiring a lot of patching in the future.
8e7d6c8 to
43465c7
Compare
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java
Outdated
Show resolved
Hide resolved
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/SparkTestHelperBase.java
Show resolved
Hide resolved
...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java
Show resolved
Hide resolved
43465c7 to
f7aa973
Compare
| insertData(tableName(QUOTED_SPECIAL_CHARS_TABLE_NAME), 10); | ||
| insertData(tblName, 10); | ||
| // TODO: metadata table access currently fails with special chars in the table name | ||
| // long dataSizeBefore = testDataSize(tblName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a bug to me where the table with special characters can't be found when running SELECT sum(file_size_in_bytes) from %s.files. It fails with
Caused by: java.io.FileNotFoundException: File file:/tmp/warehouse2890706410427132468.tmp/default/table:with.special:chars/metadata/2ae7cad2-3cff-4c92-9536-9bf9652f119d-m1.avro does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:779)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:1100)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:769)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:462)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:160)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:372)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976)
at org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:183)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turns out this happened because of caching in the Hadoop catalog. Adding cache-enabled=false fixes this issue
e78ef43 to
a017a34
Compare
...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java
Outdated
Show resolved
Hide resolved
a017a34 to
a00a9bd
Compare
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java
Show resolved
Hide resolved
…#6117) Yesterday this PR got merged (apache/iceberg#6801) which introduces one more output value. Hence, the strict check fails. This PR is to unblock `query-engine-integration-tests` Part of fixing #6114
…esults (apache#6801) Co-authored-by: Alex Reid <[email protected]>
No description provided.