-
Notifications
You must be signed in to change notification settings - Fork 0
Flink - Fix incorrect row being written for delta files when using upsert mode #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
data/src/test/java/org/apache/iceberg/io/TestTaskEqualityDeltaWriter.java
Outdated
Show resolved
Hide resolved
flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/sink/BaseDeltaTaskWriter.java
Show resolved
Hide resolved
|
Yes, there is no dedicated Upsert type in flink. When the row is transferred for deletion, I think we should prune the row first and retain only the key field. But it's not doing that at the moment. |
In our test case, only id province should be paased to deleteKey. It seems that we should use |
|
I found that it seemed that the |
|
By passing in the correct eqDeleteRowSchema, our test cases are now running properly. After you approve my PR, let's see if I can pass the CI. |
d17483f to
93b8487
Compare
|
Closing and reopening for tests. |
| // TODO provide the ability to customize the equality-delete row schema. | ||
| this.appenderFactory = new FlinkAppenderFactory(schema, flinkSchema, table.properties(), spec, | ||
| ArrayUtil.toIntArray(equalityFieldIds), schema, null); | ||
| ArrayUtil.toIntArray(equalityFieldIds), TypeUtil.select(schema, Sets.newHashSet(equalityFieldIds)), null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the present fix to the current issue.
|
This fixes the test cases presented, but causes a number of tests to fail. However, all of those tests are upsert or changelog based tests where the upsert happens on the The tests that fail are all Of note, tests where we upsert on So we might need to only use the update delete projection when the equality ids are not the same as the partition ids and update the delete schema accordingly. Or we possibly need to update the test harness source, given that it has special handling for this particular key. I'd also like to add tests for partition evolution, but this should be addressed first. |
1001f1c to
0bda9c9
Compare
flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/sink/BaseDeltaTaskWriter.java
Outdated
Show resolved
Hide resolved
|
|
||
| /** | ||
| * Delete an eleemnt with an equality delete using the | ||
| * @param row |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description needed
| case UPDATE_AFTER: | ||
| if (upsert) { | ||
| writer.delete(row); | ||
| // Upserts come in modeled as INSERT. We need to be sure that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's best to avoid personal pronouns in comments and docs. It isn't clear who "we" is and omitting it is typically much more direct. For example: "The columns that are not equality ID columns may have changed in the inserted row, so the delete file must be written using only the equality ID columns."
91af385 to
735b811
Compare
f8b590f to
607ce7c
Compare
5995ed6 to
3ce3286
Compare
…sert mode Co-authored-by: liliwei <hililiwei@gmail.com>
f538167 to
089e7e0
Compare
… several times in same statement batch
…on in BaseTaskWriter#write
No description provided.