-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Flink - Demonstrate (position-based) bug with multiple upserts to same row #4415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flink - Demonstrate (position-based) bug with multiple upserts to same row #4415
Conversation
24237d0 to
c060426
Compare
|
cc @openinx @stevenzwu @hililiwei you might be interested in this additional bug I found when using Flink in upsert mode. Of these two tests, the first one does not pass, and the second one does (as noted). |
|
|
||
| TestHelpers.assertRows( | ||
| sql("SELECT * FROM %s", tableName), | ||
| Lists.newArrayList(Row.of("aaa", dt, 6, false), Row.of("bbb", dt, 3, false))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails every time, with the actual row for ("aaa", '2022-03-01') being Row.of("aaa", dt, 6, true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my local environment, it seems to be random and occasionally successful. I'm digging deeper to find out why.
Looks interesting. Let me try. Thanks for Ping me. 😃 |
| sql("INSERT INTO %s VALUES " + | ||
| "('aaa', TO_DATE('2022-03-01'), 1, false)," + | ||
| "('aaa', TO_DATE('2022-03-01'), 2, false)," + | ||
| "('aaa', TO_DATE('2022-03-01'), 3, false)," + | ||
| "('aaa', TO_DATE('2022-03-01'), 6, false)", | ||
| tableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works correctly.
|
The usage of This is discussed in length here: #4515 |
38f53b5 to
2054e26
Compare
2054e26 to
ccb2eda
Compare
|
I just spent some time debugging this and I think the problem is with Flink, not with Iceberg. When I run the case that fails, the issue is that the upserted rows are out of order. In both the final insert data file and the records that I see in the debugger, the column with I also did a little exploration into what is different with the "succeeds" case. The significant change between the two in my testing was the use of I'm going to close this issue. Feel free to reopen if you think it is a problem with Iceberg somehow reordering the rows, but that seems unlikely given that these rows are coming from Flink and are identical when they come through the Iceberg API (the |
As a follow up to #4364, it was discovered that there is an existing bug that is likely related to the positional deletes generated during a single snapshot.
Opening this draft so that others can inspect it. Will also be opening an issue for it as well.