Skip to content

Conversation

@kbendick
Copy link
Contributor

As a follow up to #4364, it was discovered that there is an existing bug that is likely related to the positional deletes generated during a single snapshot.

Opening this draft so that others can inspect it. Will also be opening an issue for it as well.

@github-actions github-actions bot added the flink label Mar 28, 2022
@kbendick kbendick force-pushed the add-test-for-flink-upsert-positional-issue branch from 24237d0 to c060426 Compare March 28, 2022 19:50
@kbendick
Copy link
Contributor Author

cc @openinx @stevenzwu @hililiwei you might be interested in this additional bug I found when using Flink in upsert mode.

Of these two tests, the first one does not pass, and the second one does (as noted).


TestHelpers.assertRows(
sql("SELECT * FROM %s", tableName),
Lists.newArrayList(Row.of("aaa", dt, 6, false), Row.of("bbb", dt, 3, false)));
Copy link
Contributor Author

@kbendick kbendick Mar 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails every time, with the actual row for ("aaa", '2022-03-01') being Row.of("aaa", dt, 6, true)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my local environment, it seems to be random and occasionally successful. I'm digging deeper to find out why.

@hililiwei
Copy link
Contributor

cc @openinx @stevenzwu @hililiwei you might be interested in this additional bug I found when using Flink in upsert mode.

Of these two tests, the first one does not pass, and the second one does (as noted).

Looks interesting. Let me try. Thanks for Ping me. 😃

Comment on lines 193 to 198
sql("INSERT INTO %s VALUES " +
"('aaa', TO_DATE('2022-03-01'), 1, false)," +
"('aaa', TO_DATE('2022-03-01'), 2, false)," +
"('aaa', TO_DATE('2022-03-01'), 3, false)," +
"('aaa', TO_DATE('2022-03-01'), 6, false)",
tableName);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works correctly.

@kbendick
Copy link
Contributor Author

The usage of TO_DATE causes calcite to generate a plan which adds a union across all of the VALUES entries, causing them to potentially be omitted in parallel which makes the plan not what is expected.

This is discussed in length here: #4515

@rdblue rdblue added this to the Iceberg 0.13.2 Release milestone Apr 15, 2022
@kbendick kbendick force-pushed the add-test-for-flink-upsert-positional-issue branch from 38f53b5 to 2054e26 Compare May 10, 2022 16:08
@kbendick kbendick force-pushed the add-test-for-flink-upsert-positional-issue branch from 2054e26 to ccb2eda Compare May 10, 2022 16:21
@rdblue
Copy link
Contributor

rdblue commented Jul 3, 2022

I just spent some time debugging this and I think the problem is with Flink, not with Iceberg.

When I run the case that fails, the issue is that the upserted rows are out of order. In both the final insert data file and the records that I see in the debugger, the column with bool=true is passed in last. Because this is an upsert, the last row is the version that you end up with. I'm not sure why Flink is passing the rows in the wrong order, but it could be that the parser doesn't keep track of the order of rows in a VALUES clause.

I also did a little exploration into what is different with the "succeeds" case. The significant change between the two in my testing was the use of TO_DATE('2022-03-01') instead of DATE '2022-03-01'. TO_DATE appears to trigger the rows getting out of order.

I'm going to close this issue. Feel free to reopen if you think it is a problem with Iceberg somehow reordering the rows, but that seems unlikely given that these rows are coming from Flink and are identical when they come through the Iceberg API (the TO_DATE / DATE distinction is internal to Flink).

@rdblue rdblue closed this Jul 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants