Skip to content

Conversation

@yittg
Copy link
Contributor

@yittg yittg commented Apr 11, 2022

In Flink, only VALUES with literal tuples can be converted to a single
LogicalValues node. VALUES with other type of tuples will be converted
to a chain with multiple LogicalProject nodes, like following:

LogicalValues -> (# number or rows)LogicalProject -> LogicalUnion.

The TO_DATE('xxxx-xx-xx') expression is not a literal node, so the
VALUES used in upsert test cases will be converted in the latter way.
The order of tuples can not be guaranteed, which can lead to unstable
test results.

Changing TO_DATE('xxxx-xx-xx') to DATE 'xxxx-xx-xx' can make it a
literal node and ensure that the VALUES expression can be converted to a
single LogicalValues node.

fixes #4515

@github-actions github-actions bot added the flink label Apr 11, 2022
Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great. Thank you to all involved @singhpk234 @yittg @openinx

I left some initial comments and will be reviewing more tomorrow.


@Test
public void testPrimaryKeyEqualToPartitionKey() {
// This is an SQL based reproduction of TestFlinkIcebergSinkV2#testUpsertOnDataKey
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this comment as well while you're updating this? It's no longer applicable really and not hepful.

Thank you!

Comment on lines +144 to +141
"(4, 'a', DATE '2022-03-02')," +
"(5, 'b', DATE '2022-03-02')," +
"(1, 'b', DATE '2022-03-02')",
Copy link
Contributor

@kbendick kbendick Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we addd a comment about the usage of DATE vs TO_DATE, given that we're relying on DATE to affect the generated job graph differently than using TO_DATE does, which affects ordering etc?

A short comment and then directing to this comment might be sufficent: #4515 (comment)

Copy link
Contributor Author

@yittg yittg Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the commit message is a more proper place to explain it.
I'll split this PR into two parts as you mentioned, and describe the reason in the commit message of the first.
WDYT @kbendick ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. As well as a proper description in the PR comment (which can be the same). As those are the two most commonly searched places when reviewing older things.

Thanks again for the great work.

Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yittg Would you mind spitting this into two PRs, one that has the minimum changes required to fix the issue (changing the use of TO_DATE to just DATE and anything else needed), and then another PR to add the dynamic data generation and things?

It would be very beneficial in the long term to have a more clear record of the fix that is needed, without the extra changes. The differences in the two plans generated will be helpful to have a good long term record of outside of just the issue.

Then a second PR can be opened to clean up and better normalize the tests.

Or at the least if you can add a good summary of the changes to the PR description, that would be highly appreciated. 🙂

In Flink, only VALUES with literal tuples can be converted to a single
LogicalValues node. VALUES with other type of tuples will be converted
to a chain with multiple LogicalProject nodes, like following:

    LogicalValues -> (# number or rows)LogicalProject -> LogicalUnion.

The `TO_DATE('xxxx-xx-xx')` expression is not a literal node, so the
VALUES used in upsert test cases will be converted in the latter way.
The order of tuples can not be guaranteed, which can lead to unstable
test results.

Changing `TO_DATE('xxxx-xx-xx')` to `DATE 'xxxx-xx-xx'` can make it a
literal node and ensure that the VALUES expression can be converted to a
single LogicalValues node.
@yittg yittg changed the title Flink: fix and normalize ut for upsert Flink: fix out of order appends in upsert UT Apr 11, 2022
@yittg
Copy link
Contributor Author

yittg commented Apr 11, 2022

@kbendick i've changed this PR to contain only the first part.

I'll create another PR for the second part after this being merged, if necessary. If the current test data is already enough, we can just drop the second part. :)

Copy link
Member

@openinx openinx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me ! Thanks @yittg for the contribution !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flink : UT testPrimaryKeyFieldsAtEndOfTableSchema fails probablistcally

3 participants