-
Notifications
You must be signed in to change notification settings - Fork 3k
Flink: fix flink unit test testHashDistributeMode #4117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@stevenzwu what do you think? |
|
@zhongyujiang, I think I would prefer a fix that avoids the root cause but still runs the test in streaming mode. I understand your concern about not being able to necessarily guarantee we won't have a flaky test, but we can probably set that high enough (1s?) that we don't see it in practice. |
508f0f2 to
0ac9cc7
Compare
|
@rdblue I have updated, not sure 1s is high enough but let's give it a try first. |
| Assert.assertEquals("There should be 1 data file in partition 'ccc'", 1, | ||
| SimpleDataUtil.matchingPartitions(dataFiles, table.spec(), ImmutableMap.of("data", "ccc")).size()); | ||
| Assert.assertTrue("There should be no more than 1 data file in partition 'aaa'", | ||
| SimpleDataUtil.matchingPartitions(dataFiles, table.spec(), ImmutableMap.of("data", "aaa")).size() < 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the assert condition, because if there are multiple checkpoints, data may arrive in this way:
ck1: (1, "aaa")
ck2: (1, "bbb")
...
so I think we should assert each snapshot has no more than 1 file per partition, since it could be 0 file as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unclear to me how this change of assertion is related the potential cause you described where 2 checkpoint cycles can be committed in one shot. Then we can get 2 files for one partition. why would we get 0 file for a partition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't figure out a way to validate hash distribution when there have merged results of multiple ckpts, originally I simply disabled this test running in streaming mode.
This is an update for blue's comment,
I think I would prefer a fix that avoids the root cause but still runs the test in streaming mode. I understand your concern about not being able to necessarily guarantee we won't have a flaky test, but we can probably set that high enough (1s?) that we don't see it in practice.
I improved ck interval to 1000ms to reduce merge results possibility. And in streaming mode, I think the original assert is not right given the checkpoint scenario mentioned in my last comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error you encountered is value is 2 (not 1). Hence I said this change from == 1 to < 2 won't even work around the error. anyway, it seems that other discussions in the PR already led us to the right root cause and solution.
|
@zhongyujiang Flink default only has 1 concurrent checkpoint. could the scenario you described happen in this case? |
|
@zhongyujiang , what's the current failure stacktrace you encountered ? I'd like to take a careful look to this problem, and hope we can fix this in this round work. |
I think it's not relevant to Flink
|
Like this:
Haven't encountered locally yet. |
|
|
Reconsidered this test case, I think @zhongyujiang is getting the root cause in the correct direction. Let's explain the cause here: In the unit test case, we are trying to write the following records into apache iceberg table by shuffling by partition field As we may produces multiple checkpoints when the streaming job is running, Then it's possible that we write the records in the following checkpoints:
Then it will produces a seperate data file for each partition in the given checkpoint. Let's say:
Assume the snapshotState & notifyCheckpointComplete are arrived as the following:
Then in the step#3, it will commit one transaction with the alll the data files which comes from checkpoint#1 & checkpoint#2 (According to the this IcebergFilesCommitter implementation) , finally this latest snapshot will include all the data files from |
|
But I generally don't think the current fix is in the correct direction, there are my points:
I think the real intention that we designed this unit test is: we want to ensure that there is only one generated data file in each given partition if we commit those rows in only one single deterministic iceberg transaction, once we enable the switch The current root cause is: we cannot make it trigger only one checkpoint for the given 9 rows in the flink streaming sql job. So I think the correct direction is: make only one checkpoint to write those 9 rows and finally we still assert there is only one data file in each given partition. To accomplish this goal, I think we can use the BoundedTestSource to reimplement this unit test. About the BoundedTestSource, here is a good example for how to producing multiple rows into a single checkpoint. |
Changing the assertion condition is not intended to solve the validation problem when there are merged results actually, like you said, there could be more than one checkpoint in streaming mode, but there is no guarantee that each checkpoint contains exactly each partition's data. The situation could be like this:
When results of ck1 and ck2 are not merged, then snapshot of ck1 would have only 1 data file for partition
I also wanted to solve the problem by controlling the checkpoint in the beginning but didn't figure a convenient way to do so. Using |
|
@openinx read the javadoc that you linked. seems that notifyCheckpointComplete can be skipped due to best effort. it also said that the notification can't be assumed for the latest snapshot. But it didn't say if they can come out of order. so the scenario could be Agree that precise control on the source could be the right solution here. |
This pr fix unit test TestFlinkSink#testHashDistributeMode which fails occassionally in Flink CI, have been discussed a lot in #2989 and #3365.
I think the root cause is the way notifyCheckpointComplete works in IcebergFilesCommitter:
iceberg/flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java
Lines 182 to 195 in d43cb4c
iceberg/flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java
Lines 261 to 277 in ca46fc9
As show above, results of multiple ckpts may be merged into one commit in streaming mode, and the checkpoint interval (400 ms) here is rather small, which makes this situation very likely.
Increasing checkpoint interval would reduce such failure, but it cannot be completely eliminated in theory. So I simply made this unit test only apply for batch mode, which is enough to validate Hash distribute mode in my opinion.
@openinx @szehon-ho could help review this? thanks!