-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-5592] Fixing some of the flaky tests in CI #7720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a17aabe to
744487a
Compare
|
@hudi-bot run azure |
744487a to
ed1aa3c
Compare
| // round robin to ensure we generate inserts for all partition paths | ||
| String partitionToUse = partitionPaths[partitionIndex.get()]; | ||
| partitionIndex.set((partitionIndex.get() + 1) % partitionPaths.length); | ||
| return partitionToUse; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so with this change we can guarantee all target partitions have records. then we don't need to bump 10 records to 100? so we can make it faster. We just need to make sure num records > num partitions here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but I don't expect any savings from 100 to 10 records. Also most of our tests are doing 100 recs. so just to keep it in parity.
yes, we have only 3 partitions in TestDataGenerator. so, a minimum of 3 is required to ensure we generate for all 3 partitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is minor: 1) from UT perspective, 100 records shouldn't be different from 10 records. the question is: if 10 works, why bump to 100? 2) we should ensure num records > num partitions without knowing there are always 3 partitions so some check arg will help prevent misuse leading to unexpected results
| tableBasePath = basePath + "/testtable_" + tableType; | ||
| prepareInitialConfigs(fs(), basePath, "foo"); | ||
| TypedProperties props = prepareMultiWriterProps(fs(), basePath, propsFilePath); | ||
| props.setProperty(HoodieCompactionConfig.PARQUET_SMALL_FILE_LIMIT.key(), "0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be applicable to some cases in TestHoodieDeltaStreamer ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I did not want to boil the ocean just yet. so just focussing on most frequently failing tests. will file a jira to track other similar tests. https://issues.apache.org/jira/browse/HUDI-5595
xushiyan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's land this without doing another round of CI. we should standardize data generation for testing once for all in some later PRs
ed1aa3c to
ed32d0a
Compare
ed32d0a to
923044d
Compare
|
CI failed due to flaky tests. |
Recently we have more flakiness in our CI runs. So, taking a stab at fixing some of the high frequent tests. Tests that are fixed: TestHoodieClientOnMergeOnReadStorage ( testReadingMORTableWithoutBaseFile, testCompactionOnMORTable, testLogCompactionOnMORTable, testLogCompactionOnMORTableWithoutBaseFile) Reasoning for flakiness: we generate only 10 inserts in our tests and it does not guarantee we have records for all 3 partitions(HoodieTestDataGenerator). Fixes: HoodieTestDataGenerator was choosing random partition among list of partitions while generating insert records. Fixed that to do round robin. Also, bumped up the num of records inserted in some of the flaky tests to 100 from 10. Fixed respective MOR tests to disable small file handling.
Recently we have more flakiness in our CI runs. So, taking a stab at fixing some of the high frequent tests. Tests that are fixed: TestHoodieClientOnMergeOnReadStorage ( testReadingMORTableWithoutBaseFile, testCompactionOnMORTable, testLogCompactionOnMORTable, testLogCompactionOnMORTableWithoutBaseFile) Reasoning for flakiness: we generate only 10 inserts in our tests and it does not guarantee we have records for all 3 partitions(HoodieTestDataGenerator). Fixes: HoodieTestDataGenerator was choosing random partition among list of partitions while generating insert records. Fixed that to do round robin. Also, bumped up the num of records inserted in some of the flaky tests to 100 from 10. Fixed respective MOR tests to disable small file handling.
Recently we have more flakiness in our CI runs. So, taking a stab at fixing some of the high frequent tests. Tests that are fixed: TestHoodieClientOnMergeOnReadStorage ( testReadingMORTableWithoutBaseFile, testCompactionOnMORTable, testLogCompactionOnMORTable, testLogCompactionOnMORTableWithoutBaseFile) Reasoning for flakiness: we generate only 10 inserts in our tests and it does not guarantee we have records for all 3 partitions(HoodieTestDataGenerator). Fixes: HoodieTestDataGenerator was choosing random partition among list of partitions while generating insert records. Fixed that to do round robin. Also, bumped up the num of records inserted in some of the flaky tests to 100 from 10. Fixed respective MOR tests to disable small file handling.
Recently we have more flakiness in our CI runs. So, taking a stab at fixing some of the high frequent tests. Tests that are fixed: TestHoodieClientOnMergeOnReadStorage ( testReadingMORTableWithoutBaseFile, testCompactionOnMORTable, testLogCompactionOnMORTable, testLogCompactionOnMORTableWithoutBaseFile) Reasoning for flakiness: we generate only 10 inserts in our tests and it does not guarantee we have records for all 3 partitions(HoodieTestDataGenerator). Fixes: HoodieTestDataGenerator was choosing random partition among list of partitions while generating insert records. Fixed that to do round robin. Also, bumped up the num of records inserted in some of the flaky tests to 100 from 10. Fixed respective MOR tests to disable small file handling.
Change Logs
Recently we have more flakiness in our CI runs. So, taking a stab at fixing some of the high frequent tests.
Tests that are fixed:
TestHoodieDeltaStreamerWithMultiWriter.* (all tests)
TestHoodieClientOnMergeOnReadStorage ( testReadingMORTableWithoutBaseFile, testCompactionOnMORTable,
testLogCompactionOnMORTable, testLogCompactionOnMORTableWithoutBaseFile)
Reasoning for flakiness:
Fixes:
Impact
NA
Risk level
None
Documentation Update
NA
Contributor's checklist