[HUDI-5975] Release 0.12.3 prep triage flaky test #8288

nsivabalan · 2023-03-24T19:32:42Z

Change Logs

Release 0.12.3 prep triage flaky test

Impact

Release 0.12.3 prep triage flaky test

Risk level (write none, low medium or high below)

low.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

The config description must be updated if new configs are added or the default value of the configs are changed
Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
ticket number here and follow the instruction to make
changes to the website.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

…#7433)

…pache#5610) - Alter table drop partition will not add schema to instant. If using delete sql, will get latest instant to get schema, which is "". This PR fixes the parsing of null or empty schema. Co-authored-by: Sagar Sumit <[email protected]>

) Co-authored-by: Nicholas Jiang <[email protected]>

)

…dieFlinkWriteClient (apache#7509) Different with other write clients, HoodieFlinkWriteClient invokes the dataset writing methods(#upsert or #insert) for each batch of new data set in the long running task. In current impl, a engine-specific hoodie table would be created before performing these actions, and before the table creation, some table bootstrapping operations are performed(such as table upgrade/downgrade, the metadata table bootstrap). These bootstrapping operations are guarded by a trasanction lock. In Flink, these bootstrapping operations can be avoided because they are all performed only once on the coordinator. The changes: - Make BaseHoodieWriteClient#doInitTable non abstract, it now only performs the bootstrapping operations - Add a default impl BaseHoodieWriteClient#initMetadataTable for metadata table bootstrap specifically - Add a new abstract method for creating engine-specific hoodie table

…ut format (apache#7493)" (apache#7526) This reverts commit cc1c1e7.

…abled (apache#7480)

…pache#7527) Push virtual key fetch inside createFileStatusUnchecked for MOR input format

…in (apache#7521)

…iles due to retry (apache#7517) When a write transaction writes uncommitted log files in a delta commit, e.g., due to Spark task retries, these log files stay in the file system after the successful delta commit for some time (unlike uncommitted base files, which are deleted based on the markers). The delta commit metadata does not contain these log files, and the metadata table does not contain these entries either. This is a valid case where the metadata-table-based file listing (providing committed data files) is different from the file system (providing committed data files + uncommited log files in this case). In such a case, before this PR, the metadata table validator throws an exception for the mismatch, because the log blocks are checked based on the commit time, not validated against the commit metadata. This PR fixes the logic of the metadata table validator to check whether the difference in the list of log files between metadata table and direct file system is due to committed log files, based on the commit metadata.

To match the date with announcement email sent on 2022-12-28

…pache#7588) In some of the execution modes, the execution env can only handle single job, so instantiates a fresh new execution env instead of a global singleton in service mode.

Before this change, the Hudi archived timeline is always loaded during the metastore sync process if the last sync time is given. Besides, the archived timeline is not cached inside the meta client if the start instant time is given. These cause performance issues and read timeout on cloud storage due to rate limiting on requests because of loading archived timeline from the storage, when the archived timeline is huge, e.g., hundreds of log files in .hoodie/archived folder. This change improves the timeline loading by (1) only reading active timeline if the last sync time is the same as or after the start of the active timeline; (2) caching the archived timeline based on the start instant time in the meta client, to avoid unnecessary repeated loading of the same archived timeline.

…han earliest pending commit (apache#7568)

…ial uncommitted write metadata event (apache#7611)

Co-authored-by: Jonathan Vexler <=>

…ache#7620) In the beginning, we bootstrap the ckp metadata by cleaning all the messages. This introduces some corner case like 'the write task cannot fetch the pending instant correctly when restarting the job', if a checkpoint succeeds and the job crashes suddenly, the instant hasn't had time to commit, then the data loss happens, because the last pending instant would be rolled back, while the Flink engine thinks the checkpoint/instant is successful. Q: Why we clean the messages? A: To prevent inconsistencies between timeline and the messages. Q: Why we decide to keep the messages? A: There are two cases for the inconsistency: 1. the timeline instant is complete but the ckp message is inflight (for committing instant), 2. the timeline instant is pending while the ckp message does not start (for starting a new instant). For case1, there is no need to re-commit the instant, so it's okey the write task does not get any pending instant when recovering, for case2, the instant is basically pending, it would be rolled back which is in line with expectations. Keeping the ckp messages as it is can actually preserve correctness.

…latency marker (apache#7609) Co-authored-by: coder_wang <[email protected]>

…apache#7573) Avoid using GenericRecord in ColumnStatMetadata. HoodieMetadataPayload is constructed using GenericRecord with reflection, and columnStatMetadata stores minValue and maxValue, both of which are GenericRecord types. Once spill is generated, kryo deserialization fails. Root cause AVRO-2377 1.9.2 Modified the type of FIELD_RESERVED to Collections.unmodifiableSet. https://github.com/apache/avro/blame/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L483 SPARK-27733 SPARK-34778 (Spark3.2.0) avro version upgraded from 1.8.2 to 1.10.2. As a result, Hudi may encounter UnsupportedOperationException when running Spark3.2.0 or later.

…cy marker (apache#7625) Co-authored-by: coder_wang <[email protected]>

…ollback or clean in data table (apache#7580) Before this change, the archival for the metadata table uses the earliest instant of all actions from the active timeline of the data table. In the archival process, CLEAN and ROLLBACK instants are archived separately apart from commits (check HoodieTimelineArchiver#getCleanInstantsToArchive). Because of this, a very old completed CLEAN or ROLLBACK instant in the data table can block the archive of the metadata table timeline and causes the active timeline of the metadata table to be extremely long, leading to performance issues for loading the timeline. This commit changes the archival in metadata table to not rely on completed rollback or clean in data table, by archiving the metadata table's instants after the earliest commit (COMMIT, DELTA_COMMIT, and REPLACE_COMMIT only, considering non-savepoint commit only if enabling archive beyond savepoint) and the earliest inflight instant (all actions) in the data table's active timeline.

…apache#7651)

… hashmap (apache#7641)

…orts DAY_ROLLING strategy (apache#7656)

…edFileSystemView (apache#7387) Co-authored-by: chenzhiming <[email protected]>

…e group id (apache#8072)

…pache#8150) The HoodieRetryWrapperFileSystem should override all the necessary methods.

- Enhancing spark ds tests to ensure tests for MDT spark datasource read tests are robust

… MDT (apache#8223) - Fixing a corner case bug where compaction in MDT could get triggered w/ partially failed commit in DT.

…he#8263)

…e#8248) - Adding timeline server support for integ test suite

(cherry picked from commit 05fc359) # Conflicts: # hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java # hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/format/TestInputFormat.java

…dieFlinkWriteClient (apache#7509) Different with other write clients, HoodieFlinkWriteClient invokes the dataset writing methods(#upsert or #insert) for each batch of new data set in the long-running task. In current impl, an engine-specific hoodie table would be created before performing these actions, and before the table creation, some table bootstrapping operations are performed(such as table upgrade/downgrade, the metadata table bootstrap). These bootstrapping operations are guarded by a transaction lock. In Flink, these bootstrapping operations can be avoided because they are all performed only once on the coordinator. The changes: - Make BaseHoodieWriteClient#doInitTable non abstract, it now only performs the bootstrapping operations - Add a default impl BaseHoodieWriteClient#initMetadataTable for metadata table bootstrap specifically - Add a new abstract method for creating engine-specific hoodie table (cherry picked from commit fd62a14)

(cherry picked from commit d439fab)

…annot be read normally by spark (apache#8026) (cherry picked from commit 31e94ab)

nsivabalan · 2023-03-24T19:49:47Z

@hudi-bot run azure

nsivabalan · 2023-03-24T23:13:58Z

@hudi-bot run azure

hudi-bot · 2023-03-25T17:09:25Z

CI report:

7201aec Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

voonhous · 2023-03-31T17:23:02Z

@nsivabalan HUDI-5822 requires HUDI-5862 for the test that was added in HUDI-5822 to succeed.

Just a headsup. If you pick one without the other, it'll shift the bug appearing in COW tables back to MOR tables.

loukey-lj and others added 30 commits March 21, 2023 17:00

[HUDI-5373] Different fileids are assigned to the same bucket (apache…

22c0ffa

…#7433)

[HUDI-5318] Fix partition pruning for clustering scheduling (apache#7366

e4aea74

) Co-authored-by: Nicholas Jiang <[email protected]>

[HUDI-5412] Send the boostrap event if the JM also rebooted (apache#7497

d635eed

)

Fixing compilation errors

ab4d5a5

Revert "[HUDI-5409] Avoid file index and use fs view cache in COW inp…

8427a2d

…ut format (apache#7493)" (apache#7526) This reverts commit cc1c1e7.

[HUDI-5400] Fix read issues when Hudi-FULL schema evolution is not en…

2264b58

…abled (apache#7480)

[HUDI-5411] Avoid virtual key info for COW table in the input format (a…

1e645cf

…pache#7527) Push virtual key fetch inside createFileStatusUnchecked for MOR input format

[HUDI-4827] Upgrade Azure CI to Ubuntu 22.04 and scalatest-maven-plug…

a086f01

…in (apache#7521)

Remove minlog.Log (apache#7441)

875fd50

[MINOR] Fix doap file syntax and date (apache#7586)

c2df252

To match the date with announcement email sent on 2022-12-28

[HUDI-5489] Flink offline compactor throws exception in service mode (a…

1bb44a0

…pache#7588) In some of the execution modes, the execution env can only handle single job, so instantiates a fresh new execution env instead of a global singleton in service mode.

[HUDI-5160] Fix data source write save as table (apache#7448)

7332549

[HUDI-5341] CleanPlanner retains earliest commits must not be later t…

fe1ebaf

…han earliest pending commit (apache#7568)

[HUDI-5506] StreamWriteOperatorCoordinator may not recommit with part…

8ba5aee

…ial uncommitted write metadata event (apache#7611)

[HUDI-5192] add non-code file extensions to ignore list (apache#7597)

c6a3677

Co-authored-by: Jonathan Vexler <=>

[HUDI-5231] suppress checkstyle warnings (apache#7473)

87371a0

Co-authored-by: Jonathan Vexler <=>

[MINOR] Set engine when creating meta write config (apache#7575)

ef661db

[HUDI-5504] Fix concurrency conflict for flink async compaction with …

4d3df06

…latency marker (apache#7609) Co-authored-by: coder_wang <[email protected]>

[HUDI-5515] Fix concurrency conflict in ClusteringOperator with laten…

5dd4a8b

…cy marker (apache#7625) Co-authored-by: coder_wang <[email protected]>

[HUDI-5381] Fix for class cast exception when running with Flink 1.15 (…

392420b

…apache#7651)

[MINOR] Fix flaky tests in ITTestHoodieDataSource caused by unordered…

fbf4884

… hashmap (apache#7641)

[MINOR] Add metastore_db/ into gitignore file (apache#7648)

d466f08

[HUDI-5543] Description of clustering.plan.partition.filter.mode supp…

89dc95f

…orts DAY_ROLLING strategy (apache#7656)

1032851561 and others added 17 commits March 22, 2023 14:55

[HUDI-5333] Ignore file system type of basePath when using RocksDbBas…

168892b

…edFileSystemView (apache#7387) Co-authored-by: chenzhiming <[email protected]>

[HUDI-5857] Insert overwrite into bucket table would generate new fil…

ef96ed8

…e group id (apache#8072)

fixing build issues

e0511b6

[HUDI-5917] Fix HoodieRetryWrapperFileSystem getDefaultReplication (a…

1cb8e83

…pache#8150) The HoodieRetryWrapperFileSystem should override all the necessary methods.

[HUDI-5785] Enhance Spark Datasource tests (apache#7938)

3835aea

- Enhancing spark ds tests to ensure tests for MDT spark datasource read tests are robust

fixing build failures

1ddb559

[HUDI-5950] Fixing pending instant deduction to trigger compaction in…

5157dbf

… MDT (apache#8223) - Fixing a corner case bug where compaction in MDT could get triggered w/ partially failed commit in DT.

[HUDI-5822] Fix bucket stream writer fileId not found exception (apac…

c60981d

…he#8263)

[HUDI-5962] Adding timeline server support to integ test suite (apach…

481ebe9

…e#8248) - Adding timeline server support for integ test suite

[MINOR] Fix flaky testStructuredStreamingWithCompaction

3d1febd

fix test failures

da85567

[HUDI-3673] Clean up hbase shading dependencies (apache#7371)

b13d50c

(cherry picked from commit d439fab)

[HUDI-5835] After performing the update operation, the hoodie table c…

e7dbc84

…annot be read normally by spark (apache#8026) (cherry picked from commit 31e94ab)

Fixing test failures

e68bfec

Fixing test failures

75bd3ab

Triaging a flaky test

8101540

nsivabalan force-pushed the release-0.12.3-prep-triage-flaky-test branch from f1187da to 8101540 Compare March 24, 2023 21:20

fixing base path in tests

a5482d7

nsivabalan force-pushed the release-0.12.3-prep-triage-flaky-test branch from 6cc4481 to a5482d7 Compare March 25, 2023 01:55

codope added 3 commits March 25, 2023 19:15

Fix schema handling for record type in write path

3949998

Restore test decimal type

052c470

Fix avro schema evolution tests

7201aec

nsivabalan closed this Apr 21, 2023

hudi-bot mentioned this pull request Dec 9, 2025

Prepare 0.12.3 branch w/ very very critical fixes after 0.12.2 #15860

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[HUDI-5975] Release 0.12.3 prep triage flaky test #8288

[HUDI-5975] Release 0.12.3 prep triage flaky test #8288

Uh oh!

nsivabalan commented Mar 24, 2023

Uh oh!

nsivabalan commented Mar 24, 2023

Uh oh!

nsivabalan commented Mar 24, 2023

Uh oh!

hudi-bot commented Mar 25, 2023

Uh oh!

voonhous commented Mar 31, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

[HUDI-5975] Release 0.12.3 prep triage flaky test #8288

[HUDI-5975] Release 0.12.3 prep triage flaky test #8288

Uh oh!

Conversation

nsivabalan commented Mar 24, 2023

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

Uh oh!

nsivabalan commented Mar 24, 2023

Uh oh!

nsivabalan commented Mar 24, 2023

Uh oh!

hudi-bot commented Mar 25, 2023

CI report:

Uh oh!

voonhous commented Mar 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

voonhous commented Mar 31, 2023 •

edited

Loading