Skip to content

Conversation

@TJX2014
Copy link
Contributor

@TJX2014 TJX2014 commented Sep 5, 2022

Change Logs

  1. Make hudi-flink of mor table also will gen CreateHandle with base bucket not exist.
  2. Open deduplicate function for mor table.

Impact

The duplicate issue is from hudi-flink mor table, which first append log, but not compact right now, so the bucket num is not in base file;
When spark use loadPartitionBucketIdFileIdMapping of org.apache.hudi.index.bucket.HoodieSimpleBucketIndex, it will not find the bucket num which written by hudi-flink, so it will generate a new one which not consistent with hudi-flink.
After this change, when hudi-flink write mor table use bucket index, it will firstly consider to write base parquet file after deduplicate, if base file exists, it will change to write log file, follow spark way seems more stable for mor table.

Risk level: none | low | medium | high
none

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@minihippo
Copy link
Contributor

When spark loads the latest fileslice, if include the fileslice that only contains log file, and then the problem can be also solved right?

@TJX2014 TJX2014 closed this Sep 5, 2022
@TJX2014
Copy link
Contributor Author

TJX2014 commented Sep 5, 2022

When spark loads the latest fileslice, if include the fileslice that only contains log file, and then the problem can be also solved right?

Seems spark need not to include log file, which is merged to base file.

@TJX2014
Copy link
Contributor Author

TJX2014 commented Sep 5, 2022

When spark loads the latest fileslice, if include the fileslice that only contains log file, and then the problem can be also solved right?

Sorry, closed by mistake, please see: #6595

@minihippo
Copy link
Contributor

The pr can be reopen :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants