-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-8626] fix: using rename to create consistent-hash-metadata file #12394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-8626] fix: using rename to create consistent-hash-metadata file #12394
Conversation
1. using rename to create consistent-hash-metadata file Signed-off-by: TheR1sing3un <[email protected]>
|
@hudi-bot run azure |
...udi-client-common/src/main/java/org/apache/hudi/index/bucket/ConsistentBucketIndexUtils.java
Outdated
Show resolved
Hide resolved
…ting uncompleted file 1. using HoodieStorage.createImmutableFileInPath to avoid visiting uncompleted file Signed-off-by: TheR1sing3un <[email protected]>
| OutputStream fsout = null; | ||
| StoragePath tmpPath = null; | ||
|
|
||
| boolean needTempFile = needCreateTempFile(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need an explicit flag for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need an explicit flag for this?
HoodieStorage::needCreateTempFile only be true for HDFS, but for other fs, such as local fs, we should also avoid visiting the intermediate state of the file. So completely, regardless of the underlying file system, we will always use the temp file method to create files in this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have assumption that the storage has two categories: HDFS and Object Store, the later should always be atomic for file creation so that the renaming is only needful for HDFS.
If you think there is other fs system we need to support, at least to fix the HoodieStorage::needCreateTempFile itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have assumption that the storage has two categories: HDFS and Object Store, the later should always be atomic for file creation so that the renaming is only needful for HDFS.
If you think there is other fs system we need to support, at least to fix the
HoodieStorage::needCreateTempFileitself.
I fix HoodieStorage::needCreateTempFile to for all storage without write-transaction should create temp file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can not do this, the flag isWriteTransactional does not set up right for many object stores. Let's just keep it as it is, local fs is only for testing purpose so we should be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can not do this, the flag
isWriteTransactionaldoes not set up right for many object stores. Let's just keep it as it is, local fs is only for testing purpose so we should be good.
Got it! But can I just add a judgment of whether it's local fs or not? Because ut's tests rely on local fs, if we do not add this judgment, ut will have unexpected logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not change this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not change this logic.
Ok, I have restored the original logic~
1. for storage without write-transaction should create temp file Signed-off-by: TheR1sing3un <[email protected]>
49eda0a to
4f75f9f
Compare
…of local fs 1. keep original `needCreateTempFile` logic but add judgement of local fs Signed-off-by: TheR1sing3un <[email protected]>
1. keep original `needCreateTempFile` logic Signed-off-by: TheR1sing3un <[email protected]>
In the consistent-bucket mode, the creation of the
hash_metadata_filemay be read by another task or job. In this case, the Metadata may be loaded incorrectly because thehash_metadata_filemay be read in the intermediate state.For example, two task load
p_date=20241202partition's hash-metadata:So task-b can see the ntermediate state of hash-metadata-file.
I wrote a UT to verify it.

Change Logs
Describe context and summary for this change. Highlight if any code was copied.
Impact
Describe any public API or user-facing feature change or any performance impact.
Risk level (write none, low medium or high below)
If medium or high, explain what verification was done to mitigate the risks.
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist