-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-2573] Fixing double locking with multi-writers #3827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
LGTM! |
|
Overall Approach/Design Comment: Exposing locking semantics to callers is a bit dangerous. Caller usually don't know the internals of abstract class interfaces and cannot decide when to use what lock/unlock versions. The core problem here is nested transactions (and there by lock reentrancy). Few higher level actions have sub actions and taking the same exclusive lock across actions can lead to deadlock. Instead of viewing this as implementation specific (lock) issues, can we see this as transaction manager issue? That is a higher level transaction started doesn't want to start sub transactions, right? If thats the case, can we instantiate the Rollback/Clean sub actions with no-op transaction manager? Because the transaction manager in these actions have the only task of locking/unlocking and so the presence of Transaction manager can inform indirectly inform them to relax transaction and there by the locking. Please let me know if I got the core problem wrong? |
...lient/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
Outdated
Show resolved
Hide resolved
...lient/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
Outdated
Show resolved
Hide resolved
...i-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
Outdated
Show resolved
Hide resolved
...i-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
Outdated
Show resolved
Hide resolved
|
@manojpec : thanks for your inputs. I do like the idea of TransactionManager handling the locking depending on whether the lock acquisition is requested by same owner or diff. But I see some impl hurdles in that. Let me see how I can go about that. |
7b02f0f to
0faf3e0
Compare
|
@manojpec : addressed all comments. |
|
As per our offline discussion, abstracting out transaction manager for this purpose is not straightfoward and needs more thinking. Have filed a ticket here. |
|
@hudi-bot azure run |
manojpec
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking up the larger refactoring of transaction manager in a different ticket is ok with me.
0faf3e0 to
251a3d3
Compare
What is the purpose of the pull request
Brief change log
a. When a table is being upgraded, and if multi-writer is enabled, we rollback all partial commits. This upgrade step itself happens within a lock, but the rollback also tries to acquire lock when committing the rollback
b. When auto commit is enabled with inline cleaning, auto clean is triggered as part of post commit which happens within a lock. But cleaning at the end when trying to commit again tries to acquire lock.
Fix: Added optionality to both rollback and clean to skip locking. These two code paths will set true for skipLocking. Also, fixed to add guards to take locks only if metadata is enabled. Prior to this patch, we take locks and then check if metadata writer is available. Some minor optimization.
Verify this pull request
This change added tests and can be verified as follows:
a. testMultiWriterForDoubleLocking
b. testRollbackDuringUpgradeForDoubleLocking
Points to discuss:
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.