-
Notifications
You must be signed in to change notification settings - Fork 3k
Hive: make sure unlock is called when HiveTableOperations can not acq… #2263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -319,8 +319,8 @@ private long acquireLock() throws UnknownHostException, TException, InterruptedE | |
| long duration = 0; | ||
| boolean timeout = false; | ||
|
|
||
| if (state.get().equals(LockState.WAITING)) { | ||
| try { | ||
| try { | ||
| if (state.get().equals(LockState.WAITING)) { | ||
| // Retry count is the typical "upper bound of retries" for Tasks.run() function. In fact, the maximum number of | ||
| // attempts the Tasks.run() would try is `retries + 1`. Here, for checking locks, we use timeout as the | ||
| // upper bound of retries. So it is just reasonable to set a large retry count. However, if we set | ||
|
|
@@ -348,9 +348,13 @@ private long acquireLock() throws UnknownHostException, TException, InterruptedE | |
| LOG.warn("Interrupted while waiting for lock.", e); | ||
| } | ||
| }, TException.class); | ||
| } catch (WaitingForLockException waitingForLockException) { | ||
| timeout = true; | ||
| duration = System.currentTimeMillis() - start; | ||
| } | ||
| } catch (WaitingForLockException waitingForLockException) { | ||
| timeout = true; | ||
| duration = System.currentTimeMillis() - start; | ||
| } finally { | ||
| if (!state.get().equals(LockState.ACQUIRED)) { | ||
| unlock(Optional.of(lockId)); | ||
| } | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is a little bit hard to see the final state of the code based on the PR, but I think it is this: I have preferred the original wording where the exceptions are handled in one place. Maybe something like this: What do you think @ZorTsou? Oh, and nice catch BTW! 👍
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree. It looks more clear. I updated the code, thanks. |
||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? If the lock wasn't acquired, why would it be correct to unlock it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While client calling lock, hive metastore creates an exclusive lock with
WAITINGstate on the hive table first. If there is no other lock on the table, then metastore changes the state toACQUIRED. Otherwise, updateshl_blockedby_ext_idto the latest lockId. No matter whether the lock state is acquired, the lock information will stored in HIVE_LOCKS as below.In this case, a new lock will remain in the waiting state until the latest lock is deleted by client or txn cleanup thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct,
HiveMetaStoreClient.lock()will enqueue a new lock request into the HMSHIVE_LOCKStable and if we throw an exception here upon timeout, our lock request will be stuck there inWAITINGstate (potentially blocking other subsequent requests) unless we callunlock()which makes sure to remove the lock from the HMS table.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cleanup process will remove these locks when the timeout is reached, but until that time this can block other lock requests. So it is good to clean these locks when possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, and passing the lock ID into unlock ensures that this is only for the lock request that was created when trying to acquire. Sounds good to me. Thanks for the explanation, @ZorTsou!