-
Notifications
You must be signed in to change notification settings - Fork 3k
Hive: make sure unlock is called when HiveTableOperations can not acq… #2263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…uire the lock on table
marton-bod
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks @ZorTsou
| // timeout and do not have lock acquired | ||
| if (timeout && !state.get().equals(LockState.ACQUIRED)) { | ||
| } catch (WaitingForLockException waitingForLockException) { | ||
| // timeout and do not have lock acquired |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this is a 'legacy' comment from an earlier implementation of the timeout/retry logic. Now if you get here, the lock state is always WAITING, so it's sufficient to say it's due to timeout and the 'do not have lock acquired' part can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, isn't the unlock already called within cleanupMetadataAndUnlock at the moment?
| System.currentTimeMillis() - start, database, tableName); | ||
| } finally { | ||
| if (!state.get().equals(LockState.ACQUIRED)) { | ||
| unlock(Optional.of(lockId)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? If the lock wasn't acquired, why would it be correct to unlock it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While client calling lock, hive metastore creates an exclusive lock with WAITING state on the hive table first. If there is no other lock on the table, then metastore changes the state to ACQUIRED. Otherwise, updates hl_blockedby_ext_id to the latest lockId. No matter whether the lock state is acquired, the lock information will stored in HIVE_LOCKS as below.
| hl_lock_ext_id | hl_table | hl_lock_state | hl_lock_type | hl_last_heartbeat | hl_blockedby_ext_id |
|---|---|---|---|---|---|
| 1651514 | test_table | a | e | 1614062112285 | |
| 1651516 | test_table | w | e | 1614062132342 | 1651514 |
| 1651518 | test_table | w | e | 1614062152426 | 1651516 |
In this case, a new lock will remain in the waiting state until the latest lock is deleted by client or txn cleanup thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, HiveMetaStoreClient.lock() will enqueue a new lock request into the HMS HIVE_LOCKS table and if we throw an exception here upon timeout, our lock request will be stuck there in WAITING state (potentially blocking other subsequent requests) unless we call unlock() which makes sure to remove the lock from the HMS table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cleanup process will remove these locks when the timeout is reached, but until that time this can block other lock requests. So it is good to clean these locks when possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, and passing the lock ID into unlock ensures that this is only for the lock request that was created when trying to acquire. Sounds good to me. Thanks for the explanation, @ZorTsou!
In |
| } finally { | ||
| if (!state.get().equals(LockState.ACQUIRED)) { | ||
| unlock(Optional.of(lockId)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a little bit hard to see the final state of the code based on the PR, but I think it is this:
} catch (WaitingForLockException waitingForLockException) {
// timeout and do not have lock acquired
throw new CommitFailedException("Timed out after %s ms waiting for lock on %s.%s",
System.currentTimeMillis() - start, database, tableName);
} finally {
if (!state.get().equals(LockState.ACQUIRED)) {
unlock(Optional.of(lockId));
}
}
if (!state.get().equals(LockState.ACQUIRED)) {
throw new CommitFailedException("Could not acquire the lock on %s.%s, " +
"lock request ended in state %s", database, tableName, state);
}
return lockId;
}
I have preferred the original wording where the exceptions are handled in one place. Maybe something like this:
} catch (WaitingForLockException waitingForLockException) {
timeout = true;
duration = System.currentTimeMillis() - start;
} finally {
if (!state.get().equals(LockState.ACQUIRED)) {
unlock(Optional.of(lockId));
}
}
// timeout and do not have lock acquired
if (timeout && !state.get().equals(LockState.ACQUIRED)) {
throw new CommitFailedException("Timed out after %s ms waiting for lock on %s.%s",
duration, database, tableName);
}
if (!state.get().equals(LockState.ACQUIRED)) {
throw new CommitFailedException("Could not acquire the lock on %s.%s, " +
"lock request ended in state %s", database, tableName, state);
}
return lockId;
}
What do you think @ZorTsou?
Oh, and nice catch BTW! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. It looks more clear. I updated the code, thanks.
|
Good catch, and thank you for fixing this, @ZorTsou! Thanks for the reviews, @pvary and @marton-bod! |
|
Thanks for reviewing, @marton-bod @pvary @rdblue. |
Currently, there is no way to call
unlockifHiveTableOperations.acquireLockfails at waiting for lock on hive table. This PR aims to try to invokeunlockin the finally block.