Hive: make sure unlock is called when HiveTableOperations can not acq… #2263

scutzou · 2021-02-22T02:41:30Z

Currently, there is no way to call unlock if HiveTableOperations.acquireLock fails at waiting for lock on hive table. This PR aims to try to invoke unlock in the finally block.

…uire the lock on table

marton-bod

Looks good, thanks @ZorTsou

marton-bod · 2021-02-22T15:24:43Z

hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java

-    // timeout and do not have lock acquired
-    if (timeout && !state.get().equals(LockState.ACQUIRED)) {
+    } catch (WaitingForLockException waitingForLockException) {
+      // timeout and do not have lock acquired


nit: I think this is a 'legacy' comment from an earlier implementation of the timeout/retry logic. Now if you get here, the lock state is always WAITING, so it's sufficient to say it's due to timeout and the 'do not have lock acquired' part can be removed

marton-bod

Actually, isn't the unlock already called within cleanupMetadataAndUnlock at the moment?

rdblue · 2021-02-23T00:54:26Z

hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java

+          System.currentTimeMillis() - start, database, tableName);
+    } finally {
+      if (!state.get().equals(LockState.ACQUIRED)) {
+        unlock(Optional.of(lockId));


Is this correct? If the lock wasn't acquired, why would it be correct to unlock it?

While client calling lock, hive metastore creates an exclusive lock with WAITING state on the hive table first. If there is no other lock on the table, then metastore changes the state to ACQUIRED. Otherwise, updates hl_blockedby_ext_id to the latest lockId. No matter whether the lock state is acquired, the lock information will stored in HIVE_LOCKS as below.

hl_lock_ext_id hl_table hl_lock_state hl_lock_type hl_last_heartbeat hl_blockedby_ext_id

1651514 test_table a e 1614062112285

1651516 test_table w e 1614062132342 1651514

1651518 test_table w e 1614062152426 1651516

In this case, a new lock will remain in the waiting state until the latest lock is deleted by client or txn cleanup thread.

Correct, HiveMetaStoreClient.lock() will enqueue a new lock request into the HMS HIVE_LOCKS table and if we throw an exception here upon timeout, our lock request will be stuck there in WAITING state (potentially blocking other subsequent requests) unless we call unlock() which makes sure to remove the lock from the HMS table.

The cleanup process will remove these locks when the timeout is reached, but until that time this can block other lock requests. So it is good to clean these locks when possible

Okay, and passing the lock ID into unlock ensures that this is only for the lock request that was created when trying to acquire. Sounds good to me. Thanks for the explanation, @ZorTsou!

scutzou · 2021-02-23T05:24:33Z

Actually, isn't the unlock already called within cleanupMetadataAndUnlock at the moment?

In doCommit, lockId will be lost when an exception is thrown from acquireLock.

pvary · 2021-02-23T10:38:52Z

hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java

+    } finally {
+      if (!state.get().equals(LockState.ACQUIRED)) {
+        unlock(Optional.of(lockId));
+      }


It is a little bit hard to see the final state of the code based on the PR, but I think it is this:

} catch (WaitingForLockException waitingForLockException) { // timeout and do not have lock acquired throw new CommitFailedException("Timed out after %s ms waiting for lock on %s.%s", System.currentTimeMillis() - start, database, tableName); } finally { if (!state.get().equals(LockState.ACQUIRED)) { unlock(Optional.of(lockId)); } } if (!state.get().equals(LockState.ACQUIRED)) { throw new CommitFailedException("Could not acquire the lock on %s.%s, " + "lock request ended in state %s", database, tableName, state); } return lockId; }

I have preferred the original wording where the exceptions are handled in one place. Maybe something like this:

} catch (WaitingForLockException waitingForLockException) { timeout = true; duration = System.currentTimeMillis() - start; } finally { if (!state.get().equals(LockState.ACQUIRED)) { unlock(Optional.of(lockId)); } } // timeout and do not have lock acquired if (timeout && !state.get().equals(LockState.ACQUIRED)) { throw new CommitFailedException("Timed out after %s ms waiting for lock on %s.%s", duration, database, tableName); } if (!state.get().equals(LockState.ACQUIRED)) { throw new CommitFailedException("Could not acquire the lock on %s.%s, " + "lock request ended in state %s", database, tableName, state); } return lockId; }

What do you think @ZorTsou?

Oh, and nice catch BTW! 👍

I agree. It looks more clear. I updated the code, thanks.

rdblue · 2021-02-24T18:12:29Z

Good catch, and thank you for fixing this, @ZorTsou!

Thanks for the reviews, @pvary and @marton-bod!

scutzou · 2021-02-25T02:20:15Z

Thanks for reviewing, @marton-bod @pvary @rdblue.

…cking (#2263)

…cking (apache#2263)

Hive: make sure unlock is called when HiveTableOperations can not acq…

fdfd5bd

…uire the lock on table

github-actions bot added the hive label Feb 22, 2021

Fix errant exception thrown in finally block

e5f7b57

marton-bod approved these changes Feb 22, 2021

View reviewed changes

marton-bod reviewed Feb 22, 2021

View reviewed changes

rdblue reviewed Feb 23, 2021

View reviewed changes

pvary reviewed Feb 23, 2021

View reviewed changes

Handling exceptions in one place

bf5e513

pvary approved these changes Feb 24, 2021

View reviewed changes

rdblue merged commit f356be7 into apache:master Feb 24, 2021

aokolnychyi pushed a commit that referenced this pull request Mar 24, 2021

Hive: Ensure unlock is called in HiveTableOperations to fix zombie lo…

08ba760

…cking (#2263)

aokolnychyi added this to the Java 0.11.1 Release milestone Mar 30, 2021

coolderli pushed a commit to coolderli/iceberg that referenced this pull request Apr 26, 2021

Hive: Ensure unlock is called in HiveTableOperations to fix zombie lo…

b780956

…cking (apache#2263)

RussellSpitzer mentioned this pull request Apr 28, 2021

Hive: Lock Issues with multithreaded commits #2540

Closed

stevenzwu pushed a commit to stevenzwu/iceberg that referenced this pull request Jul 28, 2021

Hive: Ensure unlock is called in HiveTableOperations to fix zombie lo…

e8c5891

…cking (apache#2263)

stevenzwu pushed a commit to stevenzwu/iceberg that referenced this pull request Jul 28, 2021

Hive: Ensure unlock is called in HiveTableOperations to fix zombie lo…

6531454

…cking (apache#2263)

autumnust mentioned this pull request Feb 1, 2022

Backport https://github.com/apache/iceberg/pull/2328 and its prerequisites linkedin/iceberg#89

Merged

maluchari mentioned this pull request Feb 8, 2022

Call unlock lockId before throwing an exception in acquireLock method linkedin/iceberg#90

Closed

hl_lock_ext_id	hl_table	hl_lock_state	hl_lock_type	hl_last_heartbeat	hl_blockedby_ext_id
1651514	test_table	a	e	1614062112285
1651516	test_table	w	e	1614062132342	1651514
1651518	test_table	w	e	1614062152426	1651516

Hive: make sure unlock is called when HiveTableOperations can not acq… #2263

Hive: make sure unlock is called when HiveTableOperations can not acq… #2263

Uh oh!

Conversation

scutzou commented Feb 22, 2021

Uh oh!

marton-bod left a comment

Choose a reason for hiding this comment

Uh oh!

marton-bod Feb 22, 2021

Choose a reason for hiding this comment

Uh oh!

marton-bod left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue Feb 23, 2021

Choose a reason for hiding this comment

Uh oh!

scutzou Feb 23, 2021

Choose a reason for hiding this comment

Uh oh!

marton-bod Feb 23, 2021

Choose a reason for hiding this comment

Uh oh!

pvary Feb 23, 2021

Choose a reason for hiding this comment

Uh oh!

rdblue Feb 24, 2021

Choose a reason for hiding this comment

Uh oh!

scutzou commented Feb 23, 2021

Uh oh!

pvary Feb 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scutzou Feb 23, 2021

Choose a reason for hiding this comment

Uh oh!

rdblue commented Feb 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scutzou commented Feb 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

marton-bod left a comment •

edited

Loading

pvary Feb 23, 2021 •

edited

Loading

rdblue commented Feb 24, 2021 •

edited

Loading