-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-7941. Race condition in getFileStatus causes flaky testObjectStoreCreateWithO3fs #5252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@sumitagrawl @sadanand48 @smengcl Pls review. |
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneListStatusHelper.java
Show resolved
Hide resolved
|
I don't understand how this patch is solving the issue. Because to me, nothing is changing. Instead of using cache iterator, you are just directly getting value form cache. It would be great if you can add more details how this change will fix it. |
This test case fails in Now here for a given key, control is returning from here in if check and after iterating all keys in batch, if a key not found in cache , we don't want to loose deleted keys entry here because after this Here is the failed CI run before the patch and green CI run after applying this patch. |
| if (!exists) { | ||
| deletedKeys.add(cacheKey); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be dumping a lot of keyTable cache entries into the temp set deletedKeys. And this method can be called frequently from getFileStatus / getOzoneFileStatus.
Is there a better way to fix this? Since the only caller is already holding a BUCKET_LOCK:
ozone/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
Line 1127 in e4cbc20
| metadataManager.getLock().acquireReadLock(BUCKET_LOCK, volumeName, |
It might be fine for now since keyTable cache should only hold a few entries that are not flushed yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @smengcl for your review. Another way to fix is this #5093 PR. Here a table.isExist() API being used, but my only concern is that cache is always faster and isExist is being called in #5093 PR till we find the target key as well it is being called inside getNextKey() call. Pls have look #5093 PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smengcl Kindly advise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. Let's merge this PR now for a fix. We could improve this further in #5093 .
smengcl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 pending CI
Thanks @smengcl for review. CI all green. |
sumitagrawl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devmadhuu Thanks for working over this, LGTM +1
…estObjectStoreCreateWithO3fs (apache#5252) (cherry picked from commit 89c76cf) Change-Id: I88c213bdbcbfea4226cd2297b78d68854a5ecd7e
What changes were proposed in this pull request?
This PR is fixing the race condition possibility in getFIleStatus call where keyTable cache iterator cache flush can happen while iterating table iterator in method
org.apache.hadoop.ozone.om.KeyManagerImpl#createFakeDirIfShouldWhat is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-7941
How was this patch tested?
This patch was tested with multiple CI job runs to test flakiness in test failure of this JIRA due to above mentioned cause. Here is the green CI run after applying patch.