-
Notifications
You must be signed in to change notification settings - Fork 590
HDDS-8292. Inconsistent key name handling for FSO bucket files. #4531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sumitagrawl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashishkumar50 Thanks for working over this, LGTM +1
| continue; | ||
| } | ||
|
|
||
| // Copy cache value to local copy and work on it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good find @ashishkumar50. Could there be similar patterns with other objects leading to corruption besides OmDirectoryInfo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @swagle , Thanks for the review. OmDirectoryInfo and OmKeyInfo are only objects used for accessing cache in OzoneListStatusHelper.
| } | ||
|
|
||
| // Copy cache value to local copy and work on it | ||
| Value copyOmInfo = ObjectUtils.clone(cacheOmInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we depend on the bucket locking for concurrency correctness here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kerneltime , Thanks for the review. In MinHeapIterator already read bucket locking exists.
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OmKeyInfo.java
Outdated
Show resolved
Hide resolved
|
@swamirishi @aswinshakil @SaketaChalamchala can you please review this? |
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OmDirectoryInfo.java
Show resolved
Hide resolved
errose28
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix LGTM @ashishkumar50. I'm just wondering your thought on the approach in this PR versus some other alternatives:
- Make ListStatusHelper not modify the OmKeyInfo object.
or - Shallow copy in
HeapEntry#getStatusand just update the string field that is needed, instead of deep copying the entire object just to update two string fields in the copy.
| keyLocationVersion != null).forEach(keyLocationVersion -> | ||
| omKeyInfo.keyLocationVersions.add( | ||
| new OmKeyLocationInfoGroup(keyLocationVersion.getVersion(), | ||
| keyLocationVersion.getLocationList(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we deep copy the contents of this list as well? OmKeyLocationInfo extends BlockLocationInfo which is not immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same question but looks like it is a deep copy
* Use this expensive method only when absolutely needed!
* It creates a new list so it is not an O(1) operation.
* Use getLocationLists() instead.
* @return a list of OmKeyLocationInfo
*/
public List<OmKeyLocationInfo> getLocationList() {
return locationVersionMap.values().stream().flatMap(List::stream)
.collect(Collectors.toList());
}```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @errose28 thanks for the review.
We could have done shallow copy in HeapEntry#getStatus and modify just the two fields which is required, but still cache value having same reference is present in HeapEntry and in future if some other fields gets modified there is still possibility of corruption by not doing deep copy to the modified field.
OmKeyLocationInfo is also deep copied as pointed out by Ritesh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Use this expensive method only when absolutely needed!
There is a risk that fixing this issue with deep copy causes a performance regression in list keys. Since this is a pretty critical bug I think the first version of the fix should do a shallow copy. A later implementation can decide to either leave the fix this way, use deep copy (with perf benchmarks to support no regressions), or fix the method to not require modifying the key info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@errose28 agreed that we need something more efficient. The entirety of OzoneListStatusHelper.java and listing logic needs to be revisited. Listing in general needs to be reviewed given that 99% of the use case does not care about the key info, to begin with. But getting something off the cache and returning a mutable reference is dangerous and this fix is needed to avoid corrupting keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The follow up jira is fine with me, as long as we are aware that list keys is probably going to be slower than before until it is resolved.
What changes were proposed in this pull request?
ListStatus should use local cache copy. It should not use and update main cache which is used for DB updation.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8292
How was this patch tested?
Tested using below steps
If X number of times, List status is called before commit to DB, X number of times keys are appended.
After the fix, it works in local cache and doesn't impact doublebuffer cache. RocksDB updates with correct keyName and also ListStatus returns proper result.