Skip to content

Conversation

@chungen0126
Copy link
Contributor

@chungen0126 chungen0126 commented Jul 20, 2023

What changes were proposed in this pull request?

When the fileSystemPaths isn't enable, listing status from cache will skip the keys with slash. To fix it, we need to create fakeDirs.

What is the link to the Apache JIRA

HDDS-8877

Please replace this section with the link to the Apache JIRA)

How was this patch tested?

Succeeded in all 10 * 10 run for TestOzoneFileSystem.
https://github.com/chungen0126/ozone/actions/runs/6081955837

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks fine. unit tests passed so I am +1. Some nits that's more cosmetic in nature.
Re-triggered the test to check the acceptance tests.

if (key.startsWith(targetKey)) {
if (!Objects.equals(key, targetKey)
&& !isKeyDeleted(key, keyTable)) {
&& keyTable.isExist(key)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chungen0126 Thanks for working on this patch, however instead of double checking , let deleted key flow outside of this method to caller and outside , it can be filtered. I think the issue is due to the cache is getting cleared after delete operation and isKeyDeleted will not tell as expected. Pls check my patch here. We may need not this extra call of "keyTable.isExist(key)"

Copy link
Contributor Author

@chungen0126 chungen0126 Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devmadhuu Thanks for your review. Here is an issue about that creating a deleteSet will lead to bloat in memory. I'm not sure if it is bothered now.

Copy link
Contributor Author

@chungen0126 chungen0126 Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is fine to add deleted keys to TreeMap. Once It is resolved by the PR, I will remove my changes about listing keys from DB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chungen0126 @devmadhuu .

Should we consider this PR superceded by #5244 ? Since the latter is fixing the same issue in a cleaner way

Copy link
Contributor

@devmadhuu devmadhuu Sep 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if isExists() API for checking the existence of key till it finds target key will impact performance, as cache is always faster. I applied the patch and ran multiple iterations of each job run in CI for single test class TestOzoneFileSystem and I can notice the time taken in completing the test is slightly increased. Also I can notice multiple test run in CI still failing.

if (key.startsWith(targetKey)) {
if (!Objects.equals(key, targetKey)
&& !isKeyDeleted(key, keyTable)) {
&& keyTable.isExist(key)) {
Copy link
Contributor

@devmadhuu devmadhuu Sep 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if isExists() API for checking the existence of key till it finds target key will impact performance, as cache is always faster. I applied the patch and ran multiple iterations of each job run in CI for single test class TestOzoneFileSystem and I can notice the time taken in completing the test is slightly increased. Also I can notice multiple test run in CI still failing.

String entryKey = entry.getKey();
if (entryKey.startsWith(prefixKey)) {
if (!KeyManagerImpl.isKeyDeleted(entryKey, table)) {
if (table.isExist(entryKey)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getNextKey() call is on every hasNextCall(), so again here we are iterating on table and checking existence of key, so is it acceptable from performance point of view ? One more point that as per isExist API doc:

A lock on the key / bucket needs to be acquired before invoking this API.

@jojochuang
Copy link
Contributor

@chungen0126 if you can confirm #5244 fixes the same bug we can close it. Thank you.

@adoroszlai
Copy link
Contributor

@chungen0126 if you can confirm #5244 fixes the same bug we can close it. Thank you.

I guess this is a different problem, as it has also happened after the other fix was merged:

https://github.com/adoroszlai/ozone-build-results/blob/master/2023/09/19/25476/it-filesystem/hadoop-ozone/integration-test/org.apache.hadoop.fs.ozone.TestOzoneFileSystem.txt

@devmadhuu
Copy link
Contributor

@chungen0126 if you can confirm #5244 fixes the same bug we can close it. Thank you.

I guess this is a different problem, as it has also happened after the other fix was merged:

https://github.com/adoroszlai/ozone-build-results/blob/master/2023/09/19/25476/it-filesystem/hadoop-ozone/integration-test/org.apache.hadoop.fs.ozone.TestOzoneFileSystem.txt

I think the other issue also were handled in PR #5252 , but still open

@szetszwo
Copy link
Contributor

Is this good to merge? If not, could we mark TestOzoneFileSystem.testListStatusOnKeyNameContainDelimiter as @Flaky? It is failing quite often, e.g. https://github.com/apache/ozone/actions/runs/6285259623/job/17068827983?pr=5338#step:5:3769

@adoroszlai
Copy link
Contributor

could we mark TestOzoneFileSystem.testListStatusOnKeyNameContainDelimiter as @Flaky?

Unfortunately these filesystem tests are implemented using JUnit4, but the @Flaky annotation applies only to JUnit5 tests.

Also, the test itself is probably OK, fix is proposed for production code.

@chungen0126
Copy link
Contributor Author

chungen0126 commented Sep 25, 2023

This PR fixs the error of listing status from cache, not the race condition in listStatus. It might be different bug in PR #5252.

In master, listing status from cache only skip the keys with slash. To fix it, I create some fakeDirs for some condition if enableFileSystemPaths was false. When enableFileSystemPaths is false and I create a key with slash called "dir1/dir2/key1", there will be a key called "dir1/dir2/key1" in table.
Before fixs, If I listed "/", it skip the key "dir1/dir2/key1" and return nothing.
After fixs, If I listed "/", it creates a fakeDir "/dir1/" to be returned not just skip it.

I also make the code more clear in listStatus, and I think it is probably OK now.

@jojochuang
Copy link
Contributor

The fix makes sense to me. I'm curious why it is intermittent though.
There is no test so hard to confirm it does fix the problem. Also, what if there's another layer of directory?

@adoroszlai
Copy link
Contributor

@chungen0126 @jojochuang It seems #5399 also fixes the same problem and more. Can you please review that one?

@adoroszlai
Copy link
Contributor

Thanks a lot @chungen0126 for the patch. The bug is now fixed in HDDS-9347, so I'm closing this one. However, to give credit to you for your work on this PR, I have added you as co-author of 84fb0b4.

Also thanks @devmadhuu, @jojochuang for the review.

@adoroszlai adoroszlai closed this Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants