-
Notifications
You must be signed in to change notification settings - Fork 590
HDDS-7121. Support namespace summaries (du, dist & counts) for legacy FS buckets #3746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
So there is a problem with clearing the NSSummary table during reprocess(). Both the fso and legacy tasks write to the nssummary table. The reprocess() method for FSO clears the nssummary table here The Legacy task doesn't clear the table. But both tasks run in separate threads in parallel There is no guarantee that FSO will run first. In addition failed tasks get rerun here If one fails and the other doesn't the table clear won't work correctly. |
|
Also because the two tasks are writing to the same table from separate threads, it would be good to clarify why no synchronization is needed. |
|
Hi @xBis7 and @GeorgeJahad, thanks for the patch and the problem description. I believe if we already separated the NSSummary handling for legacy and FSO buckets (here) we can maintain two separate NSSummary tables, one for each bucket layout. This would help with the table clear and if one of them fails and there is a retry. Let me know what you think of this idea. cc @smengcl |
|
Hi @dombizita, thanks for the feedback. With two separate tables, what happens when Recon gets statistics for both bucket layouts? Do we iterate both tables and sum the results? |
|
@dombizita @smengcl Another solution to this, would be to have a common NSSummary table for both bucket layouts. Instead of calling both tasks, we could add |
|
If we have two separate tables for |
|
@dombizita
Yes, you are right. This way we can control the order by which we are calling any bucket specific task. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the exception being swallowed? it will cause the bucketid to be returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the npe being swallowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the npe being swallowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version of getBucketHandler will throw an NPE if bucketInfo is null. The other version just returns null. They should do the same thing, probably both throw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the other version to return null, so that we know that the BucketHandler doesn't exist and later return EntityType.UNKNOWN. See here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's usually good practice to keep the methods consistent. If the null check is required, it would be better to put it in the one that is called by this one, so they both have it.
...e/recon/src/main/java/org/apache/hadoop/ozone/recon/api/handlers/DirectoryEntityHandler.java
Outdated
Show resolved
Hide resolved
|
@GeorgeJahad I've made the changes based on your recommendations. |
|
@dombizita I've made the changes so that |
dombizita
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @xBis7 for updating your patch, this approach of processing and reprocessing looks good to me, I left a few inline comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are also testing the LEGACY buckets here why did you added a ${BUCKET_LAYOUT} variable below that is set to FILE_SYSTEM_OPTIMIZED? Will you add test cases in this robot test file for LEGACY too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dombizita Apart from the Namespace Summary for a directory there is no difference between FSO and LEGACY. In order to avoid duplication, we added the BUCKET_LAYOUT as a parameter and gave it the default value of FSO. The idea is to add an extra command to test.sh to also run the test with a parameter -v BUCKET_LAYOUT:LEGACY.
hadoop-ozone/dist/src/main/smoketest/recon/recon-nssummary.robot
Outdated
Show resolved
Hide resolved
...zone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/handlers/LegacyBucketHandler.java
Outdated
Show resolved
Hide resolved
...p-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTaskWithLegacy.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parent object id for legacy records is 0, (which is why we added the setKeyParentId() method.) Let's set them back to 0 in the test code as well, just to be sure that setKeyParentId() is really working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parent object id for legacy records is 0, (which is why we added the setKeyParentId() method.) Let's set them back to 0 in the test code as well, just to be sure that setKeyParentId() is really working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parent object id for legacy records is 0, (which is why we added the setKeyParentId() method.) Let's set them back to 0 in the test code as well, just to be sure that setKeyParentId() is really working.
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTask.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTask.java
Outdated
Show resolved
Hide resolved
...p-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTaskWithLegacy.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTaskUtils.java
Outdated
Show resolved
Hide resolved
...p-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTaskWithLegacy.java
Outdated
Show resolved
Hide resolved
|
I just tried running: "ozone admin namespace summary /vol1" It gets a warning: It should be updated to something like this: |
...one/recon/src/test/java/org/apache/hadoop/ozone/recon/tasks/TestNSSummaryTaskWithLegacy.java
Outdated
Show resolved
Hide resolved
|
@smengcl CI looks good. Can we merge this? |
|
Merged. Thanks @xBis7 for the work! Thanks @GeorgeJahad and @dombizita for the review. |
* master: (718 commits) HDDS-7342. Move encryption-related code from MultipartCryptoKeyInputStream to OzoneCryptoInputStream (apache#3852) HDDS-7413. Fix logging while marking container state unhealthy (apache#3887) Revert "HDDS-7253. Fix exception when '/' in key name (apache#3774)" HDDS-7396. Force close non-RATIS containers in ReplicationManager (apache#3877) HDDS-7121. Support namespace summaries (du, dist & counts) for legacy FS buckets (apache#3746) HDDS-7258. Cleanup the allocated but uncommitted blocks (apache#3778) HDDS-7381. Cleanup of VolumeManagerImpl (apache#3873) HDDS-7253. Fix exception when '/' in key name (apache#3774) HDDS-7182. Add property to control RocksDB max open files (apache#3843) HDDS-7284. JVM crash for rocksdb for read/write after close (apache#3801) HDDS-7368. [Multi-Tenant] Add Volume Existence check in preExecute for OMTenantCreateRequest (apache#3869) HDDS-7403. README Security Improvement (apache#3879) HDDS-7199. Implement new mix workload Read/Write Freon command (apache#3872) HDDS-7248. Recon: Expand the container status page to show all unhealthy container states (apache#3837) HDDS-7141. Recon: Improve Disk Usage Page (apache#3789) HDDS-7369. Fix wrong order of command arguments in Nonrolling-Upgrade.md (apache#3866) HDDS-6210. EC: Add EC metrics (apache#3851) HDDS-7355. non-primordial scm fail to get signed cert from primordial SCM when converting an unsecure cluster to secure (apache#3859) HDDS-7356. Update SCM-HA.zh.md to match the English version (apache#3861) HDDS-6930. SCM,OM,RECON should not print ERROR and exit with code 1 on successful shutdown (apache#3848) ... Conflicts: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/LegacyReplicationManager.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestLegacyReplicationManager.java
… FS buckets (#3746) Change-Id: I478fc74ec8b0899441d1c003c2f200d7dd9dcc67
What changes were proposed in this pull request?
In the previous PR HDDS-5504, we refactored
NSSummaryEndpointandNSSummaryTaskso that we can extract the bucket specific code to new separated classes. So far, Recon has only been supporting FSO buckets. All the existing bucket related code exists inhandlers/FSOBucketHandlerandtasks/NSSummaryTaskWithFSO. In order to add support for Legacy buckets, we created two new classeshandlers/LegacyBucketHandlerandtasks/NSSummaryTaskWithLegacy.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-7121
How was this patch tested?
This patch was tested with new unit tests. We added two new test classes,
TestNSSummaryEndpointWithLegacyandTestNSSummaryTaskWithLegacy. These two classes are as similar as possible to the corresponding FSO classes. We tried to keep the setups exactly the same withTestNSSummaryEndpointWithFSOandTestNSSummaryTaskWithFSOas if they were handling FSO related code.