Skip to content

Conversation

@ArafatKhan2198
Copy link
Contributor

@ArafatKhan2198 ArafatKhan2198 commented Feb 3, 2023

What changes were proposed in this pull request?

Recon currently returns 500 when given a du request for an object store bucket, since there is no BucketHandler subclass for OBS buckets that can be returned from BucketHandler#getBucketHandler. This jira is to add an ObjectStoreBucketHandler for OBS buckets. Very similar to how the LegacyBucketHandler works.

Discussion Doc :- https://docs.google.com/document/d/1g2NCR-kgcSfLFQRBhZfYsMZmaZ6edlA_26Pq0-qfzT8/edit

How was this patch tested?

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7810?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel

Tested it Manually on the UI, with the following commands :-

ozone sh bucket create --layout FILE_SYSTEM_OPTIMIZED /s3v/fso-bucket
ozone sh key put s3v/fso-bucket/key1-fso NOTICE.txt
ozone sh key put s3v/fso-bucket/key2-fso NOTICE.txt

ozone sh bucket create --layout OBJECT_STORE /s3v/obs-bucket
ozone sh key put s3v/obs-bucket/key1-obs NOTICE.txt
ozone sh key put s3v/obs-bucket/key2-obs NOTICE.txt

ozone sh bucket create s3v/legacy-bucket
ozone sh key put s3v/legacy-bucket/key1-legacy NOTICE.txt
ozone sh key put s3v/legacy-bucket/key2-legacy NOTICE.txt
image image image

@ArafatKhan2198 ArafatKhan2198 changed the title HDDS-7810 Support namespace summaries (du, dist & counts) for OBJECT_STORE buckets HDDS-7810 Support namespace summaries (du, dist & counts) for OBJECT_STORE buckets. Feb 3, 2023
@ArafatKhan2198 ArafatKhan2198 changed the title HDDS-7810 Support namespace summaries (du, dist & counts) for OBJECT_STORE buckets. HDDS-7810. Support namespace summaries (du, dist & counts) for OBJECT_STORE buckets. Feb 3, 2023
@ArafatKhan2198 ArafatKhan2198 marked this pull request as ready for review February 5, 2023 19:11
@umamaheswararao
Copy link
Contributor

@GeorgeJahad

@GeorgeJahad
Copy link
Contributor

GeorgeJahad commented Feb 13, 2023

aren't unit tests required?

@GeorgeJahad
Copy link
Contributor

GeorgeJahad commented Feb 14, 2023

Hey @ArafatKhan2198 @smengcl @dombizita:

Last year, when @xBis7 and I implemented this for legacy buckets, we were going to do object store buckets as well.

We didn't because @avijayanhwx (the engineer who originally created this epic), told us it would be good to support "object store prefixes", (and we didn't have time to do the extra work.)

It looks like this PR still doesn't support them. Instead it just assigns all object store keys to the corresponding bucket. I'm fine with that, but just want to be sure that everyone understands it wasn't the initial intention.

FYI: this is how @avijayanhwx defined the prefixes, (from an email he sent me):

"What I mean as object store prefix is not dissimilar to a directory in legacy FS buckets, but there is added complexity in Recon to track them since they will not be present in the OM metadata. For example, in a pure OS bucket, the rocksdb entry would be /vol1/bucket1/application1/instance1/file1. The prefixes ( /vol1/bucket1/application1 & /vol1/bucket1/application1/instance1) are not created as keys in RocksDB. We wanted to explore how to capture them and calculate summaries at prefix levels. Feel free to think about the need to do it as well as how we could design it. Thanks for taking this up."

@xBis7
Copy link
Contributor

xBis7 commented Feb 15, 2023

I agree with @GeorgeJahad, there are so many unit tests for all the other bucket types. Shouldn't there be unit tests for OBS as well?

@ArafatKhan2198 ArafatKhan2198 marked this pull request as draft April 8, 2023 15:40
@ArafatKhan2198 ArafatKhan2198 marked this pull request as ready for review October 25, 2023 08:26
@ArafatKhan2198
Copy link
Contributor Author

@adoroszlai @devmadhuu @GeorgeJahad Could you please take a look!

@ArafatKhan2198
Copy link
Contributor Author

Hey @ArafatKhan2198 @smengcl @dombizita:

Last year, when @xBis7 and I implemented this for legacy buckets, we were going to do object store buckets as well.

We didn't because @avijayanhwx (the engineer who originally created this epic), told us it would be good to support "object store prefixes", (and we didn't have time to do the extra work.)

It looks like this PR still doesn't support them. Instead it just assigns all object store keys to the corresponding bucket. I'm fine with that, but just want to be sure that everyone understands it wasn't the initial intention.

FYI: this is how @avijayanhwx defined the prefixes, (from an email he sent me):

"What I mean as object store prefix is not dissimilar to a directory in legacy FS buckets, but there is added complexity in Recon to track them since they will not be present in the OM metadata. For example, in a pure OS bucket, the rocksdb entry would be /vol1/bucket1/application1/instance1/file1. The prefixes ( /vol1/bucket1/application1 & /vol1/bucket1/application1/instance1) are not created as keys in RocksDB. We wanted to explore how to capture them and calculate summaries at prefix levels. Feel free to think about the need to do it as well as how we could design it. Thanks for taking this up."

@GeorgeJahad @xBis7
That's a great suggestion! thanks for explaining it so well. I'm presently in the process of creating a design document outlining how we can implement prefix-based searching for OBS (Object Store) buckets. You can monitor the progress of this initiative on Jira using the following link: https://issues.apache.org/jira/browse/HDDS-9535.

Our primary objective at the moment is to ensure that our customers don't encounter any confusion when using Recon with various bucket types. Implementing prefix-based filtering will likely be a crucial improvement, and I'll be working on it soon.

Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArafatKhan2198 thanks for working on this patch. Few comments, just to start with, Review is still in progress. Will continue to do.

@xBis7
Copy link
Contributor

xBis7 commented Nov 2, 2023

@ArafatKhan2198 Thanks for persisting with this.

I'm presently in the process of creating a design document outlining how we can implement prefix-based searching for OBS (Object Store) buckets.

Is the purpose of this patch to implement prefix-based support for OBS buckets? Or is this going to be in a following PR after the design is finalized?

@ArafatKhan2198
Copy link
Contributor Author

Thank you for your response, @xBis7

This patch will not implement prefix-based search; its primary purpose is to list the keys within an OBS bucket. The support for prefix-based searching in OBS buckets will be added later when I have the available bandwidth to complete the design doc.

We received several support tickets from customers who reported issues with NSSummary not functioning correctly due to the lack of OBS support. Therefore, the goal of this patch is to provide the essential functionality to address this issue and eliminate any confusion for now. This patch offers a straightforward implementation of how we can generate NSSummary for an OBS layout bucket.

The addition of prefix-based OBS filtering can be considered as an enhancement in subsequent updates and will be tracked by this JIRA :- https://issues.apache.org/jira/browse/HDDS-9535

@devmadhuu
Copy link
Contributor

Thank you for your response, @xBis7

This patch will not implement prefix-based search; its primary purpose is to list the keys within an OBS bucket. The support for prefix-based searching in OBS buckets will be added later when I have the available bandwidth to complete the design doc.

We received several support tickets from customers who reported issues with NSSummary not functioning correctly due to the lack of OBS support. Therefore, the goal of this patch is to provide the essential functionality to address this issue and eliminate any confusion for now. This patch offers a straightforward implementation of how we can generate NSSummary for an OBS layout bucket.

The addition of prefix-based OBS filtering can be considered as an enhancement in subsequent updates and will be tracked by this JIRA :- https://issues.apache.org/jira/browse/HDDS-9535

I think simply providing the datasize of single OBS bucket key will not solve the purpose and pain point for users and without adding providing datasize at prefix level will create more confusion when users will have mix of buckets.

@sumitagrawl
Copy link
Contributor

@GeorgeJahad @xBis7 @devmadhuu @ArafatKhan2198 I have looked over discussion and current state of feature, few of suggestions for going ahead,

  1. Current implementation for FSO, UI shows all files, directories and same gets queried. This does not work if have large number of directory Or files at same level, Pie-Chart will be mess, and data query itself will fail. So need support query with pagination at different level of directory and files.
    @ArafatKhan2198 Please raise separate bug to handle this.

  2. We need refactor code to have
    -- Legacy and OBS : as same handler as both case are OBS only
    -- Legacy with filesystem enable: this needs separate handler
    -- FSO: need separate handler
    Legacy with FSO is not supported, this support can be provided with this PR.
    and for OBS, as plain query of data (not a FSO simulation for OBS)

  3. Need another JIRA to support "OBS support with FSO simulation"
    -- This needs enable using flag at backend
    -- frontend: can provide flag to query flat keys OR FSO simulated keys
    -- Backend: Need have pre-processing as separate NSSummary table to have data in hierarchy -- a separate NSSummary handler and query handler
    Detailed design of this needs to be done.

@ArafatKhan2198 IMO, we can break as above and handle this.

@adoroszlai
Copy link
Contributor

@ArafatKhan2198 please merge master into your branch and resolve compile error:

Error:  /home/runner/work/ozone/ozone/hadoop-ozone/recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestNSSummaryEndpointWithOBS.java:[1129,12] no suitable constructor found for SCMNodeStat(long,long,long)
Error:      constructor org.apache.hadoop.hdds.scm.container.placement.metrics.SCMNodeStat.SCMNodeStat() is not applicable
Error:        (actual and formal argument lists differ in length)
Error:      constructor org.apache.hadoop.hdds.scm.container.placement.metrics.SCMNodeStat.SCMNodeStat(org.apache.hadoop.hdds.scm.container.placement.metrics.SCMNodeStat) is not applicable
Error:        (actual and formal argument lists differ in length)
Error:      constructor org.apache.hadoop.hdds.scm.container.placement.metrics.SCMNodeStat.SCMNodeStat(long,long,long,long,long) is not applicable
Error:        (actual and formal argument lists differ in length)

https://github.com/apache/ozone/actions/runs/7862913313/job/21479359409?pr=4245#step:6:2865

Copy link
Contributor

@dombizita dombizita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your continuous effort on this @ArafatKhan2198!! I went through the implementation, I only have a few small nits, overall it looks good to me. I didn't go through the TestNSSummaryTaskWithOBS and the TestNSSummaryEndpointWithOBS classes very throughly, I'll finish that today.


NSSummary nonExistentSummary =
reconNamespaceSummaryManager.getNSSummary(BUCKET_ONE_OBJECT_ID);
Assertions.assertNull(nonExistentSummary);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see that you changed the imports for the assertions, could you check it?

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ArafatKhan2198 for working and fixing all review comments. Just a minor check on my comment you can confirm. rest all LGTM +1

Copy link
Contributor

@dombizita dombizita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work @ArafatKhan2198, some of my comments are not addressed, can you check them?
Also I get that in the TestNSSummaryEndpointWithLegacy and the TestNSSummaryEndpointWithOBS classes we have both the key and the file constants, but I don't think it makes sense. You also have the same value in the constants, I can't find the explanation in the code why we need them and I think it makes the test classes hard to read later. Maybe @xBis7 can you help us as he worked on the previous test classes?

@xBis7
Copy link
Contributor

xBis7 commented Feb 28, 2024

KeyName refers to the entire path under the bucket while FileName refers just to the key, or to put it another way, the last component of the path. We kept the variables the same in the Legacy code so that the tests can easily be compared to FSO. I would suggest doing the same with OBS if it's feasible.

To explain it with an example /vol1/bucket1/dir1/dir2/key1

KeyName: /dir1/dir2/key1

FileName: key1

The issue in these tests is that there is no depth in the key structure. That's why all the key names and the file names are the same.

Check FSO and Legacy

https://github.com/apache/ozone/blob/master/hadoop-ozone/recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestNSSummaryEndpointWithFSO.java#L135-L159

https://github.com/apache/ozone/blob/master/hadoop-ozone/recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestNSSummaryEndpointWithLegacy.java#L135-L159

Here is a definition reference regarding the fileName

https://github.com/apache/ozone/blob/master/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OmKeyInfo.java#L91-L96

@ArafatKhan2198
Copy link
Contributor Author

@dombizita Could you take a look now!
Some of the changes missed out in the previous commit for some reason.

@adoroszlai adoroszlai dismissed their stale review February 29, 2024 17:00

patch updated

Copy link
Contributor

@dombizita dombizita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the tremendous effort on this @ArafatKhan2198! The changes are looking good to me!
Thanks @xBis7 for helping us out on the previous test class designs! I think it made sense to do that in those classes, as the values for key and file names were different and we needed both of them for testing. But here I'd go with only key names, as for OBS buckets that's the one that makes sense and the values would be the same, if we would have the file name constants.

@sumitagrawl sumitagrawl merged commit 0a5fc69 into apache:master Mar 1, 2024
jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Mar 15, 2024
…_STORE buckets. (apache#4245)

(cherry picked from commit 0a5fc69)
Change-Id: I80cdba64165c5f4786e9a50b22ccc395869879ed
swamirishi pushed a commit to swamirishi/ozone that referenced this pull request Jun 10, 2024
… for OBJECT_STORE buckets. (apache#4245)

(cherry picked from commit 0a5fc69)
Change-Id: Icd9d4dcb4d0dbb7160bc3d5a2eb44d163a556fc9
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Jul 17, 2024
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Jul 17, 2024
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Jul 17, 2024
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Jul 18, 2024
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants