Skip to content

Conversation

@whbing
Copy link
Contributor

@whbing whbing commented Jun 29, 2023

What changes were proposed in this pull request?

[S3G] Improve list performance in LEGACY/OBS bucket when listing delimited by '/'

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8822

How was this patch tested?

  • Manual test

(1) LEGACY bucket list optimization

keys info:

hadoop fs -count ofs://om/s3v/buk-lg/*
           6        54712             109424 ofs://om/s3v/buk-lg/s3put
           1            1                238 ofs://om/s3v/buk-lg/test1
          61        30489              30489 ofs://om/s3v/buk-lg/test2
          21       100000             100000 ofs://om/s3v/buk-lg/test3

hadoop fs -count ofs://om/s3v/buk-lg/test3/*
           1         5000               5000 ofs://om/s3v/buk-lg/test3/dir0_47640
           1         5000               5000 ofs://om/s3v/buk-lg/test3/dir10_19047
           1         5000               5000 ofs://om/s3v/buk-lg/test3/dir11_99507
           1         5000               5000 ofs://om/s3v/buk-lg/test3/dir12_43926
           1         5000               5000 ofs://om/s3v/buk-lg/test3/dir13_98700
           1         5000               5000 ofs://om/s3v/buk-lg/test3/dir14_30007
           1         5000               5000 ofs://om/s3v/buk-lg/test3/dir15_32065
           1         5000               5000 ofs://om/s3v/buk-lg/test3/dir16_42162
           1         5000               5000 ofs://om/s3v/buk-lg/test3/dir17_71071
           1         5000               5000 ofs://om/s3v/buk-lg/test3/dir18_46050
           1         5000               5000 ofs://om/s3v/buk-lg/test3/dir19_86567
           1         5000               5000 ofs://om/s3v/buk-lg/test3/dir1_78108
           1         5000               5000 ofs://om/s3v/buk-lg/test3/dir2_53468
           1         5000               5000 ofs://om/s3v/buk-lg/test3/dir3_60271
           1         5000               5000 ofs://om/s3v/buk-lg/test3/dir4_41931
           1         5000               5000 ofs://om/s3v/buk-lg/test3/dir5_92207
           1         5000               5000 ofs://om/s3v/buk-lg/test3/dir6_93526
           1         5000               5000 ofs://om/s3v/buk-lg/test3/dir7_98836
           1         5000               5000 ofs://om/s3v/buk-lg/test3/dir8_71060
           1         5000               5000 ofs://om/s3v/buk-lg/test3/dir9_11147

time before and after optimization:

$ time aws s3api --endpoint http://<old-s3g-ip>:9878 list-objects-v2 --bucket buk-lg --prefix 't' --delimiter '/'
{
    "CommonPrefixes": [
        {
            "Prefix": "test1/"
        },
        {
            "Prefix": "test2/"
        },
        {
            "Prefix": "test3/"
        }
    ]
}

real	0m13.022s
user	0m0.952s
sys	0m0.161s

$ time aws s3api --endpoint http://<new-s3g-ip>:9878 list-objects-v2 --bucket buk-lg --prefix 't' --delimiter '/'
{
    "CommonPrefixes": [
        {
            "Prefix": "test1/"
        },
        {
            "Prefix": "test2/"
        },
        {
            "Prefix": "test3/"
        }
    ]
}

real	0m3.210s
user	0m0.969s
sys	0m0.169s

Detail data are shown in the following table:

time with prefix t test3/ ''
before optimization 13.022s 10.381s 16.430s
after optimization 3.210s 3.134s 3.132s

(2) OBS bucket list optimization

keys info:

aws s3api --endpoint http://<s3g-ip>:9878 list-objects-v2 --bucket buk-obs --prefix 'obs1/obs2/obs3/' --max-keys 100000 | grep Key | wc -l
9986
aws s3api --endpoint http://<s3g-ip>:9878 list-objects-v2 --bucket buk-obs --prefix 'obs1/obs2/obs31/' --max-keys 100000 | grep Key | wc -l
10002
aws s3api --endpoint http://<s3g-ip>:9878 list-objects-v2 --bucket buk-obs --prefix 'obs1/obs2/obs311/' --max-keys 100000 | grep Key | wc -l
57549

time before and after optimization:

$ time aws s3 --endpoint http://<old-s3g-ip>:9878 ls buk-obs/obs1/obs2/obs3
                           PRE obs3/
                           PRE obs31/
                           PRE obs311/

real	0m9.165s
user	0m0.919s
sys	0m0.114s
$ time aws s3 --endpoint http://<new-s3g-ip>:9878 ls buk-obs/obs1/obs2/obs3
                           PRE obs3/
                           PRE obs31/
                           PRE obs311/

real	0m3.124s
user	0m0.927s
sys	0m0.159s

Detail data are shown in the following table:

time with prefix obs1/obs2/obs3 obs1/obs2/ ''
before optimization 9.165s 9.112s 9.045s
after optimization 3.124s 3.067s 3.014s

Note:reduce ozone.client.list.cache to simulate multiple calls to iterators

@whbing whbing marked this pull request as draft June 29, 2023 09:27
@whbing
Copy link
Contributor Author

whbing commented Jun 29, 2023

Related to #4868
The optimization effect is shown in the above table.
Will open for review once the related pr merged.

@whbing whbing marked this pull request as ready for review June 30, 2023 12:19
@whbing whbing changed the title HDDS-8822. [S3G] Improve list performance in LEGACY bucket HDDS-8822. [S3G] Improve list performance in LEGACY/OBS bucket Jun 30, 2023
@whbing
Copy link
Contributor Author

whbing commented Jun 30, 2023

Copy link
Member

@captainzmc captainzmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to @whbing for the patch, the change looks good. there some minor comments. I guess CI has other problems.

Copy link
Contributor

@xichen01 xichen01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good,just a comment you can refer to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constructing a OzoneKey from OmKeyInf appears many times in the code, so maybe we can create a tool function to do those. (Such as line L1608 and L1508)

@whbing whbing marked this pull request as draft July 9, 2023 14:02
@whbing whbing marked this pull request as ready for review July 22, 2023 04:11
@whbing
Copy link
Contributor Author

whbing commented Jul 22, 2023

Sorry for the late follow-up. I added the test and method doc for listKeys, and also tested the s3 list for legacy and obs buckets in the test environment. @captainzmc please help triger ci, thanks !

@captainzmc
Copy link
Member

@whbing,Sorry for the late reply. CI has been triggered

@whbing
Copy link
Contributor Author

whbing commented Aug 8, 2023

@whbing,Sorry for the late reply. CI has been triggered

@captainzmc Thanks! I will add more test cases if needed and improve some other boundary cases in the coming days.

@whbing
Copy link
Contributor Author

whbing commented Aug 10, 2023

@captainzmc Thanks for review. I added a param ozone-s3g.list-keys.shallow.enabled to compare the efficiency before and after using this feature. The time taken for both the true and false settings corresponds with the initial description provided on this page.

Copy link
Member

@captainzmc captainzmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, Thanks @whbing for the update, the change looks good.Hi @xichen01 @ivandika3 Would you take another look?

@captainzmc
Copy link
Member

Let's merge this, Thanks @whbing for the patch. And Thanks @ivandika3 @xichen01 for the review.

jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Feb 1, 2024
…ucket (apache#5003)

(cherry picked from commit 49955db)
Change-Id: I030ac8fb946504f5190fbb1f57d2b086a963fafc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants