HDDS-12779. Parallelize table Spliterator with multiple multiple iterators across different ranges #8517

swamirishi · 2025-05-28T07:33:08Z

What changes were proposed in this pull request?

Currently most of the operations performed on the tables are agnostic to order of iteration of keys and can be executed in parallel to speed up most of such operations. The proposal here is to introduce a central functionaliy to achieve this.

The whole idea for parallelizing in this approach is to peek into the sst files metadata of rocksdb to get the keyRanges(smallestKey and largestKey) from LiveFileMetadata corresponding to the rocksdb column family and get an approximate set of range of keys in the table. Thus indirectly the table would be iterated based on the number of sst files as there are on the DB.
For instance consider the case:
1.sst has key range [k1 - k10]
2.sst has key range [k5 - k15]
3.sst has key range [k15 - k30].
Thus with the approach we would be splitting the table into 4 iterators iterating the key range:
[k1 - k5), [k5, k10), [k10, k15), [k15, k30]

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-12779

How was this patch tested?

Adding unit tests

…KeyValue Change-Id: Ifd24aadf501f3af1f1385ea2a43b036ddee2c5a1

Change-Id: Ifdfafe0ec0a3467b012a0abb407bc8d5e75a88a3

Change-Id: I2b0ec8b91d0c9ac47b20b05665b5d31d8d486067

Change-Id: I5bd5babb70126b2a59ab082e7c0c7469398817b2

Change-Id: Id34d08479c4baedc28f177e82ed9707aff481da8

Change-Id: I54f7ebeb3967947131ae7acfefa17990965f5ca4

Change-Id: Ic0e600dfa46951e048231a4e56a20bd9ce6fa423

Change-Id: Ia271b1d6be0a4a526a4a65fda7d1ad6dc9cd2bd2

…king stack Change-Id: I78272536972d58b76a10092bfde418a29ba8adae

Change-Id: I95dcc0d6e370329846272441dc70f4be2b671a29

Change-Id: I4db792cbf85d60f371653e7b9119d5155661c228

Change-Id: I91605a0d5e6e9390c5a75a518d75ef8e5998c97d

Change-Id: I825272d43b9c2614072ac41773813221c79dc130 # Conflicts: # hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RawKeyValue.java

Change-Id: I73c8ff72786dcc57650bbd1107915946fc291396

Change-Id: I46afdf6e6510fece9bca9a0d677e52b33dfc97c5

…ls/db/RDBStoreCodecBufferIterator.java Co-authored-by: Ritesh H Shukla <[email protected]>

Change-Id: Ifdff95b06c134ff3a2d2bf7022edcf7c630c86c2

Change-Id: I43c3342214fd698b692870e63c6566e43b24e350

Change-Id: Ic39f3016d8999b2492905828043ec22fa4da51ab

Change-Id: I9ea4eaf4b9f4db8c279a90d5222cb448d46772f6

Change-Id: I819e2a82cf30d5851762ca2fe8c881c716035df4

Change-Id: I251e28caef8c20263cff822792cf2de8ca6b61ca

Change-Id: If5caeff1bcd4c5015b08f147a90be615dede3e38

Change-Id: Iaf7e8059d442e954e722c58620ccd4811e15c855

Change-Id: Idf8a61a54288fd25bcbfdcde362d76868ed9329a

Change-Id: Id6f11892e47103ebecd7b2cd8549d4ff4ebcf444

Change-Id: I214cab272e61ce1de527b78190dd1c5560061c8e

Change-Id: I8b80f584a316cc54778068572702fe3c69699185

Change-Id: I6527e2a2ed8f22131880e5c7cf14601614254671

Change-Id: I7c720a3432efaa8e308ce7693e70b309df472851

…ators across different ranges Change-Id: I8ee22aed8c4be85ccb693cc9dcaf00880c4e736b

Change-Id: I2c32f89e4a9f65f46695bd9ab51ee75f78ebf2ab

…ator Change-Id: I73eb2d451d91d1f6177dc718f4b0cd2fbafb9d2c

Change-Id: Id594fc104fa32d69ab2a84ee5d08363af142e3d3

Change-Id: Ia69d68b1058350f4004c49e7ad2d41c984af1455 # Conflicts: # hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBTable.java # hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RawSpliterator.java # hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/TypedTable.java

Change-Id: I04fe90cbfc65e4c52ed2b36c223ade76417c06c0

Change-Id: I65ca22255e2f0e5b9f58010a73fe53ac4cfe971e

Change-Id: I171b21260d363550ccd336b18042423055252547

szetszwo

@swamirishi , please move all the code for parallel iterator implementation to a new package. We should not change the existing iterator code as mentioned in this comment.

szetszwo · 2025-06-02T18:11:22Z

BTW, as mentioned in #8243, it is better to not implementing Comparator. Just use equals(..) for detecting overlap.

github-actions · 2025-11-12T00:05:14Z

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

github-actions · 2025-11-19T00:05:33Z

Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it.

swamirishi and others added 30 commits April 12, 2025 10:32

HDDS-12742. Make RDBStoreAbstractIterator return a reference counted …

30facb9

…KeyValue Change-Id: Ifd24aadf501f3af1f1385ea2a43b036ddee2c5a1

HDDS-12742. Remove log

aab6ebc

Change-Id: Ifdfafe0ec0a3467b012a0abb407bc8d5e75a88a3

HDDS-12742. Remove synchronized

5dd7460

Change-Id: I2b0ec8b91d0c9ac47b20b05665b5d31d8d486067

HDDS-12742. Fix seek

e4bf8f7

Change-Id: I5bd5babb70126b2a59ab082e7c0c7469398817b2

HDDS-12742. Add test case

9ab91c9

Change-Id: Id34d08479c4baedc28f177e82ed9707aff481da8

HDDS-12742. Fix iter

abe06ab

Change-Id: I54f7ebeb3967947131ae7acfefa17990965f5ca4

HDDS-12742. Fix NPE

d215d4b

Change-Id: Ic0e600dfa46951e048231a4e56a20bd9ce6fa423

HDDS-12742. Fix NPE

85c74b4

Change-Id: Ia271b1d6be0a4a526a4a65fda7d1ad6dc9cd2bd2

HDDS-12742. Add Blocking Deque instead custom implementation of a blc…

7d82a9e

…king stack Change-Id: I78272536972d58b76a10092bfde418a29ba8adae

HDDS-12742. Fix checkstyle

ce6ab81

Change-Id: I95dcc0d6e370329846272441dc70f4be2b671a29

HDDS-12742. Fix test cases

091e8ca

Change-Id: I4db792cbf85d60f371653e7b9119d5155661c228

Merge remote-tracking branch 'apache/master' into HEAD

6bc5c86

Change-Id: I91605a0d5e6e9390c5a75a518d75ef8e5998c97d

Merge remote-tracking branch 'apache/master' into HEAD

1b394c2

Change-Id: I825272d43b9c2614072ac41773813221c79dc130 # Conflicts: # hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RawKeyValue.java

HDDS-12742. Add Spliterator

7fdced2

Change-Id: I73c8ff72786dcc57650bbd1107915946fc291396

HDDS-12742. Fix Spliterator

f5b633b

Change-Id: I46afdf6e6510fece9bca9a0d677e52b33dfc97c5

Update hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/uti…

1971a96

…ls/db/RDBStoreCodecBufferIterator.java Co-authored-by: Ritesh H Shukla <[email protected]>

HDDS-12742. Make concurrent hash set

a33f265

Change-Id: Ifdff95b06c134ff3a2d2bf7022edcf7c630c86c2

Merge remote-tracking branch 'origin/HDDS-12742' into HEAD

0c3ae4f

Change-Id: I43c3342214fd698b692870e63c6566e43b24e350

Merge remote-tracking branch 'apache/master' into HEAD

fbe213b

Change-Id: Ic39f3016d8999b2492905828043ec22fa4da51ab

HDDS-12742. Fix checkstyle

71f9c28

Change-Id: I9ea4eaf4b9f4db8c279a90d5222cb448d46772f6

HDDS-12742. Fix pmd

5734027

Change-Id: I819e2a82cf30d5851762ca2fe8c881c716035df4

HDDS-12742. Fix checkstyle

99c508e

Change-Id: I251e28caef8c20263cff822792cf2de8ca6b61ca

HDDS-12742. Fix max buffer definition

f16e1b9

Change-Id: If5caeff1bcd4c5015b08f147a90be615dede3e38

HDDS-12742. Add tests

86fe101

Change-Id: Iaf7e8059d442e954e722c58620ccd4811e15c855

Merge remote-tracking branch 'apache/master' into HEAD

cfde81f

Change-Id: Idf8a61a54288fd25bcbfdcde362d76868ed9329a

HDDS-12742. Fix exception handling for memory leaks

f825754

Change-Id: Id6f11892e47103ebecd7b2cd8549d4ff4ebcf444

HDDS-12742. Make spliterator an interface parameter

e4155a6

Change-Id: I214cab272e61ce1de527b78190dd1c5560061c8e

HDDS-12742. Fix build

985d612

Change-Id: I8b80f584a316cc54778068572702fe3c69699185

HDDS-12742. Fix findbugs

b98968b

Change-Id: I6527e2a2ed8f22131880e5c7cf14601614254671

Merge remote-tracking branch 'apache/master' into HEAD

3abac4f

Change-Id: I7c720a3432efaa8e308ce7693e70b309df472851

HDDS-12779. Parallelize table Spliterator with multiple multiple iter…

af6d30d

…ators across different ranges Change-Id: I8ee22aed8c4be85ccb693cc9dcaf00880c4e736b

swamirishi changed the title ~~Hdds 12779 alt~~ HDDS-12779. Parallelize table Spliterator with multiple multiple iterators across different ranges May 28, 2025

swamirishi force-pushed the HDDS-12779_alt branch from 44cbab9 to af6d30d Compare May 28, 2025 08:54

swamirishi added 7 commits May 28, 2025 09:22

HDDS-12779. Revert creating an init function

231ad28

Change-Id: I2c32f89e4a9f65f46695bd9ab51ee75f78ebf2ab

HDDS-12742. Convert to Named Class from anonymous class of RawSpliter…

78d33af

…ator Change-Id: I73eb2d451d91d1f6177dc718f4b0cd2fbafb9d2c

HDDS-12742. Fix checkstyle

ed632a8

Change-Id: Id594fc104fa32d69ab2a84ee5d08363af142e3d3

HDDS-12779. Fix checkstyle

18deb3c

Change-Id: I04fe90cbfc65e4c52ed2b36c223ade76417c06c0

HDDS-12779. Fix bug

82c3f01

Change-Id: I65ca22255e2f0e5b9f58010a73fe53ac4cfe971e

HDDS-12779. Fix Boundary iteration logic

a6c25e7

Change-Id: I171b21260d363550ccd336b18042423055252547

swamirishi requested review from errose28, kerneltime and sumitagrawl May 28, 2025 17:37

smengcl requested a review from szetszwo June 2, 2025 16:51

szetszwo requested changes Jun 2, 2025

View reviewed changes

github-actions bot added the stale label Nov 12, 2025

swamirishi mentioned this pull request Nov 12, 2025

HDDS-12607. Parallelize recon tasks to speed up OM rocksdb reading tasks #9243

Merged

github-actions bot closed this Nov 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HDDS-12779. Parallelize table Spliterator with multiple multiple iterators across different ranges #8517

HDDS-12779. Parallelize table Spliterator with multiple multiple iterators across different ranges #8517

Uh oh!

swamirishi commented May 28, 2025

Uh oh!

szetszwo left a comment

Uh oh!

szetszwo commented Jun 2, 2025

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

github-actions bot commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HDDS-12779. Parallelize table Spliterator with multiple multiple iterators across different ranges #8517

HDDS-12779. Parallelize table Spliterator with multiple multiple iterators across different ranges #8517

Uh oh!

Conversation

swamirishi commented May 28, 2025

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

szetszwo left a comment

Choose a reason for hiding this comment

Uh oh!

szetszwo commented Jun 2, 2025

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

github-actions bot commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants