Skip to content

Conversation

@swamirishi
Copy link
Contributor

What changes were proposed in this pull request?

Currently most of the operations performed on the tables are agnostic to order of iteration of keys and can be executed in parallel to speed up most of such operations. The proposal here is to introduce a central functionaliy to achieve this.

The whole idea for parallelizing in this approach is to peek into the sst files metadata of rocksdb to get the keyRanges(smallestKey and largestKey) from LiveFileMetadata corresponding to the rocksdb column family and get an approximate set of range of keys in the table. Thus indirectly the table would be iterated based on the number of sst files as there are on the DB.
For instance consider the case:
1.sst has key range [k1 - k10]
2.sst has key range [k5 - k15]
3.sst has key range [k15 - k30].
Thus with the approach we would be splitting the table into 4 iterators iterating the key range:
[k1 - k5), [k5, k10), [k10, k15), [k15, k30]

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-12779

How was this patch tested?

Adding unit tests

swamirishi and others added 30 commits April 12, 2025 10:32
…KeyValue

Change-Id: Ifd24aadf501f3af1f1385ea2a43b036ddee2c5a1
Change-Id: Ifdfafe0ec0a3467b012a0abb407bc8d5e75a88a3
Change-Id: I2b0ec8b91d0c9ac47b20b05665b5d31d8d486067
Change-Id: I5bd5babb70126b2a59ab082e7c0c7469398817b2
Change-Id: Id34d08479c4baedc28f177e82ed9707aff481da8
Change-Id: I54f7ebeb3967947131ae7acfefa17990965f5ca4
Change-Id: Ic0e600dfa46951e048231a4e56a20bd9ce6fa423
Change-Id: Ia271b1d6be0a4a526a4a65fda7d1ad6dc9cd2bd2
…king stack

Change-Id: I78272536972d58b76a10092bfde418a29ba8adae
Change-Id: I95dcc0d6e370329846272441dc70f4be2b671a29
Change-Id: I4db792cbf85d60f371653e7b9119d5155661c228
Change-Id: I91605a0d5e6e9390c5a75a518d75ef8e5998c97d
Change-Id: I825272d43b9c2614072ac41773813221c79dc130

# Conflicts:
#	hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RawKeyValue.java
Change-Id: I73c8ff72786dcc57650bbd1107915946fc291396
Change-Id: I46afdf6e6510fece9bca9a0d677e52b33dfc97c5
…ls/db/RDBStoreCodecBufferIterator.java

Co-authored-by: Ritesh H Shukla <[email protected]>
Change-Id: Ifdff95b06c134ff3a2d2bf7022edcf7c630c86c2
Change-Id: I43c3342214fd698b692870e63c6566e43b24e350
Change-Id: Ic39f3016d8999b2492905828043ec22fa4da51ab
Change-Id: I9ea4eaf4b9f4db8c279a90d5222cb448d46772f6
Change-Id: I819e2a82cf30d5851762ca2fe8c881c716035df4
Change-Id: I251e28caef8c20263cff822792cf2de8ca6b61ca
Change-Id: If5caeff1bcd4c5015b08f147a90be615dede3e38
Change-Id: Iaf7e8059d442e954e722c58620ccd4811e15c855
Change-Id: Idf8a61a54288fd25bcbfdcde362d76868ed9329a
Change-Id: Id6f11892e47103ebecd7b2cd8549d4ff4ebcf444
Change-Id: I214cab272e61ce1de527b78190dd1c5560061c8e
Change-Id: I8b80f584a316cc54778068572702fe3c69699185
Change-Id: I6527e2a2ed8f22131880e5c7cf14601614254671
Change-Id: I7c720a3432efaa8e308ce7693e70b309df472851
…ators across different ranges

Change-Id: I8ee22aed8c4be85ccb693cc9dcaf00880c4e736b
@swamirishi swamirishi changed the title Hdds 12779 alt HDDS-12779. Parallelize table Spliterator with multiple multiple iterators across different ranges May 28, 2025
Change-Id: I2c32f89e4a9f65f46695bd9ab51ee75f78ebf2ab
…ator

Change-Id: I73eb2d451d91d1f6177dc718f4b0cd2fbafb9d2c
Change-Id: Id594fc104fa32d69ab2a84ee5d08363af142e3d3
Change-Id: Ia69d68b1058350f4004c49e7ad2d41c984af1455

# Conflicts:
#	hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBTable.java
#	hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RawSpliterator.java
#	hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/TypedTable.java
Change-Id: I04fe90cbfc65e4c52ed2b36c223ade76417c06c0
Change-Id: I65ca22255e2f0e5b9f58010a73fe53ac4cfe971e
Change-Id: I171b21260d363550ccd336b18042423055252547
@smengcl smengcl requested a review from szetszwo June 2, 2025 16:51
Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swamirishi , please move all the code for parallel iterator implementation to a new package. We should not change the existing iterator code as mentioned in this comment.

@szetszwo
Copy link
Contributor

szetszwo commented Jun 2, 2025

BTW, as mentioned in #8243, it is better to not implementing Comparator. Just use equals(..) for detecting overlap.

@github-actions
Copy link

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

@github-actions
Copy link

Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it.

@github-actions github-actions bot closed this Nov 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants