-
Notifications
You must be signed in to change notification settings - Fork 587
HDDS-12779. Parallelize table Spliterator with multiple multiple iterators across different ranges #8517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…KeyValue Change-Id: Ifd24aadf501f3af1f1385ea2a43b036ddee2c5a1
Change-Id: Ifdfafe0ec0a3467b012a0abb407bc8d5e75a88a3
Change-Id: I2b0ec8b91d0c9ac47b20b05665b5d31d8d486067
Change-Id: I5bd5babb70126b2a59ab082e7c0c7469398817b2
Change-Id: Id34d08479c4baedc28f177e82ed9707aff481da8
Change-Id: I54f7ebeb3967947131ae7acfefa17990965f5ca4
Change-Id: Ic0e600dfa46951e048231a4e56a20bd9ce6fa423
Change-Id: Ia271b1d6be0a4a526a4a65fda7d1ad6dc9cd2bd2
…king stack Change-Id: I78272536972d58b76a10092bfde418a29ba8adae
Change-Id: I95dcc0d6e370329846272441dc70f4be2b671a29
Change-Id: I4db792cbf85d60f371653e7b9119d5155661c228
Change-Id: I91605a0d5e6e9390c5a75a518d75ef8e5998c97d
Change-Id: I825272d43b9c2614072ac41773813221c79dc130 # Conflicts: # hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RawKeyValue.java
Change-Id: I73c8ff72786dcc57650bbd1107915946fc291396
Change-Id: I46afdf6e6510fece9bca9a0d677e52b33dfc97c5
…ls/db/RDBStoreCodecBufferIterator.java Co-authored-by: Ritesh H Shukla <[email protected]>
Change-Id: Ifdff95b06c134ff3a2d2bf7022edcf7c630c86c2
Change-Id: I43c3342214fd698b692870e63c6566e43b24e350
Change-Id: Ic39f3016d8999b2492905828043ec22fa4da51ab
Change-Id: I9ea4eaf4b9f4db8c279a90d5222cb448d46772f6
Change-Id: I819e2a82cf30d5851762ca2fe8c881c716035df4
Change-Id: I251e28caef8c20263cff822792cf2de8ca6b61ca
Change-Id: If5caeff1bcd4c5015b08f147a90be615dede3e38
Change-Id: Iaf7e8059d442e954e722c58620ccd4811e15c855
Change-Id: Idf8a61a54288fd25bcbfdcde362d76868ed9329a
Change-Id: Id6f11892e47103ebecd7b2cd8549d4ff4ebcf444
Change-Id: I214cab272e61ce1de527b78190dd1c5560061c8e
Change-Id: I8b80f584a316cc54778068572702fe3c69699185
Change-Id: I6527e2a2ed8f22131880e5c7cf14601614254671
Change-Id: I7c720a3432efaa8e308ce7693e70b309df472851
…ators across different ranges Change-Id: I8ee22aed8c4be85ccb693cc9dcaf00880c4e736b
44cbab9 to
af6d30d
Compare
Change-Id: I2c32f89e4a9f65f46695bd9ab51ee75f78ebf2ab
…ator Change-Id: I73eb2d451d91d1f6177dc718f4b0cd2fbafb9d2c
Change-Id: Id594fc104fa32d69ab2a84ee5d08363af142e3d3
Change-Id: Ia69d68b1058350f4004c49e7ad2d41c984af1455 # Conflicts: # hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBTable.java # hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RawSpliterator.java # hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/TypedTable.java
Change-Id: I04fe90cbfc65e4c52ed2b36c223ade76417c06c0
Change-Id: I65ca22255e2f0e5b9f58010a73fe53ac4cfe971e
Change-Id: I171b21260d363550ccd336b18042423055252547
szetszwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@swamirishi , please move all the code for parallel iterator implementation to a new package. We should not change the existing iterator code as mentioned in this comment.
|
BTW, as mentioned in #8243, it is better to not implementing |
|
This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days. |
|
Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it. |
What changes were proposed in this pull request?
Currently most of the operations performed on the tables are agnostic to order of iteration of keys and can be executed in parallel to speed up most of such operations. The proposal here is to introduce a central functionaliy to achieve this.
The whole idea for parallelizing in this approach is to peek into the sst files metadata of rocksdb to get the keyRanges(smallestKey and largestKey) from LiveFileMetadata corresponding to the rocksdb column family and get an approximate set of range of keys in the table. Thus indirectly the table would be iterated based on the number of sst files as there are on the DB.
For instance consider the case:
1.sst has key range [k1 - k10]
2.sst has key range [k5 - k15]
3.sst has key range [k15 - k30].
Thus with the approach we would be splitting the table into 4 iterators iterating the key range:
[k1 - k5), [k5, k10), [k10, k15), [k15, k30]
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-12779
How was this patch tested?
Adding unit tests