-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fullrt rework batching #720
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of comments to consider, but looks good overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Improvements to tuning parallelism and timeouts can be made after further observation of performance.
Removed divideIntoGroups Renamed divideIntoChunks into divideByChunkSize Reworked tests to target divideByChunkSize instead of divideIntoGroups Fixed divideByChunkSize to pass tests Made divideByChunkSize handle the cases of empty key lists and panic on invalid chunk sizes
af6797d
to
dae2082
Compare
Reworked full batching so that it's more explicitly peer oriented rather than key oriented.
There's still room for improvement here on dealing with slow peers (e.g. having more message-based rather than peer-based parallelism).
Some tests on a 2 core machine with about 800k CIDs being provided (after a fresh crawl, although that is hopefully not super relevant)