-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH]: better ordered writer block splitting #3177
base: main
Are you sure you want to change the base?
[ENH]: better ordered writer block splitting #3177
Conversation
This reverts commit 7dde094.
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
Please tag your PR title with one of: [ENH | BUG | DOC | TST | BLD | PERF | TYP | CLN | CHORE]. See https://docs.trychroma.com/contributing#contributing-code-and-ideas |
…writer-block-splitting
Please tag your PR title with one of: [ENH | BUG | DOC | TST | BLD | PERF | TYP | CLN | CHORE]. See https://docs.trychroma.com/contributing#contributing-code-and-ideas |
…writer-block-splitting
Please tag your PR title with one of: [ENH | BUG | DOC | TST | BLD | PERF | TYP | CLN | CHORE]. See https://docs.trychroma.com/contributing#contributing-code-and-ideas |
33b6b42
to
82ebeae
Compare
82ebeae
to
5e2a058
Compare
@@ -316,7 +316,8 @@ impl ArrowOrderedBlockfileWriter { | |||
.current_block_delta | |||
.take() | |||
.expect("We already checked above that there is a current delta"); | |||
let new_delta = current_delta.split_off_half::<K, V>(); | |||
|
|||
let new_delta = current_delta.split_off_last_key().expect("This returns None only if the delta is empty. We just added to the delta, so it is not empty."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main change. Since the ordered writer writes keys in strictly increasing order, we can always split deltas at the last key instead of splitting them at their midpoint. This results in full blocks (rather than half-sized blocks) and better performance.
Description of changes
These changes result in fuller blocks when using ordered writing mode, leading to less fragmentation and slightly better writing performance depending on the write pattern:
Test plan
How are these changes tested?
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs repository?
n/a