fix(query): fix Inverted/Vector index panic with Native Storage Format #18932
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
Problem: When using
storage_format = nativefor data storage, queries involving inverted or vector indexes could lead to a panic. This issue specifically occurred because the native format processes data page by page, where a single data block is internally divided into multiple pages.for example
This PR addresses the panic by introducing a mechanism to correctly transform the block-level row indices (idx) from the inverted and vector indexes into page-specific indices.
To efficiently manage the row offsets in each page,
std::vec::Vechas been replaced withRoaringTreemap.RoaringTreemapprovides a highly optimized way to store and query sets of integers.fixes: #[Link the issue here]
Tests
Type of change
This change is