Skip to content

fix: spurious bug during compaction#1829

Merged
eddyxu merged 1 commit intolance-format:mainfrom
westonpace:fix/scan_in_order_if_capturing_row_ids_in_sequence
Jan 14, 2024
Merged

fix: spurious bug during compaction#1829
eddyxu merged 1 commit intolance-format:mainfrom
westonpace:fix/scan_in_order_if_capturing_row_ids_in_sequence

Conversation

@westonpace
Copy link
Member

During compaction we read in the row ids so we can use them for remapping. We read the row ids into a tree map which is most efficient if we can read them in order and the current code is in fact assuming that the row ids arrive in order. However, we weren't actually scanning in order and this could lead to failures during compaction.

@eddyxu eddyxu merged commit 0bfcd30 into lance-format:main Jan 14, 2024
scanner.with_fragments(fragments.clone()).with_row_id();
scanner
.with_fragments(fragments.clone())
.scan_in_order(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this impacts the bug, but I should point out that scan_in_order guarantees that results will be in order of local row ids, but not in order of fragment ids. This is because the fragment list in the manifest is not guaranteed to be sorted by row_id. You should be able to force a scan to be globally sorted by row_id by passing in the .fragments() parameter with the list of fragments sorted by fragment id.

eddyxu pushed a commit that referenced this pull request Jan 16, 2024
During compaction we read in the row ids so we can use them for
remapping. We read the row ids into a tree map which is most efficient
if we can read them in order and the current code is in fact assuming
that the row ids arrive in order. However, we weren't actually scanning
in order and this could lead to failures during compaction.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants