Fix Iceberg read when positional delete files are unaligned#10261
Fix Iceberg read when positional delete files are unaligned#10261yingsu00 wants to merge 2 commits intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
40bf1c3 to
f965154
Compare
| return rowsScanned; | ||
| } | ||
|
|
||
| void IcebergSplitReader::shiftDeleteBitmap() { |
There was a problem hiding this comment.
There was a problem hiding this comment.
Yes there is definitely room to improve copyBits. The overall timing of that function in the query is not large though.
| mutation.deletedRows = | ||
| deleteBitmap_ ? deleteBitmap_->as<uint64_t>() : nullptr; | ||
|
|
||
| auto rowsScanned = baseRowReader_->next(size, output, &mutation); |
There was a problem hiding this comment.
There is a function to get actual scanning size in row reader called nextReadSize. There is also nextRowNumber to help you track the number of rows scanned on base file. You can consider using these methods to make reading delta positions a little easier and fewer states to maintain (maybe in subsequent diffs).
There was a problem hiding this comment.
@Yuhta Thanks Jimmy! I checked these functions and yes we can get the actual rows scanned before calling base RowReader's next(). So I'll send a separate PR to simplify the logic using these functions.
For now I updated the PR with your another comment addressed. Would appreciate your review again.
f965154 to
b23f34c
Compare
This commit makes it possible to create base data files and delete fiels with un-aligned Rowgroup boundaries. It also added several new test cases and improved some variable and function namings.
b23f34c to
f848d6e
Compare
|
@Yuhta Would you please review again? Thank you! |
|
@Yuhta Thanks for reviewing and approving this PR. Do we know when it will be merged? |
|
@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
When the base data file and positional delete files contains multiple unaligned RowGroups, some of the bits at the end of IcebergSplitReader::deleteBitmap_ could be mistakenly skipped, causing wrong result. This commit fixes it by introducing an offset into this deleteBitmap_ and shift the unused bits to the beginning for each batch.
f848d6e to
e4df770
Compare
|
@Yuhta Thanks for importing the PR and let me know the failures! I have updated the PR with your comments addressed. |
|
@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
|
This pull request has been reverted by e8a73ae. |

When the base data file and positional delete files contains multiple
unaligned RowGroups, some of the bits at the end of
IcebergSplitReader::deleteBitmap_ could be mistakenly skipped, causing
wrong result. This commit fixes it by introducing an offset into this
deleteBitmap_ and shift the unused bits to the beginning for each batch.
Fixes #9856