Fix incorrect results when writing deletion vectors in Delta Lake#23231
Fix incorrect results when writing deletion vectors in Delta Lake#23231
Conversation
5eb9e07 to
e1cd4ba
Compare
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Can we add a test that fails for this case deterministically ?
I don't get why this failure was flaky
|
The test is still flaky. Probably, Databricks/Delta Lake suspects deletion vectors is used even for rewriting all rows. |
e1cd4ba to
ff6fcb0
Compare
...t-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeDeleteCompatibility.java
Outdated
Show resolved
Hide resolved
570ea53 to
513a53a
Compare
513a53a to
6b47720
Compare
|
Pushed a new commit to fix correctness issue when rewriting all rows. |
| long rowCount = parquetMetadata.getBlocks().stream().map(BlockMetadata::rowCount).mapToLong(Long::longValue).sum(); | ||
| RoaringBitmapArray rowsRetained = new RoaringBitmapArray(); | ||
| rowsRetained.addRange(0, rowCount); | ||
| rowsRetained.addRange(0, rowCount - 1); |
There was a problem hiding this comment.
Do we have a test which detects this issue ?
There was a problem hiding this comment.
The following line throws an exception without this change:
assertThat(getEntriesFromJson(3, tableLocation + "/_delta_log", FILE_SYSTEM).orElseThrow().get(1).getRemove().deletionVector().orElseThrow())
.isEqualTo(deletionVector);(Not ideal test, but I assume it's enough)
| long rowCount = parquetMetadata.getBlocks().stream().map(BlockMetadata::rowCount).mapToLong(Long::longValue).sum(); | ||
| RoaringBitmapArray rowsRetained = new RoaringBitmapArray(); | ||
| rowsRetained.addRange(0, rowCount); | ||
| rowsRetained.addRange(0, rowCount - 1); |
There was a problem hiding this comment.
What was happening as a consequence of having a too long range before?
| { | ||
| String sourceRelativePath = relativePath(rootTableLocation.toString(), sourcePath); | ||
| DeltaLakeMergeResult result = new DeltaLakeMergeResult(deletion.partitionValues(), Optional.of(sourceRelativePath), Optional.empty()); | ||
| DeletionVectorEntry deletionVector = deletionVectors.get(sourceRelativePath); |
There was a problem hiding this comment.
follow-up: do we need any special handling for deletion vectors from shallowly cloned tables?
| long deletionTimestamp, | ||
| boolean dataChange) | ||
| boolean dataChange, | ||
| Optional<DeletionVectorEntry> deletionVector) |
There was a problem hiding this comment.
Where does delta lake need deletionVector in the remove file entries?
There was a problem hiding this comment.
Just noticed
Optional<DeletionVectorEntry> deletionVector = Optional.empty();
if (deletionVectorsEnabled) {
deletionVector = Optional.ofNullable(remove.getRow("deletionVector"))
.map(row -> parseDeletionVectorFromParquet(session, row, removeDeletionVectorType.orElseThrow()));
}
Description
Fixes #23229
Release notes