Add support for writing deletion vectors in Delta Lake#22102
Conversation
72af1af to
8ea6475
Compare
a55bc53 to
7d52f10
Compare
7d52f10 to
e3ce7cb
Compare
e3ce7cb to
8f8cddc
Compare
4152170 to
8ab2964
Compare
5242022 to
42769ca
Compare
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/delete/Base85Codec.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Outdated
Show resolved
Hide resolved
37eaaa8 to
c9bc2da
Compare
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/delete/DeletionVectors.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Outdated
Show resolved
Hide resolved
c9bc2da to
70ead35
Compare
70ead35 to
b2de0c2
Compare
| deletedRows.or(deletion.rowsDeletedByDelete()); | ||
| deletedRows.or(deletion.rowsDeletedByUpdate()); | ||
|
|
||
| TrinoInputFile inputFile = fileSystem.newInputFile(Location.of(path.toStringUtf8())); |
There was a problem hiding this comment.
Do we know size of deletion vector in advance from io.trino.plugin.deltalake.transactionlog.DeletionVectorEntry#sizeInBytes or any other metadata ?
If we do, then using it in newInputFile would probably save a FS call.
There was a problem hiding this comment.
This inputFile is a Parquet file, not deletion vector. I think that's possible, but it requires some refactoring. Let me handle in a follow-up.
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/delete/DeletionVectors.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/delete/DeletionVectors.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/delete/RoaringBitmapArray.java
Show resolved
Hide resolved
6883a04 to
a1587b5
Compare
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
When we write a new deletion vector, is it mandatory as per the spec that it should be a union of all deleted/updated rows so far, or is that just how we've implemented it ?
Is cleanup of old deletion vectors something already handled by some optimize/vaccum procedure in Trino ?
There was a problem hiding this comment.
It's mandatory for compatiblity with Delta Lake. Otherwise, they return the wrong results if I remember correctly.
The cleanup should be handled in #22809
a1587b5 to
37622e1
Compare
37622e1 to
e9ebfcb
Compare
|
Addressed comments. |
| RoaringBitmapArray deletedRows = loadDeletionVector(Location.of(path.toStringUtf8())); | ||
| deletedRows.or(deletion.rowsDeletedByDelete()); | ||
| deletedRows.or(deletion.rowsDeletedByUpdate()); |
There was a problem hiding this comment.
There was a discussion with @findepi about whether If the amount of rows deleted from file depasses a certain quota, we should consider to proactively rewrite the file.
Is there a potential follow-up here?
There was a problem hiding this comment.
It's a potential follow-up task. I don't expect we will handle it shortly though.
Description
Fixes #17063
Release notes
(x) Release notes are required, with the following suggested text: