Iceberg: support row-level delete and update#8565
Iceberg: support row-level delete and update#8565jackye1995 wants to merge 1 commit intotrinodb:masterfrom
Conversation
|
@jackye1995 can you please add product test that would assert compatibility between Trino and Spark? |
|
If I wanted to try to this out, I'd need to create an Iceberg Table adhering to the Iceberg Format Specification V2, since you are proposing using delete snapshots, right? And should we bump Iceberg to 0.12 (that version has the final V2 spec)? |
|
|
||
| @Override | ||
| protected Block[] getRawFieldBlocks() | ||
| public Block[] getRawFieldBlocks() |
There was a problem hiding this comment.
Wonder why this is needed, and whether this is actually used correctly.
|
|
||
| public static IcebergColumnHandle createUpdateRowIdColumnHandle(Schema tableSchema, TypeManager typeManager) | ||
| { | ||
| return create(required(ROW_ID_COLUMN_INDEX, ROW_ID_COLUMN_NAME, DeleteSchemaUtil.posDeleteSchema(tableSchema).asStruct()), typeManager); |
There was a problem hiding this comment.
Is it used for deletes only, or for updates as well?
| serializeToBytes(table.schema()), | ||
| serializeToBytes(table.spec()), |
There was a problem hiding this comment.
I think there already was an idea to add schema to IcebergTableHandle and it was rejected (?) for some reason.
@phd3 do you remember?
| } | ||
| else { | ||
| Schema posDeleteSchema = DeleteSchemaUtil.posDeleteSchema(table.getSchema()); | ||
| ConnectorPageSink posDeleteSink = new IcebergPageSink( |
| private final List<IcebergColumnHandle> allTableColumns; | ||
| private final List<IcebergColumnHandle> updateColumns; | ||
| private final ConnectorPageSource source; | ||
| private final ConnectorPageSink posDeleteSink; |
| FileContent.POSITION_DELETES, | ||
| maxOpenPartitions); | ||
|
|
||
| ConnectorPageSink updateRowSink = new IcebergPageSink( |
| private final List<IcebergColumnHandle> updateColumns; | ||
| private final ConnectorPageSource source; | ||
| private final ConnectorPageSink posDeleteSink; | ||
| private final ConnectorPageSink updateRowSink; |
| } | ||
|
|
||
| Block[] updatedRows = new Block[allTableColumns.size()]; | ||
| Block[] oldRows = ((RowBlock) rowIdBlock.getRawFieldBlocks()[2]).getRawFieldBlocks(); |
There was a problem hiding this comment.
Cast to RowBlock isn't entirely correct.
See #9354 and perhaps we should use ColumnarRow here.
cc @djsstarburst
| resultBlocks[i] = RowBlock.fromFieldBlocks(pageSize, Optional.empty(), rowIdComponentBlocks); | ||
| } | ||
| else { | ||
| resultBlocks[i] = sourcePage.getBlock(allTableColumns.indexOf(columnHandle)); |
There was a problem hiding this comment.
indexOf use here looks quadratic, and we seem to be doing this for every page.
|
close in favor of #10075 |
This PR adds support for writing Iceberg position delete. Similar to #8534 , I first present our working internal implementation backported to Trino, some parts might not work because of internal differences, but once we agree with the general approach I will make the fix and add unit tests.
Also, there is a missing piece that has to be added after #8534 is first merged, so that the
IcebergPageSourcehas the ability to retain the row position channel and pass it to the updatable page source.A few key points:
ROW(string file_path, long pos, row(table schema)), which matches Iceberg's position delete file schema.beginXXXandfinishXXXoperation implementation. The only difference is that update writes new data files after writing the delete files. This is because update in Iceberg is modelled as delete + insert.This is a bare minimum backport, I left some inline TODOs, and also there are many optimizaitons we can make after the base version is checked in, I tried to keep this as simple as possible to avoid too many disagreements around optimization related changes. Please let me know if this looks good or not, thanks!
@phd3 @electrum @findepi @losipiuk @caneGuy @rdblue @hashhar