Iceberg: support Parquet read with delete filter#8534
Iceberg: support Parquet read with delete filter#8534jackye1995 wants to merge 1 commit intotrinodb:masterfrom
Conversation
|
@hashhar You've been working on this one, not sure what is the current shape. PTAL. |
hashhar
left a comment
There was a problem hiding this comment.
Looks good at first glance. Both position and equality deletes are supported with this change.
One question - the FileIO ends up using the Iceberg Parquet readers to read the delete instead of the Trino native parquet reader. This is different from the normal read path. How difficult would it be to use the Trino parquet reader for reading and applying the deletes?
I've yet to look at calls into the Iceberg library code to see if something more.
| <properties> | ||
| <air.main.basedir>${project.parent.basedir}</air.main.basedir> | ||
| <dep.iceberg.version>0.11.0</dep.iceberg.version> | ||
| <dep.iceberg.version>0.11.1</dep.iceberg.version> |
There was a problem hiding this comment.
Is this to make the MetadataColumns available?
| .collect(toImmutableList()); | ||
| Schema deleteReadSchema = new Schema(deleteReadFields); | ||
| TrinoDeleteFilter deleteFilter = new TrinoDeleteFilter(fileIo, split.getTask(), deleteReadSchema, deleteReadSchema); | ||
| getColumns(deleteFilter.requiredSchema(), typeManager).stream() |
There was a problem hiding this comment.
If I understand correctly the deleteFilter.requiredSchema will always be a superset of the columns we request (the columns arg to this method). So do we need to create the initial regularColumns at all? Can we just assign the result of this stream to regularColumns?
@hashhar did you have implementation for positional deletes with Trino Parquet reader? |
|
Use TrinoRow to reconstruct Page and Blocks looks good! |
|
Hello @jackye1995, I am very interested in using trino to read iceberg table with delete filter, and have compiled your branch to test. but there is an exception was thrown It seems like Thank you for your reply ! |
|
For anyone subscribing to this PR, I was mostly focusing on the multi catalog support in the past few weeks, will start the work on this one. |
|
Looking forward to this feature, we are currently using iceberg v2 table and writing binlog by flink, and want to read and complete ETL through Trino |
| private final String tableName; | ||
| private final TableType tableType; | ||
| private final Optional<Long> snapshotId; | ||
| private final byte[] serializedSchema; |
There was a problem hiding this comment.
I think there already was an idea to add schema to IcebergTableHandle and it was rejected (?) for some reason.
@phd3 do you remember?
| public Schema getSchema() | ||
| { | ||
| if (schema == null) { | ||
| schema = deserializeFromBytes(serializedSchema); |
There was a problem hiding this comment.
Unsafe publication of the Schema object, since this.schema is not volatile.
| } | ||
|
|
||
| @Override | ||
| protected InputFile getInputFile(String s) |
| } | ||
|
|
||
| @Override | ||
| public <T> T get(int i, Class<T> aClass) |
There was a problem hiding this comment.
is it necessary to impl equality-based deletes?
| value = aClass.cast(type.getDouble(block, position)); | ||
| } | ||
| else if (type.equals(TIME_MICROS)) { | ||
| value = aClass.cast(type.getLong(block, position) / PICOSECONDS_PER_MICROSECOND); |
There was a problem hiding this comment.
This logic could ideally be in a shared function, doing mapping reverse to io.trino.plugin.iceberg.IcebergTypes#convertIcebergValueToTrino
| @Override | ||
| public <T> void set(int i, T t) | ||
| { | ||
| throw new TrinoException(NOT_SUPPORTED, "writing to TrinoRow is not supported"); |
There was a problem hiding this comment.
TrinoException should be used only when we know what is the reason of the failure.
throw new UnsupportedOperationException();
| @Override | ||
| protected InputFile getInputFile(String s) | ||
| { | ||
| return fileIO.newInputFile(s); |
|
How can we test this? |
|
close in favor of #10075 |
support using Iceberg's DeleteFilter to filter delete files in the read path, this implementation only supports Parquet first because it already has the ability to generate row id channel. Will add ORC later if this impl is accepted. The general idea is that:
Pageas an iterable ofTrinoRows, where each row is defined by the underlying the block array and the position in the page.TrinoRows are implemented as IcebergStructLikeso that it can be used to directly leverage Iceberg'sDeleteFilterDeleteFilteris used to filter pages produced by the Parquet page source.Page.getPositionsis used to only retain rows in the particular positions and complete the merge-on-read processI have not added unit tests yet, only tested with internal Trino installation that supports multi-catalog against tables in Glue catalog. There might be some backport error I missed. Once we agree upon the general implementation idea, I will add back tests and fix performance issues if any.
@phd3 @electrum @findepi @losipiuk @caneGuy @rdblue