Add support for AVRO in Iceberg#12125
Conversation
There was a problem hiding this comment.
This should always be string correct?
There was a problem hiding this comment.
A more general question: Is there a big difference between the iceberg libraries for Avro / the Avro Libraries themselves? I know Ryan works on both so I know this works Im just curious if this is done to keep things in sync or ergonomics or some other reason?
There was a problem hiding this comment.
Does the Iceberg Avro Library do away with the Utf8 class 😅
There was a problem hiding this comment.
General question: Why is this needed to be sent as part of the split if the schema is part of the file?
7343096 to
a5c6550
Compare
alexjo2144
left a comment
There was a problem hiding this comment.
Just an initial skim. Looks like Avro table stat's aren't working. Is that because Avro doesn't have them, or because the read path just isn't implemented for them yet?
There was a problem hiding this comment.
I think these need to be in the else block?
| rowIndexChannels.add(false); | |
| columnNames.add(column.getName()); | |
| columnTypes.add(column.getType()); | |
| if (field == null) { | |
| constantPopulatingPageSourceBuilder.addConstantColumn(nativeValueToBlock(column.getType(), null)); | |
| } | |
| else { | |
| constantPopulatingPageSourceBuilder.addDelegateColumn(avroSourceChannel); | |
| } | |
| avroSourceChannel++; | |
| if (field == null) { | |
| constantPopulatingPageSourceBuilder.addConstantColumn(nativeValueToBlock(column.getType(), null)); | |
| } | |
| else { | |
| rowIndexChannels.add(false); | |
| columnNames.add(column.getName()); | |
| columnTypes.add(column.getType()); | |
| constantPopulatingPageSourceBuilder.addDelegateColumn(avroSourceChannel); | |
| avroSourceChannel++; | |
| } |
Avro file/table doesn't support statistics as far as I know. |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergAvroDataConversion.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergAvroDataConversion.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergAvroFileWriter.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergAvroFileWriter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Do we need multiple row ID columns? Or could this be a single index?
Since we're using the Iceberg Avro reader, I'm thinking we might get the row ID automatically by including the ROW_POSITION column in the schema.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
cf3c508 to
53317f6
Compare
|
Just rebased on upstream to resolve a logical conflict by 308e717 |
|
Looks like a real test failure: |
Co-Authored-By: Xingyuan Lin <xingyuan_lin@apple.com>
Description
Add support for AVRO in Iceberg
Supersedes #4776
Documentation
(x) Sufficient documentation is included in this PR.
Release notes
(x) Release notes entries required with the following suggested text: