Add Avro support to Iceberg Connector by lxynov · Pull Request #4776 · trinodb/trino

lxynov · 2020-08-11T05:54:04Z

Spec

https://iceberg.apache.org/spec/#avro

Values should be stored in Avro using the Avro types and logical type annotations in the table.

Iceberg struct, list, and map types identify nested types by ID. When writing data to Avro files, these IDs must be stored in the Avro schema to support ID-based column pruning.

Implementation

This PR's implementation utilizes Iceberg Avro reader/writer to read/write Iceberg Avro files. Involved Iceberg classes include DataReader, DataWriter, Avro, etc. Class IcebergAvroDataConversion was implemented to convert between Presto data representation and Iceberg Avro data presentation. Presto data presentation refers to data presentation in Blocks. Iceberg Avro data presentation refers to the one used when read/write through DataReader/DataWriter. The table below illustrates the conversion map.

Presto Type	Iceberg Type	Presto Representation (in Blocks)	Iceberg Avro Representation (if read/written through DataReader/DataWriter)
BooleanType.BOOLEAN	BOOLEAN	boolean	boolean
IntegerType.INTEGER	INTEGER	long	int
BigintType.BIGINT	LONG	long	long
RealType.REAL	FLOAT	Bit representation in int	float
DoubleType.DOUBLE	DOUBLE	Bit representation in long	double
ShortDecimalType LongDecimalType	DECIMAL	ShortDecimalType: unscaled value is stored in long LongDecimalType: unscaled value is encoded and stored in Slice	BigDecimal
VarcharType.VARCHAR	STRING	Slice	String
VarbinaryType	BINARY FIXED(L)	Slice	BINARY: ByteBuffer FIXED(L): byte[]
DateType.DATE	DATE	Days since Epoch in long	LocalDate
TIME_MICROS	TIME	Picos of day in long	LocalTime
TIMESTAMP_MICROS	TimestampType.withoutZone()	Microseconds since Epoch in long	LocalDateTime
TIMESTAMP_TZ_MICROS	TimestampType.withZone()	LongTimestampWithTimeZone	OffsetDateTime
ArrayType	LIST	Block	Collection
MapType	MAP	Block	Map
RowType	STRUCT	Block	Record

Tests and TODOs

This PR doesn't implement additional tests but applies AbstractTestIcebergSmoke and TestSparkCompatibility to Avro format.
There are two test failures.

AbstractTestIcebergSmoke.testCreateNestedPartitionedTable. This test depends on Avro: Fix pruning columns when a logical-map array's value type is nested apache/iceberg#1321. It's not in apache-iceberg-0.9.1 but in apache-iceberg-0.10.0-rc0.
TestSparkCompatibility.testPrestoReadingSparkData. This test doesn't pass due to the handling of TIMESTAMP objects. I wasn't able to get it pass and felt Presto's spec on TIMESTAMP is not really identical to Iceberg's.
Presto's spec on TIMESTAMP: Instant in time that includes the date and time of day without a time zone with P digits of precision for the fraction of seconds. A precision of up to 12 (picoseconds) is supported. Values of this type are parsed and rendered in the session time zone.
Iceberg's spec on TIMESTAMP: All time and timestamp values are stored with microsecond precision. Timestamps with time zone represent a point in time: values are stored as UTC and do not retain a source time zone (2017-11-16 17:10:34 PST is stored/retrieved as 2017-11-17 01:10:34 UTC and these values are considered identical). Timestamps without time zone represent a date and time of day regardless of zone: the time value is independent of zone adjustments (2017-11-16 17:10:34 is always retrieved as 2017-11-16 17:10:34). Timestamp values are stored as a long that encodes microseconds from the unix epoch.]
I feel this PR's implementation is correct and perhaps there's something wrong on the Spark side.

Closes #2298
Part of #1324

phd3

Added some comments (mostly minor), still reviewing.

phd3 · 2020-08-20T18:03:06Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/avro/IcebergAvroPageSource.java

OrcFileWriter --> IcebergAvroPageSource

phd3 · 2020-10-07T00:23:02Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/IcebergFileWriterFactory.java

    }
+
+    private IcebergFileWriter createAvroWriter(
+            String schemaName,


the requirement of providing a "tableName" in Avro.WriteBuilder#named() api feels a bit strange to me. However, I was also wondering if we could just pass hdfsContext as an argument here and use the table name from there.

Agreed. We already pass HdfsContext to createParquetWriter()

phd3 · 2020-10-09T14:54:29Z

presto-iceberg/src/test/java/io/prestosql/plugin/iceberg/AbstractTestIcebergSmoke.java


    @Test
    public void testHourTransform()
+    {


are changes in this file from a different commit?

phd3 · 2020-10-09T15:01:20Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/avro/IcebergAvroPageSource.java

+import static io.prestosql.plugin.iceberg.util.IcebergAvroDataConversion.serializeToPrestoObject;
+import static java.util.Objects.requireNonNull;
+
+public class IcebergAvroPageSource


Is there a reason to not use RecordPageSource with a cursor implementation? The page building mechanism there is pretty similar.

IIRC using RecordPageSource with an extended record cursor had some performance advantage over using a ConnectorPageSource that is internally row-oriented, but don't remember the details. @dain is that still the case? If so, is the performance difference considerable?

That's correct, RecordPageSource has special handling in the engine. It has the advantage of not materializing entire column pages if there is filtering. Consider this:

SELECT x WHERE y > 5

With ConnectorPageSource, we materialize entire pages of x even for rows where the y predicate is false. Since we're not using lazy pages, we actually materialize x even if the predicate is false for the entire page (and thus we don't need x at all).

Note that I'm not saying we need to do it this way -- just something to consider.

phd3 · 2020-10-09T16:01:14Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/avro/IcebergAvroFileWriter.java

+                    .build();
+        }
+        catch (IOException e) {
+            throw new PrestoException(ICEBERG_WRITER_OPEN_ERROR, "Error creating Avro file", e);


nit: add file path in the error message for ease of debugging?

phd3 · 2020-10-09T22:22:57Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/util/IcebergAvroDataConversion.java

+            else {
+                unscaledValue = Decimals.decodeUnscaledValue(decimalType.getSlice(block, position));
+            }
+            return new BigDecimal(unscaledValue, decimalType.getScale());


should we use new BigDecimal(unscaledValue, decimalType.getScale(), type.getPrecision()) ? Or may be just Decimals#readBigDecimal

Decimals.readBigDecimal() is the best way

phd3 · 2020-10-09T22:38:55Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/util/IcebergAvroDataConversion.java

+            return type.getSlice(block, position).toStringUtf8();
+        }
+        if (type.equals(VARBINARY)) {
+            if (icebergType.typeId().equals(FIXED)) {


could you elaborate on what is the reason for this special case?

phd3 · 2020-10-09T22:42:18Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/util/IcebergAvroDataConversion.java

+        if (type instanceof MapType) {
+            Type keyType = type.getTypeParameters().get(0);
+            Type valueType = type.getTypeParameters().get(1);
+            org.apache.iceberg.types.Type keyIcebergtype = icebergType.asMapType().keyType();


nit: camel case in keyIcebergType and valueIcebergType

phd3 · 2020-10-09T22:50:32Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/util/IcebergAvroDataConversion.java

+            List<Types.NestedField> icebergFields = icebergType.asStructType().fields();
+            BlockBuilder currentBuilder = builder.beginBlockEntry();
+            for (int i = 0; i < typeParameters.size(); i++) {
+                serializeToPrestoObject(typeParameters.get(i), icebergFields.get(1).type(), currentBuilder, record.get(i), timeZoneKey);


typo? icebergFields.get(i).type()

phd3 · 2020-10-09T22:50:50Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/util/IcebergAvroDataConversion.java

+            Map<?, ?> map = (Map<?, ?>) object;
+            Type keyType = ((MapType) type).getKeyType();
+            Type valueType = ((MapType) type).getValueType();
+            org.apache.iceberg.types.Type keyIcebergtype = icebergType.asMapType().keyType();


ditto: camelcase in the name

phd3

finished reviewing, it looks great, only a couple more comments.

phd3 · 2020-10-12T16:50:33Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/util/IcebergAvroDataConversion.java

+            Block rowBlock = block.getObject(position, Block.class);
+
+            List<Type> fieldTypes = type.getTypeParameters();
+            checkCondition(fieldTypes.size() == rowBlock.getPositionCount(), GENERIC_INTERNAL_ERROR, "Expected row value field count does not match type field count");


nit: instead of relying on a hive module class, may be throw PrestoException here directly?

phd3 · 2020-10-12T17:36:52Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/util/IcebergAvroDataConversion.java

+            }
+            return new BigDecimal(unscaledValue, decimalType.getScale());
+        }
+        if (type.equals(VARCHAR)) {


does this cause bounded varchar types to throw an exception? we might want to use instanceof VarcharType right?

Yep, need to use instanceof VarcharType here

phd3 · 2020-10-12T17:39:45Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/util/IcebergAvroDataConversion.java

+            }
+            return;
+        }
+        if (type.equals(VARCHAR)) {


same comment about supporting bounded varchar types

rdsr · 2021-01-20T23:32:25Z

@lxynov what's pending. Is there something that we can help with?

lxynov · 2021-01-20T23:40:26Z

@lxynov what's pending. Is there something that we can help with?

@rdsr Let me rebase it to Trino master and also address @phd3 's comments so that you can help review

rdsr · 2021-01-21T23:22:02Z

@lxynov what's pending. Is there something that we can help with?

@rdsr Let me rebase it to Trino master and also address @phd3 's comments so that you can help review

Thanks @lxynov !

rdsr · 2021-01-25T22:55:05Z

presto-iceberg/src/test/java/io/prestosql/plugin/iceberg/AbstractTestIcebergSmoke.java

                    "(DATE '2015-05-15', 2, NULL, NULL, 4, 5), " +
                    "(DATE '2020-02-21', 2, NULL, NULL, 6, 7)";
        }
+        if (!columnStatisticsCollected) {


I think these tests will become simpler if we make it similar to how we handle ORC. For example instead of defining columnStatisticsCollected and adding new methods and sublcassing. Could we not just do
if (format == AVRO) and test appropriately?

lxynov · 2021-01-26T02:49:20Z

@electrum @rdsr @phd3 Hey I'm thinking of dividing this PR into 3 parts:

Test all file formats in TestSparkCompatibility #6699: clean up TestSparkCompatibility and test both ORC and Parquet in it.
An independent PR that upgrades Iceberg dependency to 0.10.0. io.trino.plugin.iceberg.HiveTableOperations needs to be updated in that PR. It references org.apache.iceberg.hive.HiveTypeConverter which no longer exists in Iceberg 0.10.0. Furthermore, we need to figure out if more updates are needed.
The rest of this PR that adds Avro integration.

Please LMK if you have comments.

rdsr · 2021-01-26T03:40:40Z

@lxynov sounds good to me!

csunwold · 2021-01-26T21:53:14Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/avro/IcebergAvroPageSource.java

+        if (closed) {
+            return;
+        }
+        closed = true;


Should this line be moved after recordIterator.close()?

csunwold · 2021-01-26T21:53:53Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/util/IcebergAvroDataConversion.java

+import static org.apache.iceberg.util.DateTimeUtil.timestampFromMicros;
+import static org.apache.iceberg.util.DateTimeUtil.timestamptzFromMicros;
+
+public final class IcebergAvroDataConversion


Are there tests that cover this class?

electrum

I started reviewing this a while back and had a bunch of pending comments. I'll submit them now -- not sure if they are still relevant after the more recent changes.

electrum · 2020-12-01T23:05:58Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/IcebergErrorCode.java

    ICEBERG_MISSING_DATA(5, EXTERNAL),
    ICEBERG_CANNOT_OPEN_SPLIT(6, EXTERNAL),
    ICEBERG_WRITER_OPEN_ERROR(7, EXTERNAL),
-    ICEBERG_FILESYSTEM_ERROR(8, EXTERNAL),


Did you mean to change the existing error codes?

electrum · 2020-12-01T23:06:07Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/IcebergFileFormat.java

 {
    ORC,
    PARQUET,
+    AVRO


Nit: add trailing comma

electrum · 2020-12-01T23:09:55Z

presto-product-tests/src/main/java/io/prestosql/tests/iceberg/TestSparkCompatibility.java


-    @Test(groups = {ICEBERG, PROFILE_SPECIFIC_TESTS})
-    public void testPrestoReadingSparkData()
+    @DataProvider(name = "storage_formats")


You can leave off name and have it default to the method name

electrum · 2020-12-01T23:12:46Z

presto-product-tests/src/main/java/io/prestosql/tests/iceberg/TestSparkCompatibility.java

        String baseTableName = "test_spark_reads_presto_partitioned_table";
        String prestoTableName = prestoTableName(baseTableName);
-        onPresto().executeQuery(format("CREATE TABLE %s (_string VARCHAR, _bigint BIGINT) WITH (partitioning = ARRAY['_string'])", prestoTableName));
+        onPresto().executeQuery(format("CREATE TABLE %s (_string VARCHAR, _bigint BIGINT) WITH (partitioning = ARRAY['_string'], format = '" + storageFormat + "')", prestoTableName));


Use the existing string formatting instead of concatenation

electrum · 2020-12-01T23:13:14Z

presto-product-tests/src/main/java/io/prestosql/tests/iceberg/TestSparkCompatibility.java

        String baseTableName = "test_spark_reads_presto_partitioned_table";
        String sparkTableName = sparkTableName(baseTableName);
-        onSpark().executeQuery(format("CREATE TABLE %s (_string STRING, _bigint BIGINT) USING ICEBERG PARTITIONED BY (_string)", sparkTableName));
+        onSpark().executeQuery(format("CREATE TABLE %s (_string STRING, _bigint BIGINT) USING ICEBERG PARTITIONED BY (_string)" +


Nit: missing space after last )

electrum · 2020-12-02T20:06:58Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/util/IcebergAvroDataConversion.java

+        private List<Block> columnBlocks;
+        private List<Type> types;
+        private List<org.apache.iceberg.types.Type> icebergTypes;
+        private Schema icebergSchema;


electrum · 2020-12-02T20:16:59Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/avro/IcebergAvroPageSource.java

+    public long getSystemMemoryUsage()
+    {
+        //TODO: try to add memory used by recordIterator
+        return INSTANCE_SIZE + pageBuilder.getRetainedSizeInBytes();


We could reset the PageBuilder at the end, then we don't need to calculate retained size for it.

electrum · 2020-12-02T20:21:07Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/avro/IcebergAvroPageSource.java

+    @Override
+    public Page getNextPage()
+    {
+        if (closed) {


We could simplify this by removing the closed flag

if (!recordIterator.hasNext()) { return null; }

The engine won't call getNextPage() after closing the page source.

This also allows removing the explicit close() below.

electrum · 2020-12-03T19:30:15Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/util/IcebergAvroDataConversion.java

+            else {
+                unscaledValue = Decimals.decodeUnscaledValue(decimalType.getSlice(block, position));
+            }
+            return new BigDecimal(unscaledValue, decimalType.getScale());


Decimals.readBigDecimal() is the best way

electrum · 2020-12-03T19:32:22Z

presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/util/IcebergAvroDataConversion.java

+            }
+            return new BigDecimal(unscaledValue, decimalType.getScale());
+        }
+        if (type.equals(VARCHAR)) {


Yep, need to use instanceof VarcharType here

caneGuy · 2021-07-19T09:25:20Z

Any progress for this? @lxynov thanks

rdsr · 2021-07-22T16:53:49Z

@lxynov is this patch now split into 3 parts? Or is it safe to use this existing patch? I wanted to backport this to our internal Trino repo

phd3 · 2021-07-22T20:28:33Z

@caneGuy I don't think @lxynov is continuing work on this anymore. Feel free to pick it up if you'd like to.

@rdsr FWIW, w.r.t. part-2 in #4776 (comment) , we've upgraded to 0.11.0.

findepi · 2021-08-24T11:43:39Z

@jackye1995 is it correct you have picked this up?

findepi · 2022-04-25T14:17:25Z

Superseded by @ebyhr in #12125

cla-bot bot added the cla-signed label Aug 11, 2020

lxynov mentioned this pull request Sep 19, 2020

Avro version conflict between Hive and Iceberg libraries #5225

Closed

lxynov added 4 commits September 20, 2020 15:55

Drop table at the end of tests

ed69036

Test all file formats in TestSparkCompatibility

5a6fd77

Handle file formats that don't support column statistics

a718776

Add Avro support to Iceberg Connector

b3230e3

lxynov force-pushed the iceberg-avro branch from 249f37b to b3230e3 Compare September 21, 2020 01:06

phd3 reviewed Oct 9, 2020

View reviewed changes

phd3 reviewed Oct 12, 2020

View reviewed changes

lxynov mentioned this pull request Jan 9, 2021

upgrade Iceberg to 0.10 #6443

Closed

rdsr reviewed Jan 25, 2021

View reviewed changes

lxynov mentioned this pull request Jan 26, 2021

Test all file formats in TestSparkCompatibility #6699

Closed

csunwold reviewed Jan 26, 2021

View reviewed changes

electrum reviewed Feb 18, 2021

View reviewed changes

roman-ambinder mentioned this pull request Jul 7, 2021

Release notes for 360 #8455

Closed

11 tasks

findepi force-pushed the master branch from 8538e49 to 1f896ea Compare July 30, 2021 22:13

jdintruff mentioned this pull request Aug 24, 2021

Allow querying Iceberg table by its location, without registering it in metastore #2298

Open

ebyhr mentioned this pull request Apr 25, 2022

Add support for AVRO in Iceberg #12125

Merged

findepi closed this Apr 25, 2022

Conversation

lxynov commented Aug 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Spec

Implementation

Tests and TODOs

Uh oh!

phd3 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phd3 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdsr commented Jan 20, 2021

Uh oh!

lxynov commented Jan 20, 2021

Uh oh!

rdsr commented Jan 21, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lxynov commented Jan 26, 2021

Uh oh!

rdsr commented Jan 26, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

electrum left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lxynov commented Aug 11, 2020 •

edited

Loading