Fix partitioning in Iceberg on varbinary column by homar · Pull Request #10214 · trinodb/trino

homar · 2021-12-07T13:32:13Z

fixes #9755
Econding on the read path was different than encoding on the write path
which resulted in different results. Now it is the same and it compatible
with spark iceberg

findepi · 2021-12-08T09:43:58Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java

                if (type.typeId() == FIXED || type.typeId() == BINARY) {
                    // this is safe because Iceberg PartitionData directly wraps the byte array
-                    partitionValue = new String(((ByteBuffer) value).array(), UTF_8);
+                    partitionValue = StandardCharsets.ISO_8859_1.decode(Base64.getEncoder().encode((ByteBuffer) value)).toString();


use Base64.getEncoder().encodeToString instead of StandardCharsets.ISO_8859_1.decode

does the comment in the preceding line require an update?

sure

The comment seems to be valid.

findepi · 2021-12-08T09:44:39Z

...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java

    public void testSparkReadsTrinoPartitionedTable(StorageFormat storageFormat)
    {
-        String baseTableName = "test_spark_reads_trino_partitioned_table_" + storageFormat;
+        String baseTableName = "test_spark_reads_trino_partitioned_table_5" + storageFormat;


findepi · 2021-12-08T09:47:05Z

...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java

+                .containsOnly(row1);
+
+        Row row2 = row("a", new byte[]{15, -15, 2, -16, -2, -1}, 1001);
+        String selectByVarbinaryTrino = "SELECT * FROM %s WHERE _varbinary = X'0ff102f0feff'"; //for now this fails on spark see https://githubmemory.com/repo/apache/iceberg/issues/2934


link to apache/iceberg#2934 instead

selectByVarbinaryTrino -> selectByVarbinary
(otherwise commenting that "trino select doesn't work in spark" is weird)

also, add assertQueryFailure

also, add assertQueryFailure

... with onSpark()

bump

findepi · 2021-12-08T09:47:46Z

...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java

+                .containsOnly(row1);
+
+        Row row2 = row("c", new byte[]{15, -15, 2, -3, -2, -1}, 1003);
+        String selectByVarbinaryTrino = "SELECT * FROM %s WHERE _varbinary = X'0ff102fdfeff'"; //for now this fails on spark see https://githubmemory.com/repo/apache/iceberg/issues/2934


homar · 2021-12-08T11:50:55Z

@findepi comments addressed

...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java

fixes trinodb#9755 Econding on the read path was different than encoding on the write path which resulted in different results. Now it is the same and it compatible with spark iceberg

cla-bot bot added the cla-signed label Dec 7, 2021

findepi reviewed Dec 8, 2021

View reviewed changes

homar force-pushed the homar/fix_iceberg_partitioning_on_varbinary_column branch from 2ef9013 to fdeb485 Compare December 8, 2021 11:50

findepi reviewed Dec 8, 2021

View reviewed changes

...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java Outdated Show resolved Hide resolved

homar force-pushed the homar/fix_iceberg_partitioning_on_varbinary_column branch from fdeb485 to 289ccea Compare December 8, 2021 12:26

findepi approved these changes Dec 8, 2021

View reviewed changes

Fix partitioning in Iceberg on varbinary column

bbbc14d

fixes trinodb#9755 Econding on the read path was different than encoding on the write path which resulted in different results. Now it is the same and it compatible with spark iceberg

homar force-pushed the homar/fix_iceberg_partitioning_on_varbinary_column branch from 289ccea to bbbc14d Compare December 8, 2021 16:07

findepi approved these changes Dec 8, 2021

View reviewed changes

findepi merged commit ff63ca8 into trinodb:master Dec 9, 2021

findepi mentioned this pull request Dec 9, 2021

Release notes for 366 #10181

Closed

10 tasks

github-actions bot added this to the 366 milestone Dec 9, 2021

izchen mentioned this pull request Jan 6, 2022

Spark: Fix IllegarlArgumentException when filtering on BinaryType column apache/iceberg#3460

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix partitioning in Iceberg on varbinary column#10214

Fix partitioning in Iceberg on varbinary column#10214
findepi merged 1 commit intotrinodb:masterfrom
homar:homar/fix_iceberg_partitioning_on_varbinary_column

homar commented Dec 7, 2021

Uh oh!

findepi Dec 8, 2021

Uh oh!

homar Dec 8, 2021

Uh oh!

findepi Dec 8, 2021

Uh oh!

findepi Dec 8, 2021

Uh oh!

findepi Dec 8, 2021

Uh oh!

findepi Dec 8, 2021

Uh oh!

homar commented Dec 8, 2021

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

homar commented Dec 7, 2021

Uh oh!

findepi Dec 8, 2021

Choose a reason for hiding this comment

Uh oh!

homar Dec 8, 2021

Choose a reason for hiding this comment

Uh oh!

findepi Dec 8, 2021

Choose a reason for hiding this comment

Uh oh!

findepi Dec 8, 2021

Choose a reason for hiding this comment

Uh oh!

findepi Dec 8, 2021

Choose a reason for hiding this comment

Uh oh!

findepi Dec 8, 2021

Choose a reason for hiding this comment

Uh oh!

homar commented Dec 8, 2021

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants