Skip to content

Fix partitioning in Iceberg on varbinary column#10214

Merged
findepi merged 1 commit intotrinodb:masterfrom
homar:homar/fix_iceberg_partitioning_on_varbinary_column
Dec 9, 2021
Merged

Fix partitioning in Iceberg on varbinary column#10214
findepi merged 1 commit intotrinodb:masterfrom
homar:homar/fix_iceberg_partitioning_on_varbinary_column

Conversation

@homar
Copy link
Copy Markdown
Member

@homar homar commented Dec 7, 2021

fixes #9755
Econding on the read path was different than encoding on the write path
which resulted in different results. Now it is the same and it compatible
with spark iceberg

@cla-bot cla-bot bot added the cla-signed label Dec 7, 2021
if (type.typeId() == FIXED || type.typeId() == BINARY) {
// this is safe because Iceberg PartitionData directly wraps the byte array
partitionValue = new String(((ByteBuffer) value).array(), UTF_8);
partitionValue = StandardCharsets.ISO_8859_1.decode(Base64.getEncoder().encode((ByteBuffer) value)).toString();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. use Base64.getEncoder().encodeToString instead of StandardCharsets.ISO_8859_1.decode
  2. does the comment in the preceding line require an update?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. sure
  2. The comment seems to be valid.

public void testSparkReadsTrinoPartitionedTable(StorageFormat storageFormat)
{
String baseTableName = "test_spark_reads_trino_partitioned_table_" + storageFormat;
String baseTableName = "test_spark_reads_trino_partitioned_table_5" + storageFormat;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

.containsOnly(row1);

Row row2 = row("a", new byte[]{15, -15, 2, -16, -2, -1}, 1001);
String selectByVarbinaryTrino = "SELECT * FROM %s WHERE _varbinary = X'0ff102f0feff'"; //for now this fails on spark see https://githubmemory.com/repo/apache/iceberg/issues/2934
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to apache/iceberg#2934 instead

selectByVarbinaryTrino -> selectByVarbinary
(otherwise commenting that "trino select doesn't work in spark" is weird)

also, add assertQueryFailure

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, add assertQueryFailure

... with onSpark()

bump

.containsOnly(row1);

Row row2 = row("c", new byte[]{15, -15, 2, -3, -2, -1}, 1003);
String selectByVarbinaryTrino = "SELECT * FROM %s WHERE _varbinary = X'0ff102fdfeff'"; //for now this fails on spark see https://githubmemory.com/repo/apache/iceberg/issues/2934
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

@homar homar force-pushed the homar/fix_iceberg_partitioning_on_varbinary_column branch from 2ef9013 to fdeb485 Compare December 8, 2021 11:50
@homar
Copy link
Copy Markdown
Member Author

homar commented Dec 8, 2021

@findepi comments addressed

@homar homar force-pushed the homar/fix_iceberg_partitioning_on_varbinary_column branch from fdeb485 to 289ccea Compare December 8, 2021 12:26
fixes trinodb#9755
Econding on the read path was different than encoding on the write path
which resulted in different results. Now it is the same and it compatible
with spark iceberg
@homar homar force-pushed the homar/fix_iceberg_partitioning_on_varbinary_column branch from 289ccea to bbbc14d Compare December 8, 2021 16:07
@findepi findepi merged commit ff63ca8 into trinodb:master Dec 9, 2021
@findepi findepi mentioned this pull request Dec 9, 2021
10 tasks
@github-actions github-actions bot added this to the 366 milestone Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Incorrect query results for Iceberg table partitioned on varbinary / binary

2 participants