Support reading uniontype as struct from Avro/ORC Hive tables by lxynov · Pull Request #3483 · trinodb/trino

lxynov · 2020-04-20T01:22:32Z

Reading uniontypes by converting them into structs. Take type
"uniontype<int, double>" as an example:

It will be regarded as "struct<tag int, field0 int, field1 string>"
Data {1: 'hello'}, {0: 312}, {1: 'world'} will be read as [1, NULL,
'hello'], [0, 312, NULL], [1, NULL, 'world']

Writing into uniontypes remains unsupported.

(Note: support for Parquet is not added because Parquet itself doesn't support union types yet.)

Closes #1751

electrum · 2020-04-20T05:05:15Z

Starting the names at “field1” would be more consistent with SQL one-based numbering.

findepi · 2020-04-20T06:47:55Z

presto-hive/src/main/java/io/prestosql/plugin/hive/HiveType.java

Add a comment explaining whyh this is storage format dependent

Thanks, I've added such a comment

lxynov · 2020-04-22T18:59:39Z

Starting the names at “field1” would be more consistent with SQL one-based numbering.

@electrum Thanks for the suggestion, but the tag field starts at zero in Hive. If we start the names at "field1", we'll also need to make tag start at 1 to reduce confusion? But this will be inconsistent with Hive. What do you think?

electrum · 2020-04-29T00:49:49Z

That makes sense. Let’s keep it consistent with Hive.

dain

Looking good. One comment about using dictionary block instead of copying the data. Also, some union tests to AbstractTestHiveFileFormats?

dain · 2020-05-05T21:17:20Z

presto-orc/src/main/java/io/prestosql/orc/reader/UnionColumnReader.java

Instead of copying the data use a dictionary block.

@dain Thanks for the pointer, but I didn't find a good way to handle NULLs. Is there a way to append a NULL to the raw block so that it can serve as a dictionary?

Ah, yes. This is the same problem we had with unnest. In that case, we scanned the block for a null and if present, we used that; otherwise we copied. We can leave this for now.

dain

Let me know when it is updated

Reading uniontypes by converting them into structs. Take type "uniontype<int, double>" as an example: 1. It will be regarded as "struct<tag int, field0 int, field1 string>" 2. Data {1: 'hello'}, {0: 312}, {1: 'world'} will be read as [1, NULL, 'hello'], [0, 312, NULL], [1, NULL, 'world'] Writing into uniontypes remains unsupported.

lxynov · 2020-05-17T20:54:50Z

@dain Thanks for the review! I've rebased it to the latest master.

Also, some union tests to AbstractTestHiveFileFormats?

AbstractTestHiveFileFormats uses serializeObject to build write objects so union types will be written as structs eventually. So I didn't add these tests. Only a product test TestReadUniontype was added which uses HQLs to write union types.

dain · 2020-06-07T18:15:38Z

presto-orc/src/main/java/io/prestosql/orc/reader/UnionColumnReader.java

Ah, yes. This is the same problem we had with unnest. In that case, we scanned the block for a null and if present, we used that; otherwise we copied. We can leave this for now.

Cherry-pick of trinodb/trino#1067, trinodb/trino#2042, trinodb/trino#4055, trinodb/trino#1629, trinodb/trino#3483 Co-authored-by: Parth Brahmbhatt <pbrahmbhatt@netflix.com> Co-authored-by: David Phillips <david@acz.org> Co-authored-by: Xingyuan Lin <linxingyuan1102@gmail.com> Co-authored-by: Dain Sundstrom <dain@iq80.com>

cla-bot bot added the cla-signed label Apr 20, 2020

lxynov requested review from dain and ebyhr April 20, 2020 01:22

lxynov force-pushed the uniontype branch from f6a64a3 to 9b3ba80 Compare April 20, 2020 03:09

findepi reviewed Apr 20, 2020

View reviewed changes

lxynov force-pushed the uniontype branch 4 times, most recently from 354e320 to 6d54bec Compare April 22, 2020 17:20

dain reviewed May 5, 2020

View reviewed changes

dain requested changes May 5, 2020

View reviewed changes

lxynov force-pushed the uniontype branch from 6d54bec to 2b4f2ad Compare May 17, 2020 20:47

dain self-requested a review May 19, 2020 03:36

dain approved these changes Jun 7, 2020

View reviewed changes

dain merged commit e3d798d into trinodb:master Jun 7, 2020

dain mentioned this pull request Jun 8, 2020

Release notes for 335 #3886

Closed

9 tasks

lxynov deleted the uniontype branch June 12, 2020 22:25

junyi1313 mentioned this pull request Jul 12, 2021

Add ORC support for iceberg connector prestodb/presto#16391

Merged

autumnust mentioned this pull request Nov 9, 2021

[Coral-Common] Convert Hive uniontype into a struct-RelDataType that conforms Trino' schema linkedin/coral#192

Merged

funcheetah mentioned this pull request Mar 8, 2022

Support non-optional union types for Avro apache/iceberg#4242

Closed

This was referenced Apr 27, 2022

Support non-optional union types for ORC apache/iceberg#4654

Closed

Docs: Union type support spec apache/iceberg#4664

Closed

groupcache4321 mentioned this pull request Nov 15, 2022

Allow Coercion between hive Union and hive struct for Hive ORC table #15017

Merged

groupcache4321 mentioned this pull request Dec 2, 2022

Fix dereference operations for union type in Hive Connector #15278

Merged

homatthew mentioned this pull request Feb 1, 2023

[GOBBLIN-1774] Util for detecting non optional uniontype columns based on Hive Table metadata apache/gobblin#3632

Merged

4 tasks

KevinGe00 mentioned this pull request May 7, 2024

Correctly handle single type uniontypes in Coral linkedin/coral#504

Closed

KevinGe00 mentioned this pull request May 29, 2024

Correctly handle single type uniontypes in Coral linkedin/coral#507

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support reading uniontype as struct from Avro/ORC Hive tables#3483

Support reading uniontype as struct from Avro/ORC Hive tables#3483
dain merged 1 commit intotrinodb:masterfrom
lxynov:uniontype

lxynov commented Apr 20, 2020

Uh oh!

electrum commented Apr 20, 2020

Uh oh!

findepi Apr 20, 2020

Uh oh!

lxynov Apr 22, 2020

Uh oh!

lxynov commented Apr 22, 2020

Uh oh!

electrum commented Apr 29, 2020

Uh oh!

dain left a comment •

edited

Loading

Uh oh!

dain May 5, 2020

Uh oh!

lxynov May 17, 2020

Uh oh!

dain Jun 7, 2020

Uh oh!

dain left a comment

Uh oh!

lxynov commented May 17, 2020

Uh oh!

dain Jun 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Conversation

lxynov commented Apr 20, 2020

Uh oh!

electrum commented Apr 20, 2020

Uh oh!

findepi Apr 20, 2020

Choose a reason for hiding this comment

Uh oh!

lxynov Apr 22, 2020

Choose a reason for hiding this comment

Uh oh!

lxynov commented Apr 22, 2020

Uh oh!

electrum commented Apr 29, 2020

Uh oh!

dain left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dain May 5, 2020

Choose a reason for hiding this comment

Uh oh!

lxynov May 17, 2020

Choose a reason for hiding this comment

Uh oh!

dain Jun 7, 2020

Choose a reason for hiding this comment

Uh oh!

dain left a comment

Choose a reason for hiding this comment

Uh oh!

lxynov commented May 17, 2020

Uh oh!

dain Jun 7, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

dain left a comment •

edited

Loading