Support reading uniontype as struct from Avro/ORC Hive tables#3483
Support reading uniontype as struct from Avro/ORC Hive tables#3483dain merged 1 commit intotrinodb:masterfrom
Conversation
|
Starting the names at “field1” would be more consistent with SQL one-based numbering. |
There was a problem hiding this comment.
Add a comment explaining whyh this is storage format dependent
There was a problem hiding this comment.
Thanks, I've added such a comment
354e320 to
6d54bec
Compare
@electrum Thanks for the suggestion, but the |
|
That makes sense. Let’s keep it consistent with Hive. |
There was a problem hiding this comment.
Instead of copying the data use a dictionary block.
There was a problem hiding this comment.
@dain Thanks for the pointer, but I didn't find a good way to handle NULLs. Is there a way to append a NULL to the raw block so that it can serve as a dictionary?
There was a problem hiding this comment.
Ah, yes. This is the same problem we had with unnest. In that case, we scanned the block for a null and if present, we used that; otherwise we copied. We can leave this for now.
Reading uniontypes by converting them into structs. Take type
"uniontype<int, double>" as an example:
1. It will be regarded as "struct<tag int, field0 int, field1 string>"
2. Data {1: 'hello'}, {0: 312}, {1: 'world'} will be read as [1, NULL,
'hello'], [0, 312, NULL], [1, NULL, 'world']
Writing into uniontypes remains unsupported.
|
@dain Thanks for the review! I've rebased it to the latest master.
|
There was a problem hiding this comment.
Ah, yes. This is the same problem we had with unnest. In that case, we scanned the block for a null and if present, we used that; otherwise we copied. We can leave this for now.
Cherry-pick of trinodb/trino#1067, trinodb/trino#2042, trinodb/trino#4055, trinodb/trino#1629, trinodb/trino#3483 Co-authored-by: Parth Brahmbhatt <pbrahmbhatt@netflix.com> Co-authored-by: David Phillips <david@acz.org> Co-authored-by: Xingyuan Lin <linxingyuan1102@gmail.com> Co-authored-by: Dain Sundstrom <dain@iq80.com>
Reading uniontypes by converting them into structs. Take type
"uniontype<int, double>" as an example:
'hello'], [0, 312, NULL], [1, NULL, 'world']
Writing into uniontypes remains unsupported.
(Note: support for Parquet is not added because Parquet itself doesn't support union types yet.)
Closes #1751