Skip to content

[WIP] Add ORC support for the Iceberg Connector#1290

Closed
lxynov wants to merge 4 commits intotrinodb:masterfrom
lxynov:iceberg-orc
Closed

[WIP] Add ORC support for the Iceberg Connector#1290
lxynov wants to merge 4 commits intotrinodb:masterfrom
lxynov:iceberg-orc

Conversation

@lxynov
Copy link
Copy Markdown
Member

@lxynov lxynov commented Aug 14, 2019

Although the spec for ORC with Iceberg has not been finalized (https://iceberg.apache.org/spec/#orc is expected to change), the comments under apache/iceberg#227 show the direction. The problem to solve is that Iceberg tracks columns by IDs and ORC tracks them by names, which makes ORC not directly workable in Iceberg's semantics. According to the discussion in the Iceberg community, the current idea is to introduce user-defined type annotations to ORC (apache/orc#410), and use them to store Iceberg column IDs.

This WIP PR prototypes this idea in Presto.

Tests done:

  1. TestIcebergSmoke passes with the ORC format.
  2. End-to-end test on schema evolution.
CREATE TABLE iceberg.u_xinlin.exp (order_id int, price, date varchar) WITH (format = 'orc', partitioning = array['date']);
INSERT INTO iceberg.u_xinlin.exp VALUES (0, 1.0, '2019-08-13');
SELECT * FROM iceberg.u_xinlin.exp;

It shows

 order_id | price |    date    
----------+-------+------------
        0 |   1.0 | 2019-08-13 
(1 row)
ALTER TABLE iceberg.u_xinlin.exp DROP COLUMN price;
ALTER TABLE iceberg.u_xinlin.exp ADD COLUMN price double;
SELECT * FROM iceberg.u_xinlin.exp;

It shows

 order_id |    date    | price 
----------+------------+-------
        0 | 2019-08-13 |  NULL 

This is consistent with Iceberg's schema evolution semantics.

cc: @wagnermarkd @electrum

@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Nov 24, 2019

@lxynov I confirmed you sent new PR to support ORC on Iceberg. Can we close this PR?

@lxynov
Copy link
Copy Markdown
Member Author

lxynov commented Nov 25, 2019

@ebyhr Yes

@lxynov lxynov closed this Nov 25, 2019
@lxynov lxynov deleted the iceberg-orc branch June 6, 2020 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants