Skip to content

Iceberg Connector ORC support#2042

Merged
electrum merged 7 commits intotrinodb:masterfrom
lxynov:icebergOrc
Jan 19, 2020
Merged

Iceberg Connector ORC support#2042
electrum merged 7 commits intotrinodb:masterfrom
lxynov:icebergOrc

Conversation

@lxynov
Copy link
Copy Markdown
Member

@lxynov lxynov commented Nov 18, 2019

This is ready for review.

Some notes:

  1. The implementation conforms to this spec
  2. TestIcebergSmoke passes with the ORC format.
  3. TIME and TIME WITH TIME ZONE are not supported. The current syntax is that an Iceberg table is also a Hive table (see HiveTableOperations). So types unsupported by Hive are not supported by Iceberg Connector neither.
  4. The commit Handle data files with no column metrics is related because ORC files' column metrics collection is not supported yet. This commit ensures that partition tables don't break.

Supersedes #1290
Umbrella issue: #1324

cc: @electrum @wagnermarkd @phd3

@cla-bot cla-bot bot added the cla-signed label Nov 18, 2019
@lxynov lxynov added the WIP label Nov 18, 2019
@lxynov lxynov force-pushed the icebergOrc branch 3 times, most recently from b9db21f to 8a2f0ae Compare November 19, 2019 19:41
@lxynov lxynov removed the WIP label Nov 19, 2019
@lxynov lxynov force-pushed the icebergOrc branch 2 times, most recently from c82ef7e to 774e172 Compare December 4, 2019 18:18
Copy link
Copy Markdown
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. Overall looks good.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two appear to be unused

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, ICEBERG_WRITER_OPEN_ERROR and ICEBERG_UNSUPPORTED_FORMAT are used in IcebergFileWriterFactory; ICEBERG_FILESYSTEM_ERROR is used in IcebergPageSourceProvider

Column metrics of data files might be uncollected, in which case nulls
would be returned by methods in `DataFile`.
@lxynov
Copy link
Copy Markdown
Member Author

lxynov commented Jan 11, 2020

@electrum thanks a lot for the review! I've updated the PR

@lxynov lxynov mentioned this pull request Jan 16, 2020
93 tasks
@electrum electrum merged commit 3fd3c8c into trinodb:master Jan 19, 2020
@electrum
Copy link
Copy Markdown
Member

Thanks!

@lxynov lxynov deleted the icebergOrc branch January 20, 2020 07:17
@findepi findepi added this to the 329 milestone Jan 23, 2020
zhenxiao pushed a commit to prestodb/presto that referenced this pull request Jul 15, 2021
Cherry-pick of trinodb/trino#1067, trinodb/trino#2042, trinodb/trino#4055, trinodb/trino#1629, trinodb/trino#3483

Co-authored-by: Parth Brahmbhatt <pbrahmbhatt@netflix.com>
Co-authored-by: David Phillips <david@acz.org>
Co-authored-by: Xingyuan Lin <linxingyuan1102@gmail.com>
Co-authored-by: Dain Sundstrom <dain@iq80.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants