Skip to content

Add ORC support for iceberg connector#16391

Merged
zhenxiao merged 2 commits intoprestodb:masterfrom
junyi1313:master
Jul 15, 2021
Merged

Add ORC support for iceberg connector#16391
zhenxiao merged 2 commits intoprestodb:masterfrom
junyi1313:master

Conversation

@junyi1313
Copy link
Copy Markdown
Contributor

@junyi1313 junyi1313 commented Jul 6, 2021

Cherry-pick of trinodb/trino#1067, trinodb/trino#2042, trinodb/trino#4055, trinodb/trino#1629, trinodb/trino#3483, trinodb/trino@ecce4a2

Co-authored-by: Parth Brahmbhatt pbrahmbhatt@netflix.com
Co-authored-by: David Phillips david@acz.org
Co-authored-by: Xingyuan Lin linxingyuan1102@gmail.com
Co-authored-by: Dain Sundstrom dain@iq80.com

This PR implements the issue: #16305
Test plan - Unit Tests

== RELEASE NOTES ==

Iceberg Connector Changes
* Add ORC support for iceberg connector

Cherry-pick of trinodb/trino@ecce4a2

Co-authored-by: Xingyuan Lin <linxingyuan1102@gmail.com>
@junyi1313
Copy link
Copy Markdown
Contributor Author

@ChunxuTang @zhenxiao @beinan Could you help take a look? Thanks.

Copy link
Copy Markdown
Collaborator

@zhenxiao zhenxiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, @junyi1313 one minor thing
@beinan @ChunxuTang could you please take a look?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presto-iceberg or iceberg? either is fine, just a note

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine. Just like mysql connector: assembly/presto.xml#presto-mysql

@zhenxiao zhenxiao requested a review from beinan July 7, 2021 08:01
@ChunxuTang
Copy link
Copy Markdown
Member

@junyi1313 @zhenxiao
Sure, I'm glad to help with the review. will take a look this weekend.

Copy link
Copy Markdown
Member

@ChunxuTang ChunxuTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junyi1313 Thanks for your work! This is a very nice feature we want for the iceberg connector!!
From my initial review, the implementation on the iceberg connector part generally looks good to me.
Just one reminder: I noticed that there're some new features/improvements but are not in the PRs you cherry-picked. I think the new features may deserve a bit more documentation or clarification as they are unique contributions.

@zhenxiao @beinan There are some changes in the presto-orc package, including the upgrade of ORC. I'm not familiar with that package. Could you folks have a closer look at those presto-orc changes? Or any other folks know more?

Comment on lines 187 to 231
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the PRs you cherry-picked, but it seems that this snippet of code is not in those PRs. Is this a new feature?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try to provide some context of these two functions -- they are for caching the metadata of orc files -- more details: #13501

But looks like these two are duplicated with the functions in presto-hive/src/main/java/com/facebook/presto/hive/HiveClientModule.java
Have we included HiveClientModule in the iceberg connector? if not, can we reuse the functions in HiveClientModule or extract them to a separate module?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beinan @ChunxuTang These two functions are copied from HiveClientModule because we haven't included HiveClientModule in the iceberg connector. Besides, the presto-raptor StorageModule also has these two functions. I think extracting them to a separate module is better. Should we do it in this PR or open a new PR?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junyi1313 I'm ok with either. @ChunxuTang what do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junyi1313 @beinan
Thanks for your clarification! Yeah, I'm also ok with either way. @junyi1313 your call.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I will find time to send a new PR about this work after this PR has been merged.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is a new file that is not in the PRs cherry-picked. Any specific reasons to create this class?

nit: Some setter functions (e.g. setOrcType, setAttributes, etc.) are unused.

Copy link
Copy Markdown
Contributor Author

@junyi1313 junyi1313 Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IcebergOrcColumn.java is similar to the OrcColumn. I have updated the cherry-pick infos(add trinodb/trino#1629, trinodb/trino#3483) and removed the unused functions. Pls help with the review again. Thanks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Thanks for your work!

Comment on lines 230 to 231
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May we directly import DecimalType?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Copy Markdown
Member

@beinan beinan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Great contribution, many thanks!

Copy link
Copy Markdown
Member

@ChunxuTang ChunxuTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junyi1313
Thanks for your nice work! Looks that there're some errors in CI tests. Could you fix the errors and update the PR to pass the tests?

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Jul 14, 2021

CLA Signed

The committers are authorized under a signed CLA.

Cherry-pick of trinodb/trino#1067, trinodb/trino#2042, trinodb/trino#4055, trinodb/trino#1629, trinodb/trino#3483

Co-authored-by: Parth Brahmbhatt <pbrahmbhatt@netflix.com>
Co-authored-by: David Phillips <david@acz.org>
Co-authored-by: Xingyuan Lin <linxingyuan1102@gmail.com>
Co-authored-by: Dain Sundstrom <dain@iq80.com>
@junyi1313
Copy link
Copy Markdown
Contributor Author

@ChunxuTang I have updated the PR and fixed the CI errors.
@zhenxiao @beinan Pls help to merge.
Thanks so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants