Add ORC support for iceberg connector#16391
Add ORC support for iceberg connector#16391zhenxiao merged 2 commits intoprestodb:masterfrom junyi1313:master
Conversation
Cherry-pick of trinodb/trino@ecce4a2 Co-authored-by: Xingyuan Lin <linxingyuan1102@gmail.com>
|
@ChunxuTang @zhenxiao @beinan Could you help take a look? Thanks. |
zhenxiao
left a comment
There was a problem hiding this comment.
looks good, @junyi1313 one minor thing
@beinan @ChunxuTang could you please take a look?
There was a problem hiding this comment.
presto-iceberg or iceberg? either is fine, just a note
There was a problem hiding this comment.
I think it's fine. Just like mysql connector: assembly/presto.xml#presto-mysql
|
@junyi1313 @zhenxiao |
ChunxuTang
left a comment
There was a problem hiding this comment.
@junyi1313 Thanks for your work! This is a very nice feature we want for the iceberg connector!!
From my initial review, the implementation on the iceberg connector part generally looks good to me.
Just one reminder: I noticed that there're some new features/improvements but are not in the PRs you cherry-picked. I think the new features may deserve a bit more documentation or clarification as they are unique contributions.
@zhenxiao @beinan There are some changes in the presto-orc package, including the upgrade of ORC. I'm not familiar with that package. Could you folks have a closer look at those presto-orc changes? Or any other folks know more?
There was a problem hiding this comment.
I checked the PRs you cherry-picked, but it seems that this snippet of code is not in those PRs. Is this a new feature?
There was a problem hiding this comment.
I can try to provide some context of these two functions -- they are for caching the metadata of orc files -- more details: #13501
But looks like these two are duplicated with the functions in presto-hive/src/main/java/com/facebook/presto/hive/HiveClientModule.java
Have we included HiveClientModule in the iceberg connector? if not, can we reuse the functions in HiveClientModule or extract them to a separate module?
There was a problem hiding this comment.
@beinan @ChunxuTang These two functions are copied from HiveClientModule because we haven't included HiveClientModule in the iceberg connector. Besides, the presto-raptor StorageModule also has these two functions. I think extracting them to a separate module is better. Should we do it in this PR or open a new PR?
There was a problem hiding this comment.
@junyi1313 I'm ok with either. @ChunxuTang what do you think?
There was a problem hiding this comment.
@junyi1313 @beinan
Thanks for your clarification! Yeah, I'm also ok with either way. @junyi1313 your call.
There was a problem hiding this comment.
Got it. I will find time to send a new PR about this work after this PR has been merged.
There was a problem hiding this comment.
Looks like this is a new file that is not in the PRs cherry-picked. Any specific reasons to create this class?
nit: Some setter functions (e.g. setOrcType, setAttributes, etc.) are unused.
There was a problem hiding this comment.
IcebergOrcColumn.java is similar to the OrcColumn. I have updated the cherry-pick infos(add trinodb/trino#1629, trinodb/trino#3483) and removed the unused functions. Pls help with the review again. Thanks.
There was a problem hiding this comment.
Gotcha. Thanks for your work!
There was a problem hiding this comment.
May we directly import DecimalType?
beinan
left a comment
There was a problem hiding this comment.
Looks good to me! Great contribution, many thanks!
ChunxuTang
left a comment
There was a problem hiding this comment.
@junyi1313
Thanks for your nice work! Looks that there're some errors in CI tests. Could you fix the errors and update the PR to pass the tests?
Cherry-pick of trinodb/trino#1067, trinodb/trino#2042, trinodb/trino#4055, trinodb/trino#1629, trinodb/trino#3483 Co-authored-by: Parth Brahmbhatt <pbrahmbhatt@netflix.com> Co-authored-by: David Phillips <david@acz.org> Co-authored-by: Xingyuan Lin <linxingyuan1102@gmail.com> Co-authored-by: Dain Sundstrom <dain@iq80.com>
|
@ChunxuTang I have updated the PR and fixed the CI errors. |
Cherry-pick of trinodb/trino#1067, trinodb/trino#2042, trinodb/trino#4055, trinodb/trino#1629, trinodb/trino#3483, trinodb/trino@ecce4a2
Co-authored-by: Parth Brahmbhatt pbrahmbhatt@netflix.com
Co-authored-by: David Phillips david@acz.org
Co-authored-by: Xingyuan Lin linxingyuan1102@gmail.com
Co-authored-by: Dain Sundstrom dain@iq80.com
This PR implements the issue: #16305
Test plan - Unit Tests