Skip to content

Upgrade parquet version to 1.11.0#14960

Closed
shangxinli wants to merge 6 commits intoprestodb:masterfrom
shangxinli:parquet11
Closed

Upgrade parquet version to 1.11.0#14960
shangxinli wants to merge 6 commits intoprestodb:masterfrom
shangxinli:parquet11

Conversation

@shangxinli
Copy link
Collaborator

Test plan - (Please fill in how you tested your changes)

Please make sure your submission complies with our Development, Formatting, and Commit Message guidelines.

Fill in the release notes towards the bottom of the PR description.
See Release Notes Guidelines for details.

== RELEASE NOTES ==

General Changes
* ...
* ...

Hive Changes
* ...
* ...

If release note is NOT required, use:

== NO RELEASE NOTE ==

@vkorukanti
Copy link
Contributor

Thanks @shangxinli for the patch.

@highker are there any concerns with removing the upperbounds on version for javax.annotation-api.
@zhenxiao could you please also take a look at it?

Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; will leave it to @zhenxiao to take a look

@zhenxiao
Copy link
Collaborator

hi @vkorukanti how about we keep hold until Uber run new Parquet version(for Presto) in production? as far as I know, no other companies are running new Parquet version in production yet. Would like to see if production traffic triggers any problem :)

@shangxinli
Copy link
Collaborator Author

@zhenxiao There are a few companies running the new version of Parquet(1.11.x) with Iceberg. The version of our case is a little complex because our version is hybrid. For the deployment, better to talk offline. What do you think? Let me know I can set up a short meeting with you, @vkorukanti, @highker.

This PR is the blocker for #15454 and #12048

@chliang71
Copy link
Contributor

Hi @zhenxiao yes, we have deployed the parquet version to two of our prod clusters already. We haven't seen any issue on this so far, and are planning to continue rolling it out the upgrade to other prod clusters as well.

@zhenxiao
Copy link
Collaborator

zhenxiao commented Apr 4, 2021

thank you, @chliang71 @shangxinli how long has Uber run it on production?
could you please resolve conflict?
would love to hear from @vkorukanti 's point of view :)

@zhenxiao zhenxiao requested a review from vkorukanti April 4, 2021 16:17
@shangxinli
Copy link
Collaborator Author

Thanks @zhenxiao! We talked to @vkorukanti and we want to wait one week after both DCs are deployed. The 1st DCs was deployed several weeks ago and the 2nd DCs was deployed this week. We will update if there are issues found.

For the conflict you mentioned, I just found it is because the PR #15805 removed the direct dependency of Parquet. Currently, Presto only uses Parquet via Hive after that change. If we want to upgrade the Parquet version, we need to make a change in Hive first and wait till Hive's new release and then upgrade Hive's version in Presto. This seems a longer route to go. I left comments in PR 15805. Feel free to chime in.

@beinan
Copy link
Member

beinan commented Apr 21, 2021

@zhenxiao There are a few companies running the new version of Parquet(1.11.x) with Iceberg. The version of our case is a little complex because our version is hybrid. For the deployment, better to talk offline. What do you think? Let me know I can set up a short meeting with you, @vkorukanti, @highker.

This PR is the blocker for #15454 and #12048

As we discussed in prestodb/presto-hive-apache#46 , the current iceberg connector PR might prefer using parquet 1.11.0, there might be a lots of conflict if we go with 1.11.1 here. In Twitter, we also planned to upgrade to 1.11.0 (rather than 1.11.1) in the near future.

@zhenxiao
Copy link
Collaborator

thank you, @beinan
hi @shangxinli could you please rebase, resolve conflicts, and squash all commits into one?
@vkorukanti could you please confirm parquet 1.11.0 is running good in production at Uber?

@beinan
Copy link
Member

beinan commented Apr 22, 2021

thank you, @beinan
hi @shangxinli could you please rebase, resolve conflicts, and squash all commits into one?
@vkorukanti could you please confirm parquet 1.11.0 is running good in production at Uber?

Twitter started to canary parquet 1.11.0 on production from 2~3 days ago, so far so good. I will keep watching it and let you guys know if we saw anything abnormal. Thank you @shangxinli for this great contribution!

@zhenxiao
Copy link
Collaborator

nice. if running good for 2+ weeks, we could merge this PR. Looking forward to it

@shangxinli
Copy link
Collaborator Author

shangxinli commented Apr 29, 2021

@zhenxiao @beinan We cannot use this PR for upgrading the parquet version anymore because PR #15805 removed the direct dependency of Parquet as I mentioned above. What we can do is to Merge the PR 46, release new version of presto-hive-apache and then make changes in Presto to upgrade presto-hive-apache.

@beinan, I see you are a reviewer of PR [46](https://github.com/prestodb/presto-hive-apache/pull/46]. If you don't have more comments on that, can you help to merge PR46?

@beinan
Copy link
Member

beinan commented May 5, 2021

@zhenxiao @beinan We cannot use this PR for upgrading the parquet version anymore because PR #15805 removed the direct dependency of Parquet as I mentioned above. What we can do is to Merge the PR 46, release new version of presto-hive-apache and then make changes in Presto to upgrade presto-hive-apache.

@beinan, I see you are a reviewer of PR [46]([https://github.com/prestodb/presto-hive-apache/pull/46]](prestodb/presto-hive-apache#46). If you don't have more comments on that, can you help to merge PR46?

The change looks good to me. I haven't merge it yet just because we would like to watch it running on twitter's production for a couple of days more. I will sync with @zhenxiao tomorrow to see when we could merge it.

@beinan
Copy link
Member

beinan commented May 6, 2021

Hello @shangxinli ,
We're running prestodb on parquet 1.11.0 in twitter for around 2 weeks, everything looks good so far, thank you so much for the contribution! I also discussed with @zhenxiao , there is no concern from us to merge this change.

But looks like a couple of tests are failing, @shangxinli could you push -f again to trigger a rerun of the CI? Thanks!

@timrobbins1 timrobbins1 mentioned this pull request Jul 3, 2021
@ajaygeorge
Copy link
Contributor

ajaygeorge commented Sep 16, 2021

Hi @shangxinli , looks like we are looking good to merge. Can you rebase with master and fix the conflicts in pom.xml
cc @jbapple

@shangxinli
Copy link
Collaborator Author

Hi @shangxinli , looks like we are looking good to merge. Can you rebase with master and fix the conflicts in pom.xml
cc @jbapple

@ajaygeorge, as mentioned above, upgrading Parquet version can only be done via presto-hive-apache repo. This PR has no effect on changing parquet version anymore.

@ajaygeorge
Copy link
Contributor

@shangxinli Thanks for taking a look. Just wanted to understand this better. If this PR has no effect as you mentioned, can we close it.?

@shangxinli shangxinli closed this Nov 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants