Upgrade parquet version to 1.11.0#14960
Conversation
|
Thanks @shangxinli for the patch. @highker are there any concerns with removing the upperbounds on version for |
|
hi @vkorukanti how about we keep hold until Uber run new Parquet version(for Presto) in production? as far as I know, no other companies are running new Parquet version in production yet. Would like to see if production traffic triggers any problem :) |
|
@zhenxiao There are a few companies running the new version of Parquet(1.11.x) with Iceberg. The version of our case is a little complex because our version is hybrid. For the deployment, better to talk offline. What do you think? Let me know I can set up a short meeting with you, @vkorukanti, @highker. |
|
Hi @zhenxiao yes, we have deployed the parquet version to two of our prod clusters already. We haven't seen any issue on this so far, and are planning to continue rolling it out the upgrade to other prod clusters as well. |
|
thank you, @chliang71 @shangxinli how long has Uber run it on production? |
|
Thanks @zhenxiao! We talked to @vkorukanti and we want to wait one week after both DCs are deployed. The 1st DCs was deployed several weeks ago and the 2nd DCs was deployed this week. We will update if there are issues found. For the conflict you mentioned, I just found it is because the PR #15805 removed the direct dependency of Parquet. Currently, Presto only uses Parquet via Hive after that change. If we want to upgrade the Parquet version, we need to make a change in Hive first and wait till Hive's new release and then upgrade Hive's version in Presto. This seems a longer route to go. I left comments in PR 15805. Feel free to chime in. |
As we discussed in prestodb/presto-hive-apache#46 , the current iceberg connector PR might prefer using parquet 1.11.0, there might be a lots of conflict if we go with 1.11.1 here. In Twitter, we also planned to upgrade to 1.11.0 (rather than 1.11.1) in the near future. |
|
thank you, @beinan |
Twitter started to canary parquet 1.11.0 on production from 2~3 days ago, so far so good. I will keep watching it and let you guys know if we saw anything abnormal. Thank you @shangxinli for this great contribution! |
|
nice. if running good for 2+ weeks, we could merge this PR. Looking forward to it |
|
@zhenxiao @beinan We cannot use this PR for upgrading the parquet version anymore because PR #15805 removed the direct dependency of Parquet as I mentioned above. What we can do is to Merge the PR 46, release new version of presto-hive-apache and then make changes in Presto to upgrade presto-hive-apache. @beinan, I see you are a reviewer of PR [46](https://github.com/prestodb/presto-hive-apache/pull/46]. If you don't have more comments on that, can you help to merge PR46? |
The change looks good to me. I haven't merge it yet just because we would like to watch it running on twitter's production for a couple of days more. I will sync with @zhenxiao tomorrow to see when we could merge it. |
|
Hello @shangxinli , But looks like a couple of tests are failing, @shangxinli could you push -f again to trigger a rerun of the CI? Thanks! |
|
Hi @shangxinli , looks like we are looking good to merge. Can you rebase with master and fix the conflicts in pom.xml |
@ajaygeorge, as mentioned above, upgrading Parquet version can only be done via presto-hive-apache repo. This PR has no effect on changing parquet version anymore. |
|
@shangxinli Thanks for taking a look. Just wanted to understand this better. If this PR has no effect as you mentioned, can we close it.? |
Test plan - (Please fill in how you tested your changes)
Please make sure your submission complies with our Development, Formatting, and Commit Message guidelines.
Fill in the release notes towards the bottom of the PR description.
See Release Notes Guidelines for details.
If release note is NOT required, use: