-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-1602] update parquet version from 1.10.1 to 1.11.1 #2601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2601 +/- ##
============================================
- Coverage 51.17% 9.69% -41.49%
+ Complexity 3226 48 -3178
============================================
Files 438 53 -385
Lines 20089 1929 -18160
Branches 2068 230 -1838
============================================
- Hits 10281 187 -10094
+ Misses 8961 1729 -7232
+ Partials 847 13 -834
Flags with carried forward coverage won't be shown. Click here to find out more. |
|
@vc I wonder how we may want to proceed here, it looks like 1.10.1 dependency is baked into Spark, so I am struggling to make it to work properly as I am potentially getting class conflicts with 1.11.1 here. Any ideas? |
|
@vburenin yes. general principle here is to keep the parquet version aligned with the spark version. So this is definitely trickier. if we upgrade It almost seems like parquet-avro kind of layer, we need to maintain? (again seems like tall order) |
|
One thing that we could do is not to include parquet library into the fat jar since spark already comes with it. It would be just a trivial JAR swap. |
|
@vburenin we only bundle parquet-avro, not all of parquet. So the culprit here our dependency on parquet-avro. Can you try building with a different parquet avro version? In general, I think we should build differently for spark3 and spark2, since they may have different parquet versions and hence need different parquet-avro? is that what you are running into? what spark version are you at? |
|
@vinothchandar I am still running on spark2, however I am using a custom docker image where I replaced older parquet libraries with a newer ones. Technically a hack. |
|
@vburenin got it. We want to keep the parquet version matched with the spark version, w.r.t hudi. Spark3.1 still seems to be on 1.10.1 https://github.com/apache/spark/blob/branch-3.1/pom.xml#L138 Once #2625 is landed, lets see what the spark land is saying about 1.11.1? |
|
CC @li36909 |
What is the purpose of the pull request
parquet-avro 1.10.1 library has a bug that does incorrect complex parquet schema conversion into avro schema.
See https://issues.apache.org/jira/browse/HUDI-1602 for details
Brief change log
parquet-avro library has been upgraded from 1.10.1 to 1.11.1
Verify this pull request
This pull request is a trivial rework without any test coverage.
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change