-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-46718][BUILD] Upgrade Arrow to 15.0.0 #44797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
| val name = v.data.getName | ||
| name.startsWith("pmml-model-") || name.startsWith("scala-collection-compat_") || | ||
| name.startsWith("jsr305-") || name.startsWith("netty-") || name == "unused-1.0.0.jar" | ||
| val validPrefixes = Set("spark-connect", "unused-", "guava-", "failureaccess-", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this modification is not made, the connect server assembly jar will not be able to start after upgrade arrow 15 because too many unnecessary jar files are included in the assembly jar, which has caused some conflicts after upgrading to Arrow 15.
But I think this change is quite universal, so I will submit this change in a separate pr first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar | ||
| flatbuffers-java/1.12.0//flatbuffers-java-1.12.0.jar | ||
| eclipse-collections-api/11.1.0//eclipse-collections-api-11.1.0.jar | ||
| eclipse-collections/11.1.0//eclipse-collections-11.1.0.jar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question. Is this inevitable new dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In apache/arrow#38493, netty-common is replaced by eclipse-collections for arrow-vector module. Let's test exclude it, I guess there will be test failures.
If there are no test failures, I will further confirm it from the code later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for trying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/LuciferYang/spark/actions/runs/7608110702/job/20716682222

From the test results, we must add this dependency, it is used in the initialization of StructVector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for confirming.
This reverts commit b702f40.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @LuciferYang .
Merged to master for Apache Spark 4.0.0.
|
Thanks @dongjoon-hyun ~ |
What changes were proposed in this pull request?
This pr aims to upgrade Arrow from 14.0.2 to 15.0.0, this version fixes the compatibility issue with Netty 4.1.104.Final(GH-39265).
Additionally, since the
arrow-vectormodule useseclipse-collectionsto replacenetty-commonas a compile-level dependency, Apache Spark has added a dependency oneclipse-collectionsafter upgrading to use Arrow 15.0.0.Why are the changes needed?
The new version brings the following major changes:
Bug Fixes
GH-34610 - [Java] Fix valueCount and field name when loading/transferring NullVector
GH-38242 - [Java] Fix incorrect internal struct accounting for DenseUnionVector#getBufferSizeFor
GH-38254 - [Java] Add reusable buffer getters to char/binary vectors
GH-38366 - [Java] Fix Murmur hash on buffers less than 4 bytes
GH-38387 - [Java] Fix JDK8 compilation issue with TestAllTypes
GH-38614 - [Java] Add VarBinary and VarCharWriter helper methods to more writers
GH-38725 - [Java] decompression in Lz4CompressionCodec.java does not set writer index
New Features and Improvements
GH-38511 - [Java] Add getTransferPair(Field, BufferAllocator, CallBack) for StructVector and MapVector
GH-14936 - [Java] Remove netty dependency from arrow-vector
GH-38990 - [Java] Upgrade to flatc version 23.5.26
GH-39265 - [Java] Make it run well with the netty newest version 4.1.104
The full release notes as follows:
Does this PR introduce any user-facing change?
No
How was this patch tested?
Pass GitHub Actions
Was this patch authored or co-authored using generative AI tooling?
No