-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-43185][BUILD] Inline hadoop-client related properties in pom.xml
#40847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @sunchao |
|
spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala Lines 118 to 124 in 09a4353
Update: leave it, #33160 didn't get in, Spark does not support for building against vanilla Hadoop3 client |
friendly ping @sunchao |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Note this also makes it impossible for users to pick a Hadoop version without shaded client support, like Hadoop 3.1.x. Previous they can do:
./mvn package -Dhadoop.version=3.1.2 -Dhadoop-client-api.artifact=hadoop-client ...
cc @xkrogen too (I vaguely remember you did something similar).
|
Some related JIRA: https://issues.apache.org/jira/browse/SPARK-37994 |
|
@sunchao so the current supported Hadoop version is 3.2.2+ and 3.3.1+? there is some code for Hadoop 3.0 and 3.1, should we remove it then? |
|
yea, the shaded Hadoop client only work for Hadoop 3.2.2+ and 3.3.1+. I'm not sure if there're people that still use Hadoop 3.0/3.1 with Spark though. I'm not aware of any code in Spark that specifically depend on Hadoop 3.0/3.1. Could you point to them for me? |
@sunchao I find two examples
|
|
No strong opinion on this, but we should make it clear that this PR is explicitly dropping support for Hadoop 3.0/3.1 and earlier versions of 3.2 cc @mridulm |
|
@xkrogen @sunchao @pan3793 I would like to clarify, actually, no longer using Hadoop 3.0/3.1 support for build ant test is not the original intention of this PR. So if there is an way to build and test Hadoop 3.0/3.1 successfully before this pr, but it loses after this pr, I think we should stop this work because Apache Spark has not previously stated on any occasion that it no longer supports Hadoop 3.0/3.1, right ? @xkrogen @sunchao @pan3793 Can you give a command that can be used for build & test with Hadoop 3.0/3.1? I want to manually check it, thanks ~ |
Yes, I think that's probably a sensible thing to do.
You can check this JIRA for the command to build: https://issues.apache.org/jira/browse/SPARK-37994 |
I encountered the following error while compiling Due to the fixed version of HADOOP-15691 being |
|
convert to draft to avoid accidental merging |
|
@xkrogen @sunchao @pan3793 Synchronize my experimental results
otherwise, the following compilation error will occurred with
Otherwise, cannot build
Overall, the current master cannot compile |
|
More
|
|
Interesting, thanks for the detailed analysis @LuciferYang !
This is Hadoop 3.2.2 ? I remember at some point we started to enable |
I test with hadoop 3.2.4. |
|
Just to be clear are we saying this is OK to merge or there are issues with hadoop-cloud? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK to get in, because
- Spark does not claim to support building against vanilla Hadoop 3 client officially, and it does not work before this change, so this PR breaks nothing
- before and after this PR, Spark supports building against Hadoop 3.3.1+ shaded client.
- before and after this PR, Spark can NOT build against Hadoop 3.2.x shaded client because of SPARK-40039, it's another issue if we want to restore the support for Hadoop 3.2.x shaded client
@srowen I'm also +1 that we should document clearly the Hadoop client version support strategy |
sunchao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with the PR too given that Spark already doesn't support most of other Hadoop versions before 3.3.1.
|
Merged to master |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
…m.xml` ### What changes were proposed in this pull request? SPARK-36835 introduced `hadoop-client-api.artifact`, `adoop-client-runtime.artifact` and `hadoop-client-minicluster.artifact` to be compatible with the dependency definitions of Hadoop 2 and Hadoop 3. After [SPARK-42452](https://issues.apache.org/jira/browse/SPARK-42452), Spark no longer supports Hadoop 2, so this pr inline these properties to simplify the dependency definition. ### Why are the changes needed? No longer requires compatibility with Hadoop 2 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions Closes apache#40847 from LuciferYang/SPARK-43185. Lead-authored-by: yangjie01 <[email protected]> Co-authored-by: YangJie <[email protected]> Signed-off-by: Sean Owen <[email protected]>
What changes were proposed in this pull request?
SPARK-36835 introduced
hadoop-client-api.artifact,adoop-client-runtime.artifactandhadoop-client-minicluster.artifactto be compatible with the dependency definitions of Hadoop 2 and Hadoop 3.After SPARK-42452, Spark no longer supports Hadoop 2, so this pr inline these properties to simplify the dependency definition.
Why are the changes needed?
No longer requires compatibility with Hadoop 2
Does this PR introduce any user-facing change?
No
How was this patch tested?
Pass GitHub Actions