Skip to content

Conversation

@sunchao
Copy link
Member

@sunchao sunchao commented Jul 25, 2022

What changes were proposed in this pull request?

This PR aims to upgrade to Hadoop 3.3.4, which was just announced today.

Why are the changes needed?

Hadoop 3.3.4 comes with many bug fixes as well as CVE fixes. Please check release notes and change log.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@sunchao
Copy link
Member Author

sunchao commented Jul 25, 2022

RC0 is actually gonna be cancelled but still worth testing it here.

@steveloughran
Copy link
Contributor

cancelled rc0 to get a later version of the reloadj4 in (XXE) and aws sdk upgrade. no direct security issue there, just that it depended on jackson, and, well....

@steveloughran
Copy link
Contributor

btw, my build file for validating the rc can not only build spark, it can test the s3/azure/gcs stores through the release.
https://github.com/steveloughran/validate-hadoop-client-artifacts

@sunchao
Copy link
Member Author

sunchao commented Aug 4, 2022

Some tests are failing with the following error:

[info] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
[3970](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3971)
[info] Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.apache.hadoop#hadoop-aws;3.3.4: not found]
[3971](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3972)
[info] 	at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1458)
[3972](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3973)
[info] 	at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:185)
[3973](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3974)
[info] 	at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:308)
[3974](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3975)
[info] 	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:901)
[3975](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3976)
[info] 	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
[3976](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3977)
[info] 	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
[3977](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3978)
[info] 	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
[3978](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3979)
[info] 	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1048)
[3979](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3980)
[info] 	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1057)
[3980](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3981)
[info] 	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[3981](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3982)
[info]   at scala.Predef$.assert(Predef.scala:223)
[3982](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3983)
[info]   at org.apache.spark.deploy.k8s.integrationtest.ProcessUtils$.executeProcess(ProcessUtils.scala:54)
[3983](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3984)
[info]   at org.apache.spark.deploy.k8s.integrationtest.SparkAppLauncher$.launch(KubernetesTestComponents.scala:136)
[3984](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3985)
[info]   at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.runSparkApplicationAndVerifyCompletion(KubernetesSuite.scala:457)
[3985](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3986)
[info]   at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.runSparkRemoteCheckAndVerifyCompletion(KubernetesSuite.scala:290)
[3986](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3987)
[info]   at org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$2(DepsTestsSuite.scala:166)
[3987](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3988)
[info]   at org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.tryDepsTest(DepsTestsSuite.scala:328)
[3988](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3989)
[info]   at org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160)
[3989](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3990)
[info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[3990](https://github.com/sunchao/spark/runs/7660949772?check_suite_focus=true#step:9:3991)

Will take a look. cc @steveloughran .

@github-actions github-actions bot added the CORE label Aug 4, 2022
@sunchao
Copy link
Member Author

sunchao commented Aug 4, 2022

Looks OK now except one test failure which is unrelated to the Hadoop upgrade.

@dongjoon-hyun dongjoon-hyun changed the title [WIP][SPARK-39863][BUILD] Upgrade Hadoop to 3.3.4 [SPARK-39863][BUILD] Upgrade Hadoop to 3.3.4 Aug 8, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The python linter failure is irrelevant to this PR. Could you fill the PR description and convert to a normal PR, @sunchao ?

@sunchao sunchao marked this pull request as ready for review August 9, 2022 05:34
@sunchao
Copy link
Member Author

sunchao commented Aug 9, 2022

Sure @dongjoon-hyun . Updated.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @sunchao and @steveloughran .
Merged to master for Apache Spark 3.4.0.

avro-mapred/1.11.0//avro-mapred-1.11.0.jar
avro/1.11.0//avro-1.11.0.jar
aws-java-sdk-bundle/1.11.1026//aws-java-sdk-bundle-1.11.1026.jar
aws-java-sdk-bundle/1.12.132//aws-java-sdk-bundle-1.12.132.jar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunchao Oh, master branch found that this is inconsistent from the official Apache Hadoop 3.3.4.
The staging artifacts in this PR seems to be outdated somehow. Let me make a followup.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HADOOP-18344 changes AWS SDK at Apache Hadoop 3.3.4 RC1. So, this RC0-based dependency file is not updated. I found that the dependency test is not triggered due to the Python Linter flaky failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are at 1.12.262. there's a CVE out on the aws sdk transfer manager for releases < 1.12.261, which the s3a connector isn't exposed to (it's only for downloads through that class), but which other apps using the same sdk may be.

dongjoon-hyun added a commit that referenced this pull request Aug 9, 2022
### What changes were proposed in this pull request?

This PR aims to update the dependency manifest for Hadoop 3.

[HADOOP-18344](https://issues.apache.org/jira/browse/HADOOP-18344) changes AWS SDK at Apache Hadoop 3.3.4 RC1.

### Why are the changes needed?

#37281 missed this inconsistency.

### Does this PR introduce _any_ user-facing change?

No. This will recover the dependency check CI job.

### How was this patch tested?

Pass the CI on this job.

Closes #37447 from dongjoon-hyun/SPARK-39863.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants