[SPARK-35959][BUILD][test-maven][test-hadoop3.2][test-java11] Add a new Maven profile "no-shaded-hadoop-client" for Hadoop versions older than 3.2.2/3.3.1 #33160

sunchao · 2021-06-30T23:33:10Z

What changes were proposed in this pull request?

Add a new Maven profile no-shaded-hadoop-client that, when activated, switches to non-shaded Hadoop client (e.g., hadoop-client, hadoop-yarn-client, etc).

Why are the changes needed?

Currently Spark uses Hadoop shaded client by default. However, if Spark users want to build Spark with older version of Hadoop, such as 3.1.x, the shaded client cannot be used as it currently it only support Hadoop 3.2.2+ and 3.3.1+). Therefore, this proposes to offer a new Maven profile "no-shaded-hadoop-client" for this use case.

Does this PR introduce any user-facing change?

Yes, now users can choose to build Apache Spark with non-shaded Hadoop client, e.g.:

build/mvn package -DskipTests -Dhadoop.version=3.1.1 -Pno-shaded-hadoop-client

How was this patch tested?

Existing tests.

pom.xml

SparkQA · 2021-07-01T00:21:53Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44983/

SparkQA · 2021-07-01T00:55:16Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44983/

SparkQA · 2021-07-01T02:08:27Z

Test build #140469 has finished for PR 33160 at commit 0bf197b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-07-01T07:36:31Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44999/

SparkQA · 2021-07-01T09:51:04Z

Test build #140487 has finished for PR 33160 at commit 37130a9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

Hi, @sunchao .

To verify via CI, could you make the profile active by default? After testing, we should remove it.

dongjoon-hyun · 2021-07-01T17:19:49Z

FYI, if you enable it by default, the dependency files are required to be updated accordingly.

sunchao · 2021-07-01T17:24:01Z

To verify via CI, could you make the profile active by default? After testing, we should remove it.

Thanks @dongjoon-hyun . Will do.

SparkQA · 2021-07-01T18:00:28Z

Test build #140531 has finished for PR 33160 at commit bf51f50.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-07-01T18:41:52Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45044/

SparkQA · 2021-07-01T19:16:05Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45044/

SparkQA · 2021-07-01T20:39:59Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45049/

dongjoon-hyun · 2021-07-01T20:56:58Z

For Hadoop 2 build, I noticed that GitHub Action job used sbt directly. So, I verified that combination compilation manually on top of that GitHub Action job command.

$ ./build/sbt -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Phadoop-2.7 -Pno-shaded-hadoop-client compile test:compile
...
[info] compiling 18 Scala sources to /Users/dongjoon/APACHE/spark-merge/sql/hive-thriftserver/target/scala-2.12/test-classes ...
[success] Total time: 142 s (02:22), completed Jul 1, 2021 1:48:49 PM

For the other part, it looks good to me. After one hour, if GitHub Action passed, let's revert dev/run-tests.py and merge this PR.

SparkQA · 2021-07-01T21:50:12Z

Test build #140537 has finished for PR 33160 at commit 13932bf.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sunchao · 2021-07-01T22:35:22Z

Cause: java.lang.NoSuchMethodError: org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
[info]   at org.apache.hadoop.http.HttpServer2.initializeWebServer(HttpServer2.java:707)
[info]   at org.apache.hadoop.http.HttpServer2.<init>(HttpServer2.java:687)

Hmm for some reason it is still using Hadoop 3.3.1 classes which is only compatible with jetty 9.4+. Let me check why it happens.

dongjoon-hyun · 2021-07-01T23:33:21Z

Let me check the PR builder.

dongjoon-hyun · 2021-07-01T23:36:51Z

So, SBT with the following conf still fails?

-Phadoop-3.2 -Dhadoop.version=3.1.1 -Pno-shaded-hadoop-client -Phive-2.3 
-Pmesos -Pspark-ganglia-lgpl -Pyarn -Pdocker-integration-tests -Pkubernetes
-Phive-thriftserver -Pkinesis-asl -Phive -Phadoop-cloud test:package streaming-kinesis-asl-assembly/assembly

sunchao · 2021-07-01T23:39:14Z

Yeah, but somehow it references Hadoop 3.3.1 class like org.apache.hadoop.http.HttpServer2 according to the line number.

dongjoon-hyun · 2021-07-02T00:01:02Z

It seems to be a Spark bug which works only at maven and not on sbt.

$ build/mvn dependency:tree -Phadoop-3.2 -Dhadoop.version=3.1.1 -pl core | grep hadoop.client
exec: curl --silent --show-error -L https://downloads.lightbend.com/scala/2.12.14/scala-2.12.14.tgz
Using `mvn` from path: /opt/homebrew/bin/mvn
[INFO] +- org.apache.hadoop:hadoop-client-api:jar:3.1.1:compile
[INFO] +- org.apache.hadoop:hadoop-client-runtime:jar:3.1.1:compile

$ build/sbt "core/dependencyTree" -Phadoop-3.2 | grep hadoop.client
[info]   +-org.apache.hadoop:hadoop-client-api:3.3.1
[info]   +-org.apache.hadoop:hadoop-client-runtime:3.3.1
[info]   | +-org.apache.hadoop:hadoop-client-api:3.3.1

dongjoon-hyun · 2021-07-02T00:03:28Z

In this case, we used to use [test-hadoop3.2][test-java11] combination which is still working in Jenkins. However, Jenkins PR Builder seems to be broken for [test-java11] for a while. Only the backend Jenkins job is working.

dongjoon-hyun · 2021-07-02T00:04:13Z

Let me try.

dongjoon-hyun · 2021-07-02T00:04:32Z

Retest this please

sunchao · 2021-07-02T00:17:05Z

@dongjoon-hyun ahh I think you are right! seems sbt doesn't parse the -Dhadoop.version parameter:

build/sbt "core/dependencyTree" -Phadoop-3.2 -Dhadoop.version=3.1.1 | grep hadoop.client

[info]   +-org.apache.hadoop:hadoop-client-api:3.3.1
[info]   +-org.apache.hadoop:hadoop-client-runtime:3.3.1
[info]   | +-org.apache.hadoop:hadoop-client-api:3.3.1

SparkQA · 2021-07-02T05:34:26Z

Test build #140560 has finished for PR 33160 at commit b1e0583.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-07-02T05:35:11Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45072/

SparkQA · 2021-07-06T18:57:28Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45225/

SparkQA · 2021-07-06T18:59:44Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45225/

SparkQA · 2021-07-06T19:02:39Z

Test build #140714 has finished for PR 33160 at commit a22c1e7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-07-06T23:45:54Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45232/

SparkQA · 2021-07-06T23:47:23Z

Test build #140721 has finished for PR 33160 at commit 3ffecf8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-07-06T23:48:14Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45232/

sunchao · 2021-07-07T17:04:58Z

It seems Spark can't use non-shaded Hadoop 3.3.1 client as it is because of jetty-server incompatibility issue: Hadoop 3.3.1 uses Jetty 9.4.40 while Spark master uses 9.4.42 (upgraded via this #33053). The method SessionHandler.setHttpOnly was removed in 9.4.42 and therefore we'll get exception if trying to use the non-shaded Hadoop client:

sbt.ForkMain$ForkError: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.NoSuchMethodError: org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
	at org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:384)
	at org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:129)
	at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:500)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:122)
	at org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:333)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)

SparkQA · 2021-08-12T07:16:06Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46877/

SparkQA · 2021-08-12T07:19:57Z

Test build #142369 has finished for PR 33160 at commit 4bf4533.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

steveloughran · 2021-08-19T14:10:55Z

Jetty 9.4.40 while Spark master uses 9.4.42

could move hadoop 3.3.2 to the same jetty version; if we get that out then things will briefly be in sync

sunchao · 2021-08-19T16:41:51Z

@steveloughran yes we can, this is only an issue when Spark uses the non-shaded client though so I think it's OK, since it's better to just use the shaded client.

SparkQA · 2021-09-07T02:49:36Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47538/

steveloughran · 2021-09-09T13:12:00Z

you got any plans to update that hadoop jetty version alongside this?

sunchao · 2021-09-09T18:33:32Z

@steveloughran you mean upgrade jetty version in Hadoop? yea I can check, but anyways Spark is not blocked by the jetty thing.

github-actions · 2021-12-19T00:12:23Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions bot added BUILD YARN labels Jun 30, 2021

dongjoon-hyun reviewed Jun 30, 2021

View reviewed changes

pom.xml Outdated Show resolved Hide resolved

sunchao changed the title ~~[SPARK-35959][BUILD] Add a new Maven profile "no-shaded-client" for older Hadoop 3.x versions~~ [SPARK-35959][BUILD] Add a new Maven profile "no-shaded-hadoop-client" for older Hadoop 3.x versions Jul 1, 2021

dongjoon-hyun reviewed Jul 1, 2021

View reviewed changes

sunchao changed the title ~~[SPARK-35959][BUILD] Add a new Maven profile "no-shaded-hadoop-client" for older Hadoop 3.x versions~~ [SPARK-35959][BUILD] Add a new Maven profile "no-shaded-hadoop-client" for Hadoop versions older than 3.2.2/3.3.1 Jul 1, 2021

sunchao marked this pull request as draft July 1, 2021 17:51

github-actions bot added the SQL label Jul 6, 2021

sunchao mentioned this pull request Aug 6, 2021

[SPARK-29250][BUILD] Upgrade to Hadoop 3.3.1 #30135

Closed

sunchao added 6 commits August 11, 2021 23:50

initial commit

a7dfb14

name change

946175f

TEST: change hadoop profiles in ci

f2f8f69

add more test dependency

1f0f6c8

TEST: temporarily change hadoop version to 3.1.1

252b817

Use Hadoop 3.1.4

4bf4533

sunchao force-pushed the no-shaded-client branch from 3ffecf8 to 4bf4533 Compare August 12, 2021 06:50

github-actions bot added the Stale label Dec 19, 2021

github-actions bot closed this Dec 20, 2021

Nuttymoon mentioned this pull request May 9, 2022

spark3: cannot start history server TOSIT-IO/spark-old#1

Closed

pan3793 mentioned this pull request Apr 20, 2023

[SPARK-43185][BUILD] Inline hadoop-client related properties in pom.xml #40847

Closed

[SPARK-35959][BUILD][test-maven][test-hadoop3.2][test-java11] Add a new Maven profile "no-shaded-hadoop-client" for Hadoop versions older than 3.2.2/3.3.1 #33160

[SPARK-35959][BUILD][test-maven][test-hadoop3.2][test-java11] Add a new Maven profile "no-shaded-hadoop-client" for Hadoop versions older than 3.2.2/3.3.1 #33160

Uh oh!

Conversation

sunchao commented Jun 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

SparkQA commented Jul 1, 2021

Uh oh!

SparkQA commented Jul 1, 2021

Uh oh!

SparkQA commented Jul 1, 2021

Uh oh!

SparkQA commented Jul 1, 2021

Uh oh!

SparkQA commented Jul 1, 2021

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jul 1, 2021

Uh oh!

sunchao commented Jul 1, 2021

Uh oh!

SparkQA commented Jul 1, 2021

Uh oh!

SparkQA commented Jul 1, 2021

Uh oh!

SparkQA commented Jul 1, 2021

Uh oh!

SparkQA commented Jul 1, 2021

Uh oh!

dongjoon-hyun commented Jul 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jul 1, 2021

Uh oh!

sunchao commented Jul 1, 2021

Uh oh!

dongjoon-hyun commented Jul 1, 2021

Uh oh!

dongjoon-hyun commented Jul 1, 2021

Uh oh!

sunchao commented Jul 1, 2021

Uh oh!

dongjoon-hyun commented Jul 2, 2021

Uh oh!

dongjoon-hyun commented Jul 2, 2021

Uh oh!

dongjoon-hyun commented Jul 2, 2021

Uh oh!

dongjoon-hyun commented Jul 2, 2021

Uh oh!

sunchao commented Jul 2, 2021

Uh oh!

SparkQA commented Jul 2, 2021

Uh oh!

SparkQA commented Jul 2, 2021

Uh oh!

SparkQA commented Jul 6, 2021

Uh oh!

SparkQA commented Jul 6, 2021

Uh oh!

SparkQA commented Jul 6, 2021

Uh oh!

SparkQA commented Jul 6, 2021

Uh oh!

SparkQA commented Jul 6, 2021

Uh oh!

SparkQA commented Jul 6, 2021

Uh oh!

sunchao commented Jul 7, 2021

Uh oh!

SparkQA commented Aug 12, 2021

Uh oh!

sunchao commented Jun 30, 2021 •

edited

Loading

dongjoon-hyun commented Jul 1, 2021 •

edited

Loading