[SPARK-28903][STREAMING][PYSPARK][TESTS] Fix AWS JDK version conflict that breaks Pyspark Kinesis tests #25559

srowen · 2019-08-22T20:00:47Z

The Pyspark Kinesis tests are failing, at least in master:

======================================================================
ERROR: test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/streaming/tests/test_kinesis.py", line 44, in test_kinesis_stream
    kinesisTestUtils = self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2)
  File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__
    answer, self._gateway_client, None, self._fqn)
  File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.kinesis.KinesisTestUtils.
: java.lang.NoSuchMethodError: com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection;
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211)
	at scala.collection.Iterator.find(Iterator.scala:993)
	at scala.collection.Iterator.find$(Iterator.scala:990)
	at scala.collection.AbstractIterator.find(Iterator.scala:1429)
	at scala.collection.IterableLike.find(IterableLike.scala:81)
	at scala.collection.IterableLike.find$(IterableLike.scala:80)
	at scala.collection.AbstractIterable.find(Iterable.scala:56)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46)
...

The non-Python Kinesis tests are fine though. It turns out that this is because Pyspark tests use the output of the Spark assembly, and it pulls in hadoop-cloud, which in turn pulls in an old AWS Java SDK.

Per Steve Loughran (below), it seems like we can just resolve this by excluding the aws-java-sdk dependency. See the attached PR for some more detail about the debugging and other options.

See #25558 (comment)

SparkQA · 2019-08-22T21:42:08Z

Test build #109592 has finished for PR 25559 at commit 68efc6a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-23T08:20:41Z

Test build #109633 has finished for PR 25559 at commit 8085b7f.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2019-08-23T14:34:54Z

I'm pretty stumped on this. The NoSuchMethodError refers to a method that is definitely there. The AWS dependencies are harmonized. The Scala tests work. The dependency graph from SBT looks right. I am guessing something about how python tests work or are run is the issue, but not sure what would cause this, as it suggests a runtime AWS SDK version difference, but I don't see any other versions pulled in.

sarutak · 2019-08-26T08:45:44Z

The reason for failure of the unit test in #24651 seems to be the same as you hit...
@sekikn said that the unit test finished successfully on his laptop so I guess there are something wrong in CI environment.

SparkQA · 2019-08-26T17:12:46Z

Test build #109745 has finished for PR 25559 at commit 0fbfdf7.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-26T17:56:45Z

Test build #109748 has finished for PR 25559 at commit d8d0e59.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-26T20:16:13Z

Test build #109751 has finished for PR 25559 at commit 361c90d.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2019-08-26T21:38:24Z

Well, I made a little breakthrough: if you build the Spark assembly, something puts aws-java-sdk-1.7.4.jar in the jars/ dir and that has to be the source of the problem. I'm trying to figure out what. I'll ping @vanzin preemptively as a possible expert on how the assembly works, but I still need to do some homework. This doesn't appear in the dependency tree.

vanzin · 2019-08-26T21:46:01Z

That's probably the hadoop-cloud module if you have that profile enabled.

srowen · 2019-08-26T22:05:28Z

Bingo, that's it. Thanks! I'll work on harmonizing its dependencies with the Kinesis module.

srowen · 2019-08-26T22:30:10Z

hadoop-cloud/pom.xml

          <groupId>com.fasterxml.jackson.core</groupId>
          <artifactId>jackson-annotations</artifactId>
        </exclusion>
+        <exclusion>


Preemptively CCing @steveloughran for a look at this. The TL;DR is that hadoop-cloud is brining in an old aws-java-sdk dependency to the assembly and it interferes with the Kinesis dependencies, which are newer. Excluding these is a bit extreme, but, the aws-java-sdk dependency brings in like 20 other AWS JARs. I'm not clear whether that's the intent anyway.

Won't you break the hadoop-cloud profile by doing this?

The kinesis integration is not packaged as part of the Spark distribution (when you enable its profile), while hadoop-cloud is.

Yeah, this is the thing. Right now we only pull in the core aws-java-sdk JAR. If I include aws-java-sdk as an explicit dependency, it pulls in tons of other dependencies that seem irrelevant to Spark. Hm, maybe I need to use <dependencyManagement> to more narrowly manage up the version of aws-java-sdk without affecting the transitive dependency resolution? Well, if this change works, at least we are on to the cause, and then I'll try that.

1.7.4 is a really old version; hadoop 2.9+ uses a (fat) shaded JAR which has a consistent kinesis SDK in with it; 2.8 is on a 1.10.x I think

Go on, move off Hadoop 2.7 as a baseline. It's many years old. EOL/unsupported and never actually qualified against Java 8

Thanks @steveloughran -- so, given that we are for better or worse here still on Hadoop 2.7 (because I think I need to back port this to 2.4 at least), is it safe to exclude the whole aws-java-sdk dependency? doesn't seem so as it would mean the user has to re-include it. But is it safe to assume they would be running this on Hadoop anyway?

Sounds like you are saying that in Hadoop 2.9, this dependency wouldn't exist or could be excluded.

So, excluding it definitely worked to solve the problem. Right now I'm seeing what happens if we explicitly manage its version up as a direct dependency because just managing it up with <dependencyManagement> wasn't enough. The downside is probably that the assembly brings in everything the aws-java-sdk depends on, which is a lot of stuff. We don't distribute the assembly per se (right?) so it doesn't really mean more careful checks of the license of all the dependencies.

Still, if somehow it were fine to exclude this dependency, that's the tidiest from Spark's perspective. Does that fly for Hadoop 2.7 or pretty well break the point of hadoop-cloud?

+1 to excluding the AWS dependency. It is not actually something you can bundle into ASF releases anyway https://issues.apache.org/jira/browse/HADOOP-13794. But: it'd be good for a spark-hadoop-cloud artifact to be published with that dependency for downstream users, or at least the things you have to add documented somewhere.

FWIW, I do build and test the spark kinesis module as part of my AWS SDK update process -one that actually went pretty smoothly for a change last time. No regressions, no new error messages in logs, shaded JARs really are shaded, etc. This is progress and means that backporting is something we should be doing

see https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md#-qualifying-an-aws-sdk-update for the runbook there

SparkQA · 2019-08-27T00:30:24Z

Test build #109756 has finished for PR 25559 at commit e760ebb.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-27T04:19:31Z

Test build #4841 has finished for PR 25559 at commit e760ebb.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-27T15:17:51Z

Test build #4843 has finished for PR 25559 at commit e760ebb.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-27T18:33:44Z

Test build #4844 has finished for PR 25559 at commit e760ebb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2019-08-27T19:10:49Z

That's good that this passes. Now going to try the less drastic change at #25559 (comment) to see if that works.

SparkQA · 2019-08-27T21:00:32Z

Test build #109826 has finished for PR 25559 at commit 101c4ce.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-27T23:47:58Z

Test build #109832 has finished for PR 25559 at commit 78c4e62.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-28T00:32:23Z

Test build #4845 has finished for PR 25559 at commit 78c4e62.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-28T03:49:19Z

Test build #4846 has finished for PR 25559 at commit 78c4e62.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-28T16:09:54Z

Test build #109867 has finished for PR 25559 at commit 24bea1e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

@steveloughran I'll re-run tests without my placeholder TODOs, but it won't matter (Kinesis tests won't run anyway). Looks like it passes with the exclusion. That's OK by you?

steveloughran · 2019-08-28T17:35:50Z

could some stick up the output of the mvn dependency listing here mvn dependency:tree -Dverbose for the base profile and the hadoop-3.1 branch? I'll be able to review them.

srowen · 2019-08-28T18:10:23Z

Base:

[INFO] org.apache.spark:spark-hadoop-cloud_2.12:jar:3.0.0-SNAPSHOT
[INFO] +- org.apache.spark:spark-sql_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- com.univocity:univocity-parsers:jar:2.7.3:provided
[INFO] |  +- org.apache.spark:spark-sketch_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.spark:spark-core_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.spark:spark-catalyst_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  |  +- org.scala-lang.modules:scala-parser-combinators_2.12:jar:1.1.0:provided
[INFO] |  |  +- org.codehaus.janino:janino:jar:3.0.15:provided
[INFO] |  |  +- org.codehaus.janino:commons-compiler:jar:3.0.15:provided
[INFO] |  |  +- org.antlr:antlr4-runtime:jar:4.7.1:provided
[INFO] |  |  \- org.apache.arrow:arrow-vector:jar:0.12.0:provided
[INFO] |  |     +- org.apache.arrow:arrow-format:jar:0.12.0:provided
[INFO] |  |     +- org.apache.arrow:arrow-memory:jar:0.12.0:provided
[INFO] |  |     +- com.carrotsearch:hppc:jar:0.7.2:provided
[INFO] |  |     \- com.google.flatbuffers:flatbuffers-java:jar:1.9.0:provided
[INFO] |  +- org.apache.spark:spark-tags_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.orc:orc-core:jar:nohive:1.5.5:compile
[INFO] |  |  +- org.apache.orc:orc-shims:jar:1.5.5:compile
[INFO] |  |  +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
[INFO] |  |  +- commons-lang:commons-lang:jar:2.6:compile
[INFO] |  |  \- io.airlift:aircompressor:jar:0.10:compile
[INFO] |  +- org.apache.orc:orc-mapreduce:jar:nohive:1.5.5:compile
[INFO] |  +- org.apache.parquet:parquet-column:jar:1.10.1:compile
[INFO] |  |  +- org.apache.parquet:parquet-common:jar:1.10.1:compile
[INFO] |  |  \- org.apache.parquet:parquet-encoding:jar:1.10.1:compile
[INFO] |  +- org.apache.parquet:parquet-hadoop:jar:1.10.1:compile
[INFO] |  |  +- org.apache.parquet:parquet-format:jar:2.4.0:compile
[INFO] |  |  \- org.apache.parquet:parquet-jackson:jar:1.10.1:compile
[INFO] |  \- org.apache.xbean:xbean-asm7-shaded:jar:4.14:provided
[INFO] +- org.apache.spark:spark-core_2.12:test-jar:tests:3.0.0-SNAPSHOT:test
[INFO] |  +- com.thoughtworks.paranamer:paranamer:jar:2.8:runtime
[INFO] |  +- org.apache.avro:avro:jar:1.8.2:compile
[INFO] |  |  +- org.apache.commons:commons-compress:jar:1.8.1:compile
[INFO] |  |  \- org.tukaani:xz:jar:1.5:compile
[INFO] |  +- org.apache.avro:avro-mapred:jar:hadoop2:1.8.2:compile
[INFO] |  |  \- org.apache.avro:avro-ipc:jar:1.8.2:compile
[INFO] |  +- com.twitter:chill_2.12:jar:0.9.3:provided
[INFO] |  |  \- com.esotericsoftware:kryo-shaded:jar:4.0.2:provided
[INFO] |  |     +- com.esotericsoftware:minlog:jar:1.3.0:provided
[INFO] |  |     \- org.objenesis:objenesis:jar:2.5.1:provided
[INFO] |  +- com.twitter:chill-java:jar:0.9.3:provided
[INFO] |  +- org.apache.spark:spark-launcher_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.spark:spark-kvstore_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  |  \- org.fusesource.leveldbjni:leveldbjni-all:jar:1.8:provided
[INFO] |  +- org.apache.spark:spark-network-common_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.spark:spark-network-shuffle_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.spark:spark-unsafe_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- javax.activation:activation:jar:1.1.1:compile
[INFO] |  +- org.apache.curator:curator-recipes:jar:2.7.1:compile
[INFO] |  |  +- org.apache.curator:curator-framework:jar:2.7.1:compile
[INFO] |  |  \- com.google.guava:guava:jar:14.0.1:provided
[INFO] |  +- org.apache.zookeeper:zookeeper:jar:3.4.6:compile
[INFO] |  +- javax.servlet:javax.servlet-api:jar:3.1.0:provided
[INFO] |  +- org.apache.commons:commons-lang3:jar:3.8.1:compile
[INFO] |  +- org.apache.commons:commons-math3:jar:3.4.1:provided
[INFO] |  +- org.apache.commons:commons-text:jar:1.6:provided
[INFO] |  +- com.google.code.findbugs:jsr305:jar:3.0.0:provided
[INFO] |  +- org.slf4j:slf4j-api:jar:1.7.16:compile
[INFO] |  +- org.slf4j:jul-to-slf4j:jar:1.7.16:provided
[INFO] |  +- org.slf4j:jcl-over-slf4j:jar:1.7.16:provided
[INFO] |  +- log4j:log4j:jar:1.2.17:compile
[INFO] |  +- org.slf4j:slf4j-log4j12:jar:1.7.16:compile
[INFO] |  +- com.ning:compress-lzf:jar:1.0.3:provided
[INFO] |  +- org.xerial.snappy:snappy-java:jar:1.1.7.3:compile
[INFO] |  +- org.lz4:lz4-java:jar:1.6.0:provided
[INFO] |  +- com.github.luben:zstd-jni:jar:1.4.2-1:provided
[INFO] |  +- org.roaringbitmap:RoaringBitmap:jar:0.7.45:provided
[INFO] |  |  \- org.roaringbitmap:shims:jar:0.7.45:provided
[INFO] |  +- commons-net:commons-net:jar:3.1:provided
[INFO] |  +- org.scala-lang.modules:scala-xml_2.12:jar:1.2.0:provided
[INFO] |  +- org.scala-lang:scala-library:jar:2.12.8:provided
[INFO] |  +- org.scala-lang:scala-reflect:jar:2.12.8:provided
[INFO] |  +- org.json4s:json4s-jackson_2.12:jar:3.6.6:provided
[INFO] |  |  \- org.json4s:json4s-core_2.12:jar:3.6.6:provided
[INFO] |  |     +- org.json4s:json4s-ast_2.12:jar:3.6.6:provided
[INFO] |  |     \- org.json4s:json4s-scalap_2.12:jar:3.6.6:provided
[INFO] |  +- org.glassfish.jersey.core:jersey-client:jar:2.29:provided
[INFO] |  |  +- jakarta.ws.rs:jakarta.ws.rs-api:jar:2.1.5:provided
[INFO] |  |  \- org.glassfish.hk2.external:jakarta.inject:jar:2.5.0:provided
[INFO] |  +- org.glassfish.jersey.core:jersey-common:jar:2.29:provided
[INFO] |  |  +- jakarta.annotation:jakarta.annotation-api:jar:1.3.4:provided
[INFO] |  |  \- org.glassfish.hk2:osgi-resource-locator:jar:1.0.3:provided
[INFO] |  +- org.glassfish.jersey.core:jersey-server:jar:2.29:provided
[INFO] |  |  +- org.glassfish.jersey.media:jersey-media-jaxb:jar:2.29:provided
[INFO] |  |  \- javax.validation:validation-api:jar:2.0.1.Final:provided
[INFO] |  +- org.glassfish.jersey.containers:jersey-container-servlet:jar:2.29:provided
[INFO] |  +- org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.29:provided
[INFO] |  +- org.glassfish.jersey.inject:jersey-hk2:jar:2.29:provided
[INFO] |  |  \- org.glassfish.hk2:hk2-locator:jar:2.5.0:provided
[INFO] |  |     +- org.glassfish.hk2.external:aopalliance-repackaged:jar:2.5.0:provided
[INFO] |  |     +- org.glassfish.hk2:hk2-api:jar:2.5.0:provided
[INFO] |  |     +- org.glassfish.hk2:hk2-utils:jar:2.5.0:provided
[INFO] |  |     \- org.javassist:javassist:jar:3.22.0-CR2:provided
[INFO] |  +- io.netty:netty-all:jar:4.1.30.Final:provided
[INFO] |  +- com.clearspring.analytics:stream:jar:2.9.6:provided
[INFO] |  +- io.dropwizard.metrics:metrics-core:jar:3.1.5:provided
[INFO] |  +- io.dropwizard.metrics:metrics-jvm:jar:3.1.5:provided
[INFO] |  +- io.dropwizard.metrics:metrics-json:jar:3.1.5:provided
[INFO] |  +- io.dropwizard.metrics:metrics-graphite:jar:3.1.5:provided
[INFO] |  +- com.fasterxml.jackson.module:jackson-module-scala_2.12:jar:2.9.9:provided
[INFO] |  |  \- com.fasterxml.jackson.module:jackson-module-paranamer:jar:2.9.9:provided
[INFO] |  +- org.apache.ivy:ivy:jar:2.4.0:provided
[INFO] |  +- oro:oro:jar:2.0.8:provided
[INFO] |  +- net.razorvine:pyrolite:jar:4.30:provided
[INFO] |  +- net.sf.py4j:py4j:jar:0.10.8.1:provided
[INFO] |  \- org.apache.commons:commons-crypto:jar:1.0.0:provided
[INFO] +- org.apache.hadoop:hadoop-client:jar:2.7.4:provided
[INFO] |  +- org.apache.hadoop:hadoop-common:jar:2.7.4:provided
[INFO] |  |  +- commons-cli:commons-cli:jar:1.2:compile
[INFO] |  |  +- xmlenc:xmlenc:jar:0.52:provided
[INFO] |  |  +- commons-httpclient:commons-httpclient:jar:3.1:provided
[INFO] |  |  +- commons-collections:commons-collections:jar:3.2.2:provided
[INFO] |  |  +- org.mortbay.jetty:jetty-sslengine:jar:6.1.26:provided
[INFO] |  |  +- javax.servlet.jsp:jsp-api:jar:2.1:provided
[INFO] |  |  +- commons-configuration:commons-configuration:jar:1.6:provided
[INFO] |  |  |  \- commons-digester:commons-digester:jar:1.8:provided
[INFO] |  |  |     \- commons-beanutils:commons-beanutils:jar:1.9.3:provided
[INFO] |  |  +- com.google.code.gson:gson:jar:2.2.4:provided
[INFO] |  |  +- org.apache.hadoop:hadoop-auth:jar:2.7.4:provided
[INFO] |  |  |  \- org.apache.directory.server:apacheds-kerberos-codec:jar:2.0.0-M15:provided
[INFO] |  |  |     +- org.apache.directory.server:apacheds-i18n:jar:2.0.0-M15:provided
[INFO] |  |  |     +- org.apache.directory.api:api-asn1-api:jar:1.0.0-M20:provided
[INFO] |  |  |     \- org.apache.directory.api:api-util:jar:1.0.0-M20:provided
[INFO] |  |  +- org.apache.curator:curator-client:jar:2.7.1:compile
[INFO] |  |  \- org.apache.htrace:htrace-core:jar:3.1.0-incubating:provided
[INFO] |  +- org.apache.hadoop:hadoop-hdfs:jar:2.7.4:provided
[INFO] |  |  +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile
[INFO] |  |  \- xerces:xercesImpl:jar:2.9.1:provided
[INFO] |  |     \- xml-apis:xml-apis:jar:1.4.01:test
[INFO] |  +- org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.7.4:provided
[INFO] |  |  +- org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.7.4:provided
[INFO] |  |  |  +- org.apache.hadoop:hadoop-yarn-client:jar:2.7.4:compile
[INFO] |  |  |  \- org.apache.hadoop:hadoop-yarn-server-common:jar:2.7.4:provided
[INFO] |  |  \- org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.7.4:provided
[INFO] |  +- org.apache.hadoop:hadoop-yarn-api:jar:2.7.4:compile
[INFO] |  +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.7.4:provided
[INFO] |  |  \- org.apache.hadoop:hadoop-yarn-common:jar:2.7.4:compile
[INFO] |  |     +- javax.xml.bind:jaxb-api:jar:2.2.2:compile
[INFO] |  |     |  \- javax.xml.stream:stax-api:jar:1.0-2:compile
[INFO] |  |     +- org.codehaus.jackson:jackson-jaxrs:jar:1.9.13:compile
[INFO] |  |     \- org.codehaus.jackson:jackson-xc:jar:1.9.13:compile
[INFO] |  +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.7.4:provided
[INFO] |  \- org.apache.hadoop:hadoop-annotations:jar:2.7.4:provided
[INFO] +- org.apache.hadoop:hadoop-aws:jar:2.7.4:compile
[INFO] +- org.apache.hadoop:hadoop-openstack:jar:2.7.4:compile
[INFO] |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
[INFO] |  +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
[INFO] |  \- commons-io:commons-io:jar:2.4:compile
[INFO] +- joda-time:joda-time:jar:2.9.3:compile
[INFO] +- com.fasterxml.jackson.core:jackson-databind:jar:2.9.9.3:compile
[INFO] |  \- com.fasterxml.jackson.core:jackson-core:jar:2.9.9:compile
[INFO] +- com.fasterxml.jackson.core:jackson-annotations:jar:2.9.9:compile
[INFO] +- com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:jar:2.9.9:compile
[INFO] +- org.apache.httpcomponents:httpclient:jar:4.5.6:compile
[INFO] |  +- commons-logging:commons-logging:jar:1.1.3:compile
[INFO] |  \- commons-codec:commons-codec:jar:1.10:compile
[INFO] +- org.apache.httpcomponents:httpcore:jar:4.4.10:compile
[INFO] +- org.apache.hadoop:hadoop-azure:jar:2.7.4:compile
[INFO] |  \- com.microsoft.azure:azure-storage:jar:2.0.0:compile
[INFO] +- org.spark-project.spark:unused:jar:1.0.0:compile
[INFO] +- org.scalatest:scalatest_2.12:jar:3.0.5:test
[INFO] |  \- org.scalactic:scalactic_2.12:jar:3.0.5:test
[INFO] +- junit:junit:jar:4.12:test
[INFO] |  \- org.hamcrest:hamcrest-core:jar:1.3:test
[INFO] \- com.novocode:junit-interface:jar:0.11:test
[INFO]    \- org.scala-sbt:test-interface:jar:1.0:test

-Phadoop-3.2:

[INFO] org.apache.spark:spark-hadoop-cloud_2.12:jar:3.0.0-SNAPSHOT
[INFO] +- org.apache.spark:spark-sql_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- com.univocity:univocity-parsers:jar:2.7.3:provided
[INFO] |  +- org.apache.spark:spark-sketch_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.spark:spark-core_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.spark:spark-catalyst_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  |  +- org.scala-lang.modules:scala-parser-combinators_2.12:jar:1.1.0:provided
[INFO] |  |  +- org.codehaus.janino:janino:jar:3.0.15:provided
[INFO] |  |  +- org.codehaus.janino:commons-compiler:jar:3.0.15:provided
[INFO] |  |  +- org.antlr:antlr4-runtime:jar:4.7.1:provided
[INFO] |  |  \- org.apache.arrow:arrow-vector:jar:0.12.0:provided
[INFO] |  |     +- org.apache.arrow:arrow-format:jar:0.12.0:provided
[INFO] |  |     +- org.apache.arrow:arrow-memory:jar:0.12.0:provided
[INFO] |  |     +- com.carrotsearch:hppc:jar:0.7.2:provided
[INFO] |  |     \- com.google.flatbuffers:flatbuffers-java:jar:1.9.0:provided
[INFO] |  +- org.apache.spark:spark-tags_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.orc:orc-core:jar:nohive:1.5.5:provided
[INFO] |  |  +- org.apache.orc:orc-shims:jar:1.5.5:provided
[INFO] |  |  +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
[INFO] |  |  \- io.airlift:aircompressor:jar:0.10:provided
[INFO] |  +- org.apache.orc:orc-mapreduce:jar:nohive:1.5.5:provided
[INFO] |  +- org.apache.parquet:parquet-column:jar:1.10.1:compile
[INFO] |  |  +- org.apache.parquet:parquet-common:jar:1.10.1:compile
[INFO] |  |  \- org.apache.parquet:parquet-encoding:jar:1.10.1:compile
[INFO] |  +- org.apache.parquet:parquet-hadoop:jar:1.10.1:compile
[INFO] |  |  +- org.apache.parquet:parquet-format:jar:2.4.0:compile
[INFO] |  |  +- org.apache.parquet:parquet-jackson:jar:1.10.1:compile
[INFO] |  |  \- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
[INFO] |  \- org.apache.xbean:xbean-asm7-shaded:jar:4.14:provided
[INFO] +- org.apache.spark:spark-core_2.12:test-jar:tests:3.0.0-SNAPSHOT:test
[INFO] |  +- com.thoughtworks.paranamer:paranamer:jar:2.8:runtime
[INFO] |  +- org.apache.avro:avro:jar:1.8.2:compile
[INFO] |  |  +- org.apache.commons:commons-compress:jar:1.8.1:compile
[INFO] |  |  \- org.tukaani:xz:jar:1.5:compile
[INFO] |  +- org.apache.avro:avro-mapred:jar:hadoop2:1.8.2:compile
[INFO] |  |  \- org.apache.avro:avro-ipc:jar:1.8.2:compile
[INFO] |  +- com.twitter:chill_2.12:jar:0.9.3:provided
[INFO] |  |  \- com.esotericsoftware:kryo-shaded:jar:4.0.2:provided
[INFO] |  |     +- com.esotericsoftware:minlog:jar:1.3.0:provided
[INFO] |  |     \- org.objenesis:objenesis:jar:2.5.1:provided
[INFO] |  +- com.twitter:chill-java:jar:0.9.3:provided
[INFO] |  +- org.apache.spark:spark-launcher_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.spark:spark-kvstore_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  |  \- org.fusesource.leveldbjni:leveldbjni-all:jar:1.8:provided
[INFO] |  +- org.apache.spark:spark-network-common_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.spark:spark-network-shuffle_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.spark:spark-unsafe_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- javax.activation:activation:jar:1.1.1:compile
[INFO] |  +- org.apache.curator:curator-recipes:jar:2.13.0:compile
[INFO] |  |  \- org.apache.curator:curator-framework:jar:2.13.0:compile
[INFO] |  +- org.apache.zookeeper:zookeeper:jar:3.4.13:compile
[INFO] |  |  \- org.apache.yetus:audience-annotations:jar:0.5.0:compile
[INFO] |  +- javax.servlet:javax.servlet-api:jar:3.1.0:compile
[INFO] |  +- org.apache.commons:commons-lang3:jar:3.8.1:provided
[INFO] |  +- org.apache.commons:commons-math3:jar:3.4.1:provided
[INFO] |  +- org.apache.commons:commons-text:jar:1.6:provided
[INFO] |  +- com.google.code.findbugs:jsr305:jar:3.0.0:provided
[INFO] |  +- org.slf4j:slf4j-api:jar:1.7.16:compile
[INFO] |  +- org.slf4j:jul-to-slf4j:jar:1.7.16:provided
[INFO] |  +- org.slf4j:jcl-over-slf4j:jar:1.7.16:provided
[INFO] |  +- log4j:log4j:jar:1.2.17:compile
[INFO] |  +- org.slf4j:slf4j-log4j12:jar:1.7.16:compile
[INFO] |  +- com.ning:compress-lzf:jar:1.0.3:provided
[INFO] |  +- org.xerial.snappy:snappy-java:jar:1.1.7.3:compile
[INFO] |  +- org.lz4:lz4-java:jar:1.6.0:provided
[INFO] |  +- com.github.luben:zstd-jni:jar:1.4.2-1:provided
[INFO] |  +- org.roaringbitmap:RoaringBitmap:jar:0.7.45:provided
[INFO] |  |  \- org.roaringbitmap:shims:jar:0.7.45:provided
[INFO] |  +- commons-net:commons-net:jar:3.1:provided
[INFO] |  +- org.scala-lang.modules:scala-xml_2.12:jar:1.2.0:provided
[INFO] |  +- org.scala-lang:scala-library:jar:2.12.8:provided
[INFO] |  +- org.scala-lang:scala-reflect:jar:2.12.8:provided
[INFO] |  +- org.json4s:json4s-jackson_2.12:jar:3.6.6:provided
[INFO] |  |  \- org.json4s:json4s-core_2.12:jar:3.6.6:provided
[INFO] |  |     +- org.json4s:json4s-ast_2.12:jar:3.6.6:provided
[INFO] |  |     \- org.json4s:json4s-scalap_2.12:jar:3.6.6:provided
[INFO] |  +- org.glassfish.jersey.core:jersey-client:jar:2.29:provided
[INFO] |  |  +- jakarta.ws.rs:jakarta.ws.rs-api:jar:2.1.5:provided
[INFO] |  |  \- org.glassfish.hk2.external:jakarta.inject:jar:2.5.0:provided
[INFO] |  +- org.glassfish.jersey.core:jersey-common:jar:2.29:provided
[INFO] |  |  +- jakarta.annotation:jakarta.annotation-api:jar:1.3.4:provided
[INFO] |  |  \- org.glassfish.hk2:osgi-resource-locator:jar:1.0.3:provided
[INFO] |  +- org.glassfish.jersey.core:jersey-server:jar:2.29:provided
[INFO] |  |  +- org.glassfish.jersey.media:jersey-media-jaxb:jar:2.29:provided
[INFO] |  |  \- javax.validation:validation-api:jar:2.0.1.Final:provided
[INFO] |  +- org.glassfish.jersey.containers:jersey-container-servlet:jar:2.29:provided
[INFO] |  +- org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.29:provided
[INFO] |  +- org.glassfish.jersey.inject:jersey-hk2:jar:2.29:provided
[INFO] |  |  \- org.glassfish.hk2:hk2-locator:jar:2.5.0:provided
[INFO] |  |     +- org.glassfish.hk2.external:aopalliance-repackaged:jar:2.5.0:provided
[INFO] |  |     +- org.glassfish.hk2:hk2-api:jar:2.5.0:provided
[INFO] |  |     +- org.glassfish.hk2:hk2-utils:jar:2.5.0:provided
[INFO] |  |     \- org.javassist:javassist:jar:3.22.0-CR2:provided
[INFO] |  +- io.netty:netty-all:jar:4.1.30.Final:provided
[INFO] |  +- com.clearspring.analytics:stream:jar:2.9.6:provided
[INFO] |  +- io.dropwizard.metrics:metrics-core:jar:3.1.5:provided
[INFO] |  +- io.dropwizard.metrics:metrics-jvm:jar:3.1.5:provided
[INFO] |  +- io.dropwizard.metrics:metrics-json:jar:3.1.5:provided
[INFO] |  +- io.dropwizard.metrics:metrics-graphite:jar:3.1.5:provided
[INFO] |  +- com.fasterxml.jackson.module:jackson-module-scala_2.12:jar:2.9.9:provided
[INFO] |  |  \- com.fasterxml.jackson.module:jackson-module-paranamer:jar:2.9.9:provided
[INFO] |  +- org.apache.ivy:ivy:jar:2.4.0:provided
[INFO] |  +- oro:oro:jar:2.0.8:provided
[INFO] |  +- net.razorvine:pyrolite:jar:4.30:provided
[INFO] |  +- net.sf.py4j:py4j:jar:0.10.8.1:provided
[INFO] |  \- org.apache.commons:commons-crypto:jar:1.0.0:provided
[INFO] +- org.apache.hadoop:hadoop-client:jar:3.2.0:provided
[INFO] |  +- org.apache.hadoop:hadoop-common:jar:3.2.0:provided
[INFO] |  |  +- com.google.guava:guava:jar:14.0.1:provided
[INFO] |  |  +- commons-cli:commons-cli:jar:1.2:compile
[INFO] |  |  +- commons-io:commons-io:jar:2.4:compile
[INFO] |  |  +- commons-collections:commons-collections:jar:3.2.2:provided
[INFO] |  |  +- org.eclipse.jetty:jetty-servlet:jar:9.4.18.v20190429:provided
[INFO] |  |  |  \- org.eclipse.jetty:jetty-security:jar:9.4.18.v20190429:provided
[INFO] |  |  +- org.eclipse.jetty:jetty-webapp:jar:9.3.24.v20180605:provided
[INFO] |  |  |  \- org.eclipse.jetty:jetty-xml:jar:9.3.24.v20180605:provided
[INFO] |  |  +- javax.servlet.jsp:jsp-api:jar:2.1:provided
[INFO] |  |  +- commons-beanutils:commons-beanutils:jar:1.9.3:provided
[INFO] |  |  +- org.apache.commons:commons-configuration2:jar:2.1.1:provided
[INFO] |  |  +- com.google.re2j:re2j:jar:1.1:provided
[INFO] |  |  +- com.google.code.gson:gson:jar:2.2.4:provided
[INFO] |  |  +- org.apache.hadoop:hadoop-auth:jar:3.2.0:compile
[INFO] |  |  |  +- com.nimbusds:nimbus-jose-jwt:jar:4.41.1:compile
[INFO] |  |  |  |  \- com.github.stephenc.jcip:jcip-annotations:jar:1.0-1:compile
[INFO] |  |  |  \- net.minidev:json-smart:jar:2.3:compile
[INFO] |  |  |     \- net.minidev:accessors-smart:jar:1.2:compile
[INFO] |  |  +- org.apache.curator:curator-client:jar:2.13.0:compile
[INFO] |  |  +- org.apache.htrace:htrace-core4:jar:4.1.0-incubating:provided
[INFO] |  |  +- org.apache.kerby:kerb-simplekdc:jar:1.0.1:compile
[INFO] |  |  |  +- org.apache.kerby:kerb-client:jar:1.0.1:compile
[INFO] |  |  |  |  +- org.apache.kerby:kerby-config:jar:1.0.1:compile
[INFO] |  |  |  |  +- org.apache.kerby:kerb-core:jar:1.0.1:compile
[INFO] |  |  |  |  |  \- org.apache.kerby:kerby-pkix:jar:1.0.1:compile
[INFO] |  |  |  |  |     +- org.apache.kerby:kerby-asn1:jar:1.0.1:compile
[INFO] |  |  |  |  |     \- org.apache.kerby:kerby-util:jar:1.0.1:compile
[INFO] |  |  |  |  +- org.apache.kerby:kerb-common:jar:1.0.1:compile
[INFO] |  |  |  |  |  \- org.apache.kerby:kerb-crypto:jar:1.0.1:compile
[INFO] |  |  |  |  +- org.apache.kerby:kerb-util:jar:1.0.1:compile
[INFO] |  |  |  |  \- org.apache.kerby:token-provider:jar:1.0.1:compile
[INFO] |  |  |  \- org.apache.kerby:kerb-admin:jar:1.0.1:compile
[INFO] |  |  |     +- org.apache.kerby:kerb-server:jar:1.0.1:compile
[INFO] |  |  |     |  \- org.apache.kerby:kerb-identity:jar:1.0.1:compile
[INFO] |  |  |     \- org.apache.kerby:kerby-xdr:jar:1.0.1:compile
[INFO] |  |  +- org.codehaus.woodstox:stax2-api:jar:3.1.4:provided
[INFO] |  |  +- com.fasterxml.woodstox:woodstox-core:jar:5.0.3:provided
[INFO] |  |  \- dnsjava:dnsjava:jar:2.1.7:provided
[INFO] |  +- org.apache.hadoop:hadoop-hdfs-client:jar:3.2.0:compile
[INFO] |  |  \- com.squareup.okhttp:okhttp:jar:2.7.5:compile
[INFO] |  |     \- com.squareup.okio:okio:jar:1.6.0:compile
[INFO] |  +- org.apache.hadoop:hadoop-yarn-api:jar:3.2.0:compile
[INFO] |  |  \- javax.xml.bind:jaxb-api:jar:2.2.11:compile
[INFO] |  +- org.apache.hadoop:hadoop-yarn-client:jar:3.2.0:compile
[INFO] |  +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.2.0:provided
[INFO] |  |  \- org.apache.hadoop:hadoop-yarn-common:jar:3.2.0:compile
[INFO] |  |     +- com.fasterxml.jackson.module:jackson-module-jaxb-annotations:jar:2.9.9:compile
[INFO] |  |     \- com.fasterxml.jackson.jaxrs:jackson-jaxrs-json-provider:jar:2.9.5:compile
[INFO] |  |        \- com.fasterxml.jackson.jaxrs:jackson-jaxrs-base:jar:2.9.5:compile
[INFO] |  +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:3.2.0:provided
[INFO] |  |  \- org.apache.hadoop:hadoop-mapreduce-client-common:jar:3.2.0:provided
[INFO] |  \- org.apache.hadoop:hadoop-annotations:jar:3.2.0:compile
[INFO] +- org.apache.hadoop:hadoop-aws:jar:3.2.0:compile
[INFO] |  \- com.amazonaws:aws-java-sdk-bundle:jar:1.11.375:compile
[INFO] +- org.apache.hadoop:hadoop-openstack:jar:3.2.0:compile
[INFO] +- joda-time:joda-time:jar:2.9.3:compile
[INFO] +- com.fasterxml.jackson.core:jackson-databind:jar:2.9.9.3:compile
[INFO] |  \- com.fasterxml.jackson.core:jackson-core:jar:2.9.9:compile
[INFO] +- com.fasterxml.jackson.core:jackson-annotations:jar:2.9.9:compile
[INFO] +- com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:jar:2.9.9:compile
[INFO] +- org.apache.httpcomponents:httpclient:jar:4.5.6:compile
[INFO] |  +- commons-logging:commons-logging:jar:1.1.3:compile
[INFO] |  \- commons-codec:commons-codec:jar:1.10:compile
[INFO] +- org.apache.httpcomponents:httpcore:jar:4.4.10:compile
[INFO] +- org.apache.hadoop:hadoop-azure:jar:3.2.0:compile
[INFO] |  +- com.microsoft.azure:azure-storage:jar:7.0.0:compile
[INFO] |  |  \- com.microsoft.azure:azure-keyvault-core:jar:1.0.0:compile
[INFO] |  +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
[INFO] |  \- org.wildfly.openssl:wildfly-openssl:jar:1.0.4.Final:compile
[INFO] +- org.apache.hadoop:hadoop-cloud-storage:jar:3.2.0:compile
[INFO] |  +- org.apache.hadoop:hadoop-aliyun:jar:3.2.0:compile
[INFO] |  |  \- com.aliyun.oss:aliyun-sdk-oss:jar:2.8.3:compile
[INFO] |  |     \- org.jdom:jdom:jar:1.1:compile
[INFO] |  \- org.apache.hadoop:hadoop-azure-datalake:jar:3.2.0:compile
[INFO] |     \- com.microsoft.azure:azure-data-lake-store-sdk:jar:2.2.9:compile
[INFO] +- org.eclipse.jetty:jetty-util:jar:9.4.18.v20190429:compile
[INFO] +- org.eclipse.jetty:jetty-util-ajax:jar:9.4.18.v20190429:compile
[INFO] +- org.spark-project.spark:unused:jar:1.0.0:compile
[INFO] +- org.scalatest:scalatest_2.12:jar:3.0.5:test
[INFO] |  \- org.scalactic:scalactic_2.12:jar:3.0.5:test
[INFO] +- junit:junit:jar:4.12:test
[INFO] |  \- org.hamcrest:hamcrest-core:jar:1.3:test
[INFO] +- com.novocode:junit-interface:jar:0.11:test
[INFO] |  \- org.scala-sbt:test-interface:jar:1.0:test
[INFO] \- org.apache.hive:hive-storage-api:jar:2.6.0:compile
[INFO]    \- commons-lang:commons-lang:jar:2.6:compile

SparkQA · 2019-08-28T19:32:44Z

Test build #109879 has finished for PR 25559 at commit 36830b4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2019-08-29T14:31:50Z

@steveloughran does that seem reasonable? the above is after removing aws-java-sdk per your suggestion. That definitely fixes the issue and would be good to get these tests back online. However I don't want to fundamentally break the purpose of hadoop-cloud. Would a user of this be running on Hadoop and already have this stuff on the classpath?

srowen · 2019-08-30T20:27:49Z

Per #25559 (comment) I'm going to proceed, to make the tests work again. If we need to make a more nuanced change we can do so in a follow-up.

… that breaks Pyspark Kinesis tests The Pyspark Kinesis tests are failing, at least in master: ``` ====================================================================== ERROR: test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/pyspark/streaming/tests/test_kinesis.py", line 44, in test_kinesis_stream kinesisTestUtils = self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2) File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__ answer, self._gateway_client, None, self._fqn) File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value format(target_id, ".", name), value) Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.kinesis.KinesisTestUtils. : java.lang.NoSuchMethodError: com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection; at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211) at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211) at scala.collection.Iterator.find(Iterator.scala:993) at scala.collection.Iterator.find$(Iterator.scala:990) at scala.collection.AbstractIterator.find(Iterator.scala:1429) at scala.collection.IterableLike.find(IterableLike.scala:81) at scala.collection.IterableLike.find$(IterableLike.scala:80) at scala.collection.AbstractIterable.find(Iterable.scala:56) at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211) at org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46) ... ``` The non-Python Kinesis tests are fine though. It turns out that this is because Pyspark tests use the output of the Spark assembly, and it pulls in `hadoop-cloud`, which in turn pulls in an old AWS Java SDK. Per Steve Loughran (below), it seems like we can just resolve this by excluding the aws-java-sdk dependency. See the attached PR for some more detail about the debugging and other options. See #25558 (comment) Closes #25559 from srowen/KinesisTest. Authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]> (cherry picked from commit d5b7eed) Signed-off-by: Sean Owen <[email protected]>

srowen · 2019-08-31T15:32:04Z

Merged to master/2.4. If this turns out to cause problems for hadoop-cloud's usage, we can look at another solution like instead including this (and all its transitive dependencies) as a direct dependency, and harmonize the version. I wasn't clear whether that was necessary given how it's used, nor if it's even OK to push the SDK version much higher for 2.7. Sounds like it's kind of a non-issue in 2.9+.

dongjoon-hyun · 2019-08-31T18:01:11Z

Thank you for this fix, @srowen !

steveloughran · 2019-09-09T12:43:53Z

sorry, been in hiding. hadoop branch-2 and cloud stuff is trouble as the aws sdk support has had to chase a moving target dependency-wise.

if the hadoop-cloud-storage pom/jar isn't consistent in branch-2, then, well, so be it. It's time to move off hadoop 2.x anyway.
I do want it to work on hadoop 3.2; if that's no longer working then that can be fixed. It does include the full shaded AWS SDK so it's unlikely that you'd get inconsistencies across modules; it's also made upgrading that SDK easier, especially as the SDK classes haven't made any incompatible changes for a while (mostly)

FWIW we use the apache hadoop-cloud-storage POM as the source of truth for which cloud store bits dependent apps pick up (spark, hive, etc); the spark hadoop-cloud POM pulls that in and then gets it into the spark releases. This provides a one-stop pipeline to get in things which aren't normally in releases (Google GCS) and leave out bits which aren't currently supports (allyun OSS). That generally keeps things under control, leaving only configuration settings...

… that breaks Pyspark Kinesis tests The Pyspark Kinesis tests are failing, at least in master: ``` ====================================================================== ERROR: test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/pyspark/streaming/tests/test_kinesis.py", line 44, in test_kinesis_stream kinesisTestUtils = self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2) File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__ answer, self._gateway_client, None, self._fqn) File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value format(target_id, ".", name), value) Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.kinesis.KinesisTestUtils. : java.lang.NoSuchMethodError: com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection; at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211) at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211) at scala.collection.Iterator.find(Iterator.scala:993) at scala.collection.Iterator.find$(Iterator.scala:990) at scala.collection.AbstractIterator.find(Iterator.scala:1429) at scala.collection.IterableLike.find(IterableLike.scala:81) at scala.collection.IterableLike.find$(IterableLike.scala:80) at scala.collection.AbstractIterable.find(Iterable.scala:56) at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211) at org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46) ... ``` The non-Python Kinesis tests are fine though. It turns out that this is because Pyspark tests use the output of the Spark assembly, and it pulls in `hadoop-cloud`, which in turn pulls in an old AWS Java SDK. Per Steve Loughran (below), it seems like we can just resolve this by excluding the aws-java-sdk dependency. See the attached PR for some more detail about the debugging and other options. See apache#25558 (comment) Closes apache#25559 from srowen/KinesisTest. Authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]> (cherry picked from commit d5b7eed) Signed-off-by: Sean Owen <[email protected]>

srowen self-assigned this Aug 22, 2019

srowen added 3 commits August 26, 2019 11:20

Test updating Kinesis deps and current state of Kinesis Python tests

b283338

Undo Kinesis version change

fdceda1

More debugging

3cc5f64

srowen force-pushed the KinesisTest branch from 8085b7f to 3cc5f64 Compare August 26, 2019 16:30

Oops

0fbfdf7

Try without pypy

d8d0e59

srowen mentioned this pull request Aug 26, 2019

[SPARK-28855][CORE][ML][SQL][STREAMING] Remove outdated usages of Experimental, Evolving annotations #25558

Closed

A few more guesses

361c90d

srowen added 2 commits August 26, 2019 17:26

Revert many changes and try excluding aws-java-sdk

6ae413d

Revert default

e760ebb

srowen commented Aug 26, 2019

View reviewed changes

Try just dependencyManagement

101c4ce

Try managing version in dependencies

78c4e62

Retry excluding the AWS SDK

24bea1e

Remove TODOs

36830b4

srowen commented Aug 28, 2019

View reviewed changes

srowen changed the title ~~[WIP][DO-NOT-MERGE] Test updating Kinesis deps and current state of Kinesis Python tests~~ [SPARK-28903][STREAMING][PYSPARK][TESTS] Fix AWS JDK version conflict that breaks Pyspark Kinesis tests Aug 28, 2019

srowen closed this in d5b7eed Aug 31, 2019

srowen deleted the KinesisTest branch September 3, 2019 20:11

[SPARK-28903][STREAMING][PYSPARK][TESTS] Fix AWS JDK version conflict that breaks Pyspark Kinesis tests #25559

[SPARK-28903][STREAMING][PYSPARK][TESTS] Fix AWS JDK version conflict that breaks Pyspark Kinesis tests #25559

Uh oh!

Conversation

srowen commented Aug 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Aug 22, 2019

Uh oh!

SparkQA commented Aug 23, 2019

Uh oh!

srowen commented Aug 23, 2019

Uh oh!

sarutak commented Aug 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Aug 26, 2019

Uh oh!

SparkQA commented Aug 26, 2019

Uh oh!

SparkQA commented Aug 26, 2019

Uh oh!

srowen commented Aug 26, 2019

Uh oh!

vanzin commented Aug 26, 2019

Uh oh!

srowen commented Aug 26, 2019

Uh oh!

srowen Aug 26, 2019

Choose a reason for hiding this comment

Uh oh!

vanzin Aug 26, 2019

Choose a reason for hiding this comment

Uh oh!

srowen Aug 26, 2019

Choose a reason for hiding this comment

Uh oh!

steveloughran Aug 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen Aug 27, 2019

Choose a reason for hiding this comment

Uh oh!

steveloughran Aug 28, 2019

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Aug 27, 2019

Uh oh!

SparkQA commented Aug 27, 2019

Uh oh!

SparkQA commented Aug 27, 2019

Uh oh!

SparkQA commented Aug 27, 2019

Uh oh!

srowen commented Aug 27, 2019

Uh oh!

SparkQA commented Aug 27, 2019

Uh oh!

SparkQA commented Aug 27, 2019

Uh oh!

SparkQA commented Aug 28, 2019

Uh oh!

SparkQA commented Aug 28, 2019

Uh oh!

SparkQA commented Aug 28, 2019

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

steveloughran commented Aug 28, 2019

Uh oh!

srowen commented Aug 28, 2019

Uh oh!

SparkQA commented Aug 28, 2019

Uh oh!

srowen commented Aug 29, 2019

Uh oh!

srowen commented Aug 30, 2019

Uh oh!

srowen commented Aug 22, 2019 •

edited

Loading

sarutak commented Aug 26, 2019 •

edited

Loading

steveloughran Aug 27, 2019 •

edited

Loading