Skip to content

Conversation

@srowen
Copy link
Member

@srowen srowen commented Aug 22, 2019

The Pyspark Kinesis tests are failing, at least in master:

======================================================================
ERROR: test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/streaming/tests/test_kinesis.py", line 44, in test_kinesis_stream
    kinesisTestUtils = self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2)
  File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__
    answer, self._gateway_client, None, self._fqn)
  File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.kinesis.KinesisTestUtils.
: java.lang.NoSuchMethodError: com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection;
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211)
	at scala.collection.Iterator.find(Iterator.scala:993)
	at scala.collection.Iterator.find$(Iterator.scala:990)
	at scala.collection.AbstractIterator.find(Iterator.scala:1429)
	at scala.collection.IterableLike.find(IterableLike.scala:81)
	at scala.collection.IterableLike.find$(IterableLike.scala:80)
	at scala.collection.AbstractIterable.find(Iterable.scala:56)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46)
...

The non-Python Kinesis tests are fine though. It turns out that this is because Pyspark tests use the output of the Spark assembly, and it pulls in hadoop-cloud, which in turn pulls in an old AWS Java SDK.

Per Steve Loughran (below), it seems like we can just resolve this by excluding the aws-java-sdk dependency. See the attached PR for some more detail about the debugging and other options.

See #25558 (comment)

@srowen srowen self-assigned this Aug 22, 2019
@SparkQA
Copy link

SparkQA commented Aug 22, 2019

Test build #109592 has finished for PR 25559 at commit 68efc6a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 23, 2019

Test build #109633 has finished for PR 25559 at commit 8085b7f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Aug 23, 2019

I'm pretty stumped on this. The NoSuchMethodError refers to a method that is definitely there. The AWS dependencies are harmonized. The Scala tests work. The dependency graph from SBT looks right. I am guessing something about how python tests work or are run is the issue, but not sure what would cause this, as it suggests a runtime AWS SDK version difference, but I don't see any other versions pulled in.

@sarutak
Copy link
Member

sarutak commented Aug 26, 2019

The reason for failure of the unit test in #24651 seems to be the same as you hit...
@sekikn said that the unit test finished successfully on his laptop so I guess there are something wrong in CI environment.

@SparkQA
Copy link

SparkQA commented Aug 26, 2019

Test build #109745 has finished for PR 25559 at commit 0fbfdf7.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 26, 2019

Test build #109748 has finished for PR 25559 at commit d8d0e59.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 26, 2019

Test build #109751 has finished for PR 25559 at commit 361c90d.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Aug 26, 2019

Well, I made a little breakthrough: if you build the Spark assembly, something puts aws-java-sdk-1.7.4.jar in the jars/ dir and that has to be the source of the problem. I'm trying to figure out what. I'll ping @vanzin preemptively as a possible expert on how the assembly works, but I still need to do some homework. This doesn't appear in the dependency tree.

@vanzin
Copy link
Contributor

vanzin commented Aug 26, 2019

That's probably the hadoop-cloud module if you have that profile enabled.

@srowen
Copy link
Member Author

srowen commented Aug 26, 2019

Bingo, that's it. Thanks! I'll work on harmonizing its dependencies with the Kinesis module.

<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
</exclusion>
<exclusion>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preemptively CCing @steveloughran for a look at this. The TL;DR is that hadoop-cloud is brining in an old aws-java-sdk dependency to the assembly and it interferes with the Kinesis dependencies, which are newer. Excluding these is a bit extreme, but, the aws-java-sdk dependency brings in like 20 other AWS JARs. I'm not clear whether that's the intent anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't you break the hadoop-cloud profile by doing this?

The kinesis integration is not packaged as part of the Spark distribution (when you enable its profile), while hadoop-cloud is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is the thing. Right now we only pull in the core aws-java-sdk JAR. If I include aws-java-sdk as an explicit dependency, it pulls in tons of other dependencies that seem irrelevant to Spark. Hm, maybe I need to use <dependencyManagement> to more narrowly manage up the version of aws-java-sdk without affecting the transitive dependency resolution? Well, if this change works, at least we are on to the cause, and then I'll try that.

Copy link
Contributor

@steveloughran steveloughran Aug 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.7.4 is a really old version; hadoop 2.9+ uses a (fat) shaded JAR which has a consistent kinesis SDK in with it; 2.8 is on a 1.10.x I think

Go on, move off Hadoop 2.7 as a baseline. It's many years old. EOL/unsupported and never actually qualified against Java 8

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @steveloughran -- so, given that we are for better or worse here still on Hadoop 2.7 (because I think I need to back port this to 2.4 at least), is it safe to exclude the whole aws-java-sdk dependency? doesn't seem so as it would mean the user has to re-include it. But is it safe to assume they would be running this on Hadoop anyway?

Sounds like you are saying that in Hadoop 2.9, this dependency wouldn't exist or could be excluded.

So, excluding it definitely worked to solve the problem. Right now I'm seeing what happens if we explicitly manage its version up as a direct dependency because just managing it up with <dependencyManagement> wasn't enough. The downside is probably that the assembly brings in everything the aws-java-sdk depends on, which is a lot of stuff. We don't distribute the assembly per se (right?) so it doesn't really mean more careful checks of the license of all the dependencies.

Still, if somehow it were fine to exclude this dependency, that's the tidiest from Spark's perspective. Does that fly for Hadoop 2.7 or pretty well break the point of hadoop-cloud?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to excluding the AWS dependency. It is not actually something you can bundle into ASF releases anyway https://issues.apache.org/jira/browse/HADOOP-13794. But: it'd be good for a spark-hadoop-cloud artifact to be published with that dependency for downstream users, or at least the things you have to add documented somewhere.

FWIW, I do build and test the spark kinesis module as part of my AWS SDK update process -one that actually went pretty smoothly for a change last time. No regressions, no new error messages in logs, shaded JARs really are shaded, etc. This is progress and means that backporting is something we should be doing

see https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md#-qualifying-an-aws-sdk-update for the runbook there

@SparkQA
Copy link

SparkQA commented Aug 27, 2019

Test build #109756 has finished for PR 25559 at commit e760ebb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 27, 2019

Test build #4841 has finished for PR 25559 at commit e760ebb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 27, 2019

Test build #4843 has finished for PR 25559 at commit e760ebb.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 27, 2019

Test build #4844 has finished for PR 25559 at commit e760ebb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Aug 27, 2019

That's good that this passes. Now going to try the less drastic change at #25559 (comment) to see if that works.

@SparkQA
Copy link

SparkQA commented Aug 27, 2019

Test build #109826 has finished for PR 25559 at commit 101c4ce.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 27, 2019

Test build #109832 has finished for PR 25559 at commit 78c4e62.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 28, 2019

Test build #4845 has finished for PR 25559 at commit 78c4e62.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 28, 2019

Test build #4846 has finished for PR 25559 at commit 78c4e62.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 28, 2019

Test build #109867 has finished for PR 25559 at commit 24bea1e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member Author

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveloughran I'll re-run tests without my placeholder TODOs, but it won't matter (Kinesis tests won't run anyway). Looks like it passes with the exclusion. That's OK by you?

@srowen srowen changed the title [WIP][DO-NOT-MERGE] Test updating Kinesis deps and current state of Kinesis Python tests [SPARK-28903][STREAMING][PYSPARK][TESTS] Fix AWS JDK version conflict that breaks Pyspark Kinesis tests Aug 28, 2019
@steveloughran
Copy link
Contributor

could some stick up the output of the mvn dependency listing here mvn dependency:tree -Dverbose for the base profile and the hadoop-3.1 branch? I'll be able to review them.

@srowen
Copy link
Member Author

srowen commented Aug 28, 2019

Base:

[INFO] org.apache.spark:spark-hadoop-cloud_2.12:jar:3.0.0-SNAPSHOT
[INFO] +- org.apache.spark:spark-sql_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- com.univocity:univocity-parsers:jar:2.7.3:provided
[INFO] |  +- org.apache.spark:spark-sketch_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.spark:spark-core_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.spark:spark-catalyst_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  |  +- org.scala-lang.modules:scala-parser-combinators_2.12:jar:1.1.0:provided
[INFO] |  |  +- org.codehaus.janino:janino:jar:3.0.15:provided
[INFO] |  |  +- org.codehaus.janino:commons-compiler:jar:3.0.15:provided
[INFO] |  |  +- org.antlr:antlr4-runtime:jar:4.7.1:provided
[INFO] |  |  \- org.apache.arrow:arrow-vector:jar:0.12.0:provided
[INFO] |  |     +- org.apache.arrow:arrow-format:jar:0.12.0:provided
[INFO] |  |     +- org.apache.arrow:arrow-memory:jar:0.12.0:provided
[INFO] |  |     +- com.carrotsearch:hppc:jar:0.7.2:provided
[INFO] |  |     \- com.google.flatbuffers:flatbuffers-java:jar:1.9.0:provided
[INFO] |  +- org.apache.spark:spark-tags_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.orc:orc-core:jar:nohive:1.5.5:compile
[INFO] |  |  +- org.apache.orc:orc-shims:jar:1.5.5:compile
[INFO] |  |  +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
[INFO] |  |  +- commons-lang:commons-lang:jar:2.6:compile
[INFO] |  |  \- io.airlift:aircompressor:jar:0.10:compile
[INFO] |  +- org.apache.orc:orc-mapreduce:jar:nohive:1.5.5:compile
[INFO] |  +- org.apache.parquet:parquet-column:jar:1.10.1:compile
[INFO] |  |  +- org.apache.parquet:parquet-common:jar:1.10.1:compile
[INFO] |  |  \- org.apache.parquet:parquet-encoding:jar:1.10.1:compile
[INFO] |  +- org.apache.parquet:parquet-hadoop:jar:1.10.1:compile
[INFO] |  |  +- org.apache.parquet:parquet-format:jar:2.4.0:compile
[INFO] |  |  \- org.apache.parquet:parquet-jackson:jar:1.10.1:compile
[INFO] |  \- org.apache.xbean:xbean-asm7-shaded:jar:4.14:provided
[INFO] +- org.apache.spark:spark-core_2.12:test-jar:tests:3.0.0-SNAPSHOT:test
[INFO] |  +- com.thoughtworks.paranamer:paranamer:jar:2.8:runtime
[INFO] |  +- org.apache.avro:avro:jar:1.8.2:compile
[INFO] |  |  +- org.apache.commons:commons-compress:jar:1.8.1:compile
[INFO] |  |  \- org.tukaani:xz:jar:1.5:compile
[INFO] |  +- org.apache.avro:avro-mapred:jar:hadoop2:1.8.2:compile
[INFO] |  |  \- org.apache.avro:avro-ipc:jar:1.8.2:compile
[INFO] |  +- com.twitter:chill_2.12:jar:0.9.3:provided
[INFO] |  |  \- com.esotericsoftware:kryo-shaded:jar:4.0.2:provided
[INFO] |  |     +- com.esotericsoftware:minlog:jar:1.3.0:provided
[INFO] |  |     \- org.objenesis:objenesis:jar:2.5.1:provided
[INFO] |  +- com.twitter:chill-java:jar:0.9.3:provided
[INFO] |  +- org.apache.spark:spark-launcher_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.spark:spark-kvstore_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  |  \- org.fusesource.leveldbjni:leveldbjni-all:jar:1.8:provided
[INFO] |  +- org.apache.spark:spark-network-common_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.spark:spark-network-shuffle_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.spark:spark-unsafe_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- javax.activation:activation:jar:1.1.1:compile
[INFO] |  +- org.apache.curator:curator-recipes:jar:2.7.1:compile
[INFO] |  |  +- org.apache.curator:curator-framework:jar:2.7.1:compile
[INFO] |  |  \- com.google.guava:guava:jar:14.0.1:provided
[INFO] |  +- org.apache.zookeeper:zookeeper:jar:3.4.6:compile
[INFO] |  +- javax.servlet:javax.servlet-api:jar:3.1.0:provided
[INFO] |  +- org.apache.commons:commons-lang3:jar:3.8.1:compile
[INFO] |  +- org.apache.commons:commons-math3:jar:3.4.1:provided
[INFO] |  +- org.apache.commons:commons-text:jar:1.6:provided
[INFO] |  +- com.google.code.findbugs:jsr305:jar:3.0.0:provided
[INFO] |  +- org.slf4j:slf4j-api:jar:1.7.16:compile
[INFO] |  +- org.slf4j:jul-to-slf4j:jar:1.7.16:provided
[INFO] |  +- org.slf4j:jcl-over-slf4j:jar:1.7.16:provided
[INFO] |  +- log4j:log4j:jar:1.2.17:compile
[INFO] |  +- org.slf4j:slf4j-log4j12:jar:1.7.16:compile
[INFO] |  +- com.ning:compress-lzf:jar:1.0.3:provided
[INFO] |  +- org.xerial.snappy:snappy-java:jar:1.1.7.3:compile
[INFO] |  +- org.lz4:lz4-java:jar:1.6.0:provided
[INFO] |  +- com.github.luben:zstd-jni:jar:1.4.2-1:provided
[INFO] |  +- org.roaringbitmap:RoaringBitmap:jar:0.7.45:provided
[INFO] |  |  \- org.roaringbitmap:shims:jar:0.7.45:provided
[INFO] |  +- commons-net:commons-net:jar:3.1:provided
[INFO] |  +- org.scala-lang.modules:scala-xml_2.12:jar:1.2.0:provided
[INFO] |  +- org.scala-lang:scala-library:jar:2.12.8:provided
[INFO] |  +- org.scala-lang:scala-reflect:jar:2.12.8:provided
[INFO] |  +- org.json4s:json4s-jackson_2.12:jar:3.6.6:provided
[INFO] |  |  \- org.json4s:json4s-core_2.12:jar:3.6.6:provided
[INFO] |  |     +- org.json4s:json4s-ast_2.12:jar:3.6.6:provided
[INFO] |  |     \- org.json4s:json4s-scalap_2.12:jar:3.6.6:provided
[INFO] |  +- org.glassfish.jersey.core:jersey-client:jar:2.29:provided
[INFO] |  |  +- jakarta.ws.rs:jakarta.ws.rs-api:jar:2.1.5:provided
[INFO] |  |  \- org.glassfish.hk2.external:jakarta.inject:jar:2.5.0:provided
[INFO] |  +- org.glassfish.jersey.core:jersey-common:jar:2.29:provided
[INFO] |  |  +- jakarta.annotation:jakarta.annotation-api:jar:1.3.4:provided
[INFO] |  |  \- org.glassfish.hk2:osgi-resource-locator:jar:1.0.3:provided
[INFO] |  +- org.glassfish.jersey.core:jersey-server:jar:2.29:provided
[INFO] |  |  +- org.glassfish.jersey.media:jersey-media-jaxb:jar:2.29:provided
[INFO] |  |  \- javax.validation:validation-api:jar:2.0.1.Final:provided
[INFO] |  +- org.glassfish.jersey.containers:jersey-container-servlet:jar:2.29:provided
[INFO] |  +- org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.29:provided
[INFO] |  +- org.glassfish.jersey.inject:jersey-hk2:jar:2.29:provided
[INFO] |  |  \- org.glassfish.hk2:hk2-locator:jar:2.5.0:provided
[INFO] |  |     +- org.glassfish.hk2.external:aopalliance-repackaged:jar:2.5.0:provided
[INFO] |  |     +- org.glassfish.hk2:hk2-api:jar:2.5.0:provided
[INFO] |  |     +- org.glassfish.hk2:hk2-utils:jar:2.5.0:provided
[INFO] |  |     \- org.javassist:javassist:jar:3.22.0-CR2:provided
[INFO] |  +- io.netty:netty-all:jar:4.1.30.Final:provided
[INFO] |  +- com.clearspring.analytics:stream:jar:2.9.6:provided
[INFO] |  +- io.dropwizard.metrics:metrics-core:jar:3.1.5:provided
[INFO] |  +- io.dropwizard.metrics:metrics-jvm:jar:3.1.5:provided
[INFO] |  +- io.dropwizard.metrics:metrics-json:jar:3.1.5:provided
[INFO] |  +- io.dropwizard.metrics:metrics-graphite:jar:3.1.5:provided
[INFO] |  +- com.fasterxml.jackson.module:jackson-module-scala_2.12:jar:2.9.9:provided
[INFO] |  |  \- com.fasterxml.jackson.module:jackson-module-paranamer:jar:2.9.9:provided
[INFO] |  +- org.apache.ivy:ivy:jar:2.4.0:provided
[INFO] |  +- oro:oro:jar:2.0.8:provided
[INFO] |  +- net.razorvine:pyrolite:jar:4.30:provided
[INFO] |  +- net.sf.py4j:py4j:jar:0.10.8.1:provided
[INFO] |  \- org.apache.commons:commons-crypto:jar:1.0.0:provided
[INFO] +- org.apache.hadoop:hadoop-client:jar:2.7.4:provided
[INFO] |  +- org.apache.hadoop:hadoop-common:jar:2.7.4:provided
[INFO] |  |  +- commons-cli:commons-cli:jar:1.2:compile
[INFO] |  |  +- xmlenc:xmlenc:jar:0.52:provided
[INFO] |  |  +- commons-httpclient:commons-httpclient:jar:3.1:provided
[INFO] |  |  +- commons-collections:commons-collections:jar:3.2.2:provided
[INFO] |  |  +- org.mortbay.jetty:jetty-sslengine:jar:6.1.26:provided
[INFO] |  |  +- javax.servlet.jsp:jsp-api:jar:2.1:provided
[INFO] |  |  +- commons-configuration:commons-configuration:jar:1.6:provided
[INFO] |  |  |  \- commons-digester:commons-digester:jar:1.8:provided
[INFO] |  |  |     \- commons-beanutils:commons-beanutils:jar:1.9.3:provided
[INFO] |  |  +- com.google.code.gson:gson:jar:2.2.4:provided
[INFO] |  |  +- org.apache.hadoop:hadoop-auth:jar:2.7.4:provided
[INFO] |  |  |  \- org.apache.directory.server:apacheds-kerberos-codec:jar:2.0.0-M15:provided
[INFO] |  |  |     +- org.apache.directory.server:apacheds-i18n:jar:2.0.0-M15:provided
[INFO] |  |  |     +- org.apache.directory.api:api-asn1-api:jar:1.0.0-M20:provided
[INFO] |  |  |     \- org.apache.directory.api:api-util:jar:1.0.0-M20:provided
[INFO] |  |  +- org.apache.curator:curator-client:jar:2.7.1:compile
[INFO] |  |  \- org.apache.htrace:htrace-core:jar:3.1.0-incubating:provided
[INFO] |  +- org.apache.hadoop:hadoop-hdfs:jar:2.7.4:provided
[INFO] |  |  +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile
[INFO] |  |  \- xerces:xercesImpl:jar:2.9.1:provided
[INFO] |  |     \- xml-apis:xml-apis:jar:1.4.01:test
[INFO] |  +- org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.7.4:provided
[INFO] |  |  +- org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.7.4:provided
[INFO] |  |  |  +- org.apache.hadoop:hadoop-yarn-client:jar:2.7.4:compile
[INFO] |  |  |  \- org.apache.hadoop:hadoop-yarn-server-common:jar:2.7.4:provided
[INFO] |  |  \- org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.7.4:provided
[INFO] |  +- org.apache.hadoop:hadoop-yarn-api:jar:2.7.4:compile
[INFO] |  +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.7.4:provided
[INFO] |  |  \- org.apache.hadoop:hadoop-yarn-common:jar:2.7.4:compile
[INFO] |  |     +- javax.xml.bind:jaxb-api:jar:2.2.2:compile
[INFO] |  |     |  \- javax.xml.stream:stax-api:jar:1.0-2:compile
[INFO] |  |     +- org.codehaus.jackson:jackson-jaxrs:jar:1.9.13:compile
[INFO] |  |     \- org.codehaus.jackson:jackson-xc:jar:1.9.13:compile
[INFO] |  +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.7.4:provided
[INFO] |  \- org.apache.hadoop:hadoop-annotations:jar:2.7.4:provided
[INFO] +- org.apache.hadoop:hadoop-aws:jar:2.7.4:compile
[INFO] +- org.apache.hadoop:hadoop-openstack:jar:2.7.4:compile
[INFO] |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
[INFO] |  +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
[INFO] |  \- commons-io:commons-io:jar:2.4:compile
[INFO] +- joda-time:joda-time:jar:2.9.3:compile
[INFO] +- com.fasterxml.jackson.core:jackson-databind:jar:2.9.9.3:compile
[INFO] |  \- com.fasterxml.jackson.core:jackson-core:jar:2.9.9:compile
[INFO] +- com.fasterxml.jackson.core:jackson-annotations:jar:2.9.9:compile
[INFO] +- com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:jar:2.9.9:compile
[INFO] +- org.apache.httpcomponents:httpclient:jar:4.5.6:compile
[INFO] |  +- commons-logging:commons-logging:jar:1.1.3:compile
[INFO] |  \- commons-codec:commons-codec:jar:1.10:compile
[INFO] +- org.apache.httpcomponents:httpcore:jar:4.4.10:compile
[INFO] +- org.apache.hadoop:hadoop-azure:jar:2.7.4:compile
[INFO] |  \- com.microsoft.azure:azure-storage:jar:2.0.0:compile
[INFO] +- org.spark-project.spark:unused:jar:1.0.0:compile
[INFO] +- org.scalatest:scalatest_2.12:jar:3.0.5:test
[INFO] |  \- org.scalactic:scalactic_2.12:jar:3.0.5:test
[INFO] +- junit:junit:jar:4.12:test
[INFO] |  \- org.hamcrest:hamcrest-core:jar:1.3:test
[INFO] \- com.novocode:junit-interface:jar:0.11:test
[INFO]    \- org.scala-sbt:test-interface:jar:1.0:test

-Phadoop-3.2:

[INFO] org.apache.spark:spark-hadoop-cloud_2.12:jar:3.0.0-SNAPSHOT
[INFO] +- org.apache.spark:spark-sql_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- com.univocity:univocity-parsers:jar:2.7.3:provided
[INFO] |  +- org.apache.spark:spark-sketch_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.spark:spark-core_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.spark:spark-catalyst_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  |  +- org.scala-lang.modules:scala-parser-combinators_2.12:jar:1.1.0:provided
[INFO] |  |  +- org.codehaus.janino:janino:jar:3.0.15:provided
[INFO] |  |  +- org.codehaus.janino:commons-compiler:jar:3.0.15:provided
[INFO] |  |  +- org.antlr:antlr4-runtime:jar:4.7.1:provided
[INFO] |  |  \- org.apache.arrow:arrow-vector:jar:0.12.0:provided
[INFO] |  |     +- org.apache.arrow:arrow-format:jar:0.12.0:provided
[INFO] |  |     +- org.apache.arrow:arrow-memory:jar:0.12.0:provided
[INFO] |  |     +- com.carrotsearch:hppc:jar:0.7.2:provided
[INFO] |  |     \- com.google.flatbuffers:flatbuffers-java:jar:1.9.0:provided
[INFO] |  +- org.apache.spark:spark-tags_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.orc:orc-core:jar:nohive:1.5.5:provided
[INFO] |  |  +- org.apache.orc:orc-shims:jar:1.5.5:provided
[INFO] |  |  +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
[INFO] |  |  \- io.airlift:aircompressor:jar:0.10:provided
[INFO] |  +- org.apache.orc:orc-mapreduce:jar:nohive:1.5.5:provided
[INFO] |  +- org.apache.parquet:parquet-column:jar:1.10.1:compile
[INFO] |  |  +- org.apache.parquet:parquet-common:jar:1.10.1:compile
[INFO] |  |  \- org.apache.parquet:parquet-encoding:jar:1.10.1:compile
[INFO] |  +- org.apache.parquet:parquet-hadoop:jar:1.10.1:compile
[INFO] |  |  +- org.apache.parquet:parquet-format:jar:2.4.0:compile
[INFO] |  |  +- org.apache.parquet:parquet-jackson:jar:1.10.1:compile
[INFO] |  |  \- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
[INFO] |  \- org.apache.xbean:xbean-asm7-shaded:jar:4.14:provided
[INFO] +- org.apache.spark:spark-core_2.12:test-jar:tests:3.0.0-SNAPSHOT:test
[INFO] |  +- com.thoughtworks.paranamer:paranamer:jar:2.8:runtime
[INFO] |  +- org.apache.avro:avro:jar:1.8.2:compile
[INFO] |  |  +- org.apache.commons:commons-compress:jar:1.8.1:compile
[INFO] |  |  \- org.tukaani:xz:jar:1.5:compile
[INFO] |  +- org.apache.avro:avro-mapred:jar:hadoop2:1.8.2:compile
[INFO] |  |  \- org.apache.avro:avro-ipc:jar:1.8.2:compile
[INFO] |  +- com.twitter:chill_2.12:jar:0.9.3:provided
[INFO] |  |  \- com.esotericsoftware:kryo-shaded:jar:4.0.2:provided
[INFO] |  |     +- com.esotericsoftware:minlog:jar:1.3.0:provided
[INFO] |  |     \- org.objenesis:objenesis:jar:2.5.1:provided
[INFO] |  +- com.twitter:chill-java:jar:0.9.3:provided
[INFO] |  +- org.apache.spark:spark-launcher_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.spark:spark-kvstore_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  |  \- org.fusesource.leveldbjni:leveldbjni-all:jar:1.8:provided
[INFO] |  +- org.apache.spark:spark-network-common_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.spark:spark-network-shuffle_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- org.apache.spark:spark-unsafe_2.12:jar:3.0.0-SNAPSHOT:provided
[INFO] |  +- javax.activation:activation:jar:1.1.1:compile
[INFO] |  +- org.apache.curator:curator-recipes:jar:2.13.0:compile
[INFO] |  |  \- org.apache.curator:curator-framework:jar:2.13.0:compile
[INFO] |  +- org.apache.zookeeper:zookeeper:jar:3.4.13:compile
[INFO] |  |  \- org.apache.yetus:audience-annotations:jar:0.5.0:compile
[INFO] |  +- javax.servlet:javax.servlet-api:jar:3.1.0:compile
[INFO] |  +- org.apache.commons:commons-lang3:jar:3.8.1:provided
[INFO] |  +- org.apache.commons:commons-math3:jar:3.4.1:provided
[INFO] |  +- org.apache.commons:commons-text:jar:1.6:provided
[INFO] |  +- com.google.code.findbugs:jsr305:jar:3.0.0:provided
[INFO] |  +- org.slf4j:slf4j-api:jar:1.7.16:compile
[INFO] |  +- org.slf4j:jul-to-slf4j:jar:1.7.16:provided
[INFO] |  +- org.slf4j:jcl-over-slf4j:jar:1.7.16:provided
[INFO] |  +- log4j:log4j:jar:1.2.17:compile
[INFO] |  +- org.slf4j:slf4j-log4j12:jar:1.7.16:compile
[INFO] |  +- com.ning:compress-lzf:jar:1.0.3:provided
[INFO] |  +- org.xerial.snappy:snappy-java:jar:1.1.7.3:compile
[INFO] |  +- org.lz4:lz4-java:jar:1.6.0:provided
[INFO] |  +- com.github.luben:zstd-jni:jar:1.4.2-1:provided
[INFO] |  +- org.roaringbitmap:RoaringBitmap:jar:0.7.45:provided
[INFO] |  |  \- org.roaringbitmap:shims:jar:0.7.45:provided
[INFO] |  +- commons-net:commons-net:jar:3.1:provided
[INFO] |  +- org.scala-lang.modules:scala-xml_2.12:jar:1.2.0:provided
[INFO] |  +- org.scala-lang:scala-library:jar:2.12.8:provided
[INFO] |  +- org.scala-lang:scala-reflect:jar:2.12.8:provided
[INFO] |  +- org.json4s:json4s-jackson_2.12:jar:3.6.6:provided
[INFO] |  |  \- org.json4s:json4s-core_2.12:jar:3.6.6:provided
[INFO] |  |     +- org.json4s:json4s-ast_2.12:jar:3.6.6:provided
[INFO] |  |     \- org.json4s:json4s-scalap_2.12:jar:3.6.6:provided
[INFO] |  +- org.glassfish.jersey.core:jersey-client:jar:2.29:provided
[INFO] |  |  +- jakarta.ws.rs:jakarta.ws.rs-api:jar:2.1.5:provided
[INFO] |  |  \- org.glassfish.hk2.external:jakarta.inject:jar:2.5.0:provided
[INFO] |  +- org.glassfish.jersey.core:jersey-common:jar:2.29:provided
[INFO] |  |  +- jakarta.annotation:jakarta.annotation-api:jar:1.3.4:provided
[INFO] |  |  \- org.glassfish.hk2:osgi-resource-locator:jar:1.0.3:provided
[INFO] |  +- org.glassfish.jersey.core:jersey-server:jar:2.29:provided
[INFO] |  |  +- org.glassfish.jersey.media:jersey-media-jaxb:jar:2.29:provided
[INFO] |  |  \- javax.validation:validation-api:jar:2.0.1.Final:provided
[INFO] |  +- org.glassfish.jersey.containers:jersey-container-servlet:jar:2.29:provided
[INFO] |  +- org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.29:provided
[INFO] |  +- org.glassfish.jersey.inject:jersey-hk2:jar:2.29:provided
[INFO] |  |  \- org.glassfish.hk2:hk2-locator:jar:2.5.0:provided
[INFO] |  |     +- org.glassfish.hk2.external:aopalliance-repackaged:jar:2.5.0:provided
[INFO] |  |     +- org.glassfish.hk2:hk2-api:jar:2.5.0:provided
[INFO] |  |     +- org.glassfish.hk2:hk2-utils:jar:2.5.0:provided
[INFO] |  |     \- org.javassist:javassist:jar:3.22.0-CR2:provided
[INFO] |  +- io.netty:netty-all:jar:4.1.30.Final:provided
[INFO] |  +- com.clearspring.analytics:stream:jar:2.9.6:provided
[INFO] |  +- io.dropwizard.metrics:metrics-core:jar:3.1.5:provided
[INFO] |  +- io.dropwizard.metrics:metrics-jvm:jar:3.1.5:provided
[INFO] |  +- io.dropwizard.metrics:metrics-json:jar:3.1.5:provided
[INFO] |  +- io.dropwizard.metrics:metrics-graphite:jar:3.1.5:provided
[INFO] |  +- com.fasterxml.jackson.module:jackson-module-scala_2.12:jar:2.9.9:provided
[INFO] |  |  \- com.fasterxml.jackson.module:jackson-module-paranamer:jar:2.9.9:provided
[INFO] |  +- org.apache.ivy:ivy:jar:2.4.0:provided
[INFO] |  +- oro:oro:jar:2.0.8:provided
[INFO] |  +- net.razorvine:pyrolite:jar:4.30:provided
[INFO] |  +- net.sf.py4j:py4j:jar:0.10.8.1:provided
[INFO] |  \- org.apache.commons:commons-crypto:jar:1.0.0:provided
[INFO] +- org.apache.hadoop:hadoop-client:jar:3.2.0:provided
[INFO] |  +- org.apache.hadoop:hadoop-common:jar:3.2.0:provided
[INFO] |  |  +- com.google.guava:guava:jar:14.0.1:provided
[INFO] |  |  +- commons-cli:commons-cli:jar:1.2:compile
[INFO] |  |  +- commons-io:commons-io:jar:2.4:compile
[INFO] |  |  +- commons-collections:commons-collections:jar:3.2.2:provided
[INFO] |  |  +- org.eclipse.jetty:jetty-servlet:jar:9.4.18.v20190429:provided
[INFO] |  |  |  \- org.eclipse.jetty:jetty-security:jar:9.4.18.v20190429:provided
[INFO] |  |  +- org.eclipse.jetty:jetty-webapp:jar:9.3.24.v20180605:provided
[INFO] |  |  |  \- org.eclipse.jetty:jetty-xml:jar:9.3.24.v20180605:provided
[INFO] |  |  +- javax.servlet.jsp:jsp-api:jar:2.1:provided
[INFO] |  |  +- commons-beanutils:commons-beanutils:jar:1.9.3:provided
[INFO] |  |  +- org.apache.commons:commons-configuration2:jar:2.1.1:provided
[INFO] |  |  +- com.google.re2j:re2j:jar:1.1:provided
[INFO] |  |  +- com.google.code.gson:gson:jar:2.2.4:provided
[INFO] |  |  +- org.apache.hadoop:hadoop-auth:jar:3.2.0:compile
[INFO] |  |  |  +- com.nimbusds:nimbus-jose-jwt:jar:4.41.1:compile
[INFO] |  |  |  |  \- com.github.stephenc.jcip:jcip-annotations:jar:1.0-1:compile
[INFO] |  |  |  \- net.minidev:json-smart:jar:2.3:compile
[INFO] |  |  |     \- net.minidev:accessors-smart:jar:1.2:compile
[INFO] |  |  +- org.apache.curator:curator-client:jar:2.13.0:compile
[INFO] |  |  +- org.apache.htrace:htrace-core4:jar:4.1.0-incubating:provided
[INFO] |  |  +- org.apache.kerby:kerb-simplekdc:jar:1.0.1:compile
[INFO] |  |  |  +- org.apache.kerby:kerb-client:jar:1.0.1:compile
[INFO] |  |  |  |  +- org.apache.kerby:kerby-config:jar:1.0.1:compile
[INFO] |  |  |  |  +- org.apache.kerby:kerb-core:jar:1.0.1:compile
[INFO] |  |  |  |  |  \- org.apache.kerby:kerby-pkix:jar:1.0.1:compile
[INFO] |  |  |  |  |     +- org.apache.kerby:kerby-asn1:jar:1.0.1:compile
[INFO] |  |  |  |  |     \- org.apache.kerby:kerby-util:jar:1.0.1:compile
[INFO] |  |  |  |  +- org.apache.kerby:kerb-common:jar:1.0.1:compile
[INFO] |  |  |  |  |  \- org.apache.kerby:kerb-crypto:jar:1.0.1:compile
[INFO] |  |  |  |  +- org.apache.kerby:kerb-util:jar:1.0.1:compile
[INFO] |  |  |  |  \- org.apache.kerby:token-provider:jar:1.0.1:compile
[INFO] |  |  |  \- org.apache.kerby:kerb-admin:jar:1.0.1:compile
[INFO] |  |  |     +- org.apache.kerby:kerb-server:jar:1.0.1:compile
[INFO] |  |  |     |  \- org.apache.kerby:kerb-identity:jar:1.0.1:compile
[INFO] |  |  |     \- org.apache.kerby:kerby-xdr:jar:1.0.1:compile
[INFO] |  |  +- org.codehaus.woodstox:stax2-api:jar:3.1.4:provided
[INFO] |  |  +- com.fasterxml.woodstox:woodstox-core:jar:5.0.3:provided
[INFO] |  |  \- dnsjava:dnsjava:jar:2.1.7:provided
[INFO] |  +- org.apache.hadoop:hadoop-hdfs-client:jar:3.2.0:compile
[INFO] |  |  \- com.squareup.okhttp:okhttp:jar:2.7.5:compile
[INFO] |  |     \- com.squareup.okio:okio:jar:1.6.0:compile
[INFO] |  +- org.apache.hadoop:hadoop-yarn-api:jar:3.2.0:compile
[INFO] |  |  \- javax.xml.bind:jaxb-api:jar:2.2.11:compile
[INFO] |  +- org.apache.hadoop:hadoop-yarn-client:jar:3.2.0:compile
[INFO] |  +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.2.0:provided
[INFO] |  |  \- org.apache.hadoop:hadoop-yarn-common:jar:3.2.0:compile
[INFO] |  |     +- com.fasterxml.jackson.module:jackson-module-jaxb-annotations:jar:2.9.9:compile
[INFO] |  |     \- com.fasterxml.jackson.jaxrs:jackson-jaxrs-json-provider:jar:2.9.5:compile
[INFO] |  |        \- com.fasterxml.jackson.jaxrs:jackson-jaxrs-base:jar:2.9.5:compile
[INFO] |  +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:3.2.0:provided
[INFO] |  |  \- org.apache.hadoop:hadoop-mapreduce-client-common:jar:3.2.0:provided
[INFO] |  \- org.apache.hadoop:hadoop-annotations:jar:3.2.0:compile
[INFO] +- org.apache.hadoop:hadoop-aws:jar:3.2.0:compile
[INFO] |  \- com.amazonaws:aws-java-sdk-bundle:jar:1.11.375:compile
[INFO] +- org.apache.hadoop:hadoop-openstack:jar:3.2.0:compile
[INFO] +- joda-time:joda-time:jar:2.9.3:compile
[INFO] +- com.fasterxml.jackson.core:jackson-databind:jar:2.9.9.3:compile
[INFO] |  \- com.fasterxml.jackson.core:jackson-core:jar:2.9.9:compile
[INFO] +- com.fasterxml.jackson.core:jackson-annotations:jar:2.9.9:compile
[INFO] +- com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:jar:2.9.9:compile
[INFO] +- org.apache.httpcomponents:httpclient:jar:4.5.6:compile
[INFO] |  +- commons-logging:commons-logging:jar:1.1.3:compile
[INFO] |  \- commons-codec:commons-codec:jar:1.10:compile
[INFO] +- org.apache.httpcomponents:httpcore:jar:4.4.10:compile
[INFO] +- org.apache.hadoop:hadoop-azure:jar:3.2.0:compile
[INFO] |  +- com.microsoft.azure:azure-storage:jar:7.0.0:compile
[INFO] |  |  \- com.microsoft.azure:azure-keyvault-core:jar:1.0.0:compile
[INFO] |  +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
[INFO] |  \- org.wildfly.openssl:wildfly-openssl:jar:1.0.4.Final:compile
[INFO] +- org.apache.hadoop:hadoop-cloud-storage:jar:3.2.0:compile
[INFO] |  +- org.apache.hadoop:hadoop-aliyun:jar:3.2.0:compile
[INFO] |  |  \- com.aliyun.oss:aliyun-sdk-oss:jar:2.8.3:compile
[INFO] |  |     \- org.jdom:jdom:jar:1.1:compile
[INFO] |  \- org.apache.hadoop:hadoop-azure-datalake:jar:3.2.0:compile
[INFO] |     \- com.microsoft.azure:azure-data-lake-store-sdk:jar:2.2.9:compile
[INFO] +- org.eclipse.jetty:jetty-util:jar:9.4.18.v20190429:compile
[INFO] +- org.eclipse.jetty:jetty-util-ajax:jar:9.4.18.v20190429:compile
[INFO] +- org.spark-project.spark:unused:jar:1.0.0:compile
[INFO] +- org.scalatest:scalatest_2.12:jar:3.0.5:test
[INFO] |  \- org.scalactic:scalactic_2.12:jar:3.0.5:test
[INFO] +- junit:junit:jar:4.12:test
[INFO] |  \- org.hamcrest:hamcrest-core:jar:1.3:test
[INFO] +- com.novocode:junit-interface:jar:0.11:test
[INFO] |  \- org.scala-sbt:test-interface:jar:1.0:test
[INFO] \- org.apache.hive:hive-storage-api:jar:2.6.0:compile
[INFO]    \- commons-lang:commons-lang:jar:2.6:compile

@SparkQA
Copy link

SparkQA commented Aug 28, 2019

Test build #109879 has finished for PR 25559 at commit 36830b4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Aug 29, 2019

@steveloughran does that seem reasonable? the above is after removing aws-java-sdk per your suggestion. That definitely fixes the issue and would be good to get these tests back online. However I don't want to fundamentally break the purpose of hadoop-cloud. Would a user of this be running on Hadoop and already have this stuff on the classpath?

@srowen
Copy link
Member Author

srowen commented Aug 30, 2019

Per #25559 (comment) I'm going to proceed, to make the tests work again. If we need to make a more nuanced change we can do so in a follow-up.

@srowen srowen closed this in d5b7eed Aug 31, 2019
srowen added a commit that referenced this pull request Aug 31, 2019
… that breaks Pyspark Kinesis tests

The Pyspark Kinesis tests are failing, at least in master:

```
======================================================================
ERROR: test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/pyspark/streaming/tests/test_kinesis.py", line 44, in test_kinesis_stream
    kinesisTestUtils = self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2)
  File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__
    answer, self._gateway_client, None, self._fqn)
  File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.kinesis.KinesisTestUtils.
: java.lang.NoSuchMethodError: com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection;
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211)
	at scala.collection.Iterator.find(Iterator.scala:993)
	at scala.collection.Iterator.find$(Iterator.scala:990)
	at scala.collection.AbstractIterator.find(Iterator.scala:1429)
	at scala.collection.IterableLike.find(IterableLike.scala:81)
	at scala.collection.IterableLike.find$(IterableLike.scala:80)
	at scala.collection.AbstractIterable.find(Iterable.scala:56)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46)
...
```

The non-Python Kinesis tests are fine though. It turns out that this is because Pyspark tests use the output of the Spark assembly, and it pulls in `hadoop-cloud`, which in turn pulls in an old AWS Java SDK.

Per Steve Loughran (below), it seems like we can just resolve this by excluding the aws-java-sdk dependency. See the attached PR for some more detail about the debugging and other options.

See #25558 (comment)

Closes #25559 from srowen/KinesisTest.

Authored-by: Sean Owen <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit d5b7eed)
Signed-off-by: Sean Owen <[email protected]>
@srowen
Copy link
Member Author

srowen commented Aug 31, 2019

Merged to master/2.4. If this turns out to cause problems for hadoop-cloud's usage, we can look at another solution like instead including this (and all its transitive dependencies) as a direct dependency, and harmonize the version. I wasn't clear whether that was necessary given how it's used, nor if it's even OK to push the SDK version much higher for 2.7. Sounds like it's kind of a non-issue in 2.9+.

@dongjoon-hyun
Copy link
Member

Thank you for this fix, @srowen !

@srowen srowen deleted the KinesisTest branch September 3, 2019 20:11
@steveloughran
Copy link
Contributor

sorry, been in hiding. hadoop branch-2 and cloud stuff is trouble as the aws sdk support has had to chase a moving target dependency-wise.

  • if the hadoop-cloud-storage pom/jar isn't consistent in branch-2, then, well, so be it. It's time to move off hadoop 2.x anyway.
  • I do want it to work on hadoop 3.2; if that's no longer working then that can be fixed. It does include the full shaded AWS SDK so it's unlikely that you'd get inconsistencies across modules; it's also made upgrading that SDK easier, especially as the SDK classes haven't made any incompatible changes for a while (mostly)

FWIW we use the apache hadoop-cloud-storage POM as the source of truth for which cloud store bits dependent apps pick up (spark, hive, etc); the spark hadoop-cloud POM pulls that in and then gets it into the spark releases. This provides a one-stop pipeline to get in things which aren't normally in releases (Google GCS) and leave out bits which aren't currently supports (allyun OSS). That generally keeps things under control, leaving only configuration settings...

rluta pushed a commit to rluta/spark that referenced this pull request Sep 17, 2019
… that breaks Pyspark Kinesis tests

The Pyspark Kinesis tests are failing, at least in master:

```
======================================================================
ERROR: test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/pyspark/streaming/tests/test_kinesis.py", line 44, in test_kinesis_stream
    kinesisTestUtils = self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2)
  File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__
    answer, self._gateway_client, None, self._fqn)
  File "/home/jenkins/workspace/SparkPullRequestBuilder2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.kinesis.KinesisTestUtils.
: java.lang.NoSuchMethodError: com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection;
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211)
	at scala.collection.Iterator.find(Iterator.scala:993)
	at scala.collection.Iterator.find$(Iterator.scala:990)
	at scala.collection.AbstractIterator.find(Iterator.scala:1429)
	at scala.collection.IterableLike.find(IterableLike.scala:81)
	at scala.collection.IterableLike.find$(IterableLike.scala:80)
	at scala.collection.AbstractIterable.find(Iterable.scala:56)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211)
	at org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46)
...
```

The non-Python Kinesis tests are fine though. It turns out that this is because Pyspark tests use the output of the Spark assembly, and it pulls in `hadoop-cloud`, which in turn pulls in an old AWS Java SDK.

Per Steve Loughran (below), it seems like we can just resolve this by excluding the aws-java-sdk dependency. See the attached PR for some more detail about the debugging and other options.

See apache#25558 (comment)

Closes apache#25559 from srowen/KinesisTest.

Authored-by: Sean Owen <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit d5b7eed)
Signed-off-by: Sean Owen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants