[SPARK-18057][SS] Update Kafka client version from 0.10.0.1 to 2.0.0 #21488

tedyu · 2018-06-03T19:55:59Z

What changes were proposed in this pull request?

This PR upgrades to the Kafka 2.0.0 release where KIP-266 is integrated.

How was this patch tested?

This PR uses existing Kafka related unit tests

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

SparkQA · 2018-06-03T20:00:06Z

Test build #91429 has finished for PR 21488 at commit 0a22686.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds no public classes.

ijuma · 2018-06-03T20:19:56Z

external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala

-    zkUtils = ZkUtils(s"$zkHost:$zkPort", zkSessionTimeout, zkConnectionTimeout, false)
+    zkUtils = ZkUtils(zkSvr, zkSessionTimeout, zkConnectionTimeout, false)
+    zkClient = KafkaZkClient(zkSvr, false, 6000, 10000, Int.MaxValue, Time.SYSTEM)
+    adminZkClient = new AdminZkClient(zkClient)


Can we use the Java AdminClient instead of these internal classes?

AdminClient is abstract.
KafkaAdminClient doesn't provide addPartitions.

Mind giving some pointer ?

AdminClient.create gives you a concrete instance. createPartitions is the method you're looking for.

SparkQA · 2018-06-03T22:39:46Z

Test build #91431 has finished for PR 21488 at commit f76da89.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-06-03T23:20:06Z

Test build #91432 has finished for PR 21488 at commit 062c6d0.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds no public classes.

ijuma · 2018-06-05T06:40:48Z

external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala

+    val existingAssignment = zkClient.getReplicaAssignmentForTopics(
+      collection.immutable.Set(topic)).map {
+        case (topicPartition, replicas) => topicPartition.partition -> replicas
+    }


We can get replica assignment information via AdminClient too. I think we should try to avoid the internal ZkUtils and KafkaZkClient as much as we can.

SparkQA · 2018-06-05T13:25:15Z

Test build #91485 has finished for PR 21488 at commit 48f5698.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-06-05T13:55:05Z

Test build #91487 has finished for PR 21488 at commit 758378e.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds no public classes.

tedyu · 2018-06-05T14:54:30Z

KafkaMicroBatchSourceSuite hangs with current PR.
I saw the following in its stack trace:

"stream execution thread for [id = ab211e05-02f1-4e17-91c6-294b8ab4400f, runId = 882c5af4-0497-47e4-9abe-1fc947822c43]" #108 daemon prio=5 os_prio=0 tid=0x00007f7e72828000 nid=0x10ce    runnable [0x00007f7de6dfb000]
   java.lang.Thread.State: RUNNABLE
  at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
  at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
  at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
  at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
  - locked <0x0000000792ef55f8> (a sun.nio.ch.Util$3)
  - locked <0x0000000792ef55e8> (a java.util.Collections$UnmodifiableSet)
  - locked <0x0000000792ef54d0> (a sun.nio.ch.EPollSelectorImpl)
  at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
  at org.apache.kafka.common.network.Selector.select(Selector.java:686)
  at org.apache.kafka.common.network.Selector.poll(Selector.java:408)
  at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:478)
  at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:265)
  at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
  at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:227)
  at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:155)
  at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:228)
  - locked <0x0000000793232fa0> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
  at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:310)
  at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1215)
  at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1178)
  at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1112)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader$$anonfun$fetchLatestOffsets$1$$anonfun$apply$9.apply(KafkaOffsetReader.scala:199)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader$$anonfun$fetchLatestOffsets$1$$anonfun$apply$9.apply(KafkaOffsetReader.scala:197)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader$$anonfun$org$apache$spark$sql$kafka010$KafkaOffsetReader$$withRetriesWithoutInterrupt$1.apply$mcV$sp(KafkaOffsetReader.scala:288)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader$$anonfun$org$apache$spark$sql$kafka010$KafkaOffsetReader$$withRetriesWithoutInterrupt$1.apply(KafkaOffsetReader.scala:287)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader$$anonfun$org$apache$spark$sql$kafka010$KafkaOffsetReader$$withRetriesWithoutInterrupt$1.apply(KafkaOffsetReader.scala:287)
  at org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader.org$apache$spark$sql$kafka010$KafkaOffsetReader$$withRetriesWithoutInterrupt(KafkaOffsetReader.scala:286)
  - locked <0x00000007916b4918> (a org.apache.spark.sql.kafka010.KafkaOffsetReader)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader$$anonfun$fetchLatestOffsets$1.apply(KafkaOffsetReader.scala:197)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader$$anonfun$fetchLatestOffsets$1.apply(KafkaOffsetReader.scala:197)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader.runUninterruptibly(KafkaOffsetReader.scala:255)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader.fetchLatestOffsets(KafkaOffsetReader.scala:196)
  at org.apache.spark.sql.kafka010.KafkaMicroBatchReader$$anonfun$getOrCreateInitialPartitionOffsets$1.apply(KafkaMicroBatchReader.scala:194)
  at org.apache.spark.sql.kafka010.KafkaMicroBatchReader$$anonfun$getOrCreateInitialPartitionOffsets$1.apply(KafkaMicroBatchReader.scala:189)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.kafka010.KafkaMicroBatchReader.getOrCreateInitialPartitionOffsets(KafkaMicroBatchReader.scala:189)
  at org.apache.spark.sql.kafka010.KafkaMicroBatchReader.org$apache$spark$sql$kafka010$KafkaMicroBatchReader$$initialPartitionOffsets$lzycompute(KafkaMicroBatchReader.scala:82)
  - locked <0x00000007916b5448> (a org.apache.spark.sql.kafka010.KafkaMicroBatchReader)
  at org.apache.spark.sql.kafka010.KafkaMicroBatchReader.org$apache$spark$sql$kafka010$KafkaMicroBatchReader$$initialPartitionOffsets(KafkaMicroBatchReader.scala:82)
  at org.apache.spark.sql.kafka010.KafkaMicroBatchReader.setOffsetRange(KafkaMicroBatchReader.scala:86)
  at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$5$$anonfun$apply$2.apply$mcV$sp(MicroBatchExecution.scala:344)
  at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$5$$anonfun$apply$2.apply(MicroBatchExecution.scala:344)
  at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$5$$anonfun$apply$2.apply(MicroBatchExecution.scala:344)
  at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:337)
  at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
  at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$5.                 apply(MicroBatchExecution.scala:340)
  at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$5.                 apply(MicroBatchExecution.scala:332)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)

Looking at KafkaOffsetReader.scala , I see call to KafkaConsumer.poll(0)

tedyu · 2018-06-05T16:57:04Z

I tried the following change but didn't seem to get more output from Kafka:

diff --git a/external/kafka-0-10-sql/src/test/resources/log4j.properties b/external/kafka-0-10-sql/src/test/resources/log4j.properties
index 75e3b53..0d65339 100644
--- a/external/kafka-0-10-sql/src/test/resources/log4j.properties
+++ b/external/kafka-0-10-sql/src/test/resources/log4j.properties
@@ -25,4 +25,4 @@ log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %t %p %c{

 # Ignore messages below warning level from Jetty, because it's a bit verbose
 log4j.logger.org.spark-project.jetty=WARN
-
+log4j.logger.org.apache.kafka=DEBUG

eric-maynard · 2018-06-06T20:33:09Z

external/kafka-0-10-sql/pom.xml

+    <kafka.version>2.0.0-SNAPSHOT</kafka.version>
  </properties>
  <packaging>jar</packaging>
  <name>Kafka 0.10 Source for Structured Streaming</name>


We should change this line to reflect the change too

SparkQA · 2018-06-06T20:45:09Z

Test build #91508 has finished for PR 21488 at commit b773982.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds no public classes.

tedyu · 2018-06-06T22:28:01Z

There is only target/surefire-reports/TEST-org.apache.spark.sql.kafka010.KafkaMicroBatchV2SourceSuite.xml under target/surefire-reports

That file doesn't contain test log.

SparkQA · 2018-06-07T16:25:09Z

Test build #91529 has finished for PR 21488 at commit 90745b2.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-06-08T22:59:40Z

Test build #91587 has finished for PR 21488 at commit 15d23bb.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

tedyu · 2018-06-08T23:00:46Z

Made some progress in testing.
Now facing:

- assign from latest offsets (failOnDataLoss: true) *** FAILED ***
  java.lang.IllegalArgumentException: requirement failed
  at scala.Predef$.require(Predef.scala:212)
  at org.apache.spark.sql.kafka010.KafkaSourceSuiteBase.org$apache$spark$sql$kafka010$KafkaSourceSuiteBase$$testFromLatestOffsets(KafkaMicroBatchSourceSuite.scala:993)
  at org.apache.spark.sql.kafka010.KafkaSourceSuiteBase$$anonfun$37$$anonfun$apply$2.apply$mcV$sp(KafkaMicroBatchSourceSuite.scala:734)
  at org.apache.spark.sql.kafka010.KafkaSourceSuiteBase$$anonfun$37$$anonfun$apply$2.apply(KafkaMicroBatchSourceSuite.scala:732)
  at org.apache.spark.sql.kafka010.KafkaSourceSuiteBase$$anonfun$37$$anonfun$apply$2.apply(KafkaMicroBatchSourceSuite.scala:732)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)

SparkQA · 2018-06-08T23:10:14Z

Test build #91589 has finished for PR 21488 at commit 7a04afe.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds no public classes.

tedyu · 2018-06-13T21:04:46Z

Located the test output:

-rw-r--r-- 1 hbase hadoop 35335485506 Jun 13 20:36 target/unit-tests.log

Still need to find out cause for assertion failure.

ijuma · 2018-07-08T23:09:12Z

Any luck getting to the bottom of the issue? It would be great to include this in the next version of Spark.

zsxwing · 2018-07-17T17:05:14Z

@tedyu could you please just bump to 1.1.0, the current official latest release from Apache Kafka?

zsxwing · 2018-07-17T17:07:16Z

external/kafka-0-10-sql/pom.xml

      <scope>test</scope>
    </dependency>
+    <dependency>
+      <groupId>org.eclipse.jetty</groupId>


Where does this come from? Or it can be just a test dependency?

zsxwing · 2018-07-17T17:09:25Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala

    assert(Thread.currentThread().isInstanceOf[UninterruptibleThread])
    // Poll to get the latest assigned partitions
-    consumer.poll(0)
+    consumer.poll(JDuration.ofMillis(0))


Could you revert these changes? We don't use java.time.Duration in Spark.

Depending on the Kafka release we agree upon, I can revert.
Duration is recommended API for 2.0.0 release

@tedyu just realized this is ofMillis rather than toMillis. We definitely cannot use it as this poll overload doesn't exist in previous versions and we want to support Kafka versions from 0.10 to 2.0.

@zsxwing Why do you want to support Kafka clients jars from 0.10 to 2.0? Since newer clients jars support older brokers, we recommend people use the latest Kafka clients jar whenever possible.

That's a good point. However, supporting all these versions is pretty cheap for Spark right now. Spark is using only APIs in 0.10. In addition, if the Kafka client version we pick up here has some critical issue, the user can just switch to an old version.

tedyu · 2018-07-23T23:25:02Z

Ryan:
Thanks for the close follow-up.

Once Kafka 2.0.0 is released, I will incorporate the above.

SparkQA · 2018-07-29T05:36:52Z

Test build #93736 has finished for PR 21488 at commit aa69915.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

tedyu · 2018-07-29T07:31:58Z

22:36:05.028 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 16314.0 (TID 39181, localhost, executor driver): java.io.FileNotFoundException: File file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-0bbc239c-37c5-4df2-b86d-e9c7628ceb28/f1=1/f2=1/part-00000-390ac6da-50dc-4d32-ba08-462da1e8a0c2.c000.snappy.parquet does not exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:131)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:182)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)

Doesn't seem to be related to PR.

tedyu · 2018-07-29T07:33:21Z

retest this please

SparkQA · 2018-07-29T12:07:34Z

Test build #93745 has finished for PR 21488 at commit aa69915.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-07-30T14:46:13Z

https://github.com/apache/kafka/releases KAFKA 2.0.0 release is publicly available. We can finish it before the code freeze?

tedyu · 2018-07-31T02:21:33Z

@zsxwing
Is there anything I should do for this PR ?

srowen · 2018-07-31T18:18:03Z

I might have missed this in the shuffle here -- is this fully compatible with 0.10.x brokers too?

zsxwing · 2018-07-31T18:28:15Z

I might have missed this in the shuffle here -- is this fully compatible with 0.10.x brokers too?

Yep. @ijuma could you confirm it?

ijuma · 2018-07-31T18:58:53Z

Yes, Kafka clients 0.10.2 and higher support brokers from 0.10.0 and higher if the protocols being used are available in that version. Since we only changed test code in this PR, upgrading to 2.0.0 is safe.

If Spark were to use the AdminClient create topics, then the brokers would have to be 0.10.1 or higher. Other admin protocols like config management were added in more recent versions and so on.

srowen · 2018-07-31T19:05:04Z

@ijuma this changes the version of kakfa-clients in compile scope, too -- right? looks like it in the POM. That's not just a test change. It's probably OK if the client is backwards-compatible with 0.10.x brokers, which is what this integration currently supports. I guess I'm always nervous about making a major-version change to a user-visible dependency; this is a big one too. But you all may know better whether there is any potential user impact. Would this change break any user code that was previously compiling against the 0.10.x client?

ijuma · 2018-07-31T19:13:54Z

@srowen I meant that the library is compatible, so if you just change the version in the pom, it's fine. If you changed the code to use some methods in AdminClient, then you'd have to be more careful. Does that make sense?

The major version bump is because we now require Java 8 (which I believe you require too) and the Scala clients were removed.

srowen · 2018-07-31T19:21:04Z

Yes, and that then impacts programs that depend on this module to use Kafka 0.10+ from Spark. They'd have to update kafka-clients too, or would be automatically updated really. I don't see a reason a caller would use AdminClient directly in a Spark program though. Any other API changes you can see possibly causing a problem? my concern is breaking compatibility in a minor release.

Yes Spark requires Java 8. You're saying although the (two) major version bumps look scary there was little change in the client library itself?

zsxwing · 2018-07-31T19:38:44Z

@srowen just to be clear, AdminClient is not in kafka-clients jar. The user has to add kafka jar as a dependency to use AdminClient.

In addition, even if this upgrade has some unexpected issues, the user can still switch back to an old kafka-clients version by pining the version in their pom.xml. This Kafka connector in Spark uses only 0.10 APIs.

ijuma · 2018-07-31T19:39:43Z

Yeah, the Java client libraries have been evolved in a compatible manner for the most part since 0.10.0. The set of broker versions supported by 0.10.0 and 2.0.0 is exactly the same.

The consumer/producer API has been enriched (transactions, idempotent producer, offsetsForTimes), but the existing methods have been kept. A small number of deprecated, but rarely used methods have been removed (not in KafkaProducer or KafkaConsumer though):

apache/kafka@a4c2921

In 0.10.1, a heartbeat thread was added to the Java consumer:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread

This is helpful to users who could not call poll often enough, but the default configs should be fine for most cases.

ijuma · 2018-07-31T19:41:19Z

@zsxwing org.apache.kafka.clients.admin.AdminClient is in the clients jar. It's not relevant for this PR, I am just mentioning it so if Spark decides to migrate to that eventually. I see code using ZkUtils, which is an internal class that will be removed eventually.

zsxwing · 2018-07-31T19:43:42Z

@ijuma I see. I was looking at 0.10 jar. Thanks for correcting me.

ijuma · 2018-07-31T19:44:45Z

Anyway, overall I think you should definitely make this change. Spark users are currently penalised heavily when running on clusters with the message format introduced in 0.11.0, which has important resiliency improvements (e.g. KIP-101). And, as @zsxwing said, people can choose an older version if necessary (I can't think of a reason why, the reason why we have focused on making client jars compatible is so that people can just use the latest independently of broker versions).

srowen · 2018-07-31T19:46:40Z

Yes, good argument. Having been burned in the past (actually, by Kafka and ZK changes, though that's in the past), I'm aware that even compiling against version B instead of A, with no library code changes, may still mean the library no longer runs against version A. Changing exception signatures is also sometimes a problem.

It may not be the case here, unlikely, but do want to think this through fully. Yes differences like apache/kafka@a4c2921 are what I am worried about.

Still I have no concrete example of this type of problem in the wild.

zsxwing · 2018-07-31T19:59:25Z

@srowen May I read your comment as "no objections"? The current PR looks good to me. If you don't have objections, I will go ahead and merge it.

srowen · 2018-07-31T20:02:59Z

Yes no objections. There's no concrete problem, and there is upside to making the change. I think you're aware of the potential issues, and have thought through them in the context of deeper knowledge of Kafka's APIs than I have.

zsxwing · 2018-07-31T20:04:59Z

Thanks! Merging to master.

wangyum · 2018-08-01T10:22:59Z

It seems this commit cause KafkaSourceStressForDontFailOnDataLossSuite failed:

...
[info] KafkaSourceStressSuite:
[info] - stress test with multiple topics and partitions (19 seconds, 255 milliseconds)
[info] KafkaSourceStressForDontFailOnDataLossSuite:
[info] - stress test for failOnDataLoss=false *** FAILED *** (45 seconds, 648 milliseconds)
[info]   java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.KafkaStorageException: Disk error when trying to access log file on the disk.
[info]   at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:94)
[info]   at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:77)
[info]   at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:29)
[info]   at org.apache.spark.sql.kafka010.KafkaTestUtils$$anonfun$2.apply(KafkaTestUtils.scala:254)
[info]   at org.apache.spark.sql.kafka010.KafkaTestUtils$$anonfun$2.apply(KafkaTestUtils.scala:248)
[info]   at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[info]   at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[info]   at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
[info]   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
[info]   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
[info]   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
[info]   at org.apache.spark.sql.kafka010.KafkaTestUtils.sendMessages(KafkaTestUtils.scala:248)
[info]   at org.apache.spark.sql.kafka010.KafkaTestUtils.sendMessages(KafkaTestUtils.scala:238)
[info]   at org.apache.spark.sql.kafka010.KafkaSourceStressForDontFailOnDataLossSuite$$anonfun$48$$anonfun$apply$26$$anonfun$apply$27.apply(KafkaMicroBatchSourceSuite.scala:1268)
[info]   at org.apache.spark.sql.kafka010.KafkaSourceStressForDontFailOnDataLossSuite$$anonfun$48$$anonfun$apply$26$$anonfun$apply$27.apply(KafkaMicroBatchSourceSuite.scala:1267)
[info]   at scala.collection.immutable.Range.foreach(Range.scala:160)
[info]   at org.apache.spark.sql.kafka010.KafkaSourceStressForDontFailOnDataLossSuite$$anonfun$48$$anonfun$apply$26.apply(KafkaMicroBatchSourceSuite.scala:1267)
[info]   at org.apache.spark.sql.kafka010.KafkaSourceStressForDontFailOnDataLossSuite$$anonfun$48$$anonfun$apply$26.apply(KafkaMicroBatchSourceSuite.scala:1265)
[info]   at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
[info]   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
[info]   at org.apache.spark.sql.kafka010.KafkaSourceStressForDontFailOnDataLossSuite$$anonfun$48.apply(KafkaMicroBatchSourceSuite.scala:1265)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:103)
[info]   at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
[info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196)
[info]   at org.apache.spark.sql.kafka010.KafkaSourceStressForDontFailOnDataLossSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(KafkaMicroBatchSourceSuite.scala:1155)
[info]   at org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:221)
[info]   at org.apache.spark.sql.kafka010.KafkaSourceStressForDontFailOnDataLossSuite.runTest(KafkaMicroBatchSourceSuite.scala:1155)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
[info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
[info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
[info]   at scala.collection.immutable.List.foreach(List.scala:392)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
[info]   at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
[info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
[info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
[info]   at org.scalatest.Suite$class.run(Suite.scala:1147)
[info]   at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
[info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
[info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
[info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233)
[info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:52)
[info]   at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213)
[info]   at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:52)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:314)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:480)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:748)
[info]   Cause: org.apache.kafka.common.errors.KafkaStorageException: Disk error when trying to access log file on the disk.
Exception in thread "Thread-9" java.io.EOFException
        at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2954)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:428)
        at org.scalatest.tools.Framework$ScalaTestRunner$Skeleton$1$React.react(Framework.scala:809)
        at org.scalatest.tools.Framework$ScalaTestRunner$Skeleton$1.run(Framework.scala:798)
        at java.lang.Thread.run(Thread.java:748)

How to reproduce:

build/sbt -Phadoop-2.6  "sql-kafka-0-10/testOnly"

tedyu · 2018-08-01T15:27:40Z

I used the following command and the test passed:

mvn test -Phadoop-2.6 -Pyarn -Phive -Dtest=KafkaMicroBatchSourceSuite -rf external/kafka-0-10-sql

Please take a look at the 'Disk error' message and see if it was related to test failure.

srowen · 2018-08-01T15:39:21Z

Hm, looking at the dashboard, I don't see this failure consistently in the master tests:

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/

Let's keep a close eye on it to see if it repeats. It might have been something transient.

ijuma · 2018-08-01T20:56:39Z

@wangyum, can you please file a Kafka JIRA with details of what the test is doing (even if the failure is transient)? From the stacktrace, it looks like a potential broker issue (assuming there are no real disk issues where these tests were executed). If there is indeed a new issue (we have to verify since the test failure seems to be transient), it would likely only affect tests.

srowen · 2018-08-02T03:16:58Z

Ack I missed something here: there's an override of kafka.version for Scala 2.12, from when it had to be bumped up to work with 2.12. That no longer works when compiling with 2.12. I'll submit a follow up.

SPARK-18057 Update structured streaming kafka from 0.10.0.1 to 2.0.0

0a22686

ijuma reviewed Jun 3, 2018

View reviewed changes

Use AdminClient for creating partitions

f76da89

Drop trailing semicolon

062c6d0

ijuma reviewed Jun 5, 2018

View reviewed changes

Correct assignment of zkPort

48f5698

Add missing leading 's'

758378e

eric-maynard reviewed Jun 6, 2018

View reviewed changes

Change version of Kafka

b773982

Correct listener name

90745b2

Use Duration.ofMillis(0) for poll(0)

15d23bb

Switch import order

7a04afe

zsxwing reviewed Jul 17, 2018

View reviewed changes

Upgrade to Kafka 2.0.0 - Ryan

aa69915

tedyu changed the title ~~[SPARK-18057][SS] Update Kafka client version from 0.10.0.1 to 1.1.1~~ [SPARK-18057][SS] Update Kafka client version from 0.10.0.1 to 2.0.0 Jul 29, 2018

asfgit closed this in e82784d Jul 31, 2018

[SPARK-18057][SS] Update Kafka client version from 0.10.0.1 to 2.0.0 #21488

[SPARK-18057][SS] Update Kafka client version from 0.10.0.1 to 2.0.0 #21488

Uh oh!

Conversation

tedyu commented Jun 3, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jun 3, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 3, 2018

Uh oh!

SparkQA commented Jun 3, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 5, 2018

Uh oh!

SparkQA commented Jun 5, 2018

Uh oh!

tedyu commented Jun 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tedyu commented Jun 5, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 6, 2018

Uh oh!

tedyu commented Jun 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jun 7, 2018

Uh oh!

SparkQA commented Jun 8, 2018

Uh oh!

tedyu commented Jun 8, 2018

Uh oh!

SparkQA commented Jun 8, 2018

Uh oh!

tedyu commented Jun 13, 2018

Uh oh!

ijuma commented Jul 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zsxwing commented Jul 17, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zsxwing Jul 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tedyu commented Jul 23, 2018

Uh oh!

SparkQA commented Jul 29, 2018

Uh oh!

tedyu commented Jul 29, 2018

Uh oh!

tedyu commented Jul 29, 2018

Uh oh!

SparkQA commented Jul 29, 2018

Uh oh!

gatorsmile commented Jul 30, 2018

tedyu commented Jun 5, 2018 •

edited

Loading

tedyu commented Jun 6, 2018 •

edited

Loading

ijuma commented Jul 8, 2018 •

edited

Loading

zsxwing Jul 18, 2018 •

edited

Loading

ijuma commented Jul 31, 2018 •

edited

Loading

ijuma commented Aug 1, 2018 •

edited

Loading