Skip to content

Conversation

@tedyu
Copy link
Contributor

@tedyu tedyu commented Jun 3, 2018

What changes were proposed in this pull request?

This PR upgrades to the Kafka 2.0.0 release where KIP-266 is integrated.

How was this patch tested?

This PR uses existing Kafka related unit tests

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

@SparkQA
Copy link

SparkQA commented Jun 3, 2018

Test build #91429 has finished for PR 21488 at commit 0a22686.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

zkUtils = ZkUtils(s"$zkHost:$zkPort", zkSessionTimeout, zkConnectionTimeout, false)
zkUtils = ZkUtils(zkSvr, zkSessionTimeout, zkConnectionTimeout, false)
zkClient = KafkaZkClient(zkSvr, false, 6000, 10000, Int.MaxValue, Time.SYSTEM)
adminZkClient = new AdminZkClient(zkClient)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the Java AdminClient instead of these internal classes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AdminClient is abstract.
KafkaAdminClient doesn't provide addPartitions.

Mind giving some pointer ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AdminClient.create gives you a concrete instance. createPartitions is the method you're looking for.

@SparkQA
Copy link

SparkQA commented Jun 3, 2018

Test build #91431 has finished for PR 21488 at commit f76da89.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 3, 2018

Test build #91432 has finished for PR 21488 at commit 062c6d0.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val existingAssignment = zkClient.getReplicaAssignmentForTopics(
collection.immutable.Set(topic)).map {
case (topicPartition, replicas) => topicPartition.partition -> replicas
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get replica assignment information via AdminClient too. I think we should try to avoid the internal ZkUtils and KafkaZkClient as much as we can.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@SparkQA
Copy link

SparkQA commented Jun 5, 2018

Test build #91485 has finished for PR 21488 at commit 48f5698.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 5, 2018

Test build #91487 has finished for PR 21488 at commit 758378e.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tedyu
Copy link
Contributor Author

tedyu commented Jun 5, 2018

KafkaMicroBatchSourceSuite hangs with current PR.
I saw the following in its stack trace:

"stream execution thread for [id = ab211e05-02f1-4e17-91c6-294b8ab4400f, runId = 882c5af4-0497-47e4-9abe-1fc947822c43]" #108 daemon prio=5 os_prio=0 tid=0x00007f7e72828000 nid=0x10ce    runnable [0x00007f7de6dfb000]
   java.lang.Thread.State: RUNNABLE
  at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
  at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
  at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
  at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
  - locked <0x0000000792ef55f8> (a sun.nio.ch.Util$3)
  - locked <0x0000000792ef55e8> (a java.util.Collections$UnmodifiableSet)
  - locked <0x0000000792ef54d0> (a sun.nio.ch.EPollSelectorImpl)
  at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
  at org.apache.kafka.common.network.Selector.select(Selector.java:686)
  at org.apache.kafka.common.network.Selector.poll(Selector.java:408)
  at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:478)
  at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:265)
  at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
  at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:227)
  at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:155)
  at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:228)
  - locked <0x0000000793232fa0> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
  at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:310)
  at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1215)
  at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1178)
  at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1112)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader$$anonfun$fetchLatestOffsets$1$$anonfun$apply$9.apply(KafkaOffsetReader.scala:199)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader$$anonfun$fetchLatestOffsets$1$$anonfun$apply$9.apply(KafkaOffsetReader.scala:197)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader$$anonfun$org$apache$spark$sql$kafka010$KafkaOffsetReader$$withRetriesWithoutInterrupt$1.apply$mcV$sp(KafkaOffsetReader.scala:288)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader$$anonfun$org$apache$spark$sql$kafka010$KafkaOffsetReader$$withRetriesWithoutInterrupt$1.apply(KafkaOffsetReader.scala:287)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader$$anonfun$org$apache$spark$sql$kafka010$KafkaOffsetReader$$withRetriesWithoutInterrupt$1.apply(KafkaOffsetReader.scala:287)
  at org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader.org$apache$spark$sql$kafka010$KafkaOffsetReader$$withRetriesWithoutInterrupt(KafkaOffsetReader.scala:286)
  - locked <0x00000007916b4918> (a org.apache.spark.sql.kafka010.KafkaOffsetReader)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader$$anonfun$fetchLatestOffsets$1.apply(KafkaOffsetReader.scala:197)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader$$anonfun$fetchLatestOffsets$1.apply(KafkaOffsetReader.scala:197)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader.runUninterruptibly(KafkaOffsetReader.scala:255)
  at org.apache.spark.sql.kafka010.KafkaOffsetReader.fetchLatestOffsets(KafkaOffsetReader.scala:196)
  at org.apache.spark.sql.kafka010.KafkaMicroBatchReader$$anonfun$getOrCreateInitialPartitionOffsets$1.apply(KafkaMicroBatchReader.scala:194)
  at org.apache.spark.sql.kafka010.KafkaMicroBatchReader$$anonfun$getOrCreateInitialPartitionOffsets$1.apply(KafkaMicroBatchReader.scala:189)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.kafka010.KafkaMicroBatchReader.getOrCreateInitialPartitionOffsets(KafkaMicroBatchReader.scala:189)
  at org.apache.spark.sql.kafka010.KafkaMicroBatchReader.org$apache$spark$sql$kafka010$KafkaMicroBatchReader$$initialPartitionOffsets$lzycompute(KafkaMicroBatchReader.scala:82)
  - locked <0x00000007916b5448> (a org.apache.spark.sql.kafka010.KafkaMicroBatchReader)
  at org.apache.spark.sql.kafka010.KafkaMicroBatchReader.org$apache$spark$sql$kafka010$KafkaMicroBatchReader$$initialPartitionOffsets(KafkaMicroBatchReader.scala:82)
  at org.apache.spark.sql.kafka010.KafkaMicroBatchReader.setOffsetRange(KafkaMicroBatchReader.scala:86)
  at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$5$$anonfun$apply$2.apply$mcV$sp(MicroBatchExecution.scala:344)
  at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$5$$anonfun$apply$2.apply(MicroBatchExecution.scala:344)
  at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$5$$anonfun$apply$2.apply(MicroBatchExecution.scala:344)
  at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:337)
  at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
  at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$5.                 apply(MicroBatchExecution.scala:340)
  at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$5.                 apply(MicroBatchExecution.scala:332)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)

Looking at KafkaOffsetReader.scala , I see call to KafkaConsumer.poll(0)

@tedyu
Copy link
Contributor Author

tedyu commented Jun 5, 2018

I tried the following change but didn't seem to get more output from Kafka:

diff --git a/external/kafka-0-10-sql/src/test/resources/log4j.properties b/external/kafka-0-10-sql/src/test/resources/log4j.properties
index 75e3b53..0d65339 100644
--- a/external/kafka-0-10-sql/src/test/resources/log4j.properties
+++ b/external/kafka-0-10-sql/src/test/resources/log4j.properties
@@ -25,4 +25,4 @@ log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %t %p %c{

 # Ignore messages below warning level from Jetty, because it's a bit verbose
 log4j.logger.org.spark-project.jetty=WARN
-
+log4j.logger.org.apache.kafka=DEBUG

<kafka.version>2.0.0-SNAPSHOT</kafka.version>
</properties>
<packaging>jar</packaging>
<name>Kafka 0.10 Source for Structured Streaming</name>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change this line to reflect the change too

@SparkQA
Copy link

SparkQA commented Jun 6, 2018

Test build #91508 has finished for PR 21488 at commit b773982.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tedyu
Copy link
Contributor Author

tedyu commented Jun 6, 2018

There is only target/surefire-reports/TEST-org.apache.spark.sql.kafka010.KafkaMicroBatchV2SourceSuite.xml under target/surefire-reports

That file doesn't contain test log.

@SparkQA
Copy link

SparkQA commented Jun 7, 2018

Test build #91529 has finished for PR 21488 at commit 90745b2.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 8, 2018

Test build #91587 has finished for PR 21488 at commit 15d23bb.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tedyu
Copy link
Contributor Author

tedyu commented Jun 8, 2018

Made some progress in testing.
Now facing:

- assign from latest offsets (failOnDataLoss: true) *** FAILED ***
  java.lang.IllegalArgumentException: requirement failed
  at scala.Predef$.require(Predef.scala:212)
  at org.apache.spark.sql.kafka010.KafkaSourceSuiteBase.org$apache$spark$sql$kafka010$KafkaSourceSuiteBase$$testFromLatestOffsets(KafkaMicroBatchSourceSuite.scala:993)
  at org.apache.spark.sql.kafka010.KafkaSourceSuiteBase$$anonfun$37$$anonfun$apply$2.apply$mcV$sp(KafkaMicroBatchSourceSuite.scala:734)
  at org.apache.spark.sql.kafka010.KafkaSourceSuiteBase$$anonfun$37$$anonfun$apply$2.apply(KafkaMicroBatchSourceSuite.scala:732)
  at org.apache.spark.sql.kafka010.KafkaSourceSuiteBase$$anonfun$37$$anonfun$apply$2.apply(KafkaMicroBatchSourceSuite.scala:732)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)

@SparkQA
Copy link

SparkQA commented Jun 8, 2018

Test build #91589 has finished for PR 21488 at commit 7a04afe.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tedyu
Copy link
Contributor Author

tedyu commented Jun 13, 2018

Located the test output:

-rw-r--r-- 1 hbase hadoop 35335485506 Jun 13 20:36 target/unit-tests.log

Still need to find out cause for assertion failure.

@ijuma
Copy link
Member

ijuma commented Jul 8, 2018

Any luck getting to the bottom of the issue? It would be great to include this in the next version of Spark.

@zsxwing
Copy link
Member

zsxwing commented Jul 17, 2018

@tedyu could you please just bump to 1.1.0, the current official latest release from Apache Kafka?

<scope>test</scope>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this come from? Or it can be just a test dependency?

assert(Thread.currentThread().isInstanceOf[UninterruptibleThread])
// Poll to get the latest assigned partitions
consumer.poll(0)
consumer.poll(JDuration.ofMillis(0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you revert these changes? We don't use java.time.Duration in Spark.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the Kafka release we agree upon, I can revert.
Duration is recommended API for 2.0.0 release

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tedyu just realized this is ofMillis rather than toMillis. We definitely cannot use it as this poll overload doesn't exist in previous versions and we want to support Kafka versions from 0.10 to 2.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zsxwing Why do you want to support Kafka clients jars from 0.10 to 2.0? Since newer clients jars support older brokers, we recommend people use the latest Kafka clients jar whenever possible.

Copy link
Member

@zsxwing zsxwing Jul 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. However, supporting all these versions is pretty cheap for Spark right now. Spark is using only APIs in 0.10. In addition, if the Kafka client version we pick up here has some critical issue, the user can just switch to an old version.

@tedyu
Copy link
Contributor Author

tedyu commented Jul 23, 2018

Ryan:
Thanks for the close follow-up.

Once Kafka 2.0.0 is released, I will incorporate the above.

@tedyu tedyu changed the title [SPARK-18057][SS] Update Kafka client version from 0.10.0.1 to 1.1.1 [SPARK-18057][SS] Update Kafka client version from 0.10.0.1 to 2.0.0 Jul 29, 2018
@SparkQA
Copy link

SparkQA commented Jul 29, 2018

Test build #93736 has finished for PR 21488 at commit aa69915.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tedyu
Copy link
Contributor Author

tedyu commented Jul 29, 2018

22:36:05.028 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 16314.0 (TID 39181, localhost, executor driver): java.io.FileNotFoundException: File file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-0bbc239c-37c5-4df2-b86d-e9c7628ceb28/f1=1/f2=1/part-00000-390ac6da-50dc-4d32-ba08-462da1e8a0c2.c000.snappy.parquet does not exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:131)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:182)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)

Doesn't seem to be related to PR.

@tedyu
Copy link
Contributor Author

tedyu commented Jul 29, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Jul 29, 2018

Test build #93745 has finished for PR 21488 at commit aa69915.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

https://github.com/apache/kafka/releases KAFKA 2.0.0 release is publicly available. We can finish it before the code freeze?

@tedyu
Copy link
Contributor Author

tedyu commented Jul 31, 2018

@zsxwing
Is there anything I should do for this PR ?

@srowen
Copy link
Member

srowen commented Jul 31, 2018

I might have missed this in the shuffle here -- is this fully compatible with 0.10.x brokers too?

@zsxwing
Copy link
Member

zsxwing commented Jul 31, 2018

I might have missed this in the shuffle here -- is this fully compatible with 0.10.x brokers too?

Yep. @ijuma could you confirm it?

@ijuma
Copy link
Member

ijuma commented Jul 31, 2018

Yes, Kafka clients 0.10.2 and higher support brokers from 0.10.0 and higher if the protocols being used are available in that version. Since we only changed test code in this PR, upgrading to 2.0.0 is safe.

If Spark were to use the AdminClient create topics, then the brokers would have to be 0.10.1 or higher. Other admin protocols like config management were added in more recent versions and so on.

@srowen
Copy link
Member

srowen commented Jul 31, 2018

@ijuma this changes the version of kakfa-clients in compile scope, too -- right? looks like it in the POM. That's not just a test change. It's probably OK if the client is backwards-compatible with 0.10.x brokers, which is what this integration currently supports. I guess I'm always nervous about making a major-version change to a user-visible dependency; this is a big one too. But you all may know better whether there is any potential user impact. Would this change break any user code that was previously compiling against the 0.10.x client?

@ijuma
Copy link
Member

ijuma commented Jul 31, 2018

@srowen I meant that the library is compatible, so if you just change the version in the pom, it's fine. If you changed the code to use some methods in AdminClient, then you'd have to be more careful. Does that make sense?

The major version bump is because we now require Java 8 (which I believe you require too) and the Scala clients were removed.

@srowen
Copy link
Member

srowen commented Jul 31, 2018

Yes, and that then impacts programs that depend on this module to use Kafka 0.10+ from Spark. They'd have to update kafka-clients too, or would be automatically updated really. I don't see a reason a caller would use AdminClient directly in a Spark program though. Any other API changes you can see possibly causing a problem? my concern is breaking compatibility in a minor release.

Yes Spark requires Java 8. You're saying although the (two) major version bumps look scary there was little change in the client library itself?

@zsxwing
Copy link
Member

zsxwing commented Jul 31, 2018

@srowen just to be clear, AdminClient is not in kafka-clients jar. The user has to add kafka jar as a dependency to use AdminClient.

In addition, even if this upgrade has some unexpected issues, the user can still switch back to an old kafka-clients version by pining the version in their pom.xml. This Kafka connector in Spark uses only 0.10 APIs.

@ijuma
Copy link
Member

ijuma commented Jul 31, 2018

Yeah, the Java client libraries have been evolved in a compatible manner for the most part since 0.10.0. The set of broker versions supported by 0.10.0 and 2.0.0 is exactly the same.

The consumer/producer API has been enriched (transactions, idempotent producer, offsetsForTimes), but the existing methods have been kept. A small number of deprecated, but rarely used methods have been removed (not in KafkaProducer or KafkaConsumer though):

apache/kafka@a4c2921

In 0.10.1, a heartbeat thread was added to the Java consumer:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread

This is helpful to users who could not call poll often enough, but the default configs should be fine for most cases.

@ijuma
Copy link
Member

ijuma commented Jul 31, 2018

@zsxwing org.apache.kafka.clients.admin.AdminClient is in the clients jar. It's not relevant for this PR, I am just mentioning it so if Spark decides to migrate to that eventually. I see code using ZkUtils, which is an internal class that will be removed eventually.

@zsxwing
Copy link
Member

zsxwing commented Jul 31, 2018

@ijuma I see. I was looking at 0.10 jar. Thanks for correcting me.

@ijuma
Copy link
Member

ijuma commented Jul 31, 2018

Anyway, overall I think you should definitely make this change. Spark users are currently penalised heavily when running on clusters with the message format introduced in 0.11.0, which has important resiliency improvements (e.g. KIP-101). And, as @zsxwing said, people can choose an older version if necessary (I can't think of a reason why, the reason why we have focused on making client jars compatible is so that people can just use the latest independently of broker versions).

@srowen
Copy link
Member

srowen commented Jul 31, 2018

Yes, good argument. Having been burned in the past (actually, by Kafka and ZK changes, though that's in the past), I'm aware that even compiling against version B instead of A, with no library code changes, may still mean the library no longer runs against version A. Changing exception signatures is also sometimes a problem.

It may not be the case here, unlikely, but do want to think this through fully. Yes differences like apache/kafka@a4c2921 are what I am worried about.

Still I have no concrete example of this type of problem in the wild.

@zsxwing
Copy link
Member

zsxwing commented Jul 31, 2018

@srowen May I read your comment as "no objections"? The current PR looks good to me. If you don't have objections, I will go ahead and merge it.

@srowen
Copy link
Member

srowen commented Jul 31, 2018

Yes no objections. There's no concrete problem, and there is upside to making the change. I think you're aware of the potential issues, and have thought through them in the context of deeper knowledge of Kafka's APIs than I have.

@zsxwing
Copy link
Member

zsxwing commented Jul 31, 2018

Thanks! Merging to master.

@asfgit asfgit closed this in e82784d Jul 31, 2018
@wangyum
Copy link
Member

wangyum commented Aug 1, 2018

It seems this commit cause KafkaSourceStressForDontFailOnDataLossSuite failed:

...
[info] KafkaSourceStressSuite:
[info] - stress test with multiple topics and partitions (19 seconds, 255 milliseconds)
[info] KafkaSourceStressForDontFailOnDataLossSuite:
[info] - stress test for failOnDataLoss=false *** FAILED *** (45 seconds, 648 milliseconds)
[info]   java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.KafkaStorageException: Disk error when trying to access log file on the disk.
[info]   at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:94)
[info]   at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:77)
[info]   at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:29)
[info]   at org.apache.spark.sql.kafka010.KafkaTestUtils$$anonfun$2.apply(KafkaTestUtils.scala:254)
[info]   at org.apache.spark.sql.kafka010.KafkaTestUtils$$anonfun$2.apply(KafkaTestUtils.scala:248)
[info]   at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[info]   at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[info]   at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
[info]   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
[info]   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
[info]   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
[info]   at org.apache.spark.sql.kafka010.KafkaTestUtils.sendMessages(KafkaTestUtils.scala:248)
[info]   at org.apache.spark.sql.kafka010.KafkaTestUtils.sendMessages(KafkaTestUtils.scala:238)
[info]   at org.apache.spark.sql.kafka010.KafkaSourceStressForDontFailOnDataLossSuite$$anonfun$48$$anonfun$apply$26$$anonfun$apply$27.apply(KafkaMicroBatchSourceSuite.scala:1268)
[info]   at org.apache.spark.sql.kafka010.KafkaSourceStressForDontFailOnDataLossSuite$$anonfun$48$$anonfun$apply$26$$anonfun$apply$27.apply(KafkaMicroBatchSourceSuite.scala:1267)
[info]   at scala.collection.immutable.Range.foreach(Range.scala:160)
[info]   at org.apache.spark.sql.kafka010.KafkaSourceStressForDontFailOnDataLossSuite$$anonfun$48$$anonfun$apply$26.apply(KafkaMicroBatchSourceSuite.scala:1267)
[info]   at org.apache.spark.sql.kafka010.KafkaSourceStressForDontFailOnDataLossSuite$$anonfun$48$$anonfun$apply$26.apply(KafkaMicroBatchSourceSuite.scala:1265)
[info]   at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
[info]   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
[info]   at org.apache.spark.sql.kafka010.KafkaSourceStressForDontFailOnDataLossSuite$$anonfun$48.apply(KafkaMicroBatchSourceSuite.scala:1265)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:103)
[info]   at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
[info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196)
[info]   at org.apache.spark.sql.kafka010.KafkaSourceStressForDontFailOnDataLossSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(KafkaMicroBatchSourceSuite.scala:1155)
[info]   at org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:221)
[info]   at org.apache.spark.sql.kafka010.KafkaSourceStressForDontFailOnDataLossSuite.runTest(KafkaMicroBatchSourceSuite.scala:1155)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
[info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
[info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
[info]   at scala.collection.immutable.List.foreach(List.scala:392)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
[info]   at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
[info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
[info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
[info]   at org.scalatest.Suite$class.run(Suite.scala:1147)
[info]   at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
[info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
[info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
[info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233)
[info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:52)
[info]   at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213)
[info]   at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:52)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:314)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:480)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:748)
[info]   Cause: org.apache.kafka.common.errors.KafkaStorageException: Disk error when trying to access log file on the disk.
Exception in thread "Thread-9" java.io.EOFException
        at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2954)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:428)
        at org.scalatest.tools.Framework$ScalaTestRunner$Skeleton$1$React.react(Framework.scala:809)
        at org.scalatest.tools.Framework$ScalaTestRunner$Skeleton$1.run(Framework.scala:798)
        at java.lang.Thread.run(Thread.java:748)

How to reproduce:

build/sbt -Phadoop-2.6  "sql-kafka-0-10/testOnly"

@tedyu
Copy link
Contributor Author

tedyu commented Aug 1, 2018

I used the following command and the test passed:

mvn test -Phadoop-2.6 -Pyarn -Phive -Dtest=KafkaMicroBatchSourceSuite -rf external/kafka-0-10-sql

Please take a look at the 'Disk error' message and see if it was related to test failure.

@srowen
Copy link
Member

srowen commented Aug 1, 2018

Hm, looking at the dashboard, I don't see this failure consistently in the master tests:

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/

Let's keep a close eye on it to see if it repeats. It might have been something transient.

@ijuma
Copy link
Member

ijuma commented Aug 1, 2018

@wangyum, can you please file a Kafka JIRA with details of what the test is doing (even if the failure is transient)? From the stacktrace, it looks like a potential broker issue (assuming there are no real disk issues where these tests were executed). If there is indeed a new issue (we have to verify since the test failure seems to be transient), it would likely only affect tests.

@srowen
Copy link
Member

srowen commented Aug 2, 2018

Ack I missed something here: there's an override of kafka.version for Scala 2.12, from when it had to be bumped up to work with 2.12. That no longer works when compiling with 2.12. I'll submit a follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants