[SPARK-20452][SS][Kafka]Fix a potential ConcurrentModificationException for batch Kafka DataFrame #17752

zsxwing · 2017-04-24T22:44:08Z

What changes were proposed in this pull request?

Cancel a batch Kafka query but one of task cannot be cancelled, and rerun the same DataFrame may cause ConcurrentModificationException because it may launch two tasks sharing the same group id.

This PR always create a new consumer when reuseKafkaConsumer = false to avoid ConcurrentModificationException. It also contains other minor fixes.

How was this patch tested?

Jenkins.

zsxwing · 2017-04-24T22:45:50Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala

  def close(): Unit = {
-    consumer.close()
-    kafkaReaderThread.shutdownNow()
+    runUninterruptibly {


If the kafkaReaderThread thread is using the consumer, consumer.close will throw ConcurrentModificationException. Put it inside runUninterruptibly to prevent this case happening.

how does this prevent it? seems like you want a lock, so that the consumer is not being used while close is called?

This is just like other methods wrapped with runUninterruptibly which runs either in the stream thread or kafkaReaderThread.

nvm. i understand that runUninterruptibly ensures that.

zsxwing · 2017-04-24T22:46:49Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala

+    runUninterruptibly {
+      consumer.close()
+    }
+    kafkaReaderThread.shutdown()


not need to interrupt since kafkaReaderThread is UninterruptibleThread

zsxwing · 2017-04-24T22:49:17Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaRelation.scala

+    // Each running query should use its own group id. Otherwise, the query may be only assigned
+    // partial data since Kafka will assign partitions to multiple consumers having the same group
+    // id. Hence, we should generate a unique id for each query.
+    val uniqueGroupId = s"spark-kafka-relation-${UUID.randomUUID}"


generate the unique group id and KafkaOffsetReader here, and close it inside this method, so that we never use the same reader at different threads (such as using the same DataFrame in different threads).

note: previous codes forget to close KafkaOffsetReader

zsxwing · 2017-04-24T22:49:40Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala

        ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG -> classOf[ByteArraySerializer].getName)
  }

-  private def kafkaParamsForDriver(specifiedKafkaParams: Map[String, String]) =


move to object KafkaSourceProvider

zsxwing · 2017-04-24T22:49:44Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala

  }
-
-  /** Class to conveniently update Kafka config params, while logging the changes */
-  private case class ConfigUpdater(module: String, kafkaParams: Map[String, String]) {


move to object KafkaSourceProvider and change logInfo to logDebug. Kafka consumer will print all configs. Not need to print duplicated information in the logs.

zsxwing · 2017-04-24T22:49:50Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala

-      .setIfUnset(ConsumerConfig.RECEIVE_BUFFER_CONFIG, 65536: java.lang.Integer)
-      .build()
-
-  private def kafkaParamsForExecutors(


move to object KafkaSourceProvider

zsxwing · 2017-04-24T22:50:57Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceRDD.scala

+      if (!reuseKafkaConsumer) {
+        // If we can't reuse CachedKafkaConsumers, creating a new CachedKafkaConsumer. As here we
+        // uses `assign`, we don't need to worry about the "group.id" conflicts.
+        new CachedKafkaConsumer(new TopicPartition(topic, kafkaPartition), executorKafkaParams)


This is the major change.

Would be more consistent with getOrCreate if you just add create method to CachedKafkaConsumer

zsxwing · 2017-04-24T22:51:38Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala

    // If this is reattempt at running the task, then invalidate cache and start with
    // a new consumer
-    if (TaskContext.get != null && TaskContext.get.attemptNumber > 1) {
+    if (TaskContext.get != null && TaskContext.get.attemptNumber >= 1) {


Fix attemptNumber. It starts with 0.

zsxwing · 2017-04-24T22:51:45Z

external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaRDD.scala

    val consumer = if (useConsumerCache) {
      CachedKafkaConsumer.init(cacheInitialCapacity, cacheMaxCapacity, cacheLoadFactor)
-      if (context.attemptNumber > 1) {
+      if (context.attemptNumber >= 1) {


Fix attemptNumber. It starts with 0.

SparkQA · 2017-04-24T23:12:34Z

Test build #76120 has finished for PR 17752 at commit cc05f2b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2017-04-24T23:34:03Z

external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala

 * but processing the same topicpartition and group id in multiple threads is usually bad anyway.
 */
-private[kafka010] case class CachedKafkaConsumer private(
+private[kafka010] case class CachedKafkaConsumer(


Isnt it cleaner to create a new method create in object CachedKafkaConsumer which returns a new CachedKafkaConsumer without putting it in the map?

SparkQA · 2017-04-25T00:26:24Z

Test build #76124 has finished for PR 17752 at commit b59573f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2017-04-27T20:58:19Z

LGTM. Merging to master and 2.2

…ion for batch Kafka DataFrame ## What changes were proposed in this pull request? Cancel a batch Kafka query but one of task cannot be cancelled, and rerun the same DataFrame may cause ConcurrentModificationException because it may launch two tasks sharing the same group id. This PR always create a new consumer when `reuseKafkaConsumer = false` to avoid ConcurrentModificationException. It also contains other minor fixes. ## How was this patch tested? Jenkins. Author: Shixiong Zhu <[email protected]> Closes #17752 from zsxwing/kafka-fix. (cherry picked from commit 823baca) Signed-off-by: Tathagata Das <[email protected]>

Fix some kafka issues

cc05f2b

zsxwing commented Apr 24, 2017

View reviewed changes

tdas reviewed Apr 24, 2017

View reviewed changes

Address

b59573f

asfgit closed this in 823baca Apr 27, 2017

zsxwing deleted the kafka-fix branch April 27, 2017 21:03

[SPARK-20452][SS][Kafka]Fix a potential ConcurrentModificationException for batch Kafka DataFrame #17752

[SPARK-20452][SS][Kafka]Fix a potential ConcurrentModificationException for batch Kafka DataFrame #17752

Uh oh!

Conversation

zsxwing commented Apr 24, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zsxwing Apr 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zsxwing Apr 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zsxwing Apr 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 24, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 25, 2017

Uh oh!

tdas commented Apr 27, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zsxwing Apr 24, 2017 •

edited

Loading

zsxwing Apr 24, 2017 •

edited

Loading

zsxwing Apr 24, 2017 •

edited

Loading