[SPARK-28489][SS] Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets by zsxwing · Pull Request #25237 · apache/spark

zsxwing · 2019-07-24T00:11:42Z

What changes were proposed in this pull request?

KafkaOffsetRangeCalculator.getRanges may drop offsets due to round off errors. The test added in this PR is one example.

This PR rewrites the logic in KafkaOffsetRangeCalculator.getRanges to ensure it never drops offsets.

How was this patch tested?

The regression test.

zsxwing · 2019-07-24T00:13:13Z

...kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculator.scala

+        val tp = range.topicPartition
+        val size = range.size
+        // number of partitions to divvy up this topic partition to
+        val parts = math.max(math.round(size.toDouble / totalSize * minPartitions.get), 1).toInt


This one ensures we never drop a TopicPartition.

The ratio calculation looks good, but round seems to generate less partitions. Is there a reason to choose round instead of ceiling?

Yeah I'm seeing the same. Suppose 4 offsetRanges divide 1 partition for each 0.25, then we lost 1. The number of lost partitions may vary.

In other words, if we use ceil, it may overflow the minimum partitions, and the number of exceeding partitions may vary. We don't guarantee for this calculator to return partitions closest to minimum partitions, so it seems OK.

If we really would like to make this strict, we could apply "allocation" - calculating ratio on each offsetRange, and allocate partitions to each offsetRange according to ratio (apply minimum of 1 for safeness), and allocate extra partitions to some offsetRanges if there're remaining partitions. Not sure we would like to deal with complexity.

Yep, it's a hint. And when the number of partitions is less than minPartition, we will try our best to split. Agreed that the option name minPartition is not accurate.

Then, could you update the document instead in a more accurate way?

@dongjoon-hyun I think the doc for this method is accurate :

spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculator.scala

Line 38 in c4010a2

* The number of Spark tasks will be *approximately* `numPartitions`. It can be less or more

I meant the structured streaming Kafka integration~

A few days ago, minPartitions is added to the documentation for master/branch-2.4 via #25219 .

zsxwing · 2019-07-24T00:14:13Z

...kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculator.scala

+        var startOffset = range.fromOffset
+        (0 until parts).map { part =>
+          // Fine to do integer division. Last partition will consume all the round off errors
+          val thisPartition = remaining / (parts - part)


thisPartition will be the same as remaining for the last part. This will ensure we always get a KafkaOffsetRange ending with range.untilOffset.

...-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculatorSuite.scala

SparkQA · 2019-07-24T00:43:01Z

Test build #108067 has finished for PR 25237 at commit d2d3e95.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

Thank you for the fix, @zsxwing . Could you handle the corner case like the following together in this PR? Although this is not a regressionn(master branch is the same), but currently we have less number of partitions than the given minPartitions for some cases. For example, the following passed.

  test("with minPartition = 4") {
    val options = new CaseInsensitiveStringMap(Map("minPartitions" -> "4").asJava)
    val calc = KafkaOffsetRangeCalculator(options)
    assert(
      calc.getRanges(
        fromOffsets = Map(tp1 -> 0, tp2 -> 0, tp3 -> 0),
        untilOffsets = Map(tp1 -> 29, tp2 -> 29, tp3 -> 29)) ==
        Seq(
          KafkaOffsetRange(tp1, 0, 29, None),
          KafkaOffsetRange(tp2, 0, 29, None),
          KafkaOffsetRange(tp3, 0, 29, None)))
  }

dongjoon-hyun · 2019-07-24T01:28:44Z

cc @tdas , @HeartSaVioR , @gaborgsomogyi

Also, cc @gatorsmile since this is reported as a blocker issue for 2.4.4.
I'll include this to 2.4.4 release.

HeartSaVioR · 2019-07-24T03:14:22Z

...kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculator.scala

+          offsetRange
        }
-      }
+      }.filter(_.size > 0)


I'm not sure it could be possible, but suppose it could be possible (as we have this, and we are doing integer division), then we still have chance to have less than minPartitions even the calculation on ratio-based distribution is correct.

HeartSaVioR · 2019-07-24T03:33:33Z

Hmm... I'm now reading comment on getRanges. I'm not sure numPartitions is actually minPartitions (so some typos on javadoc - maybe better to fix them here), but if they're same, below comment would say the method doesn't guarantee returning count of partitions is not necessary to be equal or greater than minPartitions.

The number of Spark tasks will be approximately numPartitions. It can be less or more depending on rounding errors or Kafka partitions that didn't receive any new data.

spark/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculator.scala

Lines 32 to 46 in d67b98e

    
             /** 
        
              * Calculate the offset ranges that we are going to process this batch. If `minPartitions` 
        
              * is not set or is set less than or equal the number of `topicPartitions` that we're going to 
        
              * consume, then we fall back to a 1-1 mapping of Spark tasks to Kafka partitions. If 
        
              * `numPartitions` is set higher than the number of our `topicPartitions`, then we will split up 
        
              * the read tasks of the skewed partitions to multiple Spark tasks. 
        
              * The number of Spark tasks will be *approximately* `numPartitions`. It can be less or more 
        
              * depending on rounding errors or Kafka partitions that didn't receive any new data. 
        
              * 
        
              * Empty ranges (`KafkaOffsetRange.size <= 0`) will be dropped. 
        
              */ 
        
             def getRanges( 
        
                 fromOffsets: PartitionOffsetMap, 
        
                 untilOffsets: PartitionOffsetMap, 
        
                 executorLocations: Seq[String] = Seq.empty): Seq[KafkaOffsetRange] = {

Please ignore my review comments if the javadoc meant it. Looks great.

SparkQA · 2019-07-24T20:28:14Z

Test build #108129 has finished for PR 25237 at commit c4010a2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

+1, LGTM. Thank you for the fix. For the documentation, let's update it later.
Merged to master/2.4.

… may drop offsets ## What changes were proposed in this pull request? `KafkaOffsetRangeCalculator.getRanges` may drop offsets due to round off errors. The test added in this PR is one example. This PR rewrites the logic in `KafkaOffsetRangeCalculator.getRanges` to ensure it never drops offsets. ## How was this patch tested? The regression test. Closes #25237 from zsxwing/fix-range. Authored-by: Shixiong Zhu <zsxwing@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit b9c2521) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

HeartSaVioR · 2019-08-02T01:24:08Z

#25332 is a follow-up PR to address documentation.

dongjoon-hyun · 2019-08-02T16:16:30Z

Thanks, @HeartSaVioR ! It's merged.

… may drop offsets ## What changes were proposed in this pull request? `KafkaOffsetRangeCalculator.getRanges` may drop offsets due to round off errors. The test added in this PR is one example. This PR rewrites the logic in `KafkaOffsetRangeCalculator.getRanges` to ensure it never drops offsets. ## How was this patch tested? The regression test. Closes apache#25237 from zsxwing/fix-range. Authored-by: Shixiong Zhu <zsxwing@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit b9c2521) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

fix range

d2d3e95

zsxwing changed the title ~~[SPARK-28489]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets~~ [SPARK-28489][SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets Jul 24, 2019

zsxwing added the STRUCTURED STREAMING label Jul 24, 2019

zsxwing commented Jul 24, 2019

View reviewed changes

dongjoon-hyun changed the title ~~[SPARK-28489][SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets~~ [SPARK-28489][SS] Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets Jul 24, 2019

dongjoon-hyun reviewed Jul 24, 2019

View reviewed changes

...-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculatorSuite.scala Outdated Show resolved Hide resolved

dongjoon-hyun requested changes Jul 24, 2019

View reviewed changes

HeartSaVioR reviewed Jul 24, 2019

View reviewed changes

address

c4010a2

dongjoon-hyun approved these changes Jul 26, 2019

View reviewed changes

dongjoon-hyun closed this in b9c2521 Jul 26, 2019

zsxwing deleted the fix-range branch July 26, 2019 07:42

Conversation

zsxwing commented Jul 24, 2019

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SparkQA commented Jul 24, 2019

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jul 24, 2019

Uh oh!

HeartSaVioR Jul 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Jul 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jul 24, 2019

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Aug 2, 2019

Uh oh!

dongjoon-hyun commented Aug 2, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

dongjoon-hyun left a comment •

edited

Loading

HeartSaVioR Jul 24, 2019 •

edited

Loading

HeartSaVioR commented Jul 24, 2019 •

edited

Loading