[SPARK-25705][BUILD][STREAMING][test-maven] Remove Kafka 0.8 integration #22703

srowen · 2018-10-11T22:14:12Z

What changes were proposed in this pull request?

Remove Kafka 0.8 integration

How was this patch tested?

Existing tests, build scripts

srowen · 2018-10-11T22:14:50Z

python/pyspark/streaming/tests.py

        self.ssc.stop(True, True)


-class KafkaStreamTests(PySparkStreamingTestCase):


Am I correct that all of this Pyspark Kafka integration is 0.8, not 0.10? that structured streaming is the only option now for Pyspark + Kafka?

Yup. Kafka 0.10 support at PySpark was not added per SPARK-16534.

OK, you or @holdenk or @koeninger might want to skim this change to make sure I didn't delete Pyspark + Structured Streaming + Kafka support inadvertentently. I don't think so, but it's not my area so much.

I skimmed and seems fine. Will try to take a look few times more while it's open. (don't block by me)

SparkQA · 2018-10-11T22:18:20Z

Test build #97285 has finished for PR 22703 at commit 4f0bab8.

This patch fails executing the dev/run-tests script.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-12T07:05:02Z

Test build #97294 has finished for PR 22703 at commit 6e34ce7.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-10-12T08:35:12Z

retest this please

SparkQA · 2018-10-12T12:21:22Z

Test build #97300 has finished for PR 22703 at commit 6e34ce7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-10-12T14:59:30Z

retest this please

SparkQA · 2018-10-12T19:39:21Z

Test build #97307 has finished for PR 22703 at commit 6e34ce7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

koeninger · 2018-10-12T19:57:57Z

docs/streaming-kafka-0-10-integration.md

+The Spark Streaming integration for Kafka 0.10 provides simple parallelism, 1:1 correspondence between Kafka 
+partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration uses 
+the [new Kafka consumer API](https://kafka.apache.org/documentation.html#newconsumerapi) instead of the simple API, 
+there are notable differences in usage. This version of the integration is marked as experimental, so the API is 


Do we want to leave the new integration marked as experimental if it is now the only available one?

Yeah, good general point. Is the kafka 0.10 integration at all experimental anymore? Is anything that survives from 2.x to 3.x? I'd say "no" in almost all cases. What are your personal views on that?

koeninger · 2018-10-13T00:55:53Z

I guess the only argument to the contrary would be if some of the known issues end up being better solved with minor API changes, leaving it marked as experimental would technically be better notice. I personally think it's clearer to remove the experimental.

…

On Fri, Oct 12, 2018, 6:18 PM Sean Owen ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In docs/streaming-kafka-0-10-integration.md <#22703 (comment)>: > @@ -3,7 +3,11 @@ layout: global title: Spark Streaming + Kafka Integration Guide (Kafka broker version 0.10.0 or higher) --- -The Spark Streaming integration for Kafka 0.10 is similar in design to the 0.8 [Direct Stream approach](streaming-kafka-0-8-integration.html#approach-2-direct-approach-no-receivers). It provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration uses the [new Kafka consumer API](http://kafka.apache.org/documentation.html#newconsumerapi) instead of the simple API, there are notable differences in usage. This version of the integration is marked as experimental, so the API is potentially subject to change. +The Spark Streaming integration for Kafka 0.10 provides simple parallelism, 1:1 correspondence between Kafka +partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration uses +the [new Kafka consumer API](https://kafka.apache.org/documentation.html#newconsumerapi) instead of the simple API, +there are notable differences in usage. This version of the integration is marked as experimental, so the API is Yeah, good general point. Is the kafka 0.10 integration at all experimental anymore? Is anything that survives from 2.x to 3.x? I'd say "no" in almost all cases. What are your personal views on that? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#22703 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAGAB1mUBOw72gARWj6GcclgXDimi6KIks5ukSNggaJpZM4XYdgE> .

… code; declare existing Kafka integration non-experimental

SparkQA · 2018-10-13T19:04:41Z

Test build #97343 has finished for PR 22703 at commit 3d44772.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-13T21:17:45Z

Test build #4377 has finished for PR 22703 at commit 6e34ce7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-14T01:26:40Z

Test build #4378 has finished for PR 22703 at commit 3d44772.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2018-10-15T16:52:30Z

So far looking good to those who have looked, and it passed Maven and SBT tests. I think this will help reduce complexity a bit (and test time in some cases), so will go for it tomorrow.

## What changes were proposed in this pull request? Remove Kafka 0.8 integration ## How was this patch tested? Existing tests, build scripts Closes apache#22703 from srowen/SPARK-25705. Authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]>

Remove Kafka 0.8 integration

4f0bab8

srowen commented Oct 11, 2018

View reviewed changes

Fix reference to streaming-kafka

6e34ce7

koeninger reviewed Oct 12, 2018

View reviewed changes

srowen changed the title ~~[SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 integration~~ [SPARK-25705][BUILD][STREAMING][test-maven] Remove Kafka 0.8 integration Oct 13, 2018

Remove spark.streaming.kafka.maxRetries doc as it is no longer in the…

3d44772

… code; declare existing Kafka integration non-experimental

asfgit closed this in 703e6da Oct 16, 2018

srowen deleted the SPARK-25705 branch October 24, 2018 16:47

		self.ssc.stop(True, True)


		class KafkaStreamTests(PySparkStreamingTestCase):

[SPARK-25705][BUILD][STREAMING][test-maven] Remove Kafka 0.8 integration #22703

[SPARK-25705][BUILD][STREAMING][test-maven] Remove Kafka 0.8 integration #22703

Uh oh!

Conversation

srowen commented Oct 11, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

srowen Oct 11, 2018

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Oct 12, 2018

Choose a reason for hiding this comment

Uh oh!

srowen Oct 12, 2018

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Oct 14, 2018

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 11, 2018

Uh oh!

SparkQA commented Oct 12, 2018

Uh oh!

HyukjinKwon commented Oct 12, 2018

Uh oh!

SparkQA commented Oct 12, 2018

Uh oh!

HyukjinKwon commented Oct 12, 2018

Uh oh!

SparkQA commented Oct 12, 2018

Uh oh!

koeninger Oct 12, 2018

Choose a reason for hiding this comment

Uh oh!

srowen Oct 12, 2018

Choose a reason for hiding this comment

Uh oh!

koeninger commented Oct 13, 2018 via email

Uh oh!

SparkQA commented Oct 13, 2018

Uh oh!

SparkQA commented Oct 13, 2018

Uh oh!

SparkQA commented Oct 14, 2018

Uh oh!

srowen commented Oct 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants