-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25705][BUILD][STREAMING][test-maven] Remove Kafka 0.8 integration #22703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| self.ssc.stop(True, True) | ||
|
|
||
|
|
||
| class KafkaStreamTests(PySparkStreamingTestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct that all of this Pyspark Kafka integration is 0.8, not 0.10? that structured streaming is the only option now for Pyspark + Kafka?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. Kafka 0.10 support at PySpark was not added per SPARK-16534.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, you or @holdenk or @koeninger might want to skim this change to make sure I didn't delete Pyspark + Structured Streaming + Kafka support inadvertentently. I don't think so, but it's not my area so much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed and seems fine. Will try to take a look few times more while it's open. (don't block by me)
|
Test build #97285 has finished for PR 22703 at commit
|
|
Test build #97294 has finished for PR 22703 at commit
|
|
retest this please |
|
Test build #97300 has finished for PR 22703 at commit
|
|
retest this please |
|
Test build #97307 has finished for PR 22703 at commit
|
| The Spark Streaming integration for Kafka 0.10 provides simple parallelism, 1:1 correspondence between Kafka | ||
| partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration uses | ||
| the [new Kafka consumer API](https://kafka.apache.org/documentation.html#newconsumerapi) instead of the simple API, | ||
| there are notable differences in usage. This version of the integration is marked as experimental, so the API is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to leave the new integration marked as experimental if it is now the only available one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good general point. Is the kafka 0.10 integration at all experimental anymore? Is anything that survives from 2.x to 3.x? I'd say "no" in almost all cases. What are your personal views on that?
|
I guess the only argument to the contrary would be if some of the known
issues end up being better solved with minor API changes, leaving it marked
as experimental would technically be better notice.
I personally think it's clearer to remove the experimental.
…On Fri, Oct 12, 2018, 6:18 PM Sean Owen ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In docs/streaming-kafka-0-10-integration.md
<#22703 (comment)>:
> @@ -3,7 +3,11 @@ layout: global
title: Spark Streaming + Kafka Integration Guide (Kafka broker version 0.10.0 or higher)
---
-The Spark Streaming integration for Kafka 0.10 is similar in design to the 0.8 [Direct Stream approach](streaming-kafka-0-8-integration.html#approach-2-direct-approach-no-receivers). It provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration uses the [new Kafka consumer API](http://kafka.apache.org/documentation.html#newconsumerapi) instead of the simple API, there are notable differences in usage. This version of the integration is marked as experimental, so the API is potentially subject to change.
+The Spark Streaming integration for Kafka 0.10 provides simple parallelism, 1:1 correspondence between Kafka
+partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration uses
+the [new Kafka consumer API](https://kafka.apache.org/documentation.html#newconsumerapi) instead of the simple API,
+there are notable differences in usage. This version of the integration is marked as experimental, so the API is
Yeah, good general point. Is the kafka 0.10 integration at all
experimental anymore? Is anything that survives from 2.x to 3.x? I'd say
"no" in almost all cases. What are your personal views on that?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22703 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAGAB1mUBOw72gARWj6GcclgXDimi6KIks5ukSNggaJpZM4XYdgE>
.
|
… code; declare existing Kafka integration non-experimental
|
Test build #97343 has finished for PR 22703 at commit
|
|
Test build #4377 has finished for PR 22703 at commit
|
|
Test build #4378 has finished for PR 22703 at commit
|
|
So far looking good to those who have looked, and it passed Maven and SBT tests. I think this will help reduce complexity a bit (and test time in some cases), so will go for it tomorrow. |
## What changes were proposed in this pull request? Remove Kafka 0.8 integration ## How was this patch tested? Existing tests, build scripts Closes apache#22703 from srowen/SPARK-25705. Authored-by: Sean Owen <[email protected]> Signed-off-by: Sean Owen <[email protected]>
What changes were proposed in this pull request?
Remove Kafka 0.8 integration
How was this patch tested?
Existing tests, build scripts