Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
Original file line number Diff line number Diff line change
Expand Up @@ -991,9 +991,9 @@ private[spark] object SparkSubmitUtils {

// Exposed for testing.
// These components are used to make the default exclusion rules for Spark dependencies.
// We need to specify each component explicitly, otherwise we miss spark-streaming-kafka-0-8 and
// other spark-streaming utility components. Underscore is there to differentiate between
// spark-streaming_2.1x and spark-streaming-kafka-0-8-assembly_2.1x
// We need to specify each component explicitly, otherwise we miss
// spark-streaming utility components. Underscore is there to differentiate between
// spark-streaming_2.1x and spark-streaming-kafka-0-10-assembly_2.1x
val IVY_DEFAULT_EXCLUDES = Seq("catalyst_", "core_", "graphx_", "kvstore_", "launcher_", "mllib_",
"mllib-local_", "network-common_", "network-shuffle_", "repl_", "sketch_", "sql_", "streaming_",
"tags_", "unsafe_")
Expand Down
4 changes: 3 additions & 1 deletion dev/create-release/release-build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,9 @@ SCALA_2_10_PROFILES="-Pscala-2.10"
SCALA_2_11_PROFILES=
if [[ $SPARK_VERSION > "2.3" ]]; then
BASE_PROFILES="$BASE_PROFILES -Pkubernetes"
SCALA_2_11_PROFILES="-Pkafka-0-8"
if [[ $SPARK_VERSION < "3.0." ]]; then
SCALA_2_11_PROFILES="-Pkafka-0-8"
fi
else
PUBLISH_SCALA_2_10=1
fi
Expand Down
2 changes: 1 addition & 1 deletion dev/mima
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ set -e
FWDIR="$(cd "`dirname "$0"`"/..; pwd)"
cd "$FWDIR"

SPARK_PROFILES="-Pmesos -Pkafka-0-8 -Pkubernetes -Pyarn -Pspark-ganglia-lgpl -Pkinesis-asl -Phive-thriftserver -Phive"
SPARK_PROFILES="-Pmesos -Pkubernetes -Pyarn -Pspark-ganglia-lgpl -Pkinesis-asl -Phive-thriftserver -Phive"
TOOLS_CLASSPATH="$(build/sbt -DcopyDependencies=false "export tools/fullClasspath" | tail -n1)"
OLD_DEPS_CLASSPATH="$(build/sbt -DcopyDependencies=false $SPARK_PROFILES "export oldDeps/fullClasspath" | tail -n1)"

Expand Down
1 change: 0 additions & 1 deletion dev/run-tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -332,7 +332,6 @@ def build_spark_sbt(hadoop_version):
# Enable all of the profiles for the build:
build_profiles = get_hadoop_profiles(hadoop_version) + modules.root.build_profile_flags
sbt_goals = ["test:package", # Build test jars as some tests depend on them
"streaming-kafka-0-8-assembly/assembly",
"streaming-kinesis-asl-assembly/assembly"]
profiles_and_goals = build_profiles + sbt_goals

Expand Down
1 change: 0 additions & 1 deletion dev/sbt-checkstyle
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ ERRORS=$(echo -e "q\n" \
| build/sbt \
-Pkinesis-asl \
-Pmesos \
-Pkafka-0-8 \
-Pkubernetes \
-Pyarn \
-Phive \
Expand Down
1 change: 0 additions & 1 deletion dev/scalastyle
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ ERRORS=$(echo -e "q\n" \
| build/sbt \
-Pkinesis-asl \
-Pmesos \
-Pkafka-0-8 \
-Pkubernetes \
-Pyarn \
-Phive \
Expand Down
22 changes: 0 additions & 22 deletions dev/sparktestsupport/modules.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,24 +252,6 @@ def __hash__(self):
)


streaming_kafka = Module(
name="streaming-kafka-0-8",
dependencies=[streaming],
source_file_regexes=[
"external/kafka-0-8",
"external/kafka-0-8-assembly",
],
build_profile_flags=[
"-Pkafka-0-8",
],
environ={
"ENABLE_KAFKA_0_8_TESTS": "1"
},
sbt_test_goals=[
"streaming-kafka-0-8/test",
]
)

streaming_kafka_0_10 = Module(
name="streaming-kafka-0-10",
dependencies=[streaming],
Expand Down Expand Up @@ -374,15 +356,11 @@ def __hash__(self):
dependencies=[
pyspark_core,
streaming,
streaming_kafka,
streaming_kinesis_asl
],
source_file_regexes=[
"python/pyspark/streaming"
],
environ={
"ENABLE_KAFKA_0_8_TESTS": "1"
},
python_test_goals=[
"pyspark.streaming.util",
"pyspark.streaming.tests",
Expand Down
2 changes: 1 addition & 1 deletion dev/test-dependencies.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ export LC_ALL=C
# TODO: This would be much nicer to do in SBT, once SBT supports Maven-style resolution.

# NOTE: These should match those in the release publishing script
HADOOP2_MODULE_PROFILES="-Phive-thriftserver -Pmesos -Pkafka-0-8 -Pkubernetes -Pyarn -Phive"
HADOOP2_MODULE_PROFILES="-Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive"
MVN="build/mvn"
HADOOP_PROFILES=(
hadoop-2.7
Expand Down
9 changes: 0 additions & 9 deletions docs/building-spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,15 +89,6 @@ like ZooKeeper and Hadoop itself.
## Building with Kubernetes support

./build/mvn -Pkubernetes -DskipTests clean package

## Building with Kafka 0.8 support

Kafka 0.8 support must be explicitly enabled with the `kafka-0-8` profile.
Note: Kafka 0.8 support is deprecated as of Spark 2.3.0.

./build/mvn -Pkafka-0-8 -DskipTests clean package

Kafka 0.10 support is still automatically built.

## Building submodules individually

Expand Down
12 changes: 1 addition & 11 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -2017,7 +2017,7 @@ showDF(properties, numRows = 200, truncate = FALSE)
<td>
Maximum rate (number of records per second) at which data will be read from each Kafka
partition when using the new Kafka direct stream API. See the
<a href="streaming-kafka-integration.html">Kafka Integration guide</a>
<a href="streaming-kafka-0-10-integration.html">Kafka Integration guide</a>
for more details.
</td>
</tr>
Expand All @@ -2029,16 +2029,6 @@ showDF(properties, numRows = 200, truncate = FALSE)
partition when using the new Kafka direct stream API.
</td>
</tr>
<tr>
<td><code>spark.streaming.kafka.maxRetries</code></td>
<td>1</td>
<td>
Maximum number of consecutive retries the driver will make in order to find
the latest offsets on the leader of each partition (a default value of 1
means that the driver will make a maximum of 2 attempts). Only applies to
the new Kafka direct stream API.
</td>
</tr>
<tr>
<td><code>spark.streaming.ui.retainedBatches</code></td>
<td>1000</td>
Expand Down
5 changes: 4 additions & 1 deletion docs/streaming-kafka-0-10-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,10 @@ layout: global
title: Spark Streaming + Kafka Integration Guide (Kafka broker version 0.10.0 or higher)
---

The Spark Streaming integration for Kafka 0.10 is similar in design to the 0.8 [Direct Stream approach](streaming-kafka-0-8-integration.html#approach-2-direct-approach-no-receivers). It provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration uses the [new Kafka consumer API](http://kafka.apache.org/documentation.html#newconsumerapi) instead of the simple API, there are notable differences in usage. This version of the integration is marked as experimental, so the API is potentially subject to change.
The Spark Streaming integration for Kafka 0.10 provides simple parallelism, 1:1 correspondence between Kafka
partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration uses
the [new Kafka consumer API](https://kafka.apache.org/documentation.html#newconsumerapi) instead of the simple API,
there are notable differences in usage.

### Linking
For Scala/Java applications using SBT/Maven project definitions, link your streaming application with the following artifact (see [Linking section](streaming-programming-guide.html#linking) in the main programming guide for further information).
Expand Down
Loading