[SPARK-27420][DSTREAMS][Kinesis] KinesisInputDStream should expose a way to configure CloudWatch metrics #24651

sekikn · 2019-05-20T15:04:30Z

What changes were proposed in this pull request?

KinesisInputDStream currently does not provide a way to disable
CloudWatch metrics push. Its default level is "DETAILED" which pushes
10s of metrics every 10 seconds. When dealing with multiple streaming
jobs this add up pretty quickly, leading to thousands of dollars in cost.
To address this problem, this PR adds interfaces for accessing
KinesisClientLibConfiguration's withMetrics and
withMetricsEnabledDimensions methods to KinesisInputDStream
so that users can configure KCL's metrics levels and dimensions.

How was this patch tested?

By running updated unit tests in KinesisInputDStreamBuilderSuite.
In addition, I ran a Streaming job with MetricsLevel.NONE and confirmed:

there's no data point for the "Operation", "Operation, ShardId" and "WorkerIdentifier" dimensions on the AWS management console
there's no DEBUG level message from Amazon KCL, such as "Successfully published xx datums."

Please review http://spark.apache.org/contributing.html before opening a pull request.

…way to configure CloudWatch metrics KinesisInputDStream currently does not provide a way to disable CloudWatch metrics push. Its default level is "DETAILED" which pushes 10s of metrics every 10 seconds. When dealing with multiple streaming jobs this add up pretty quickly, leading to thousands of dollars in cost. To address this problem, this PR adds interfaces for accessing KinesisClientLibConfiguration's `withMetrics` and `withMetricsEnabledDimensions` methods to KinesisInputDStream so that users can configure KCL's metrics levels and dimensions.

sekikn · 2019-08-19T06:28:53Z

@brkyvz Would you take a look into this PR? I think you're an expert on Kinesis-Streaming integration and have done several contributions and reviews on it.

sarutak · 2019-08-19T07:08:49Z

ok to test.

SparkQA · 2019-08-19T07:17:12Z

Test build #109327 has finished for PR 24651 at commit c4ada94.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-19T08:35:07Z

Test build #109330 has finished for PR 24651 at commit a7800e6.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sekikn · 2019-08-22T08:58:17Z

Hmm, I'm not sure why the CI failed on PySpark, because it succeeds with the same options on my local environment.

$ git checkout master
$ curl -sLO https://github.com/apache/spark/pull/24651.patch
$ git apply 24651.patch
$ build/mvn clean install -Pkinesis-asl -DskipTests

(snip)

[INFO] Spark Project Kinesis Assembly ..................... SUCCESS [  5.044 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  20:39 min
[INFO] Finished at: 2019-08-22T15:20:42+09:00
[INFO] ------------------------------------------------------------------------
$ ENABLE_KINESIS_TESTS=1 python/run-tests --modules=pyspark-streaming,pyspark-mllib,pyspark-ml
Running PySpark tests. Output is in /home/sekikn/repos/spark/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'python3.6', 'pypy']
Will test the following Python modules: ['pyspark-streaming', 'pyspark-mllib', 'pyspark-ml']
Starting test(pypy): pyspark.streaming.tests.test_dstream
Starting test(pypy): pyspark.streaming.tests.test_listener
Starting test(pypy): pyspark.streaming.tests.test_context
Starting test(pypy): pyspark.streaming.tests.test_kinesis

(snip)

Finished test(pypy): pyspark.streaming.tests.test_kinesis (178s)

(snip)

Tests passed in 1140 seconds

Skipped tests in pyspark.ml.tests.test_image with python2.7:
    test_read_images_multiple_times (pyspark.ml.tests.test_image.ImageFileFormatOnHiveContextTest) ... skipped 'Hive is not available.'

Skipped tests in pyspark.ml.tests.test_image with python3.6:
    test_read_images_multiple_times (pyspark.ml.tests.test_image.ImageFileFormatOnHiveContextTest) ... skipped 'Hive is not available.'
$ echo $?
0

sarutak · 2019-08-22T17:09:53Z

Let's retry test on Jenkins and confirm whether the failure is caused by flaky tests.

sarutak · 2019-08-22T17:10:01Z

retest this please.

...rnal/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala

SparkQA · 2019-08-22T17:44:11Z

Test build #109590 has finished for PR 24651 at commit a7800e6.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-23T11:00:18Z

Test build #109640 has finished for PR 24651 at commit fd2229c.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

Just cancel the approval.

sekikn · 2019-09-02T06:55:03Z

retest this please.

sarutak · 2019-09-02T07:26:15Z

retest this please.

suraj95 · 2019-09-02T07:48:26Z

Hello,

I am new to this community. How can I start contributing.

gaborgsomogyi · 2019-09-02T08:10:54Z

If it's still failing one can merge the latest master on top of this change.

sarutak · 2019-09-02T08:11:21Z

Hi @suraj95 , could you refer to the contribution guide?
Also, please post general questions to the user mailing list next time.
You can register to the maling list by following this instruction.

gaborgsomogyi

Basically looks good.
Maybe this feature can be mentioned in streaming-kinesis-integration.md.

SparkQA · 2019-09-02T08:21:49Z

Test build #110004 has finished for PR 24651 at commit fd2229c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sarutak · 2019-09-02T08:33:42Z

The last unit test failure might be irrelevant to the aws-jdk stuff but it's better to merge master and push it. @sekikn , Could you do that?
I also think it's good to document about the new configuration.
@sekikn you can include the documentation within this PR otherwise we will open a followup PR.

SparkQA · 2019-09-02T09:47:38Z

Test build #110008 has finished for PR 24651 at commit d487679.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sarutak · 2019-09-02T12:23:01Z

It LGTM but I want to have one more committer reviews this too because I'm not so familiar with Kinesis.
@srowen @brkyvz Can either of you take a look at this?

srowen

Very minor comment about docs.
Seems OK, just plumbing through another optional config.

srowen · 2019-09-02T14:25:55Z

...rnal/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala

+     * [[KinesisClientLibConfiguration.DEFAULT_METRICS_ENABLED_DIMENSIONS]]
+     * if no custom value is specified.
+     *
+     * @param metricsEnabledDimensions [[Set[String]]] to specify


This is a small thing, but I know we have had problems generating scaladoc with references to library code. It might be worth building the doc HTML (only) as in https://github.com/apache/spark/blob/master/docs/README.md to make sure.

You don't really have to link a type like Set.
Also we usually do a continuation indent of two spaces.

srowen · 2019-09-02T14:26:27Z

...rnal/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala

+     * @param metricsEnabledDimensions [[Set[String]]] to specify
+     *                                 the enabled CloudWatch metrics dimensions
+     * @return Reference to this [[KinesisInputDStream.Builder]]
+     * @see [[https://docs.aws.amazon.com/streams/latest/dev/monitoring-with-kcl.html#metric-levels]]


If it helps avoid turning off scalastyle, you could just write:

See [[...]]]

in the main body of the doc above. That seems short enough.

@see

* Kinesis integration documentation * Add explanation and usage examples for the new API * Fix the existing Java grammatical mistakes * Scaladoc * Insert newlines between the @see directives and URLs to avoid disabling scalastyle * Remove unnecessary brackets around a standard class * Use two spaces for a continuation indent

sekikn · 2019-09-07T13:29:25Z

@gaborgsomogyi @srowen Thank you for the review! I've just updated the PR following your advice.

As @srowen pointed out, some references to other classes (e.g., KinesisClientLibConfiguration and MetricsLevel) in the scaladoc don't seem to work. But generating documents itself succeeds and there is no problem to read them, so I left the brackets as they were.

SparkQA · 2019-09-07T14:07:58Z

Test build #110278 has finished for PR 24651 at commit c9341a0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2019-09-09T00:49:09Z

Merged to master

sekikn · 2019-09-09T04:42:49Z

@srowen @sarutak Thank you for reviewing and merging!
BTW, my client who uses v2.4.x is facing this problem, so I'd like to backport this fix into the 2.4 branch. Would you review if I submit a PR for branch-2.4?

srowen · 2019-09-09T10:11:45Z

We usually don't back-port minor features to maintenance branches. I think we'd want to see it's of broader interest to do so. Can you continue to work around in 2.4?

…way to configure CloudWatch metrics ## What changes were proposed in this pull request? KinesisInputDStream currently does not provide a way to disable CloudWatch metrics push. Its default level is "DETAILED" which pushes 10s of metrics every 10 seconds. When dealing with multiple streaming jobs this add up pretty quickly, leading to thousands of dollars in cost. To address this problem, this PR adds interfaces for accessing KinesisClientLibConfiguration's `withMetrics` and `withMetricsEnabledDimensions` methods to KinesisInputDStream so that users can configure KCL's metrics levels and dimensions. ## How was this patch tested? By running updated unit tests in KinesisInputDStreamBuilderSuite. In addition, I ran a Streaming job with MetricsLevel.NONE and confirmed: * there's no data point for the "Operation", "Operation, ShardId" and "WorkerIdentifier" dimensions on the AWS management console * there's no DEBUG level message from Amazon KCL, such as "Successfully published xx datums." Please review http://spark.apache.org/contributing.html before opening a pull request. Closes apache#24651 from sekikn/SPARK-27420. Authored-by: Kengo Seki <[email protected]> Signed-off-by: Sean Owen <[email protected]>

dongjoon-hyun added DSTREAMS INPUT/OUTPUT labels Jun 14, 2019

Address scalastyle errors

a7800e6

sarutak requested changes Aug 22, 2019

View reviewed changes

...rnal/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala Outdated Show resolved Hide resolved

Fix a wrong scalastyle directive

fd2229c

sarutak previously approved these changes Aug 23, 2019

View reviewed changes

sarutak requested review from sarutak and removed request for sarutak August 23, 2019 13:34

sarutak mentioned this pull request Aug 26, 2019

[SPARK-28903][STREAMING][PYSPARK][TESTS] Fix AWS JDK version conflict that breaks Pyspark Kinesis tests #25559

Closed

This was referenced Aug 26, 2019

[SPARK-28701][test-hadoop3.2][test-java11][k8s] adding java11 support for pull request builds #25423

Closed

[SPARK-28855][CORE][ML][SQL][STREAMING] Remove outdated usages of Experimental, Evolving annotations #25558

Closed

gaborgsomogyi reviewed Sep 2, 2019

View reviewed changes

Merge remote-tracking branch 'upstream/master' into SPARK-27420

d487679

srowen reviewed Sep 2, 2019

View reviewed changes

sekikn added 2 commits September 7, 2019 22:22

Merge remote-tracking branch 'upstream/master' into SPARK-27420

90e4d90

srowen closed this in 1f056eb Sep 9, 2019

sekikn deleted the SPARK-27420 branch September 9, 2019 06:15

[SPARK-27420][DSTREAMS][Kinesis] KinesisInputDStream should expose a way to configure CloudWatch metrics #24651

[SPARK-27420][DSTREAMS][Kinesis] KinesisInputDStream should expose a way to configure CloudWatch metrics #24651

Uh oh!

Conversation

sekikn commented May 20, 2019

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

sekikn commented Aug 19, 2019

Uh oh!

sarutak commented Aug 19, 2019

Uh oh!

SparkQA commented Aug 19, 2019

Uh oh!

SparkQA commented Aug 19, 2019

Uh oh!

sekikn commented Aug 22, 2019

Uh oh!

sarutak commented Aug 22, 2019

Uh oh!

sarutak commented Aug 22, 2019

Uh oh!

Uh oh!

SparkQA commented Aug 22, 2019

Uh oh!

SparkQA commented Aug 23, 2019

Uh oh!

sekikn commented Sep 2, 2019

Uh oh!

sarutak commented Sep 2, 2019

Uh oh!

suraj95 commented Sep 2, 2019

Uh oh!

gaborgsomogyi commented Sep 2, 2019

Uh oh!

sarutak commented Sep 2, 2019

Uh oh!

gaborgsomogyi left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 2, 2019

Uh oh!

sarutak commented Sep 2, 2019

Uh oh!

SparkQA commented Sep 2, 2019

Uh oh!

sarutak commented Sep 2, 2019

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

srowen Sep 2, 2019

Choose a reason for hiding this comment

Uh oh!

srowen Sep 2, 2019

Choose a reason for hiding this comment

Uh oh!

sekikn commented Sep 7, 2019

Uh oh!

SparkQA commented Sep 7, 2019

Uh oh!

srowen commented Sep 9, 2019

Uh oh!

sekikn commented Sep 9, 2019

Uh oh!

srowen commented Sep 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants