[SPARK-29450][SS] Measure the number of output rows for streaming aggregation with append mode by HeartSaVioR · Pull Request #26104 · apache/spark

HeartSaVioR · 2019-10-13T07:16:54Z

What changes were proposed in this pull request?

This patch addresses missing metric, the number of output rows for streaming aggregation with append mode. Other modes are correctly measuring it.

Why are the changes needed?

Without the patch, the value for such metric is always 0.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test added. Also manually tested with below query:

query

import spark.implicits._

spark.conf.set("spark.sql.shuffle.partitions", "5")

val df = spark.readStream
  .format("rate")
  .option("rowsPerSecond", 1000)
  .load()
  .withWatermark("timestamp", "5 seconds")
  .selectExpr("timestamp", "mod(value, 100) as mod", "value")
  .groupBy(window($"timestamp", "10 seconds"), $"mod")
  .agg(max("value").as("max_value"), min("value").as("min_value"), avg("value").as("avg_value"))

val query = df
  .writeStream
  .format("memory")
  .option("queryName", "test")
  .outputMode("append")
  .start()

query.awaitTermination()

before the patch

after the patch

…append mode

HeartSaVioR · 2019-10-13T07:24:10Z

cc. @tdas @zsxwing @jose-torres @gaborgsomogyi
also cc. to @jaceklaskowski as reporter of the issue

SparkQA · 2019-10-13T10:49:19Z

Test build #111992 has finished for PR 26104 at commit bf14f96.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2019-11-18T03:48:26Z

I've added the query and screenshots to show which issue this PR fixes.

HeartSaVioR · 2019-11-18T03:48:35Z

retest this, please

SparkQA · 2019-11-18T07:45:05Z

Test build #113983 has finished for PR 26104 at commit bf14f96.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2019-12-06T12:45:42Z

@tdas @zsxwing @jose-torres @gaborgsomogyi Kindly reminder.

HyukjinKwon

Looks fine to me.

dongjoon-hyun · 2019-12-19T00:03:09Z

Retest this please.

SparkQA · 2019-12-19T04:04:16Z

Test build #115529 has finished for PR 26104 at commit bf14f96.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-12-19T09:15:32Z

Merged to master.

HeartSaVioR · 2019-12-19T09:58:42Z

Thanks all for reviewing and merging!

gatorsmile · 2020-01-14T04:09:23Z

                  finished = true
                  null
                } else {
+                  numOutputRows += 1


A regression introduced in #18107 ?

Yes, looks like so. The API seemed to be revised completely in #18107 and I don't have background though.

gatorsmile · 2020-01-15T03:18:18Z

@HeartSaVioR Could you help backport this to 2.4?

HeartSaVioR · 2020-01-15T03:24:27Z

Ah yes I didn't get the intention on comment and now I'm seeing the intention. Happy to submit a PR for porting back of this. Thanks!

…regation with append mode ### What changes were proposed in this pull request? This patch addresses missing metric, the number of output rows for streaming aggregation with append mode. Other modes are correctly measuring it. ### Why are the changes needed? Without the patch, the value for such metric is always 0. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Unit test added. Also manually tested with below query: > query ``` import spark.implicits._ spark.conf.set("spark.sql.shuffle.partitions", "5") val df = spark.readStream .format("rate") .option("rowsPerSecond", 1000) .load() .withWatermark("timestamp", "5 seconds") .selectExpr("timestamp", "mod(value, 100) as mod", "value") .groupBy(window($"timestamp", "10 seconds"), $"mod") .agg(max("value").as("max_value"), min("value").as("min_value"), avg("value").as("avg_value")) val query = df .writeStream .format("memory") .option("queryName", "test") .outputMode("append") .start() query.awaitTermination() ``` > before the patch ![screenshot-before-SPARK-29450](https://user-images.githubusercontent.com/1317309/69023217-58d7bc80-0a01-11ea-8cac-40f1cced6d16.png) > after the patch ![screenshot-after-SPARK-29450](https://user-images.githubusercontent.com/1317309/69023221-5c6b4380-0a01-11ea-8a66-7bf1b7d09fc7.png) Closes apache#26104 from HeartSaVioR/SPARK-29450. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>

[SPARK-29450][SS] Measure output rows for streaming aggregation with …

bf14f96

…append mode

dongjoon-hyun added the STRUCTURED STREAMING label Oct 14, 2019

HyukjinKwon approved these changes Dec 16, 2019

View reviewed changes

HyukjinKwon closed this in ab87bfd Dec 19, 2019

HeartSaVioR deleted the SPARK-29450 branch December 19, 2019 09:58

gatorsmile reviewed Jan 14, 2020

View reviewed changes

HeartSaVioR mentioned this pull request Jan 15, 2020

[SPARK-29450][SS][2.4] Measure the number of output rows for streaming aggregation with append mode #27209

Closed

Conversation

HeartSaVioR commented Oct 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

HeartSaVioR commented Oct 13, 2019

Uh oh!

SparkQA commented Oct 13, 2019

Uh oh!

HeartSaVioR commented Nov 18, 2019

Uh oh!

HeartSaVioR commented Nov 18, 2019

Uh oh!

SparkQA commented Nov 18, 2019

Uh oh!

HeartSaVioR commented Dec 6, 2019

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Dec 19, 2019

Uh oh!

SparkQA commented Dec 19, 2019

Uh oh!

HyukjinKwon commented Dec 19, 2019

Uh oh!

HeartSaVioR commented Dec 19, 2019

Uh oh!

gatorsmile Jan 14, 2020

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Jan 14, 2020

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Jan 15, 2020

Uh oh!

HeartSaVioR commented Jan 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

HeartSaVioR commented Oct 13, 2019 •

edited

Loading