[SPARK-21051][SQL] Add hash map metrics to aggregate #18258

viirya · 2017-06-10T04:28:01Z

What changes were proposed in this pull request?

This adds the average hash map probe metrics to hash aggregate.

BytesToBytesMap already has API to get the metrics, this PR adds an API to UnsafeFixedWidthAggregationMap to access it.

Preparing a test for this metrics seems tricky, because we don't know what collision keys are. For now, the test case generates random data large enough to have desired probe.

TODO in later PR: add hash map metrics to join.

How was this patch tested?

Added test to SQLMetricsSuite.

rxin · 2017-06-10T04:34:34Z

Why would the tracking have perf impact? It's just a simple counter increase isn't it.

viirya · 2017-06-10T04:38:06Z

The enablePerfMetrics parameter of UnsafeFixedWidthAggregationMap has this comment:

* @param enablePerfMetrics if true, performance metrics will be recorded (has minor perf impact)

It's true those metrics are simple counter.

rxin · 2017-06-10T04:40:16Z

Can you test the perf degradation?

viirya · 2017-06-10T04:41:12Z

Sure. Will update later.

viirya · 2017-06-10T04:57:39Z

I just ran the existing AggregateBenchmark with the new tracking config:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.27-moby
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F, track = F                       10655 / 11043          7.9         127.0       1.0X
codegen = T hashmap = F, track = F             6923 / 7133         12.1          82.5       1.5X
codegen = T hashmap = T, track = F             1325 / 1511         63.3          15.8       8.0X


Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.27-moby
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F, track = T                      10809 / 11007          7.8         128.9       1.0X
codegen = T hashmap = F, track = T            6581 / 6629         12.7          78.4       1.6X
codegen = T hashmap = T, track = T            1411 / 1552         59.4          16.8       7.7X

Looks like no obvious perf degradation.

rxin · 2017-06-10T05:01:12Z

16.8 vs 15.8?

viirya · 2017-06-10T05:05:42Z

Is it significant? Seems to me that it's in the variance of different runs?

rxin · 2017-06-10T05:07:31Z

Can you run it a few more times to tell? Right now it's a difference of 7% almost ....

viirya · 2017-06-10T05:30:09Z

Sure. Three times for each.

Track = F:

Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F, track = F                      12657 / 12700          6.6         150.9       1.0X
codegen = T hashmap = F, track = F            6779 / 7582         12.4          80.8       1.9X
codegen = T hashmap = T, track = F            1505 / 1619         55.7          17.9       8.4X

Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F, track = F                      10085 / 10597          8.3         120.2       1.0X
codegen = T hashmap = F, track = F            5915 / 6069         14.2          70.5       1.7X
codegen = T hashmap = T, track = F            1610 / 1796         52.1          19.2       6.3X

Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F, track = F                      10275 / 10584          8.2         122.5       1.0X
codegen = T hashmap = F, track = F            6140 / 6557         13.7          73.2       1.7X
codegen = T hashmap = T, track = F            1301 / 1565         64.5          15.5       7.9X

Track = T:

Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F, track = T                      10723 / 10865          7.8         127.8       1.0X
codegen = T hashmap = F, track = T            6246 / 6432         13.4          74.5       1.7X
codegen = T hashmap = T, track = T            1465 / 1571         57.3          17.5       7.3X

Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F, track = T                       9964 / 10348          8.4         118.8       1.0X
codegen = T hashmap = F, track = T            6225 / 6375         13.5          74.2       1.6X
codegen = T hashmap = T, track = T            1361 / 1485         61.6          16.2       7.3X

Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F, track = T                      10125 / 10674          8.3         120.7       1.0X
codegen = T hashmap = F, track = T            6865 / 6980         12.2          81.8       1.5X
codegen = T hashmap = T, track = T            1491 / 1579         56.3          17.8       6.8X

rxin · 2017-06-10T06:32:37Z

Thanks!

rxin · 2017-06-10T06:32:55Z

If there is no regression, I'd remove the flag.

SparkQA · 2017-06-10T06:54:00Z

Test build #77866 has finished for PR 18258 at commit e4cfe1c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-06-10T07:18:49Z

Ok. I'll remove the flag. Thanks.

SparkQA · 2017-06-10T11:42:50Z

Test build #77872 has finished for PR 18258 at commit 55cd6ad.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-06-10T12:25:50Z

Test build #77876 has finished for PR 18258 at commit ee3d88f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-06-10T14:58:04Z

Seems to me that the hash map metrics to join operator can be done in later PR. So this change can be small to review.

rxin · 2017-06-10T18:20:53Z

That's a good idea. In that case, create a subtask on jira for this and another one for join?

viirya · 2017-06-12T02:38:38Z

cc @cloud-fan @gatorsmile for review.

gatorsmile · 2017-06-12T06:13:57Z

sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala

+   */
+  def createAverageMetric(sc: SparkContext, name: String): SQLMetric = {
+    // The final result of this metric in physical operator UI may looks like:
+    // probe avg (min, med, max):


med is medium? why 6?

oh. right. will fix this typo. :)

gatorsmile · 2017-06-12T06:24:15Z

sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala

      1024 * 16, // initial capacity
      TaskContext.get().taskMemoryManager().pageSizeBytes,
-      false // disable tracking of performance metrics
+      true // tracking of performance metrics


Always turn it on?

If we decide to always turn it on, why we still keep this parm?

Yeah, based on the benchmark, seems the performance degradation is not an issue. We can completely remove this parameter.

viirya · 2017-06-12T08:48:37Z

sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeFixedWidthAggregationMap.java

  }

  @SuppressWarnings("UseOfSystemOutOrSystemErr")
  public void printPerfMetrics() {


I can't find anyplace actually uses this method. Not sure if we want to remove it.

Yes, please remove it. Thanks!

SparkQA · 2017-06-12T11:15:29Z

Test build #77925 has finished for PR 18258 at commit 250054c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-06-13T05:49:04Z

LGTM pending Jenkins

SparkQA · 2017-06-13T06:37:39Z

Test build #77961 has finished for PR 18258 at commit c7de74c.

This patch fails from timeout after a configured wait of `250m`.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-06-13T06:48:38Z

retest this please.

SparkQA · 2017-06-13T06:52:37Z

Test build #77974 has started for PR 18258 at commit c7de74c.

cloud-fan · 2017-06-13T07:07:41Z

LGTM

viirya · 2017-06-13T07:19:31Z

retest this please.

SparkQA · 2017-06-13T09:59:43Z

Test build #77978 has finished for PR 18258 at commit c7de74c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

## What changes were proposed in this pull request? This adds the average hash map probe metrics to hash aggregate. `BytesToBytesMap` already has API to get the metrics, this PR adds an API to `UnsafeFixedWidthAggregationMap` to access it. Preparing a test for this metrics seems tricky, because we don't know what collision keys are. For now, the test case generates random data large enough to have desired probe. TODO in later PR: add hash map metrics to join. ## How was this patch tested? Added test to SQLMetricsSuite. Author: Liang-Chi Hsieh <[email protected]> Closes apache#18258 from viirya/SPARK-20953.

This adds the average hash map probe metrics to hash aggregate. `BytesToBytesMap` already has API to get the metrics, this PR adds an API to `UnsafeFixedWidthAggregationMap` to access it. Preparing a test for this metrics seems tricky, because we don't know what collision keys are. For now, the test case generates random data large enough to have desired probe. TODO in later PR: add hash map metrics to join. Added test to SQLMetricsSuite. Author: Liang-Chi Hsieh <[email protected]> Closes apache#18258 from viirya/SPARK-20953. (cherry picked from commit bcf3643) Conflicts: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala (cherry picked from commit ded5a0029ebab9404d2b9fb1dca98bbffccd5837) NETFLIX-BUILD: SPARK-21051:HOTFIX: ignore test case "ObjectHashAggregate metrics" ObjectHashAggregate feature was introduced in 2.2. (cherry picked from commit 0f412c1935f15ed96fa043054880d9796e71d563)

Report average hashmap probe when the config is enabled.

e4cfe1c

viirya added 2 commits June 10, 2017 09:06

Remove the config flag.

55cd6ad

Improve code comment.

ee3d88f

viirya changed the title ~~[SPARK-20953][SQL][WIP] Add hash map metrics to aggregate~~ [SPARK-20953][SQL] Add hash map metrics to aggregate Jun 10, 2017

viirya changed the title ~~[SPARK-20953][SQL] Add hash map metrics to aggregate~~ [SPARK-21051][SQL] Add hash map metrics to aggregate Jun 11, 2017

gatorsmile reviewed Jun 12, 2017

View reviewed changes

Fix typo. Remove enablePerfMetrics param.

250054c

viirya commented Jun 12, 2017

View reviewed changes

Remove unused method.

c7de74c

asfgit closed this in bcf3643 Jun 13, 2017

viirya deleted the SPARK-20953 branch December 27, 2023 18:20

[SPARK-21051][SQL] Add hash map metrics to aggregate #18258

[SPARK-21051][SQL] Add hash map metrics to aggregate #18258

Uh oh!

Conversation

viirya commented Jun 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

rxin commented Jun 10, 2017

Uh oh!

viirya commented Jun 10, 2017

Uh oh!

rxin commented Jun 10, 2017

Uh oh!

viirya commented Jun 10, 2017

Uh oh!

viirya commented Jun 10, 2017

Uh oh!

rxin commented Jun 10, 2017

Uh oh!

viirya commented Jun 10, 2017

Uh oh!

rxin commented Jun 10, 2017

Uh oh!

viirya commented Jun 10, 2017

Uh oh!

rxin commented Jun 10, 2017

Uh oh!

rxin commented Jun 10, 2017

Uh oh!

SparkQA commented Jun 10, 2017

Uh oh!

viirya commented Jun 10, 2017

Uh oh!

SparkQA commented Jun 10, 2017

Uh oh!

SparkQA commented Jun 10, 2017

Uh oh!

viirya commented Jun 10, 2017

Uh oh!

rxin commented Jun 10, 2017

Uh oh!

viirya commented Jun 12, 2017

Uh oh!

gatorsmile Jun 12, 2017

Choose a reason for hiding this comment

Uh oh!

viirya Jun 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jun 12, 2017

Choose a reason for hiding this comment

Uh oh!

viirya Jun 12, 2017

Choose a reason for hiding this comment

Uh oh!

viirya Jun 12, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jun 12, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 12, 2017

Uh oh!

gatorsmile commented Jun 13, 2017

Uh oh!

SparkQA commented Jun 13, 2017

Uh oh!

viirya commented Jun 13, 2017

Uh oh!

SparkQA commented Jun 13, 2017

Uh oh!

cloud-fan commented Jun 13, 2017

Uh oh!

viirya commented Jun 13, 2017

Uh oh!

SparkQA commented Jun 13, 2017

Uh oh!

viirya commented Jun 10, 2017 •

edited

Loading

viirya Jun 12, 2017 •

edited

Loading