Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Jun 10, 2017

What changes were proposed in this pull request?

This adds the average hash map probe metrics to hash aggregate.

BytesToBytesMap already has API to get the metrics, this PR adds an API to UnsafeFixedWidthAggregationMap to access it.

Preparing a test for this metrics seems tricky, because we don't know what collision keys are. For now, the test case generates random data large enough to have desired probe.

TODO in later PR: add hash map metrics to join.

How was this patch tested?

Added test to SQLMetricsSuite.

@rxin
Copy link
Contributor

rxin commented Jun 10, 2017

Why would the tracking have perf impact? It's just a simple counter increase isn't it.

@viirya
Copy link
Member Author

viirya commented Jun 10, 2017

The enablePerfMetrics parameter of UnsafeFixedWidthAggregationMap has this comment:

* @param enablePerfMetrics if true, performance metrics will be recorded (has minor perf impact)

It's true those metrics are simple counter.

@rxin
Copy link
Contributor

rxin commented Jun 10, 2017

Can you test the perf degradation?

@viirya
Copy link
Member Author

viirya commented Jun 10, 2017

Sure. Will update later.

@viirya
Copy link
Member Author

viirya commented Jun 10, 2017

I just ran the existing AggregateBenchmark with the new tracking config:

Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.27-moby
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F, track = F                       10655 / 11043          7.9         127.0       1.0X
codegen = T hashmap = F, track = F             6923 / 7133         12.1          82.5       1.5X
codegen = T hashmap = T, track = F             1325 / 1511         63.3          15.8       8.0X


Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.27-moby
Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F, track = T                      10809 / 11007          7.8         128.9       1.0X
codegen = T hashmap = F, track = T            6581 / 6629         12.7          78.4       1.6X
codegen = T hashmap = T, track = T            1411 / 1552         59.4          16.8       7.7X

Looks like no obvious perf degradation.

@rxin
Copy link
Contributor

rxin commented Jun 10, 2017

16.8 vs 15.8?

@viirya
Copy link
Member Author

viirya commented Jun 10, 2017

Is it significant? Seems to me that it's in the variance of different runs?

@rxin
Copy link
Contributor

rxin commented Jun 10, 2017

Can you run it a few more times to tell? Right now it's a difference of 7% almost ....

@viirya
Copy link
Member Author

viirya commented Jun 10, 2017

Sure. Three times for each.

Track = F:

Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F, track = F                      12657 / 12700          6.6         150.9       1.0X
codegen = T hashmap = F, track = F            6779 / 7582         12.4          80.8       1.9X
codegen = T hashmap = T, track = F            1505 / 1619         55.7          17.9       8.4X

Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F, track = F                      10085 / 10597          8.3         120.2       1.0X
codegen = T hashmap = F, track = F            5915 / 6069         14.2          70.5       1.7X
codegen = T hashmap = T, track = F            1610 / 1796         52.1          19.2       6.3X

Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F, track = F                      10275 / 10584          8.2         122.5       1.0X
codegen = T hashmap = F, track = F            6140 / 6557         13.7          73.2       1.7X
codegen = T hashmap = T, track = F            1301 / 1565         64.5          15.5       7.9X

Track = T:

Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F, track = T                      10723 / 10865          7.8         127.8       1.0X
codegen = T hashmap = F, track = T            6246 / 6432         13.4          74.5       1.7X
codegen = T hashmap = T, track = T            1465 / 1571         57.3          17.5       7.3X

Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F, track = T                       9964 / 10348          8.4         118.8       1.0X
codegen = T hashmap = F, track = T            6225 / 6375         13.5          74.2       1.6X
codegen = T hashmap = T, track = T            1361 / 1485         61.6          16.2       7.3X

Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
codegen = F, track = T                      10125 / 10674          8.3         120.7       1.0X
codegen = T hashmap = F, track = T            6865 / 6980         12.2          81.8       1.5X
codegen = T hashmap = T, track = T            1491 / 1579         56.3          17.8       6.8X

@rxin
Copy link
Contributor

rxin commented Jun 10, 2017

Thanks!

@rxin
Copy link
Contributor

rxin commented Jun 10, 2017

If there is no regression, I'd remove the flag.

@SparkQA
Copy link

SparkQA commented Jun 10, 2017

Test build #77866 has finished for PR 18258 at commit e4cfe1c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jun 10, 2017

Ok. I'll remove the flag. Thanks.

@SparkQA
Copy link

SparkQA commented Jun 10, 2017

Test build #77872 has finished for PR 18258 at commit 55cd6ad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 10, 2017

Test build #77876 has finished for PR 18258 at commit ee3d88f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jun 10, 2017

Seems to me that the hash map metrics to join operator can be done in later PR. So this change can be small to review.

@viirya viirya changed the title [SPARK-20953][SQL][WIP] Add hash map metrics to aggregate [SPARK-20953][SQL] Add hash map metrics to aggregate Jun 10, 2017
@rxin
Copy link
Contributor

rxin commented Jun 10, 2017

That's a good idea. In that case, create a subtask on jira for this and another one for join?

@viirya viirya changed the title [SPARK-20953][SQL] Add hash map metrics to aggregate [SPARK-21051][SQL] Add hash map metrics to aggregate Jun 11, 2017
@viirya
Copy link
Member Author

viirya commented Jun 12, 2017

cc @cloud-fan @gatorsmile for review.

*/
def createAverageMetric(sc: SparkContext, name: String): SQLMetric = {
// The final result of this metric in physical operator UI may looks like:
// probe avg (min, med, max):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

med is medium? why 6?

Copy link
Member Author

@viirya viirya Jun 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh. right. will fix this typo. :)

1024 * 16, // initial capacity
TaskContext.get().taskMemoryManager().pageSizeBytes,
false // disable tracking of performance metrics
true // tracking of performance metrics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always turn it on?

If we decide to always turn it on, why we still keep this parm?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, based on the benchmark, seems the performance degradation is not an issue. We can completely remove this parameter.

}

@SuppressWarnings("UseOfSystemOutOrSystemErr")
public void printPerfMetrics() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find anyplace actually uses this method. Not sure if we want to remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please remove it. Thanks!

@SparkQA
Copy link

SparkQA commented Jun 12, 2017

Test build #77925 has finished for PR 18258 at commit 250054c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM pending Jenkins

@SparkQA
Copy link

SparkQA commented Jun 13, 2017

Test build #77961 has finished for PR 18258 at commit c7de74c.

  • This patch fails from timeout after a configured wait of `250m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jun 13, 2017

retest this please.

@SparkQA
Copy link

SparkQA commented Jun 13, 2017

Test build #77974 has started for PR 18258 at commit c7de74c.

@cloud-fan
Copy link
Contributor

LGTM

@viirya
Copy link
Member Author

viirya commented Jun 13, 2017

retest this please.

@SparkQA
Copy link

SparkQA commented Jun 13, 2017

Test build #77978 has finished for PR 18258 at commit c7de74c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in bcf3643 Jun 13, 2017
dataknocker pushed a commit to dataknocker/spark that referenced this pull request Jun 16, 2017
## What changes were proposed in this pull request?

This adds the average hash map probe metrics to hash aggregate.

`BytesToBytesMap` already has API to get the metrics, this PR adds an API to `UnsafeFixedWidthAggregationMap` to access it.

Preparing a test for this metrics seems tricky, because we don't know what collision keys are. For now, the test case generates random data large enough to have desired probe.

TODO in later PR: add hash map metrics to join.

## How was this patch tested?

Added test to SQLMetricsSuite.

Author: Liang-Chi Hsieh <[email protected]>

Closes apache#18258 from viirya/SPARK-20953.
jzhuge pushed a commit to jzhuge/spark that referenced this pull request Aug 20, 2018
This adds the average hash map probe metrics to hash aggregate.

`BytesToBytesMap` already has API to get the metrics, this PR adds an API to `UnsafeFixedWidthAggregationMap` to access it.

Preparing a test for this metrics seems tricky, because we don't know what collision keys are. For now, the test case generates random data large enough to have desired probe.

TODO in later PR: add hash map metrics to join.

Added test to SQLMetricsSuite.

Author: Liang-Chi Hsieh <[email protected]>

Closes apache#18258 from viirya/SPARK-20953.

(cherry picked from commit bcf3643)

Conflicts:
	sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
(cherry picked from commit ded5a0029ebab9404d2b9fb1dca98bbffccd5837)

NETFLIX-BUILD: SPARK-21051:HOTFIX: ignore test case "ObjectHashAggregate metrics"

ObjectHashAggregate feature was introduced in 2.2.

(cherry picked from commit 0f412c1935f15ed96fa043054880d9796e71d563)
@viirya viirya deleted the SPARK-20953 branch December 27, 2023 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants