-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-21051][SQL] Add hash map metrics to aggregate #18258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Why would the tracking have perf impact? It's just a simple counter increase isn't it. |
|
The It's true those metrics are simple counter. |
|
Can you test the perf degradation? |
|
Sure. Will update later. |
|
I just ran the existing Looks like no obvious perf degradation. |
|
16.8 vs 15.8? |
|
Is it significant? Seems to me that it's in the variance of different runs? |
|
Can you run it a few more times to tell? Right now it's a difference of 7% almost .... |
|
Sure. Three times for each. Track = F: Track = T: |
|
Thanks! |
|
If there is no regression, I'd remove the flag. |
|
Test build #77866 has finished for PR 18258 at commit
|
|
Ok. I'll remove the flag. Thanks. |
|
Test build #77872 has finished for PR 18258 at commit
|
|
Test build #77876 has finished for PR 18258 at commit
|
|
Seems to me that the hash map metrics to join operator can be done in later PR. So this change can be small to review. |
|
That's a good idea. In that case, create a subtask on jira for this and another one for join? |
|
cc @cloud-fan @gatorsmile for review. |
| */ | ||
| def createAverageMetric(sc: SparkContext, name: String): SQLMetric = { | ||
| // The final result of this metric in physical operator UI may looks like: | ||
| // probe avg (min, med, max): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
med is medium? why 6?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh. right. will fix this typo. :)
| 1024 * 16, // initial capacity | ||
| TaskContext.get().taskMemoryManager().pageSizeBytes, | ||
| false // disable tracking of performance metrics | ||
| true // tracking of performance metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always turn it on?
If we decide to always turn it on, why we still keep this parm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, based on the benchmark, seems the performance degradation is not an issue. We can completely remove this parameter.
| } | ||
|
|
||
| @SuppressWarnings("UseOfSystemOutOrSystemErr") | ||
| public void printPerfMetrics() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find anyplace actually uses this method. Not sure if we want to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please remove it. Thanks!
|
Test build #77925 has finished for PR 18258 at commit
|
|
LGTM pending Jenkins |
|
Test build #77961 has finished for PR 18258 at commit
|
|
retest this please. |
|
Test build #77974 has started for PR 18258 at commit |
|
LGTM |
|
retest this please. |
|
Test build #77978 has finished for PR 18258 at commit
|
## What changes were proposed in this pull request? This adds the average hash map probe metrics to hash aggregate. `BytesToBytesMap` already has API to get the metrics, this PR adds an API to `UnsafeFixedWidthAggregationMap` to access it. Preparing a test for this metrics seems tricky, because we don't know what collision keys are. For now, the test case generates random data large enough to have desired probe. TODO in later PR: add hash map metrics to join. ## How was this patch tested? Added test to SQLMetricsSuite. Author: Liang-Chi Hsieh <[email protected]> Closes apache#18258 from viirya/SPARK-20953.
This adds the average hash map probe metrics to hash aggregate. `BytesToBytesMap` already has API to get the metrics, this PR adds an API to `UnsafeFixedWidthAggregationMap` to access it. Preparing a test for this metrics seems tricky, because we don't know what collision keys are. For now, the test case generates random data large enough to have desired probe. TODO in later PR: add hash map metrics to join. Added test to SQLMetricsSuite. Author: Liang-Chi Hsieh <[email protected]> Closes apache#18258 from viirya/SPARK-20953. (cherry picked from commit bcf3643) Conflicts: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala (cherry picked from commit ded5a0029ebab9404d2b9fb1dca98bbffccd5837) NETFLIX-BUILD: SPARK-21051:HOTFIX: ignore test case "ObjectHashAggregate metrics" ObjectHashAggregate feature was introduced in 2.2. (cherry picked from commit 0f412c1935f15ed96fa043054880d9796e71d563)
What changes were proposed in this pull request?
This adds the average hash map probe metrics to hash aggregate.
BytesToBytesMapalready has API to get the metrics, this PR adds an API toUnsafeFixedWidthAggregationMapto access it.Preparing a test for this metrics seems tricky, because we don't know what collision keys are. For now, the test case generates random data large enough to have desired probe.
TODO in later PR: add hash map metrics to join.
How was this patch tested?
Added test to SQLMetricsSuite.