Skip pre hash computation for join when input is table scan by feilong-liu · Pull Request #20948 · prestodb/presto

feilong-liu · 2023-09-23T00:29:55Z

Description

Skip hash generation for a join, when the input is table scan, and the hash is on a single big int and is not reused later.

Motivation and Context

We observed in production query that, hash precomputation actually hurts performance (both cpu and latency) for the case described above. Hence add an option to disable hash precomputation for it.

Impact

CPU and latency improvement for the targeted queries.

Test Plan

Existing unit tests and verifier test

Contributor checklist

Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add a session property to skip hash precomputation for join when the input is table scan, and the hash is on a single big int and is not reused later. It's controlled by session property `skip_hash_generation_for_join_with_table_scan_input` and default to not enabled.

vivek-bharathan

Please add tests showing plan changes

vivek-bharathan · 2023-10-30T17:09:15Z

...ain/src/main/java/com/facebook/presto/sql/planner/optimizations/HashGenerationOptimizer.java

I'm curious why we would ever add this hash computation when the parent does not require it. I.e. wouldn't this check simply always be
"return hashComputation.isPresent() && !parentPreference.getHashes().contains(hashComputation.get());"

This optimization is based on our observation, where TableScan below join is significantly than ScanProject (here project is for hash generation) for big int join key. Do not observe the same for other cases.

mlyublena · 2023-10-30T22:05:46Z

...ain/src/main/java/com/facebook/presto/sql/planner/optimizations/HashGenerationOptimizer.java

what about other operators like filter/project on top of table scan or values

In verifier suite, didn't observe same performance improvement for these cases, hence limit to the specific case where I see most significant performance improvement here.

This was similar to my question above.
If you look at this comment in the code, it seems to suggest that aggregations in general perform better for BIGINT's without the generated hash. I suspect the same principle applies to joins. I wonder if we are special casing this too much in adding the TableScanNode check.
Would it be possible to share the benchmarks you are seeing this behavior on?

If you look at this comment in the code, it seems to suggest that aggregations in general perform better for BIGINT's without the generated hash.

I see, this is because we have custom group by hash BigintGroupByHash for group by on single big int column, which does not utilize existing pre-computed hash. Not sure if join will see the same pattern, but currently we do not have specialized join hash implementation for bigint like group by. Hence what applied to group by may not be available to join here.

Would it be possible to share the benchmarks you are seeing this behavior on?

The benchmarks is based on production queries and cannot be shared.
But the queries which improve most is the queries which have table scan as input source of join. The plan changed from join <- ScanProject to join <- TableScan, and the biggest savings are from changing TableScan to ScanProject, especially when the input table is huge. And this is why I want to specialize for this case in this optimization here.

Fair enough. We can always relax this constraint in the future if needed

feilong-liu · 2023-10-30T22:08:22Z

Please add tests showing plan changes

Added unit plan test

vivek-bharathan · 2023-11-14T23:05:42Z

...ain/src/main/java/com/facebook/presto/sql/planner/optimizations/HashGenerationOptimizer.java

Fair enough. We can always relax this constraint in the future if needed

feilong-liu requested a review from a team as a code owner September 23, 2023 00:29

feilong-liu requested a review from presto-oss September 23, 2023 00:29

feilong-liu marked this pull request as draft September 23, 2023 00:30

feilong-liu force-pushed the disable_hashgen branch 5 times, most recently from b183b79 to bfae0e7 Compare October 27, 2023 18:39

feilong-liu changed the title ~~Skip pre hash computation for join if key is single bigint and key hash not reused~~ Skip pre hash computation for join when input is table scan Oct 27, 2023

feilong-liu marked this pull request as ready for review October 27, 2023 18:53

feilong-liu requested review from arhimondr, kaikalur, mlyublena and pranjalssh October 27, 2023 18:54

vivek-bharathan requested changes Oct 30, 2023

View reviewed changes

feilong-liu force-pushed the disable_hashgen branch 2 times, most recently from 30f9a74 to bb8c19c Compare October 30, 2023 22:04

mlyublena reviewed Oct 30, 2023

View reviewed changes

feilong-liu requested a review from vivek-bharathan October 30, 2023 22:09

feilong-liu requested review from vivek-bharathan and removed request for vivek-bharathan November 9, 2023 19:04

vivek-bharathan approved these changes Nov 14, 2023

View reviewed changes

feilong-liu requested a review from mlyublena December 7, 2023 04:58

mlyublena approved these changes Dec 7, 2023

View reviewed changes

kaikalur approved these changes Dec 7, 2023

View reviewed changes

feilong-liu force-pushed the disable_hashgen branch from bb8c19c to acbc6a7 Compare December 8, 2023 17:54

feilong-liu requested a review from ajaygeorge December 8, 2023 18:18

Skip hash generation for join when input is table scan

25c9c11

feilong-liu force-pushed the disable_hashgen branch from acbc6a7 to 25c9c11 Compare December 8, 2023 19:12

ajaygeorge approved these changes Dec 8, 2023

View reviewed changes

feilong-liu merged commit d9fc66d into prestodb:master Dec 8, 2023

feilong-liu deleted the disable_hashgen branch December 8, 2023 22:47

wanglinsong mentioned this pull request Feb 12, 2024

Add release notes for 0.286 #21906

Merged

64 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip pre hash computation for join when input is table scan#20948

Skip pre hash computation for join when input is table scan#20948
feilong-liu merged 1 commit intoprestodb:masterfrom
feilong-liu:disable_hashgen

feilong-liu commented Sep 23, 2023 •

edited

Loading

Uh oh!

vivek-bharathan left a comment

Uh oh!

vivek-bharathan Oct 30, 2023

Uh oh!

feilong-liu Oct 30, 2023

Uh oh!

mlyublena Oct 30, 2023

Uh oh!

feilong-liu Oct 30, 2023

Uh oh!

vivek-bharathan Oct 30, 2023

Uh oh!

feilong-liu Nov 9, 2023

Uh oh!

vivek-bharathan Nov 14, 2023

Uh oh!

feilong-liu commented Oct 30, 2023

Uh oh!

vivek-bharathan Nov 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

feilong-liu commented Sep 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

Uh oh!

vivek-bharathan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

feilong-liu commented Oct 30, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feilong-liu commented Sep 23, 2023 •

edited

Loading