Skip to content

Conversation

@gabotechs
Copy link
Contributor

@gabotechs gabotechs commented Jan 12, 2026

Which issue does this PR close?

It does not close any issue, but it's related to:

Rationale for this change

This is a PR from a batch of PRs that attempt to improve performance in hash joins:

It adds the new BufferExec node at the top of the probe side of hash joins so that some work is eagerly performed before the build side of the hash join is completely finished.

Why should this speed up joins?

In order to better understand the impact of this PR, it's useful to understand how streams work in Rust: creating a stream does not perform any work, progress is just made if the stream gets polled.

This means that whenever we call .execute() on an ExecutionPlan (like the probe side of a join), nothing happens, not even the most basic TCP connections or system calls are performed. Instead, all this work is delayed as much as possible until the first poll is made to the stream, losing the opportunity to make some early progress.

This gets worst when multiple hash joins are chained together: they will get executed in cascade as if they were domino pieces, which has the benefit of leaving a small memory footprint, but underutilizes the resources of the machine for executing the query faster.

NOTE: still don't know if this improves the benchmarks, just experimenting for now

What changes are included in this PR?

Adds a new HashJoinBuffering physical optimizer rule that will idempotently place BufferExec nodes on the probe side of has joins:

            ┌───────────────────┐
            │   HashJoinExec    │
            └─────▲────────▲────┘
          ┌───────┘        └─────────┐
          │                          │
 ┌────────────────┐         ┌─────────────────┐
 │   Build side   │       + │   BufferExec    │
 └────────────────┘         └────────▲────────┘
                                     │
                            ┌────────┴────────┐
                            │   Probe side    │
                            └─────────────────┘

Are these changes tested?

yes, by existing tests

Are there any user-facing changes?

yes, users will see a new BufferExec being placed at top of the probe side of each hash join. (Still unsure about whether de default mode should be enabled)


Results

Warning

I'm very skeptical about this benchmarks run on my laptop, take them with a grain of salt, they should be run in a more controlled environment

Comparing main and hash-join-buffering-on-probe-side
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃       main ┃ hash-join-buffering-on-probe-side ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │   37.80 ms │                          19.07 ms │ +1.98x faster │
│ QQuery 2  │  130.36 ms │                          54.25 ms │ +2.40x faster │
│ QQuery 3  │   99.05 ms │                          90.99 ms │ +1.09x faster │
│ QQuery 4  │  894.61 ms │                         340.70 ms │ +2.63x faster │
│ QQuery 5  │  151.16 ms │                         147.84 ms │     no change │
│ QQuery 6  │  566.37 ms │                         513.89 ms │ +1.10x faster │
│ QQuery 7  │  290.12 ms │                         248.25 ms │ +1.17x faster │
│ QQuery 8  │   97.46 ms │                          90.59 ms │ +1.08x faster │
│ QQuery 9  │   88.59 ms │                          94.18 ms │  1.06x slower │
│ QQuery 10 │   85.89 ms │                          48.71 ms │ +1.76x faster │
│ QQuery 11 │  567.85 ms │                         180.30 ms │ +3.15x faster │
│ QQuery 12 │   35.66 ms │                          32.78 ms │ +1.09x faster │
│ QQuery 13 │  313.89 ms │                         312.86 ms │     no change │
│ QQuery 14 │  741.51 ms │                         367.39 ms │ +2.02x faster │
│ QQuery 15 │   23.11 ms │                          49.44 ms │  2.14x slower │
│ QQuery 16 │   32.72 ms │                         109.53 ms │  3.35x slower │
│ QQuery 17 │  220.05 ms │                         160.70 ms │ +1.37x faster │
│ QQuery 18 │  114.36 ms │                         162.51 ms │  1.42x slower │
│ QQuery 19 │  133.50 ms │                         123.87 ms │ +1.08x faster │
│ QQuery 20 │   12.37 ms │                          52.66 ms │  4.26x slower │
│ QQuery 21 │   15.53 ms │                         132.58 ms │  8.54x slower │
│ QQuery 22 │  288.69 ms │                         375.91 ms │  1.30x slower │
│ QQuery 23 │  772.46 ms │                         488.07 ms │ +1.58x faster │
│ QQuery 24 │  340.42 ms │                         287.51 ms │ +1.18x faster │
│ QQuery 25 │  307.77 ms │                         195.09 ms │ +1.58x faster │
│ QQuery 26 │   81.78 ms │                         123.89 ms │  1.51x slower │
│ QQuery 27 │  297.72 ms │                         240.88 ms │ +1.24x faster │
│ QQuery 28 │  127.20 ms │                         127.28 ms │     no change │
│ QQuery 29 │  261.03 ms │                         161.52 ms │ +1.62x faster │
│ QQuery 30 │   35.53 ms │                          26.18 ms │ +1.36x faster │
│ QQuery 31 │  120.02 ms │                         101.47 ms │ +1.18x faster │
│ QQuery 32 │   48.49 ms │                          43.37 ms │ +1.12x faster │
│ QQuery 33 │  112.83 ms │                         110.45 ms │     no change │
│ QQuery 34 │   85.92 ms │                          80.71 ms │ +1.06x faster │
│ QQuery 35 │   81.94 ms │                          51.65 ms │ +1.59x faster │
│ QQuery 36 │  165.56 ms │                         168.79 ms │     no change │
│ QQuery 37 │  153.98 ms │                         155.81 ms │     no change │
│ QQuery 38 │   60.75 ms │                          53.06 ms │ +1.14x faster │
│ QQuery 39 │   81.49 ms │                         294.01 ms │  3.61x slower │
│ QQuery 40 │   87.94 ms │                          76.12 ms │ +1.16x faster │
│ QQuery 41 │   10.61 ms │                           9.61 ms │ +1.10x faster │
│ QQuery 42 │   89.63 ms │                          88.33 ms │     no change │
│ QQuery 43 │   69.61 ms │                          63.42 ms │ +1.10x faster │
│ QQuery 44 │    9.08 ms │                           7.78 ms │ +1.17x faster │
│ QQuery 45 │   53.17 ms │                          32.19 ms │ +1.65x faster │
│ QQuery 46 │  175.44 ms │                         167.41 ms │     no change │
│ QQuery 47 │  478.10 ms │                         123.03 ms │ +3.89x faster │
│ QQuery 48 │  224.20 ms │                         212.88 ms │ +1.05x faster │
│ QQuery 49 │  206.10 ms │                         200.87 ms │     no change │
│ QQuery 50 │  176.44 ms │                         141.12 ms │ +1.25x faster │
│ QQuery 51 │  141.42 ms │                         105.32 ms │ +1.34x faster │
│ QQuery 52 │   90.66 ms │                          89.26 ms │     no change │
│ QQuery 53 │   89.56 ms │                          83.37 ms │ +1.07x faster │
│ QQuery 54 │  123.43 ms │                         119.06 ms │     no change │
│ QQuery 55 │   88.73 ms │                          90.23 ms │     no change │
│ QQuery 56 │  114.66 ms │                         112.92 ms │     no change │
│ QQuery 57 │  131.64 ms │                          69.73 ms │ +1.89x faster │
│ QQuery 58 │  228.01 ms │                         127.59 ms │ +1.79x faster │
│ QQuery 59 │  169.17 ms │                         127.03 ms │ +1.33x faster │
│ QQuery 60 │  118.92 ms │                         115.28 ms │     no change │
│ QQuery 61 │  149.06 ms │                         147.06 ms │     no change │
│ QQuery 62 │  441.11 ms │                         433.50 ms │     no change │
│ QQuery 63 │   95.44 ms │                          85.84 ms │ +1.11x faster │
│ QQuery 64 │  606.32 ms │                         442.72 ms │ +1.37x faster │
│ QQuery 65 │  208.68 ms │                          91.03 ms │ +2.29x faster │
│ QQuery 66 │  188.17 ms │                         177.41 ms │ +1.06x faster │
│ QQuery 67 │  249.91 ms │                         234.31 ms │ +1.07x faster │
│ QQuery 68 │  235.92 ms │                         224.15 ms │     no change │
│ QQuery 69 │   89.95 ms │                          46.44 ms │ +1.94x faster │
│ QQuery 70 │  278.67 ms │                         203.35 ms │ +1.37x faster │
│ QQuery 71 │  109.23 ms │                         109.86 ms │     no change │
│ QQuery 72 │  508.24 ms │                         391.84 ms │ +1.30x faster │
│ QQuery 73 │   90.02 ms │                          78.49 ms │ +1.15x faster │
│ QQuery 74 │  373.75 ms │                         112.90 ms │ +3.31x faster │
│ QQuery 75 │  227.43 ms │                         172.97 ms │ +1.31x faster │
│ QQuery 76 │  116.42 ms │                         110.72 ms │     no change │
│ QQuery 77 │  170.31 ms │                         144.66 ms │ +1.18x faster │
│ QQuery 78 │  422.27 ms │                         245.42 ms │ +1.72x faster │
│ QQuery 79 │  190.47 ms │                         166.21 ms │ +1.15x faster │
│ QQuery 80 │  265.88 ms │                         242.36 ms │ +1.10x faster │
│ QQuery 81 │   23.05 ms │                          17.96 ms │ +1.28x faster │
│ QQuery 82 │  173.94 ms │                         162.41 ms │ +1.07x faster │
│ QQuery 83 │   40.37 ms │                          18.62 ms │ +2.17x faster │
│ QQuery 84 │   40.52 ms │                          26.07 ms │ +1.55x faster │
│ QQuery 85 │  138.45 ms │                          71.38 ms │ +1.94x faster │
│ QQuery 86 │   30.41 ms │                          28.27 ms │ +1.08x faster │
│ QQuery 87 │   62.64 ms │                          54.20 ms │ +1.16x faster │
│ QQuery 88 │   84.50 ms │                          74.60 ms │ +1.13x faster │
│ QQuery 89 │  108.95 ms │                          89.03 ms │ +1.22x faster │
│ QQuery 90 │   19.19 ms │                          16.36 ms │ +1.17x faster │
│ QQuery 91 │   53.45 ms │                          34.82 ms │ +1.54x faster │
│ QQuery 92 │   49.13 ms │                          25.47 ms │ +1.93x faster │
│ QQuery 93 │  151.86 ms │                         134.34 ms │ +1.13x faster │
│ QQuery 94 │   52.94 ms │                          46.45 ms │ +1.14x faster │
│ QQuery 95 │  125.23 ms │                          50.85 ms │ +2.46x faster │
│ QQuery 96 │   59.70 ms │                          54.86 ms │ +1.09x faster │
│ QQuery 97 │   99.90 ms │                          71.00 ms │ +1.41x faster │
│ QQuery 98 │  129.60 ms │                         111.11 ms │ +1.17x faster │
│ QQuery 99 │ 4562.37 ms │                        4353.70 ms │     no change │
└───────────┴────────────┴───────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)                                │ 21975.53ms │
│ Total Time (hash-join-buffering-on-probe-side)   │ 17884.01ms │
│ Average Time (main)                              │   221.98ms │
│ Average Time (hash-join-buffering-on-probe-side) │   180.65ms │
│ Queries Faster                                   │         70 │
│ Queries Slower                                   │          9 │
│ Queries with No Change                           │         20 │
│ Queries with Failure                             │          0 │
└──────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      main ┃ hash-join-buffering-on-probe-side ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │  44.90 ms │                          40.62 ms │ +1.11x faster │
│ QQuery 2  │  18.76 ms │                          12.43 ms │ +1.51x faster │
│ QQuery 3  │  28.97 ms │                          23.39 ms │ +1.24x faster │
│ QQuery 4  │  17.85 ms │                          16.29 ms │ +1.10x faster │
│ QQuery 5  │  93.97 ms │                          43.91 ms │ +2.14x faster │
│ QQuery 6  │  17.08 ms │                          17.50 ms │     no change │
│ QQuery 7  │  90.73 ms │                          46.86 ms │ +1.94x faster │
│ QQuery 8  │  85.72 ms │                          36.05 ms │ +2.38x faster │
│ QQuery 9  │  74.19 ms │                          43.14 ms │ +1.72x faster │
│ QQuery 10 │  89.22 ms │                          39.76 ms │ +2.24x faster │
│ QQuery 11 │  13.64 ms │                           9.49 ms │ +1.44x faster │
│ QQuery 12 │  53.55 ms │                          28.44 ms │ +1.88x faster │
│ QQuery 13 │  20.46 ms │                          20.60 ms │     no change │
│ QQuery 14 │  44.52 ms │                          22.86 ms │ +1.95x faster │
│ QQuery 15 │  33.20 ms │                          27.10 ms │ +1.22x faster │
│ QQuery 16 │  12.82 ms │                          11.75 ms │ +1.09x faster │
│ QQuery 17 │  82.07 ms │                          50.03 ms │ +1.64x faster │
│ QQuery 18 │ 109.41 ms │                          62.02 ms │ +1.76x faster │
│ QQuery 19 │  39.01 ms │                          34.62 ms │ +1.13x faster │
│ QQuery 20 │  53.24 ms │                          26.53 ms │ +2.01x faster │
│ QQuery 21 │  76.87 ms │                          53.66 ms │ +1.43x faster │
│ QQuery 22 │   9.18 ms │                           8.46 ms │ +1.09x faster │
└───────────┴───────────┴───────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main)                                │ 1109.37ms │
│ Total Time (hash-join-buffering-on-probe-side)   │  675.51ms │
│ Average Time (main)                              │   50.43ms │
│ Average Time (hash-join-buffering-on-probe-side) │   30.71ms │
│ Queries Faster                                   │        20 │
│ Queries Slower                                   │         0 │
│ Queries with No Change                           │         2 │
│ Queries with Failure                             │         0 │
└──────────────────────────────────────────────────┴───────────┘
--------------------
Benchmark tpch_sf10.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      main ┃ hash-join-buffering-on-probe-side ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 333.88 ms │                         333.10 ms │     no change │
│ QQuery 2  │ 149.56 ms │                          95.79 ms │ +1.56x faster │
│ QQuery 3  │ 291.89 ms │                         272.45 ms │ +1.07x faster │
│ QQuery 4  │ 115.77 ms │                         116.32 ms │     no change │
│ QQuery 5  │ 435.41 ms │                         408.67 ms │ +1.07x faster │
│ QQuery 6  │ 122.00 ms │                         119.41 ms │     no change │
│ QQuery 7  │ 597.53 ms │                         554.64 ms │ +1.08x faster │
│ QQuery 8  │ 505.06 ms │                         447.98 ms │ +1.13x faster │
│ QQuery 9  │ 718.08 ms │                         664.75 ms │ +1.08x faster │
│ QQuery 10 │ 355.45 ms │                         318.31 ms │ +1.12x faster │
│ QQuery 11 │ 117.63 ms │                          87.23 ms │ +1.35x faster │
│ QQuery 12 │ 229.20 ms │                         197.97 ms │ +1.16x faster │
│ QQuery 13 │ 250.32 ms │                         219.43 ms │ +1.14x faster │
│ QQuery 14 │ 197.94 ms │                         173.28 ms │ +1.14x faster │
│ QQuery 15 │ 318.42 ms │                         288.27 ms │ +1.10x faster │
│ QQuery 16 │  85.11 ms │                          66.98 ms │ +1.27x faster │
│ QQuery 17 │ 723.73 ms │                         667.37 ms │ +1.08x faster │
│ QQuery 18 │ 794.77 ms │                         726.88 ms │ +1.09x faster │
│ QQuery 19 │ 320.78 ms │                         292.61 ms │ +1.10x faster │
│ QQuery 20 │ 293.52 ms │                         258.06 ms │ +1.14x faster │
│ QQuery 21 │ 786.11 ms │                         732.63 ms │ +1.07x faster │
│ QQuery 22 │  84.85 ms │                          79.90 ms │ +1.06x faster │
└───────────┴───────────┴───────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main)                                │ 7827.02ms │
│ Total Time (hash-join-buffering-on-probe-side)   │ 7122.04ms │
│ Average Time (main)                              │  355.77ms │
│ Average Time (hash-join-buffering-on-probe-side) │  323.73ms │
│ Queries Faster                                   │        19 │
│ Queries Slower                                   │         0 │
│ Queries with No Change                           │         3 │
│ Queries with Failure                             │         0 │
└──────────────────────────────────────────────────┴───────────┘

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) common Related to common crate execution Related to the execution crate proto Related to proto crate datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels Jan 12, 2026
@gabotechs
Copy link
Contributor Author

run benchmarks

@alamb-ghbot
Copy link

🤖 Hi @gabotechs, thanks for the request (#19761 (comment)). scrape_comments.py only responds to whitelisted users. Allowed users: Dandandan, Omega359, adriangb, alamb, comphead, geoffreyclaude, klion26, rluvaton, xudong963, zhuqi-lucas.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jan 12, 2026
@gabotechs
Copy link
Contributor Author

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing hash-join-buffering-on-probe-side (3e4660b) to 0c5c97b diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and hash-join-buffering-on-probe-side
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ hash-join-buffering-on-probe-side ┃        Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2479.50 ms │                        2365.91 ms │     no change │
│ QQuery 1 │   933.04 ms │                         961.61 ms │     no change │
│ QQuery 2 │  2128.72 ms │                        1828.41 ms │ +1.16x faster │
│ QQuery 3 │  1140.67 ms │                        1106.77 ms │     no change │
│ QQuery 4 │  2349.73 ms │                        2265.79 ms │     no change │
│ QQuery 5 │ 28477.94 ms │                       27819.90 ms │     no change │
│ QQuery 6 │  3913.85 ms │                        3886.72 ms │     no change │
│ QQuery 7 │  2907.17 ms │                        2857.38 ms │     no change │
└──────────┴─────────────┴───────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 44330.62ms │
│ Total Time (hash-join-buffering-on-probe-side)   │ 43092.50ms │
│ Average Time (HEAD)                              │  5541.33ms │
│ Average Time (hash-join-buffering-on-probe-side) │  5386.56ms │
│ Queries Faster                                   │          1 │
│ Queries Slower                                   │          0 │
│ Queries with No Change                           │          7 │
│ Queries with Failure                             │          0 │
└──────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ hash-join-buffering-on-probe-side ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     1.91 ms │                           1.94 ms │     no change │
│ QQuery 1  │    50.86 ms │                          51.03 ms │     no change │
│ QQuery 2  │   129.07 ms │                         131.06 ms │     no change │
│ QQuery 3  │   151.75 ms │                         154.89 ms │     no change │
│ QQuery 4  │  1070.04 ms │                        1218.71 ms │  1.14x slower │
│ QQuery 5  │  1377.65 ms │                        1501.78 ms │  1.09x slower │
│ QQuery 6  │     1.82 ms │                           1.87 ms │     no change │
│ QQuery 7  │    56.03 ms │                          61.22 ms │  1.09x slower │
│ QQuery 8  │  1423.84 ms │                        1561.18 ms │  1.10x slower │
│ QQuery 9  │  1748.54 ms │                        1871.82 ms │  1.07x slower │
│ QQuery 10 │   343.11 ms │                         350.58 ms │     no change │
│ QQuery 11 │   390.93 ms │                         400.26 ms │     no change │
│ QQuery 12 │  1249.28 ms │                        1460.10 ms │  1.17x slower │
│ QQuery 13 │  1916.12 ms │                        2067.22 ms │  1.08x slower │
│ QQuery 14 │  1214.64 ms │                        1359.01 ms │  1.12x slower │
│ QQuery 15 │  1224.35 ms │                        1382.17 ms │  1.13x slower │
│ QQuery 16 │  2587.35 ms │                        2651.10 ms │     no change │
│ QQuery 17 │  2481.42 ms │                        2645.83 ms │  1.07x slower │
│ QQuery 18 │  6019.63 ms │                        4969.84 ms │ +1.21x faster │
│ QQuery 19 │   118.04 ms │                         122.91 ms │     no change │
│ QQuery 20 │  1977.36 ms │                        1907.42 ms │     no change │
│ QQuery 21 │  2282.79 ms │                        2227.74 ms │     no change │
│ QQuery 22 │  4147.94 ms │                        3809.68 ms │ +1.09x faster │
│ QQuery 23 │ 18037.69 ms │                       12405.70 ms │ +1.45x faster │
│ QQuery 24 │   203.52 ms │                         236.74 ms │  1.16x slower │
│ QQuery 25 │   482.62 ms │                         517.70 ms │  1.07x slower │
│ QQuery 26 │   218.15 ms │                         233.60 ms │  1.07x slower │
│ QQuery 27 │  2805.96 ms │                        2772.92 ms │     no change │
│ QQuery 28 │ 22174.76 ms │                       21847.75 ms │     no change │
│ QQuery 29 │   977.94 ms │                         952.08 ms │     no change │
│ QQuery 30 │  1315.68 ms │                        1336.40 ms │     no change │
│ QQuery 31 │  1366.16 ms │                        1421.09 ms │     no change │
│ QQuery 32 │  5155.78 ms │                        4350.28 ms │ +1.19x faster │
│ QQuery 33 │  5715.37 ms │                        5687.38 ms │     no change │
│ QQuery 34 │  6016.46 ms │                        5853.35 ms │     no change │
│ QQuery 35 │  1918.61 ms │                        2098.03 ms │  1.09x slower │
│ QQuery 36 │    67.22 ms │                          70.35 ms │     no change │
│ QQuery 37 │    45.47 ms │                          49.54 ms │  1.09x slower │
│ QQuery 38 │    65.42 ms │                          68.24 ms │     no change │
│ QQuery 39 │   104.44 ms │                         111.22 ms │  1.06x slower │
│ QQuery 40 │    27.46 ms │                          27.62 ms │     no change │
│ QQuery 41 │    23.04 ms │                          24.38 ms │  1.06x slower │
│ QQuery 42 │    19.89 ms │                          21.71 ms │  1.09x slower │
└───────────┴─────────────┴───────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 98706.12ms │
│ Total Time (hash-join-buffering-on-probe-side)   │ 91995.42ms │
│ Average Time (HEAD)                              │  2295.49ms │
│ Average Time (hash-join-buffering-on-probe-side) │  2139.43ms │
│ Queries Faster                                   │          4 │
│ Queries Slower                                   │         18 │
│ Queries with No Change                           │         21 │
│ Queries with Failure                             │          0 │
└──────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ hash-join-buffering-on-probe-side ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 140.65 ms │                         101.97 ms │ +1.38x faster │
│ QQuery 2  │  37.21 ms │                          30.95 ms │ +1.20x faster │
│ QQuery 3  │  44.92 ms │                          32.31 ms │ +1.39x faster │
│ QQuery 4  │  31.87 ms │                          30.19 ms │ +1.06x faster │
│ QQuery 5  │  92.53 ms │                          94.55 ms │     no change │
│ QQuery 6  │  21.01 ms │                          20.99 ms │     no change │
│ QQuery 7  │ 157.97 ms │                         165.53 ms │     no change │
│ QQuery 8  │  41.01 ms │                          35.06 ms │ +1.17x faster │
│ QQuery 9  │ 102.50 ms │                          93.90 ms │ +1.09x faster │
│ QQuery 10 │  68.82 ms │                          67.90 ms │     no change │
│ QQuery 11 │  19.57 ms │                          17.92 ms │ +1.09x faster │
│ QQuery 12 │  52.47 ms │                          54.41 ms │     no change │
│ QQuery 13 │  50.52 ms │                          47.74 ms │ +1.06x faster │
│ QQuery 14 │  15.26 ms │                          15.25 ms │     no change │
│ QQuery 15 │  31.19 ms │                          30.51 ms │     no change │
│ QQuery 16 │  30.26 ms │                          28.22 ms │ +1.07x faster │
│ QQuery 17 │ 144.19 ms │                         150.21 ms │     no change │
│ QQuery 18 │ 286.83 ms │                         262.07 ms │ +1.09x faster │
│ QQuery 19 │  40.60 ms │                          41.31 ms │     no change │
│ QQuery 20 │  57.30 ms │                          56.06 ms │     no change │
│ QQuery 21 │ 188.92 ms │                         179.62 ms │     no change │
│ QQuery 22 │  22.42 ms │                          22.15 ms │     no change │
└───────────┴───────────┴───────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 1678.03ms │
│ Total Time (hash-join-buffering-on-probe-side)   │ 1578.79ms │
│ Average Time (HEAD)                              │   76.27ms │
│ Average Time (hash-join-buffering-on-probe-side) │   71.76ms │
│ Queries Faster                                   │        10 │
│ Queries Slower                                   │         0 │
│ Queries with No Change                           │        12 │
│ Queries with Failure                             │         0 │
└──────────────────────────────────────────────────┴───────────┘

@gabotechs
Copy link
Contributor Author

run benchmark tpcds tpch10

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing hash-join-buffering-on-probe-side (3e4660b) to 0c5c97b diff using: tpcds
Results will be posted here when complete

@alamb-ghbot
Copy link

Benchmark script failed with exit code 1.

Last 10 lines of output:

Click to expand
BRANCH_NAME: HEAD
DATA_DIR: /home/alamb/arrow-datafusion/benchmarks/data
RESULTS_DIR: /home/alamb/arrow-datafusion/benchmarks/results/HEAD
CARGO_COMMAND: cargo run --release
PREFER_HASH_JOIN: true
***************************

Please prepare TPC-DS data first by following instructions:
  ./bench.sh data tpcds

@gabotechs
Copy link
Contributor Author

run benchmark tpch10

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing hash-join-buffering-on-probe-side (3e4660b) to 0c5c97b diff using: tpch10
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and hash-join-buffering-on-probe-side
--------------------
Benchmark tpch_sf10.json
--------------------
┏━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃ HEAD ┃ hash-join-buffering-on-probe-side ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ FAIL │                              FAIL │ incomparable │
│ QQuery 2  │ FAIL │                              FAIL │ incomparable │
│ QQuery 3  │ FAIL │                              FAIL │ incomparable │
│ QQuery 4  │ FAIL │                              FAIL │ incomparable │
│ QQuery 5  │ FAIL │                              FAIL │ incomparable │
│ QQuery 6  │ FAIL │                              FAIL │ incomparable │
│ QQuery 7  │ FAIL │                              FAIL │ incomparable │
│ QQuery 8  │ FAIL │                              FAIL │ incomparable │
│ QQuery 9  │ FAIL │                              FAIL │ incomparable │
│ QQuery 10 │ FAIL │                              FAIL │ incomparable │
│ QQuery 11 │ FAIL │                              FAIL │ incomparable │
│ QQuery 12 │ FAIL │                              FAIL │ incomparable │
│ QQuery 13 │ FAIL │                              FAIL │ incomparable │
│ QQuery 14 │ FAIL │                              FAIL │ incomparable │
│ QQuery 15 │ FAIL │                              FAIL │ incomparable │
│ QQuery 16 │ FAIL │                              FAIL │ incomparable │
│ QQuery 17 │ FAIL │                              FAIL │ incomparable │
│ QQuery 18 │ FAIL │                              FAIL │ incomparable │
│ QQuery 19 │ FAIL │                              FAIL │ incomparable │
│ QQuery 20 │ FAIL │                              FAIL │ incomparable │
│ QQuery 21 │ FAIL │                              FAIL │ incomparable │
│ QQuery 22 │ FAIL │                              FAIL │ incomparable │
└───────────┴──────┴───────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark Summary                                ┃        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ Total Time (HEAD)                                │ 0.00ms │
│ Total Time (hash-join-buffering-on-probe-side)   │ 0.00ms │
│ Average Time (HEAD)                              │ 0.00ms │
│ Average Time (hash-join-buffering-on-probe-side) │ 0.00ms │
│ Queries Faster                                   │      0 │
│ Queries Slower                                   │      0 │
│ Queries with No Change                           │      0 │
│ Queries with Failure                             │     22 │
└──────────────────────────────────────────────────┴────────┘

@gabotechs
Copy link
Contributor Author

run benchmark tpch

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing hash-join-buffering-on-probe-side (3e4660b) to 0c5c97b diff using: tpch
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and hash-join-buffering-on-probe-side
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ hash-join-buffering-on-probe-side ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 186.54 ms │                         180.81 ms │     no change │
│ QQuery 2  │  92.79 ms │                          48.71 ms │ +1.90x faster │
│ QQuery 3  │ 129.28 ms │                         106.07 ms │ +1.22x faster │
│ QQuery 4  │  80.78 ms │                          74.64 ms │ +1.08x faster │
│ QQuery 5  │ 186.74 ms │                         163.71 ms │ +1.14x faster │
│ QQuery 6  │  70.54 ms │                          66.87 ms │ +1.06x faster │
│ QQuery 7  │ 222.50 ms │                         194.54 ms │ +1.14x faster │
│ QQuery 8  │ 175.16 ms │                         125.23 ms │ +1.40x faster │
│ QQuery 9  │ 231.17 ms │                         174.24 ms │ +1.33x faster │
│ QQuery 10 │ 190.18 ms │                         148.84 ms │ +1.28x faster │
│ QQuery 11 │  70.01 ms │                          46.31 ms │ +1.51x faster │
│ QQuery 12 │ 120.18 ms │                         109.09 ms │ +1.10x faster │
│ QQuery 13 │ 219.34 ms │                         204.01 ms │ +1.08x faster │
│ QQuery 14 │  95.98 ms │                          88.23 ms │ +1.09x faster │
│ QQuery 15 │ 132.46 ms │                         100.40 ms │ +1.32x faster │
│ QQuery 16 │  64.09 ms │                          46.41 ms │ +1.38x faster │
│ QQuery 17 │ 280.98 ms │                         211.97 ms │ +1.33x faster │
│ QQuery 18 │ 332.62 ms │                         271.65 ms │ +1.22x faster │
│ QQuery 19 │ 140.44 ms │                         130.87 ms │ +1.07x faster │
│ QQuery 20 │ 135.30 ms │                         100.57 ms │ +1.35x faster │
│ QQuery 21 │ 265.90 ms │                         234.12 ms │ +1.14x faster │
│ QQuery 22 │  41.36 ms │                          37.33 ms │ +1.11x faster │
└───────────┴───────────┴───────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 3464.36ms │
│ Total Time (hash-join-buffering-on-probe-side)   │ 2864.63ms │
│ Average Time (HEAD)                              │  157.47ms │
│ Average Time (hash-join-buffering-on-probe-side) │  130.21ms │
│ Queries Faster                                   │        21 │
│ Queries Slower                                   │         0 │
│ Queries with No Change                           │         1 │
│ Queries with Failure                             │         0 │
└──────────────────────────────────────────────────┴───────────┘

@gabotechs
Copy link
Contributor Author

run benchmark tpcds

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing hash-join-buffering-on-probe-side (3e4660b) to 0c5c97b diff using: tpcds
Results will be posted here when complete

@alamb-ghbot
Copy link

Benchmark script failed with exit code 1.

Last 10 lines of output:

Click to expand
BRANCH_NAME: HEAD
DATA_DIR: /home/alamb/arrow-datafusion/benchmarks/data
RESULTS_DIR: /home/alamb/arrow-datafusion/benchmarks/results/HEAD
CARGO_COMMAND: cargo run --release
PREFER_HASH_JOIN: true
***************************

Please prepare TPC-DS data first by following instructions:
  ./bench.sh data tpcds

@gabotechs
Copy link
Contributor Author

🤔 the tpcds benchmark command seems broken

@gabotechs
Copy link
Contributor Author

Big picture question: could this be part of the HashJoinExec code?

Potentially yes, although I do see value in having this be its own node, that way we can:

  • collect metrics associated to it independently to the hash join node
  • use it in more places

This is actually something we (DataDog) have had for a long time, we have a similar BufferExec node that we decide to place not only on probe side of hash joins, but also in other places for other parts of our plans.

@gabotechs gabotechs force-pushed the hash-join-buffering-on-probe-side branch from 09c6b68 to d8e32b1 Compare January 16, 2026 13:03
@gabotechs gabotechs force-pushed the hash-join-buffering-on-probe-side branch from 6fa7c76 to 139cf50 Compare January 16, 2026 13:24
@gabotechs
Copy link
Contributor Author

run benchmark tpcds tpch

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing hash-join-buffering-on-probe-side (139cf50) to ca904b3 diff using: tpcds
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and hash-join-buffering-on-probe-side
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ hash-join-buffering-on-probe-side ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │    73.20 ms │                          72.20 ms │     no change │
│ QQuery 2  │   209.88 ms │                         166.95 ms │ +1.26x faster │
│ QQuery 3  │   158.47 ms │                         155.49 ms │     no change │
│ QQuery 4  │  1866.97 ms │                        1397.66 ms │ +1.34x faster │
│ QQuery 5  │   281.39 ms │                         296.97 ms │  1.06x slower │
│ QQuery 6  │  1378.68 ms │                        1457.54 ms │  1.06x slower │
│ QQuery 7  │   495.94 ms │                         505.73 ms │     no change │
│ QQuery 8  │   174.50 ms │                         176.50 ms │     no change │
│ QQuery 9  │   302.23 ms │                         295.84 ms │     no change │
│ QQuery 10 │   179.18 ms │                         171.69 ms │     no change │
│ QQuery 11 │  1327.09 ms │                         857.65 ms │ +1.55x faster │
│ QQuery 12 │    70.95 ms │                          60.27 ms │ +1.18x faster │
│ QQuery 13 │   537.04 ms │                         505.94 ms │ +1.06x faster │
│ QQuery 14 │  1904.23 ms │                        1582.52 ms │ +1.20x faster │
│ QQuery 15 │    30.86 ms │                          32.44 ms │  1.05x slower │
│ QQuery 16 │    67.06 ms │                          58.85 ms │ +1.14x faster │
│ QQuery 17 │   362.76 ms │                         361.85 ms │     no change │
│ QQuery 18 │   192.36 ms │                         189.70 ms │     no change │
│ QQuery 19 │   231.27 ms │                         232.61 ms │     no change │
│ QQuery 20 │    27.68 ms │                          26.37 ms │     no change │
│ QQuery 21 │    40.11 ms │                          34.05 ms │ +1.18x faster │
│ QQuery 22 │   747.50 ms │                         703.86 ms │ +1.06x faster │
│ QQuery 23 │  1750.16 ms │                        1735.08 ms │     no change │
│ QQuery 24 │   655.13 ms │                         642.33 ms │     no change │
│ QQuery 25 │   523.30 ms │                         512.55 ms │     no change │
│ QQuery 26 │   128.50 ms │                         129.16 ms │     no change │
│ QQuery 27 │   491.21 ms │                         506.80 ms │     no change │
│ QQuery 28 │   287.94 ms │                         303.87 ms │  1.06x slower │
│ QQuery 29 │   444.97 ms │                         445.72 ms │     no change │
│ QQuery 30 │    75.39 ms │                          62.77 ms │ +1.20x faster │
│ QQuery 31 │   312.87 ms │                         320.73 ms │     no change │
│ QQuery 32 │    84.70 ms │                          85.83 ms │     no change │
│ QQuery 33 │   209.39 ms │                         209.49 ms │     no change │
│ QQuery 34 │   163.14 ms │                         149.48 ms │ +1.09x faster │
│ QQuery 35 │   180.20 ms │                         169.66 ms │ +1.06x faster │
│ QQuery 36 │   289.05 ms │                         303.29 ms │     no change │
│ QQuery 37 │   253.74 ms │                         276.29 ms │  1.09x slower │
│ QQuery 38 │   159.54 ms │                         139.11 ms │ +1.15x faster │
│ QQuery 39 │   212.38 ms │                         197.46 ms │ +1.08x faster │
│ QQuery 40 │   170.56 ms │                         162.39 ms │     no change │
│ QQuery 41 │    23.56 ms │                          21.79 ms │ +1.08x faster │
│ QQuery 42 │   145.50 ms │                         148.16 ms │     no change │
│ QQuery 43 │   128.18 ms │                         118.50 ms │ +1.08x faster │
│ QQuery 44 │    29.16 ms │                          28.17 ms │     no change │
│ QQuery 45 │    89.26 ms │                          84.75 ms │ +1.05x faster │
│ QQuery 46 │   323.26 ms │                         310.91 ms │     no change │
│ QQuery 47 │  1063.80 ms │                         611.34 ms │ +1.74x faster │
│ QQuery 48 │   412.79 ms │                         377.71 ms │ +1.09x faster │
│ QQuery 49 │   362.32 ms │                         347.63 ms │     no change │
│ QQuery 50 │   342.17 ms │                         336.72 ms │     no change │
│ QQuery 51 │   304.31 ms │                         294.95 ms │     no change │
│ QQuery 52 │   146.14 ms │                         144.14 ms │     no change │
│ QQuery 53 │   151.03 ms │                         151.82 ms │     no change │
│ QQuery 54 │   225.87 ms │                         221.13 ms │     no change │
│ QQuery 55 │   146.84 ms │                         145.32 ms │     no change │
│ QQuery 56 │   209.34 ms │                         208.06 ms │     no change │
│ QQuery 57 │   299.26 ms │                         241.25 ms │ +1.24x faster │
│ QQuery 58 │   470.84 ms │                         381.81 ms │ +1.23x faster │
│ QQuery 59 │   293.29 ms │                         250.60 ms │ +1.17x faster │
│ QQuery 60 │   216.46 ms │                         212.93 ms │     no change │
│ QQuery 61 │   244.54 ms │                         255.30 ms │     no change │
│ QQuery 62 │  1274.77 ms │                        1275.72 ms │     no change │
│ QQuery 63 │   151.99 ms │                         154.21 ms │     no change │
│ QQuery 64 │  1149.58 ms │                        1153.85 ms │     no change │
│ QQuery 65 │   351.55 ms │                         326.36 ms │ +1.08x faster │
│ QQuery 66 │   403.22 ms │                         417.72 ms │     no change │
│ QQuery 67 │   559.38 ms │                         537.67 ms │     no change │
│ QQuery 68 │   372.75 ms │                         371.33 ms │     no change │
│ QQuery 69 │   175.32 ms │                         163.03 ms │ +1.08x faster │
│ QQuery 70 │   505.85 ms │                         418.04 ms │ +1.21x faster │
│ QQuery 71 │   188.24 ms │                         186.10 ms │     no change │
│ QQuery 72 │  2056.33 ms │                        2028.52 ms │     no change │
│ QQuery 73 │   158.95 ms │                         149.71 ms │ +1.06x faster │
│ QQuery 74 │   840.78 ms │                         479.78 ms │ +1.75x faster │
│ QQuery 75 │   402.97 ms │                         403.47 ms │     no change │
│ QQuery 76 │   189.46 ms │                         194.31 ms │     no change │
│ QQuery 77 │   287.12 ms │                         300.86 ms │     no change │
│ QQuery 78 │   939.94 ms │                         692.03 ms │ +1.36x faster │
│ QQuery 79 │   327.22 ms │                         314.23 ms │     no change │
│ QQuery 80 │   506.49 ms │                         490.73 ms │     no change │
│ QQuery 81 │    52.42 ms │                          52.08 ms │     no change │
│ QQuery 82 │   288.81 ms │                         286.53 ms │     no change │
│ QQuery 83 │    81.20 ms │                          72.30 ms │ +1.12x faster │
│ QQuery 84 │    69.93 ms │                          65.07 ms │ +1.07x faster │
│ QQuery 85 │   222.94 ms │                         174.64 ms │ +1.28x faster │
│ QQuery 86 │    60.20 ms │                          58.43 ms │     no change │
│ QQuery 87 │   155.69 ms │                         139.57 ms │ +1.12x faster │
│ QQuery 88 │   276.47 ms │                         274.15 ms │     no change │
│ QQuery 89 │   173.29 ms │                         159.86 ms │ +1.08x faster │
│ QQuery 90 │    46.31 ms │                          46.87 ms │     no change │
│ QQuery 91 │    96.05 ms │                          70.57 ms │ +1.36x faster │
│ QQuery 92 │    85.03 ms │                          84.30 ms │     no change │
│ QQuery 93 │   264.55 ms │                         245.48 ms │ +1.08x faster │
│ QQuery 94 │    93.23 ms │                          89.59 ms │     no change │
│ QQuery 95 │   242.40 ms │                         237.81 ms │     no change │
│ QQuery 96 │   116.21 ms │                         111.83 ms │     no change │
│ QQuery 97 │   190.30 ms │                         171.59 ms │ +1.11x faster │
│ QQuery 98 │   219.09 ms │                         174.78 ms │ +1.25x faster │
│ QQuery 99 │ 14262.08 ms │                       14135.21 ms │     no change │
└───────────┴─────────────┴───────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 50517.49ms │
│ Total Time (hash-join-buffering-on-probe-side)   │ 47295.97ms │
│ Average Time (HEAD)                              │   510.28ms │
│ Average Time (hash-join-buffering-on-probe-side) │   477.74ms │
│ Queries Faster                                   │         37 │
│ Queries Slower                                   │          5 │
│ Queries with No Change                           │         57 │
│ Queries with Failure                             │          0 │
└──────────────────────────────────────────────────┴────────────┘

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing hash-join-buffering-on-probe-side (139cf50) to ca904b3 diff using: tpch
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and hash-join-buffering-on-probe-side
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ hash-join-buffering-on-probe-side ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 175.65 ms │                         179.02 ms │     no change │
│ QQuery 2  │  89.87 ms │                          78.50 ms │ +1.14x faster │
│ QQuery 3  │ 125.34 ms │                         122.27 ms │     no change │
│ QQuery 4  │  79.15 ms │                          74.96 ms │ +1.06x faster │
│ QQuery 5  │ 175.79 ms │                         181.39 ms │     no change │
│ QQuery 6  │  68.11 ms │                          62.60 ms │ +1.09x faster │
│ QQuery 7  │ 212.98 ms │                         210.80 ms │     no change │
│ QQuery 8  │ 163.59 ms │                         163.50 ms │     no change │
│ QQuery 9  │ 232.96 ms │                         229.13 ms │     no change │
│ QQuery 10 │ 185.74 ms │                         193.67 ms │     no change │
│ QQuery 11 │  63.32 ms │                          59.65 ms │ +1.06x faster │
│ QQuery 12 │ 120.67 ms │                         119.02 ms │     no change │
│ QQuery 13 │ 219.39 ms │                         206.11 ms │ +1.06x faster │
│ QQuery 14 │  89.87 ms │                          91.05 ms │     no change │
│ QQuery 15 │ 124.56 ms │                         101.46 ms │ +1.23x faster │
│ QQuery 16 │  60.16 ms │                          55.35 ms │ +1.09x faster │
│ QQuery 17 │ 271.62 ms │                         258.37 ms │     no change │
│ QQuery 18 │ 313.52 ms │                         306.53 ms │     no change │
│ QQuery 19 │ 135.88 ms │                         136.65 ms │     no change │
│ QQuery 20 │ 133.43 ms │                         124.26 ms │ +1.07x faster │
│ QQuery 21 │ 263.86 ms │                         244.11 ms │ +1.08x faster │
│ QQuery 22 │  42.09 ms │                          39.17 ms │ +1.07x faster │
└───────────┴───────────┴───────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 3347.55ms │
│ Total Time (hash-join-buffering-on-probe-side)   │ 3237.59ms │
│ Average Time (HEAD)                              │  152.16ms │
│ Average Time (hash-join-buffering-on-probe-side) │  147.16ms │
│ Queries Faster                                   │        10 │
│ Queries Slower                                   │         0 │
│ Queries with No Change                           │        12 │
│ Queries with Failure                             │         0 │
└──────────────────────────────────────────────────┴───────────┘

@gabotechs
Copy link
Contributor Author

run benchmark tpch10

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing hash-join-buffering-on-probe-side (139cf50) to ca904b3 diff using: tpch10
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and hash-join-buffering-on-probe-side
--------------------
Benchmark tpch_sf10.json
--------------------
┏━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃ HEAD ┃ hash-join-buffering-on-probe-side ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ FAIL │                              FAIL │ incomparable │
│ QQuery 2  │ FAIL │                              FAIL │ incomparable │
│ QQuery 3  │ FAIL │                              FAIL │ incomparable │
│ QQuery 4  │ FAIL │                              FAIL │ incomparable │
│ QQuery 5  │ FAIL │                              FAIL │ incomparable │
│ QQuery 6  │ FAIL │                              FAIL │ incomparable │
│ QQuery 7  │ FAIL │                              FAIL │ incomparable │
│ QQuery 8  │ FAIL │                              FAIL │ incomparable │
│ QQuery 9  │ FAIL │                              FAIL │ incomparable │
│ QQuery 10 │ FAIL │                              FAIL │ incomparable │
│ QQuery 11 │ FAIL │                              FAIL │ incomparable │
│ QQuery 12 │ FAIL │                              FAIL │ incomparable │
│ QQuery 13 │ FAIL │                              FAIL │ incomparable │
│ QQuery 14 │ FAIL │                              FAIL │ incomparable │
│ QQuery 15 │ FAIL │                              FAIL │ incomparable │
│ QQuery 16 │ FAIL │                              FAIL │ incomparable │
│ QQuery 17 │ FAIL │                              FAIL │ incomparable │
│ QQuery 18 │ FAIL │                              FAIL │ incomparable │
│ QQuery 19 │ FAIL │                              FAIL │ incomparable │
│ QQuery 20 │ FAIL │                              FAIL │ incomparable │
│ QQuery 21 │ FAIL │                              FAIL │ incomparable │
│ QQuery 22 │ FAIL │                              FAIL │ incomparable │
└───────────┴──────┴───────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark Summary                                ┃        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ Total Time (HEAD)                                │ 0.00ms │
│ Total Time (hash-join-buffering-on-probe-side)   │ 0.00ms │
│ Average Time (HEAD)                              │ 0.00ms │
│ Average Time (hash-join-buffering-on-probe-side) │ 0.00ms │
│ Queries Faster                                   │      0 │
│ Queries Slower                                   │      0 │
│ Queries with No Change                           │      0 │
│ Queries with Failure                             │     22 │
└──────────────────────────────────────────────────┴────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate documentation Improvements or additions to documentation execution Related to the execution crate optimizer Optimizer rules physical-plan Changes to the physical-plan crate proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants