Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single mode for multi column group by -- Almost 2x for ClickBench Q32 #11792

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

jayzhan211
Copy link
Contributor

@jayzhan211 jayzhan211 commented Aug 3, 2024

Which issue does this PR close?

Production ready PR

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ single-mode-groupby ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.43ms │              0.48ms │  1.11x slower │
│ QQuery 1     │    40.66ms │             41.20ms │     no change │
│ QQuery 2     │    78.22ms │             76.61ms │     no change │
│ QQuery 3     │    74.18ms │             61.82ms │ +1.20x faster │
│ QQuery 4     │   449.60ms │            421.74ms │ +1.07x faster │
│ QQuery 5     │   706.87ms │            708.16ms │     no change │
│ QQuery 6     │    37.11ms │             37.35ms │     no change │
│ QQuery 7     │    41.30ms │             39.29ms │     no change │
│ QQuery 8     │   757.87ms │            673.52ms │ +1.13x faster │
│ QQuery 9     │   675.59ms │            678.64ms │     no change │
│ QQuery 10    │   204.05ms │            193.12ms │ +1.06x faster │
│ QQuery 11    │   233.20ms │            238.46ms │     no change │
│ QQuery 12    │   772.65ms │            761.90ms │     no change │
│ QQuery 13    │  1386.74ms │           1080.38ms │ +1.28x faster │
│ QQuery 14    │  1047.01ms │            758.85ms │ +1.38x faster │
│ QQuery 15    │   530.34ms │            501.92ms │ +1.06x faster │
│ QQuery 16    │  1770.70ms │           1337.31ms │ +1.32x faster │
│ QQuery 17    │  1753.15ms │           1301.97ms │ +1.35x faster │
│ QQuery 18    │  4425.26ms │           2690.27ms │ +1.64x faster │
│ QQuery 19    │    67.44ms │             57.48ms │ +1.17x faster │
│ QQuery 20    │  1687.14ms │           1577.46ms │ +1.07x faster │
│ QQuery 21    │  1945.38ms │           1822.41ms │ +1.07x faster │
│ QQuery 22    │  4323.40ms │           4102.97ms │ +1.05x faster │
│ QQuery 23    │  8903.54ms │           8579.50ms │     no change │
│ QQuery 24    │   493.88ms │            605.72ms │  1.23x slower │
│ QQuery 25    │   498.31ms │            505.79ms │     no change │
│ QQuery 26    │   566.23ms │            580.24ms │     no change │
│ QQuery 27    │  1394.61ms │           1554.94ms │  1.11x slower │
│ QQuery 28    │ 10598.40ms │          10953.84ms │     no change │
│ QQuery 29    │   419.42ms │            423.21ms │     no change │
│ QQuery 30    │   861.82ms │            765.23ms │ +1.13x faster │
│ QQuery 31    │   971.98ms │            819.20ms │ +1.19x faster │
│ QQuery 32    │  9351.65ms │           4740.51ms │ +1.97x faster │
│ QQuery 33    │  4043.72ms │           4451.66ms │  1.10x slower │
│ QQuery 34    │  3632.00ms │           4386.58ms │  1.21x slower │
│ QQuery 35    │  1095.49ms │            999.99ms │ +1.10x faster │
│ QQuery 36    │   144.76ms │            145.26ms │     no change │
│ QQuery 37    │   103.17ms │            104.42ms │     no change │
│ QQuery 38    │   106.97ms │            105.39ms │     no change │
│ QQuery 39    │   385.52ms │            275.47ms │ +1.40x faster │
│ QQuery 40    │    34.35ms │             32.81ms │     no change │
│ QQuery 41    │    32.84ms │             31.17ms │ +1.05x faster │
│ QQuery 42    │    40.06ms │             41.04ms │     no change │
└──────────────┴────────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)                  │ 66687.00ms │
│ Total Time (single-mode-groupby)   │ 59265.28ms │
│ Average Time (main)                │  1550.86ms │
│ Average Time (single-mode-groupby) │  1378.26ms │
│ Queries Faster                     │         20 │
│ Queries Slower                     │          5 │
│ Queries with No Change             │         18 │
└────────────────────────────────────┴────────────┘

This change (should only effect multi column group by query) has nothing to do with QQuery 24, it should be considered noise

TODO

This change only change partial/final to single mode. Repartition is not included inside the group by operator.
Next step is to find out whether including repartition inside group by is helpful or not

Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Aug 3, 2024
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211 jayzhan211 changed the title Single mode for multi column group by Single mode for multi column group by -- Almost 2x for ClickBench Q32 Aug 3, 2024
let mut ctx = create_context(batches, Arc::clone(&schema)).unwrap();
b.iter(|| block_on(query(&mut ctx, "select a, b, count(*) from t group by a, b order by count(*) desc limit 10")))
});
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gnuplot not found, using plotters backend
benchmark high cardinality
                        time:   [273.00 ms 350.24 ms 451.67 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

benchmark low cardinality
                        time:   [15.764 ms 16.531 ms 17.145 ms]

Copy link
Contributor Author

@jayzhan211 jayzhan211 Aug 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main branch

Gnuplot not found, using plotters backend
Benchmarking benchmark high cardinality: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 11.0s.
benchmark high cardinality
                        time:   [565.74 ms 964.40 ms 1.3864 s]
                        change: [+69.461% +175.35% +332.28%] (p = 0.01 < 0.05)
                        Performance has regressed.

benchmark low cardinality
                        time:   [10.736 ms 11.140 ms 11.577 ms]
                        change: [-35.106% -31.565% -28.118%] (p = 0.00 < 0.05)
                        Performance has improved.

This PR has slightly regression for low cardinality but huge gain for high cardinality

@jayzhan211 jayzhan211 marked this pull request as ready for review August 4, 2024 02:14
@alamb
Copy link
Contributor

alamb commented Aug 4, 2024

Thanks @jayzhan211 -- this looks quite interesting,. I will try and study this PR carefully early this week

@alamb
Copy link
Contributor

alamb commented Aug 5, 2024

I am running some benchmarks on this one

@alamb
Copy link
Contributor

alamb commented Aug 5, 2024

I also measured some non trivial improvements (16 core) along with some slowdowns

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ single-mode-groupby ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.65ms │              0.66ms │     no change │
│ QQuery 1     │    70.34ms │             69.76ms │     no change │
│ QQuery 2     │   124.93ms │            125.33ms │     no change │
│ QQuery 3     │   131.01ms │            129.34ms │     no change │
│ QQuery 4     │   917.67ms │            971.24ms │  1.06x slower │
│ QQuery 5     │  1061.52ms │           1103.04ms │     no change │
│ QQuery 6     │    65.61ms │             67.14ms │     no change │
│ QQuery 7     │    73.85ms │             72.77ms │     no change │
│ QQuery 8     │  1422.06ms │           1199.52ms │ +1.19x faster │
│ QQuery 9     │  1281.64ms │           1347.37ms │  1.05x slower │
│ QQuery 10    │   446.75ms │            449.22ms │     no change │
│ QQuery 11    │   479.20ms │            512.66ms │  1.07x slower │
│ QQuery 12    │  1149.46ms │           1148.56ms │     no change │
│ QQuery 13    │  2360.71ms │           1915.35ms │ +1.23x faster │
│ QQuery 14    │  1549.49ms │           1217.84ms │ +1.27x faster │
│ QQuery 15    │  1054.82ms │           1090.16ms │     no change │
│ QQuery 16    │  2843.22ms │           2396.08ms │ +1.19x faster │
│ QQuery 17    │  2707.32ms │           2313.85ms │ +1.17x faster │
│ QQuery 18    │  5627.90ms │           4249.07ms │ +1.32x faster │
│ QQuery 19    │   119.94ms │            122.76ms │     no change │
│ QQuery 20    │  1614.43ms │           1635.75ms │     no change │
│ QQuery 21    │  1994.01ms │           2007.84ms │     no change │
│ QQuery 22    │  4810.56ms │           4824.01ms │     no change │
│ QQuery 23    │ 11246.33ms │          11301.58ms │     no change │
│ QQuery 24    │   754.54ms │            745.52ms │     no change │
│ QQuery 25    │   673.00ms │            663.95ms │     no change │
│ QQuery 26    │   825.60ms │            812.36ms │     no change │
│ QQuery 27    │  2487.83ms │           2442.42ms │     no change │
│ QQuery 28    │ 15504.67ms │          15553.00ms │     no change │
│ QQuery 29    │   565.35ms │            563.80ms │     no change │
│ QQuery 30    │  1286.72ms │           1109.49ms │ +1.16x faster │
│ QQuery 31    │  1608.83ms │           1251.64ms │ +1.29x faster │
│ QQuery 32    │  7555.94ms │           4070.29ms │ +1.86x faster │
│ QQuery 33    │  4877.63ms │           4743.06ms │     no change │
│ QQuery 34    │  4887.61ms │           4842.37ms │     no change │
│ QQuery 35    │  1763.00ms │           1464.69ms │ +1.20x faster │
│ QQuery 36    │   314.89ms │            322.15ms │     no change │
│ QQuery 37    │   219.26ms │            219.07ms │     no change │
│ QQuery 38    │   184.48ms │            189.16ms │     no change │
│ QQuery 39    │   990.12ms │            601.18ms │ +1.65x faster │
│ QQuery 40    │    85.76ms │             81.27ms │ +1.06x faster │
│ QQuery 41    │    79.70ms │             77.11ms │     no change │
│ QQuery 42    │    95.45ms │             93.21ms │     no change │
└──────────────┴────────────┴─────────────────────┴───────────────┘

It seems to have hurt TPCH a bit more:

--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ single-mode-groupby ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  104.42ms │            371.48ms │  3.56x slower │
│ QQuery 2     │   24.85ms │             25.11ms │     no change │
│ QQuery 3     │   39.93ms │             42.32ms │  1.06x slower │
│ QQuery 4     │   32.57ms │             32.83ms │     no change │
│ QQuery 5     │   62.08ms │             60.75ms │     no change │
│ QQuery 6     │    8.38ms │              8.65ms │     no change │
│ QQuery 7     │  115.87ms │            112.41ms │     no change │
│ QQuery 8     │   27.71ms │             27.06ms │     no change │
│ QQuery 9     │   62.80ms │             64.47ms │     no change │
│ QQuery 10    │   72.36ms │             69.71ms │     no change │
│ QQuery 11    │   63.73ms │             63.81ms │     no change │
│ QQuery 12    │   28.73ms │             28.54ms │     no change │
│ QQuery 13    │   39.70ms │             39.93ms │     no change │
│ QQuery 14    │   11.17ms │             11.14ms │     no change │
│ QQuery 15    │   20.03ms │             21.21ms │  1.06x slower │
│ QQuery 16    │   26.95ms │             23.04ms │ +1.17x faster │
│ QQuery 17    │   95.26ms │            102.77ms │  1.08x slower │
│ QQuery 18    │  226.12ms │            237.74ms │  1.05x slower │
│ QQuery 19    │   28.43ms │             29.93ms │  1.05x slower │
│ QQuery 20    │   45.35ms │             35.96ms │ +1.26x faster │
│ QQuery 21    │  172.54ms │            168.60ms │     no change │
│ QQuery 22    │   13.48ms │             14.03ms │     no change │
└──────────────┴───────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main_base)             │ 1322.44ms │
│ Total Time (single-mode-groupby)   │ 1591.48ms │
│ Average Time (main_base)           │   60.11ms │
│ Average Time (single-mode-groupby) │   72.34ms │
│ Queries Faster                     │         2 │
│ Queries Slower                     │         6 │
│ Queries with No Change             │        14 │
└────────────────────────────────────┴───────────┘

--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ single-mode-groupby ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  227.72ms │            330.23ms │  1.45x slower │
│ QQuery 2     │  125.12ms │            128.35ms │     no change │
│ QQuery 3     │  124.25ms │            125.15ms │     no change │
│ QQuery 4     │   95.52ms │             93.86ms │     no change │
│ QQuery 5     │  172.74ms │            174.37ms │     no change │
│ QQuery 6     │   58.73ms │             63.04ms │  1.07x slower │
│ QQuery 7     │  203.25ms │            217.22ms │  1.07x slower │
│ QQuery 8     │  158.45ms │            164.02ms │     no change │
│ QQuery 9     │  257.60ms │            258.15ms │     no change │
│ QQuery 10    │  230.00ms │            219.76ms │     no change │
│ QQuery 11    │   96.91ms │            101.00ms │     no change │
│ QQuery 12    │  133.99ms │            129.95ms │     no change │
│ QQuery 13    │  282.25ms │            288.57ms │     no change │
│ QQuery 14    │   87.25ms │             85.62ms │     no change │
│ QQuery 15    │  118.44ms │            149.96ms │  1.27x slower │
│ QQuery 16    │   84.88ms │             62.51ms │ +1.36x faster │
│ QQuery 17    │  221.45ms │            218.67ms │     no change │
│ QQuery 18    │  315.36ms │            324.40ms │     no change │
│ QQuery 19    │  159.36ms │            156.41ms │     no change │
│ QQuery 20    │  143.00ms │            122.93ms │ +1.16x faster │
│ QQuery 21    │  278.87ms │            291.86ms │     no change │
│ QQuery 22    │   67.32ms │             68.52ms │     no change │
└──────────────┴───────────┴─────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                  ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main_base)             │ 3642.44ms │
│ Total Time (single-mode-groupby)   │ 3774.56ms │
│ Average Time (main_base)           │  165.57ms │
│ Average Time (single-mode-groupby) │  171.57ms │
│ Queries Faster                     │         2 │
│ Queries Slower                     │         4 │
│ Queries with No Change             │        16 │
└────────────────────────────────────┴───────────┘

Maybe we can do some profiling and figure out if there is some way to get back the performance on the queries it is slower for

@alamb
Copy link
Contributor

alamb commented Aug 8, 2024

Here are some more thoughts about why this approach works and an alternate idea of how we can make these queries all faster: #11680 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants