Impl `convert_to_state` for `GroupsAccumulatorAdapter` (faster median for high cardinality aggregates) #11827

Rachelint · 2024-08-05T18:30:39Z

Which issue does this PR close?

Closes #11819

Rationale for this change

See #11819

What changes are included in this PR?

Impl convert_to_state for GroupsAccumulatorAdapter.

Are these changes tested?

Test by exists.

Are there any user-facing changes?

No.

alamb · 2024-08-05T19:21:09Z

datafusion/physical-expr/src/aggregate/groups_accumulator/adapter.rs

@@ -342,6 +374,50 @@ impl GroupsAccumulator for GroupsAccumulatorAdapter {
    fn size(&self) -> usize {
        self.allocation_bytes
    }
+
+    fn convert_to_state(


thanks @Rachelint

This might be interesting to you: #11825

thanks @Rachelint

This might be interesting to you: #11825

Seems interesting, learning this pr (still not so familiar with arrow ops).

Yeah, it takes some getting used to thinking in terms of Arrays and masks, etc

Sorry for delay, fixed.

Found the quick way using filtered_null_mask + set_nulls will just set the filtered row to be null, but not change the row num.

// `filtered_null_mask` + `set_nulls` left: PrimitiveArray<Int32> [ null, null, 6, null, null, null, ] // `compute::filter` right: PrimitiveArray<Int32> [ null, 6, ]

Maybe this will make difference to the correctness for some accumulators? For example, an udf count which thinks a null row as 1?

…inct`.

…rox_distinct`.

alamb

Thank you @Rachelint -- this looks very cool . I am sorry for the delay in the review

It is my understanding that this will allow aggregates that do not yet implement GroupsAccumulator to benefit from the intermediate aggregate state.

Thus the primary benefit of this code is to make aggregates on such queries faster.

Unfortunately I don't think we have any examples of such aggregates in the benchmarks (e.g. calculating median or approx_median). I will make a PR to add some to see if we can measure improvement of this PR

cc @korowa

Rachelint · 2024-09-11T15:57:31Z

Thank you @Rachelint -- this looks very cool . I am sorry for the delay in the review

It is my understanding that this will allow aggregates that do not yet implement GroupsAccumulator to benefit from the intermediate aggregate state.

Thus the primary benefit of this code is to make aggregates on such queries faster.

Unfortunately I don't think we have any examples of such aggregates in the benchmarks (e.g. calculating median or approx_median). I will make a PR to add some to see if we can measure improvement of this PR

cc @korowa

Sounds great! And we can continue to improve the performance after having such benchmarks.

alamb · 2024-09-11T16:18:54Z

I created a proposal in #12438

I tested a little locally on the PR here and it seems like this PR does not improve the performance much.

Rachelint · 2024-09-11T16:28:31Z

I created a proposal in #12438

I tested a little locally on the PR here and it seems like this PR does not improve the performance much.

Ok, I will check it in my local soon.

alamb · 2024-09-11T16:46:09Z

I created a proposal in #12438
I tested a little locally on the PR here and it seems like this PR does not improve the performance much.

Ok, I will check it in my local soon.

It may be that the query doesn't show the correct pattern of high cardinality intermediate aggregates, btw. I am not sure

Rachelint · 2024-09-12T17:42:33Z

I found I can' t run it successfully in my local... #12438
The memory usage get higher and higher, and swap seems to be triggered, and make it even more slower... Finally, it almost can't finish forever...
I am trying to find the reason.

alamb · 2024-09-12T17:47:25Z

I am trying to find the reason.

It is probably because there are 17M groups which each is holding multiple aggregates each with non trivial state 🤔

Rachelint · 2024-09-12T18:36:45Z

I am trying to find the reason.

It is probably because there are 17M groups which each is holding multiple aggregates each with non trivial state 🤔

Yes... I found the state is much larger than the simple accumulators(e.g. count, sum, average).
For example, the digest for approx_percentile_cont:

pub struct TDigest {
    centroids: Vec<Centroid>,
    max_size: usize,
    sum: f64,
    count: u64,
    max: f64,
    min: f64,
}

Rachelint · 2024-09-12T19:10:42Z

I create a subset of hits_partitioned, and using this smaller dataset, it can success to run the q32 like high cardinality query now.

SELECT "WatchID", "ClientIP", COUNT(*) AS c, approx_median("ResponseStartTiming") tmed FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10;

Rachelint · 2024-09-12T19:36:19Z

This is the number from my local test with a subset(15%) of hits_partitioned:

// Main(3ece7a736193)
Q5: SELECT "WatchID", "ClientIP", COUNT(*) AS c, approx_median("ResponseStartTiming") tmed FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10;
Query 5 iteration 0 took 17567.8 ms and returned 10 rows
Query 5 iteration 1 took 17462.8 ms and returned 10 rows
Query 5 iteration 2 took 17442.4 ms and returned 10 rows
Query 5 iteration 3 took 17569.0 ms and returned 10 rows
Query 5 iteration 4 took 17527.8 ms and returned 10 rows

// This pr's branch(have rebased 3ece7a736193)
Q5: SELECT "WatchID", "ClientIP", COUNT(*) AS c, approx_median("ResponseStartTiming") tmed FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10;
Query 5 iteration 0 took 12093.5 ms and returned 10 rows
Query 5 iteration 1 took 12234.2 ms and returned 10 rows
Query 5 iteration 2 took 11951.3 ms and returned 10 rows
Query 5 iteration 3 took 12066.6 ms and returned 10 rows
Query 5 iteration 4 took 11963.1 ms and returned 10 rows

alamb · 2024-09-13T19:26:15Z

I just ran this with some new queries on #12438 and this branch goes about 2x faster

Elapsed 9.777 seconds. --> Elapsed 5.361 seconds.
Elapsed 6.919 seconds. --> Elapsed 3.952 seconds.

./datafusion-cli-intermediate-state   -f tq1.sql
DataFusion CLI v41.0.0
0 row(s) fetched.
Elapsed 0.019 seconds.

+-------------+---------------------+---+------+------+------+
| ClientIP    | WatchID             | c | tmin | tp95 | tmax |
+-------------+---------------------+---+------+------+------+
| 1611957945  | 6655575552203051303 | 2 | 0    | 0    | 0    |
| -1402644643 | 8566928176839891583 | 2 | 0    | 0    | 0    |
+-------------+---------------------+---+------+------+------+
2 row(s) fetched.
Elapsed 5.361 seconds.

+-------------+---------------------+---+------+------+------+
| ClientIP    | WatchID             | c | tmin | tmed | tmax |
+-------------+---------------------+---+------+------+------+
| 1611957945  | 6655575552203051303 | 2 | 0    | 0    | 0    |
| -1402644643 | 8566928176839891583 | 2 | 0    | 0    | 0    |
+-------------+---------------------+---+------+------+------+
2 row(s) fetched.
Elapsed 3.952 seconds.

andrewlamb@Andrews-MacBook-Pro-2 Downloads % datafusion-cli    -f tq1.sql
datafusion-cli    -f tq1.sql
DataFusion CLI v41.0.0
0 row(s) fetched.
Elapsed 0.020 seconds.

+-------------+---------------------+---+------+------+------+
| ClientIP    | WatchID             | c | tmin | tp95 | tmax |
+-------------+---------------------+---+------+------+------+
| 1611957945  | 6655575552203051303 | 2 | 0    | 0    | 0    |
| -1402644643 | 8566928176839891583 | 2 | 0    | 0    | 0    |
+-------------+---------------------+---+------+------+------+
2 row(s) fetched.
Elapsed 9.777 seconds.

+-------------+---------------------+---+------+------+------+
| ClientIP    | WatchID             | c | tmin | tmed | tmax |
+-------------+---------------------+---+------+------+------+
| 1611957945  | 6655575552203051303 | 2 | 0    | 0    | 0    |
| -1402644643 | 8566928176839891583 | 2 | 0    | 0    | 0    |
+-------------+---------------------+---+------+------+------+
2 row(s) fetched.
Elapsed 6.919 seconds.

alamb

Thank you @Rachelint and @korowa

While I think the real way to make MEDIAN and APPROX_PERCENTILE_CONT etc faster is to implement GroupsAccumulator, this PR makes them faster for certain cases.

Nice work. Thanks again and sorry for the delay in reviewing while we sorted out benchmarking

alamb · 2024-09-13T19:30:31Z

datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs

+        let mut results = vec![];
+        for row_idx in 0..num_rows {
+            // Create the empty accumulator for converting
+            let mut converted_accumulator = (self.factory)()?;


as a follow on PR I wonder if we could potentially to improve performance by adding a clear() or reset() type function to each accumulator to avoid having to create a new accumulator for each group.

Yes, I want to reuse the converted_accumulator at the beginning, but it is not ensure that the state will be reset after calling state.
It is clever to add such function to do the reset work explicitly.

alamb · 2024-09-15T12:00:29Z

The 42.0.0 release has been cut -- let's start the code flowing!

alamb · 2024-09-15T12:00:35Z

Thanks again @Rachelint

github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions labels Aug 5, 2024

alamb reviewed Aug 5, 2024

View reviewed changes

Rachelint force-pushed the support-convert-to-state-for-adapter branch from f29d0bb to 76f5aba Compare August 13, 2024 15:06

github-actions bot added sqllogictest SQL Logic Tests (.slt) functions and removed logical-expr Logical plan and expressions labels Aug 13, 2024

Rachelint added 2 commits September 3, 2024 20:43

make a draft for convert_to_state in GroupsAccumulatorAdapter.

3792645

tmp

b8ccaa6

Rachelint force-pushed the support-convert-to-state-for-adapter branch from 76f5aba to be5316f Compare September 4, 2024 15:38

github-actions bot removed the physical-expr Physical Expressions label Sep 4, 2024

use filter nulls to impl quick filter for some arrays.

0396fc4

Rachelint force-pushed the support-convert-to-state-for-adapter branch from be5316f to 0396fc4 Compare September 4, 2024 15:39

add unique group by test for median, approx_median, `approx_dist…

b964757

…inct`.

github-actions bot added sqllogictest SQL Logic Tests (.slt) and removed sqllogictest SQL Logic Tests (.slt) labels Sep 4, 2024

Rachelint added 3 commits September 5, 2024 00:18

add normal cases & nullable cases for median, approx_median, `app…

792160a

…rox_distinct`.

add filter cases for median, approx_median, approx_distinct.

974f576

fix clippy.

7594c02

github-actions bot removed the functions label Sep 5, 2024

fix fmt.

50013f7

Rachelint marked this pull request as ready for review September 5, 2024 04:26

Rachelint added 5 commits September 5, 2024 12:28

add todo.

1e52308

fix comments.

a55f7b4

fallback to filter kernal for general.

e85e82d

remove unused imports.

1a2e192

remove unused Array.

179f8f8

Rachelint force-pushed the support-convert-to-state-for-adapter branch from fc2a0b5 to 179f8f8 Compare September 5, 2024 10:29

This was referenced Sep 9, 2024

DataFusion weekly project plan (Andrew Lamb) - Sep 9, 2024 #12391

Closed

DataFusion weekly project plan (Andrew Lamb) - Sep 2, 2024 #12336

Closed

alamb reviewed Sep 11, 2024

View reviewed changes

alamb mentioned this pull request Sep 11, 2024

Add "Extended Clickbench" benchmark for median and approx_median for high cardinality aggregates #12438

Merged

Rachelint mentioned this pull request Sep 13, 2024

Quick clickbench with a smaller dataset #12455

Open

alamb changed the title ~~Impl convert_to_state for GroupsAccumulatorAdapter.~~ Impl convert_to_state for GroupsAccumulatorAdapter (faster median for high cardinality aggregates) Sep 13, 2024

alamb approved these changes Sep 13, 2024

View reviewed changes

alamb merged commit f48e0b2 into apache:main Sep 15, 2024
26 checks passed

Rachelint deleted the support-convert-to-state-for-adapter branch September 16, 2024 08:00

alamb mentioned this pull request Sep 16, 2024

DataFusion weekly project plan (Andrew Lamb) - Sep 16, 2024 #12494

Open

8 tasks

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impl `convert_to_state` for `GroupsAccumulatorAdapter` (faster median for high cardinality aggregates) #11827

Impl `convert_to_state` for `GroupsAccumulatorAdapter` (faster median for high cardinality aggregates) #11827

Rachelint commented Aug 5, 2024 •

edited

Loading

alamb Aug 5, 2024

Rachelint Aug 6, 2024

alamb Aug 8, 2024

Rachelint Sep 5, 2024

Rachelint Sep 5, 2024 •

edited

Loading

alamb left a comment

Rachelint commented Sep 11, 2024

alamb commented Sep 11, 2024

Rachelint commented Sep 11, 2024

alamb commented Sep 11, 2024

Rachelint commented Sep 12, 2024 •

edited

Loading

alamb commented Sep 12, 2024

Rachelint commented Sep 12, 2024 •

edited

Loading

Rachelint commented Sep 12, 2024

Rachelint commented Sep 12, 2024

alamb commented Sep 13, 2024

alamb left a comment

alamb Sep 13, 2024

Rachelint Sep 14, 2024

alamb commented Sep 15, 2024

alamb commented Sep 15, 2024

Impl convert_to_state for GroupsAccumulatorAdapter (faster median for high cardinality aggregates) #11827

Impl convert_to_state for GroupsAccumulatorAdapter (faster median for high cardinality aggregates) #11827

Conversation

Rachelint commented Aug 5, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb Aug 5, 2024

Choose a reason for hiding this comment

Rachelint Aug 6, 2024

Choose a reason for hiding this comment

alamb Aug 8, 2024

Choose a reason for hiding this comment

Rachelint Sep 5, 2024

Choose a reason for hiding this comment

Rachelint Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

Rachelint commented Sep 11, 2024

alamb commented Sep 11, 2024

Rachelint commented Sep 11, 2024

alamb commented Sep 11, 2024

Rachelint commented Sep 12, 2024 • edited Loading

alamb commented Sep 12, 2024

Rachelint commented Sep 12, 2024 • edited Loading

Rachelint commented Sep 12, 2024

Rachelint commented Sep 12, 2024

alamb commented Sep 13, 2024

alamb left a comment

Choose a reason for hiding this comment

alamb Sep 13, 2024

Choose a reason for hiding this comment

Rachelint Sep 14, 2024

Choose a reason for hiding this comment

alamb commented Sep 15, 2024

alamb commented Sep 15, 2024

Impl `convert_to_state` for `GroupsAccumulatorAdapter` (faster median for high cardinality aggregates) #11827

Impl `convert_to_state` for `GroupsAccumulatorAdapter` (faster median for high cardinality aggregates) #11827

Rachelint commented Aug 5, 2024 •

edited

Loading

Rachelint Sep 5, 2024 •

edited

Loading

Rachelint commented Sep 12, 2024 •

edited

Loading

Rachelint commented Sep 12, 2024 •

edited

Loading