[DF] Add support for filtered groupby aggregations #760

charlesbluca · 2022-09-13T19:00:28Z

Adds back support for filtered aggregations with FILTER (WHERE ...), which allows us to unxfail test_group_by_filtered and unblocks several related tests in #746 and #759.

charlesbluca · 2022-09-20T18:38:30Z

dask_sql/physical/rel/logical/aggregate.py

+                input_col = input_expr.column_name(input_rel)
+                if input_col in cc._frontend_backend_mapping:
+                    continue
+                random_name = new_temporary_column(df)


A potential issue with new_temporary_column that came to mind while working on this (I say potential because I'm unsure of the behavior of uuid.uuid4()):

Since we're using the table's columns attribute to check that a random column name hasn't been used yet, and we don't actually assign any of these random columns names until several of them have been generated, it is technically possible (though rare) to accidentally assign multiple input / filter columns to the same random backend name, which will certainly cause issues.

Since assign calls are expensive and we ideally want to be adding all required backend columns in a single go, it might make sense to refactor new_temporary_column to instead look at some attribute of the DataContainer or ColumnContainer to check for duplicates, which are both cheaper to update on the fly.

Don't intend to block this PR, but could be worthwhile to open an issue / TODO to handle this down the line.

Agreed that moving the check to dc or cc should make things significantly cheaper.
Based on the article here probability of name collision is almost negligible, especially at the scale at which we generate new columns

…filter-where

codecov-commenter · 2022-09-21T17:52:37Z

Codecov Report

❗ No coverage uploaded for pull request base (datafusion-sql-planner@528108c). Click here to learn what that means.
The diff coverage is n/a.

@@                    Coverage Diff                    @@
##             datafusion-sql-planner     #760   +/-   ##
=========================================================
  Coverage                          ?   75.55%           
=========================================================
  Files                             ?       73           
  Lines                             ?     3682           
  Branches                          ?      767           
=========================================================
  Hits                              ?     2782           
  Misses                            ?      766           
  Partials                          ?      134

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Uncomment test_group_by_filtered

40b861c

charlesbluca requested review from ayushdg and galipremsagar as code owners September 13, 2022 19:00

sarahyurick mentioned this pull request Sep 14, 2022

Resolve test_stats_aggregation #746

Merged

galipremsagar added 2 commits September 19, 2022 10:15

Merge branch 'datafusion-sql-planner' into df-filter-where

c6f6e06

Merge branch 'datafusion-sql-planner' into df-filter-where

4de1390

charlesbluca marked this pull request as draft September 20, 2022 14:11

charlesbluca mentioned this pull request Sep 20, 2022

[DF] Support complex queries with multiple DISTINCT aggregates #759

Merged

charlesbluca added 2 commits September 20, 2022 09:40

Enable filtered aggs, implement get_filter_expr

646b9e8

Compute filter columns in _collect_aggregations

665fe84

charlesbluca commented Sep 20, 2022

View reviewed changes

charlesbluca added 2 commits September 20, 2022 12:27

Resolve test failures

a5224e8

Add back in aggregate function assertion

ea797f2

charlesbluca marked this pull request as ready for review September 20, 2022 19:39

charlesbluca requested a review from jdye64 as a code owner September 20, 2022 19:39

Merge remote-tracking branch 'origin/datafusion-sql-planner' into df-…

cbeb7e3

…filter-where

galipremsagar approved these changes Sep 21, 2022

View reviewed changes

ayushdg merged commit 2af0b06 into dask-contrib:datafusion-sql-planner Sep 21, 2022

charlesbluca deleted the df-filter-where branch February 5, 2024 19:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DF] Add support for filtered groupby aggregations #760

[DF] Add support for filtered groupby aggregations #760

Uh oh!

charlesbluca commented Sep 13, 2022 •

edited

Loading

Uh oh!

charlesbluca Sep 20, 2022

Uh oh!

ayushdg Sep 20, 2022

Uh oh!

codecov-commenter commented Sep 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[DF] Add support for filtered groupby aggregations #760

[DF] Add support for filtered groupby aggregations #760

Uh oh!

Conversation

charlesbluca commented Sep 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charlesbluca Sep 20, 2022

Choose a reason for hiding this comment

Uh oh!

ayushdg Sep 20, 2022

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Sep 21, 2022

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

charlesbluca commented Sep 13, 2022 •

edited

Loading