Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Built-in sum and Rename to lowercase sum #10831

Merged
merged 6 commits into from
Jun 8, 2024

Conversation

jayzhan211
Copy link
Contributor

@jayzhan211 jayzhan211 commented Jun 8, 2024

Which issue does this PR close?

Close #10731 .
Part of #10695

Rationale for this change

What changes are included in this PR?

Since we need stub fo test in datafusion::expr, so I move it to expr crate

Although the change looks scary, but most of them are changes to lowercase name

Are these changes tested?

Are there any user-facing changes?

Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate labels Jun 8, 2024
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) substrait labels Jun 8, 2024
@@ -621,17 +592,6 @@ mod tests {
assert_eq!(*input_type, result.unwrap());
}
}
// test sum
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need coercion test, because they are handled in slt query already.

@@ -54,6 +54,7 @@ strum = { version = "0.26.1", features = ["derive"] }
[dev-dependencies]
ctor = { workspace = true }
datafusion-functions = { workspace = true, default-features = true }
datafusion-functions-aggregate = { workspace = true }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imported for test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok.

We have had bad experiences in the past with making semi-circular dependencies for tests (as I recall it messes up the release process to crates.io. THough maybe this one is ok, because datafusion-functions-aggregate doesn't depend on datafusion-sql either directly or transitively

If we are going to go with this approach, perhaps we could remove the stubs added in #10816 🤔

@@ -43,6 +43,7 @@ prost = "0.12"
substrait = { version = "0.34.0", features = ["serde"] }

[dev-dependencies]
datafusion-functions-aggregate = { workspace = true }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imported for test

Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211 jayzhan211 marked this pull request as ready for review June 8, 2024 10:41
@jayzhan211 jayzhan211 changed the title Remove builtin Sum Remove Built-in sum and Rename to lowercase sum Jun 8, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 -- looks great to me

@@ -54,6 +54,7 @@ strum = { version = "0.26.1", features = ["derive"] }
[dev-dependencies]
ctor = { workspace = true }
datafusion-functions = { workspace = true, default-features = true }
datafusion-functions-aggregate = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok.

We have had bad experiences in the past with making semi-circular dependencies for tests (as I recall it messes up the release process to crates.io. THough maybe this one is ok, because datafusion-functions-aggregate doesn't depend on datafusion-sql either directly or transitively

If we are going to go with this approach, perhaps we could remove the stubs added in #10816 🤔

}),
r#"SUM(a)"#,
),
(sum(col("a")), r#"sum(a)"#),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

@jayzhan211
Copy link
Contributor Author

If we are going to go with this approach, perhaps we could remove the stubs added in #10816 🤔

It is nice if we don't need stubs 👍

@jayzhan211
Copy link
Contributor Author

I found it is not too trivial and need review, so I will merge this one first and file another PR for removing stubs

@jayzhan211
Copy link
Contributor Author

Thanks @alamb

@jayzhan211 jayzhan211 merged commit 6b70214 into apache:main Jun 8, 2024
27 checks passed
@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Jun 8, 2024

It seems I cant remove stub for test in datafusion/expr/src/utils.rs because we can't import datafusion-aggregate (which depends on datafusion-expr) in dev-dependency. We should move the test to other crate before removing stub.

@jayzhan211 jayzhan211 mentioned this pull request Jun 11, 2024
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* rm sum

Signed-off-by: jayzhan211 <[email protected]>

* mv stub to df:expr

Signed-off-by: jayzhan211 <[email protected]>

* fix sql example

Signed-off-by: jayzhan211 <[email protected]>

* lowercase in slt

Signed-off-by: jayzhan211 <[email protected]>

* rename to lowercase

Signed-off-by: jayzhan211 <[email protected]>

* rename stl in tpch

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this pull request Jul 24, 2024
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this pull request Jul 25, 2024
andygrove pushed a commit to apache/datafusion-python that referenced this pull request Jul 31, 2024
* chore: update datafusion deps

* feat: impl ExecutionPlan::static_name() for DatasetExec

This required trait method was added upstream [0] and recommends to simply forward to `static_name`.

[0]: apache/datafusion#10266

* feat: update first_value and last_value wrappers.

Upstream signatures were changed for the new new `AggregateBuilder` api [0].

This simply gets the code to work. We should better incorporate that API into `datafusion-python`.

[0] apache/datafusion#10560

* migrate count to UDAF

Builtin Count was removed upstream.

TBD whether we want to re-implement `count_star` with new API.

Ref: apache/datafusion#10893

* migrate approx_percentile_cont, approx_distinct, and approx_median to UDAF

Ref: approx_distinct apache/datafusion#10851
Ref: approx_median apache/datafusion#10840
Ref: approx_percentile_cont and _with_weight apache/datafusion#10917

* migrate avg to UDAF

Ref: apache/datafusion#10964

* migrage corr to UDAF

Ref: apache/datafusion#10884

* migrate grouping to UDAF

Ref: apache/datafusion#10906

* add alias `mean` for UDAF `avg`

* migrate stddev to UDAF

Ref: apache/datafusion#10827

* remove rust alias for stddev

The python wrapper now provides stddev_samp alias.

* migrage var_pop to UDAF

Ref: apache/datafusion#10836

* migrate regr_* functions to UDAF

Ref: apache/datafusion#10898

* migrate bitwise functions to UDAF

The functions now take a single expression instead of a Vec<_>.

Ref: apache/datafusion#10930

* add missing variants for ScalarValue with todo

* fix typo in approx_percentile_cont

* add distinct arg to count

* comment out failing test

`approx_percentile_cont` is now returning a DoubleArray instead of an IntArray.

This may be a bug upstream; it requires further investigation.

* update tests to expect lowercase `sum` in query plans

This was changed upstream.

Ref: apache/datafusion#10831

* update ScalarType data_type map

* add docs dependency pickleshare

* re-implement count_star

* lint: ruff python lint

* lint: rust cargo fmt

* include name of window function in error for find_window_fn

* refactor `find_window_fn` for debug clarity

* search default aggregate functions by both name and aliases

The alias list no longer includes the name of the function.

Ref: apache/datafusion#10658

* fix markdown in find_window_fn docs

* parameterize test_window_functions

`first_value` and `last_value` are currently failing and marked as xfail.

* add test ids to test_simple_select tests marked xfail

* update find_window_fn to search built-ins first

The behavior of `first_value` and `last_value` UDAFs currently does not match the built-in behavior.
This allowed me to remove `marks=pytest.xfail` from the window tests.

* improve first_call and last_call use of the builder API

* remove trailing todos

* fix examples/substrait.py

* chore: remove explicit aliases from functions.rs

Ref: #779

* remove `array_fn!` aliases

* remove alias rules for `expr_fn_vec!`

* remove alias rules from `expr_fn!` macro

* remove unnecessary pyo3 var-arg signatures in functions.rs

* remove pyo3 signatures that provided defaults for first_value and last_value

* parametrize test_string_functions

* test regr_ function wrappers

Closes #778
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert builtin Sum aggregate function to UDAF
2 participants