Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert BitAnd, BitOr, BitXor to UDAF #10930

Merged

Conversation

dharanad
Copy link
Contributor

@dharanad dharanad commented Jun 15, 2024

Which issue does this PR close?

Closes #10907 and is part of #8708

Rationale for this change

What changes are included in this PR?

  • moved BitAnd, BitOr, BitXor to UDAF

Are these changes tested?

Running existing test cases

cargo build
cargo test
cargo test --test sqllogictests 

Are there any user-facing changes?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jun 15, 2024
@dharanad dharanad changed the title remove bit and or xor from expr Convert BitAnd, BitOr, BitXor to UDAF Jun 15, 2024
@github-actions github-actions bot added the physical-expr Physical Expressions label Jun 15, 2024
@dharanad dharanad marked this pull request as ready for review June 16, 2024 16:31
…o-udaf

# Conflicts:
#	datafusion/functions-aggregate/src/lib.rs
#	datafusion/physical-expr/src/aggregate/build_in.rs
#	datafusion/proto/src/physical_plan/to_proto.rs
@dharanad
Copy link
Contributor Author

@jayzhan211 / @alamb can you please help me with a review here.

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, remember to add the test in roundtrip_expr_api in datafusion/proto/tests/cases/roundtrip_logical_plan.rs

datafusion/functions-aggregate/src/bit_and_or_xor.rs Outdated Show resolved Hide resolved
datafusion/functions-aggregate/src/bit_and_or_xor.rs Outdated Show resolved Hide resolved
@dharanad
Copy link
Contributor Author

cc: @jayzhan211 I have made the changes and also updated the UT.

@dharanad dharanad requested a review from jayzhan211 June 17, 2024 08:42
true
}

fn create_groups_accumulator(&self) -> Result<Box<dyn GroupsAccumulator>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think create_groups_accumulator is not supported in new UDAF, but we should

Copy link
Contributor Author

@dharanad dharanad Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Can you help me, how to do i test groups accumulator ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment @jayzhan211

It seems like the groups accumulator code was moved from the existing function here

https://github.com/apache/datafusion/pull/10930/files#diff-9b0874795d7dc847d4137426e205de0a2e3b0b56c03ad80d55b000aa020ca132L438-L462

Are you saying we shouldn't support the groups accumulator? Or that we should and it is missing from some places?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update -- NM I see now now that @dharanad added the groups accumulator code

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @dharanad -- this is so cool to see

true
}

fn create_groups_accumulator(&self) -> Result<Box<dyn GroupsAccumulator>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment @jayzhan211

It seems like the groups accumulator code was moved from the existing function here

https://github.com/apache/datafusion/pull/10930/files#diff-9b0874795d7dc847d4137426e205de0a2e3b0b56c03ad80d55b000aa020ca132L438-L462

Are you saying we shouldn't support the groups accumulator? Or that we should and it is missing from some places?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @dharanad -- this looks good to me.

Let's wait for @jayzhan211 to give it another review as well

@jayzhan211
Copy link
Contributor

👍 Thanks @dharanad and @alamb

@jayzhan211 jayzhan211 merged commit 2daadb7 into apache:main Jun 17, 2024
23 checks passed
@dharanad
Copy link
Contributor Author

Thank you very much @jayzhan211 and @alamb for your help and time.

@alamb
Copy link
Contributor

alamb commented Jun 17, 2024

Thank you for the great initial contribution @dharanad

@dharanad dharanad deleted the dharanad/convert-bit-and-or-xor-to-udaf branch June 17, 2024 15:46
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* remove bit and or xor from expr

* remove bit and or xor from physical expr and proto

* add proto regen changes

* impl BitAnd, BitOr, BitXor UADF

* add support for float

* removing support for float

* refactor helper macros

* clippy'fy

* simplify Bitwise operation

* add documentation

* formatting

* fix lint issue

* remove XorDistinct

* update roundtrip_expr_api test

* linting

* support groups accumulator
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this pull request Jul 24, 2024
The functions now take a single expression instead of a Vec<_>.

Ref: apache/datafusion#10930
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this pull request Jul 24, 2024
The functions now take a single expression instead of a Vec<_>.

Ref: apache/datafusion#10930
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this pull request Jul 25, 2024
The functions now take a single expression instead of a Vec<_>.

Ref: apache/datafusion#10930
andygrove pushed a commit to apache/datafusion-python that referenced this pull request Jul 31, 2024
* chore: update datafusion deps

* feat: impl ExecutionPlan::static_name() for DatasetExec

This required trait method was added upstream [0] and recommends to simply forward to `static_name`.

[0]: apache/datafusion#10266

* feat: update first_value and last_value wrappers.

Upstream signatures were changed for the new new `AggregateBuilder` api [0].

This simply gets the code to work. We should better incorporate that API into `datafusion-python`.

[0] apache/datafusion#10560

* migrate count to UDAF

Builtin Count was removed upstream.

TBD whether we want to re-implement `count_star` with new API.

Ref: apache/datafusion#10893

* migrate approx_percentile_cont, approx_distinct, and approx_median to UDAF

Ref: approx_distinct apache/datafusion#10851
Ref: approx_median apache/datafusion#10840
Ref: approx_percentile_cont and _with_weight apache/datafusion#10917

* migrate avg to UDAF

Ref: apache/datafusion#10964

* migrage corr to UDAF

Ref: apache/datafusion#10884

* migrate grouping to UDAF

Ref: apache/datafusion#10906

* add alias `mean` for UDAF `avg`

* migrate stddev to UDAF

Ref: apache/datafusion#10827

* remove rust alias for stddev

The python wrapper now provides stddev_samp alias.

* migrage var_pop to UDAF

Ref: apache/datafusion#10836

* migrate regr_* functions to UDAF

Ref: apache/datafusion#10898

* migrate bitwise functions to UDAF

The functions now take a single expression instead of a Vec<_>.

Ref: apache/datafusion#10930

* add missing variants for ScalarValue with todo

* fix typo in approx_percentile_cont

* add distinct arg to count

* comment out failing test

`approx_percentile_cont` is now returning a DoubleArray instead of an IntArray.

This may be a bug upstream; it requires further investigation.

* update tests to expect lowercase `sum` in query plans

This was changed upstream.

Ref: apache/datafusion#10831

* update ScalarType data_type map

* add docs dependency pickleshare

* re-implement count_star

* lint: ruff python lint

* lint: rust cargo fmt

* include name of window function in error for find_window_fn

* refactor `find_window_fn` for debug clarity

* search default aggregate functions by both name and aliases

The alias list no longer includes the name of the function.

Ref: apache/datafusion#10658

* fix markdown in find_window_fn docs

* parameterize test_window_functions

`first_value` and `last_value` are currently failing and marked as xfail.

* add test ids to test_simple_select tests marked xfail

* update find_window_fn to search built-ins first

The behavior of `first_value` and `last_value` UDAFs currently does not match the built-in behavior.
This allowed me to remove `marks=pytest.xfail` from the window tests.

* improve first_call and last_call use of the builder API

* remove trailing todos

* fix examples/substrait.py

* chore: remove explicit aliases from functions.rs

Ref: #779

* remove `array_fn!` aliases

* remove alias rules for `expr_fn_vec!`

* remove alias rules from `expr_fn!` macro

* remove unnecessary pyo3 var-arg signatures in functions.rs

* remove pyo3 signatures that provided defaults for first_value and last_value

* parametrize test_string_functions

* test regr_ function wrappers

Closes #778
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert BitAnd, BitOr, BitXor to UDAF
3 participants