-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Epic] Port BuiltInFunctons to datafusion-functions-*
crates
#9285
Comments
Let me give this a try:
Update: |
I'll try to do regex_expressions parts. |
I'll try to do array_expressions parts. |
I filed #9336 which I think may be necessary before we port functions. |
work on |
hi @jayzhan211 I already work on |
Oh, sorry. I will work on other |
Take ArrayHas, ArrayHasAll, ArrayHasAny |
Take Atan, Atan2, Acosh. |
Take ArrayPopFront, ArrayPopBack, ArrayDistinct, ArrayElement |
Take ArrayDims, ArrayNdims, Cardinality, ArrayNdims |
I think this needs to wait until Take ArrayIntersect, ArrayUnion, ArrayExcept Edit: general_set_op needs make_array too Take: ArraySlice + ArrayElement |
take Atan, Acosh, Asinh, Atanh, #9872 |
`datafusion` completed an Epic that ported many of the `BuiltInFunctions` enum to `SclarUDF`. I created new macros to simplify the port, and used these macros to refactor a few existing functions. Ref: apache/datafusion#9285
* deps: upgrade datafusion to 37.1.0 * feat: re-implement SessionContext::tables The method was removed upstream but is used in many tests for `datafusion-python`. Ref: apache/datafusion#9627 * feat: upgrade dataframe write_parquet and write_json The options to write_parquet changed. write_json has a new argument that I defaulted to None. We can expose that config later. Ref: apache/datafusion#9382 * feat: impl new ExecutionPlanProperties for DatasetExec Ref: apache/datafusion#9346 * feat: add upstream variant and method params - `WindowFunction` and `AggregateFunction` have `null_treatment` options. - `ScalarValue` and `DataType` have new variants - `SchemaProvider::table` now returns a `Result` * lint: allow(deprecated) for make_scalar_function * feat: migrate functions.rs `datafusion` completed an Epic that ported many of the `BuiltInFunctions` enum to `SclarUDF`. I created new macros to simplify the port, and used these macros to refactor a few existing functions. Ref: apache/datafusion#9285 * fixme: commented out last failing test This is a bug upstream in datafusion FAILED datafusion/tests/test_functions.py::test_array_functions - pyo3_runtime.PanicException: range end index 9 out of range for slice of length 8 * chore: update Cargo.toml package info
Is your feature request related to a problem or challenge?
As part of making DataFusion even more customizable (#8045), it is valuable to let system designers mix and match different packages of functions to get the precise behavior they want (e.g. postgres style
to_date
or spark styleto_date
).To support this functionality as well as to ensure the
ScalarUDF
API exposes the full power of DataFusion, we are in the process of extracting the "built in" functions out of the core and into separate crates.This epic tracks the work to actually move the functions out of the core datafusion crate (spread through
datafusion_expr
anddatafusion-physical-expr
and into the newdatafusion-functions
/datafusion-functions-array
cratesTasks:
Here is list of many of the items necessary to complete this transition. Eventually there should be tickets for all tasks, and there are tickets for some already, but I don't want to make 100s of tickets all at once. I plan to make more as we make it through more of this project.
Anyone should feel free to make other tickets if they want to help with items below.
math_expressions
These should be located in the
datafusion-functions
crate (source link)Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/math/mod.rs
nullif
andisnan
to datafusion-functions #9216abs
todatafusion_functions
#9286ceil
,exp
,factorial
todatafusion-functions
crate #9939array_expressions
Note that given the size and specialization of these functions are put in their own subcrate,
datafusion-functions-array
datafusion-functions-array
crate and moveArrayToString
function into it #9113functions-array
#9496make_array
to datafusion-functions #9288 move make_array array_append array_prepend array_concat function to datafusion-functions-array crate #9504StringToArray
tofunction-arrays
#9497ArraySort
tofunction-arrays
subcrate #9551ArrayDistinct
tofunctions-array
subcrate #9549ArrayRepeat
tofunctions-array
subcrate #9565ArrayResize
tofunctions-array
subcrate #9570functions-array
#9615ArrayPosition
andArrayPositions
tofunctions-array
subcrate #9617array_reverse
function to datafusion-function-* crate #9630functions-array
#9629ArrayExcept
tofunctions-array
subcrate #9634ArrayRemove
,ArrayRemoveN
,ArrayRemoveAll
tofunctions-array
subcrate #9635MakeArray
: construct an array from columns (union/except depends on this)datafusion_array_function
specific rewrite rules like todatafusion_functions_array
crate #9519Core functions
These should be located in the
datafusion-functions
crate (source link)Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/core/mod.rs
core
module, extractnullif
: Movenullif
andisnan
to datafusion-functions #9216arrow_cast
todatafusion-functions
crate #9287ArrowTypeOf
: return the arrow type of a value Portarrow_typeof
to datafusion-function #9524Coalesce
: return the first non-null valueStruct
: Create a structNullIf
: return null if the two values are equalRandom
: return a random numberNanvl
: return the first non-NaN valuecrypto_expressions
These should be located in the
datafusion-functions
crate (source link)Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/crypto/mod.rs
crypto
module indatafusion/functions/src/crypto
andcrypto_expressions
feature flag, movedigest
functionstring_expressions
These should be located in the
datafusion-functions
crate (source link)Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/string/mod.rs
string
module indatafusion/functions/src/string
andstring_expressions
feature flag, moveascii
functionunicode_expressions
These should be located in the
datafusion-functions
crate (source link)Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/unicode/mod.rs
unicode
module indatafusion/functions/src/unicode
andunicode_expressions
feature flag, movecharlength
functionregex_expressions
These should be located in the
datafusion-functions
crate (source link)Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/regexp/mod.rs
datetime_expressions
These should be located in the
datafusion-functions
crate (source link)Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/datetime/mod.rs
datetime
module indatafusion/functions/src/datetime
anddatetime_expressions
feature flag, movedate_part
Infrastructure
Describe alternatives you've considered
No response
Additional context
The organization was discussed in #9100
The text was updated successfully, but these errors were encountered: