-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Demonstrate what a function package might look like -- encoding expressions #8046
Conversation
b3e25be
to
c441a0d
Compare
@@ -710,30 +704,6 @@ impl BuiltinScalarFunction { | |||
BuiltinScalarFunction::Digest => { | |||
utf8_or_binary_to_binary_type(&input_expr_types[0], "digest") | |||
} | |||
BuiltinScalarFunction::Encode => Ok(match input_expr_types[0] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This metadata information about the functions is now moved into functions/encoding.rs
module, along side its implementation
datafusion/expr/src/udf.rs
Outdated
} | ||
} | ||
|
||
/// Convenience trait for implementing ScalarUDF. See [`ScalarUDF::from_impl()`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This echo's the trait that @2010YOUY01 proposed in ) #7752, but does so in a way that is backwards compatible (makes a ScalarUDF out of the trait, to retain backwards compatibly)
|
||
pub(super) struct EncodeFunc {} | ||
|
||
static ENCODE_SIGNATURE: OnceLock<Signature> = OnceLock::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is what encode
and decode
look like using the ScalarUDF
API -- I think they are much clearer when all this type information is in one place (though I still kept it separate from the implementation to show the implementation did not change at all)
use std::collections::HashMap; | ||
use std::sync::Arc; | ||
|
||
/// Registers the `encode` and `decode` functions with the function registry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the conditional registration of these functions based on feature flag -- there are probably nicer ways to do this but I don't think it is any worse than the current solution.
datafusion/functions/src/lib.rs
Outdated
|
||
/// Registers all "built in" functions from this crate with the provided registry | ||
pub fn register_all(registry: &mut HashMap<String, Arc<ScalarUDF>>) { | ||
encoding::register(registry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I envision extending this list with other packages over time.
@2010YOUY01 and @viirya I wonder if you have any thoughts on this approach / proposal? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this looks great. I have several questions/suggestions:
-
to support the expr_fns encode and decode, I think we will need a Expr::ScalarFunction call or something that can take a function by name rather than fully resolved function
Now constructing an Expr
for built-in functions is stateless (does not require context), so it's tricky to be backwards compatible for Expr
API, the best solution I can think of is to also support initializing a UDF Expr
with only name string, and resolve them during logical plan optimization.
-
Extract registration functions from SessionContext into their own trait / consolidate the function registry code rather than passing
around a set of HahsMaps.... And make a way to actually modify them
It's a good idea to pack 3 HashMaps for scalar/aggr/window UDFs into a new struct like FunctionRegistry
👍🏼
datafusion/functions/src/lib.rs
Outdated
pub mod utils; | ||
|
||
/// Registers all "built in" functions from this crate with the provided registry | ||
pub fn register_all(registry: &mut HashMap<String, Arc<ScalarUDF>>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should support registering a single function here, there might be a use case that the user wants to override only one function from a function package
(possibly by changing this interface to something like
pub fn register_all() {
register_package(encoding::all_functions());
register_function(my_encoding::decode()); // override a method in default function package
}
datafusion/expr/src/udf.rs
Outdated
/// Returns this function's name | ||
pub fn name(&self) -> &str { | ||
&self.name | ||
} | ||
/// Returns this function's signature | ||
pub fn signature(&self) -> &Signature { | ||
&self.signature | ||
} | ||
/// return the return type of this function given the types of the arguments | ||
pub fn return_type(&self, args: &[DataType]) -> Result<DataType> { | ||
// Old API returns an Arc of the datatype for some reason | ||
let res = (self.return_type)(args)?; | ||
Ok(res.as_ref().clone()) | ||
} | ||
/// return the implementation of this function | ||
pub fn fun(&self) -> &ScalarFunctionImplementation { | ||
&self.fun | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that the case this set of interfaces is internal-faced for execution, we might extend it during separating function packages?
And trait FunctionImplementation
is the user-faced API for defining functions in separate crates
Yes, I agree this approach is the best I can come up with.
👍 Unfortunately that name is already taken :) Maybe |
@@ -34,6 +34,17 @@ pub trait FunctionRegistry { | |||
|
|||
/// Returns a reference to the udwf named `name`. | |||
fn udwf(&self, name: &str) -> Result<Arc<WindowUDF>>; | |||
|
|||
/// Registers a new `ScalarUDF`, returning any previously registered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a new proposed API -- to allow registering new scalar UDFs with a FunctionRegistry.
@@ -1228,30 +1229,4 @@ mod test { | |||
unreachable!(); | |||
} | |||
} | |||
|
|||
#[test] | |||
fn encode_function_definitions() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these tests add a lot -- they simply encode the signature again. This is also covered by actually calling encode()
via the expr API / dataframe tests which is done.
pub mod expr_fn { | ||
use super::*; | ||
/// Return encode(arg) | ||
pub fn encode(args: Vec<Expr>) -> Expr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here are the new expr_fn implementations
Which issue does this PR close?
Builds on #8039
Demonstrates what #8045 might look like
Rationale for this change
This PR demonstrates what a function package API might look like by removing encoding expressions encode/decode from
BuiltInScalarFunction
enum and adding it in a separate crate (datafusion-functions
)What changes are included in this PR?
FunctionImplementation
trait and integration intoScalarUDF
to make it easier to write ScalarUDFs;datafusion-functions
crate that has the implementation ofencode
anddecode
.SessionState::new()
, similarly to the automatically registeredListingTable
sOpen Questions:
expr_fn
s encode and decode, I think we will need aExpr::ScalarFunction
call or something that can take a function by name rather than fully resolved functionaround a set of HahsMaps.... And make a way to actually modify them
Are these changes tested?
Are there any user-facing changes?