-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Expr
creation for ScalarUDF
: Resolve function calls by name during planning
#8157
Comments
Thanks @2010YOUY01 !
What if we changed how enum Expr {
...
/// Represents the call of a built-in, or UDF scalar function with a set of arguments.
ScalarFunction(ScalarFunction),
...
} Instead of #[derive(Clone, PartialEq, Eq, Hash, Debug)]
pub struct ScalarFunction {
/// The function
pub fun: built_in_function::BuiltinScalarFunction,
/// List of expressions to feed to the functions as arguments
pub args: Vec<Expr>,
} Make it look like pub enum ScalarFunctionDefinition {
/// Resolved to a built in scalar function
/// (will be removed long term)
BuiltIn(built_in_function::BuiltinScalarFunction),
/// Resolved to a user defined function
UDF(ScalarUDF),
/// A scalar function that will be called by name
Name(Arc<str>),
}
#[derive(Clone, PartialEq, Eq, Hash, Debug)]
pub struct ScalarFunction {
/// The function
pub fun: ScalarFunctionDefinition,
/// List of expressions to feed to the functions as arguments
pub args: Vec<Expr>,
} And that way an expr function like fn abs(arg: Expr) -> Expr {
Expr::ScaarFunction {
fun: ScalarFunctionDefintion::Name("abs".into()),
args: vec![arg],
}
} I am not sure how large of a change this would be -- we would have to try it and see what it looked like. |
I think passing a |
@2010YOUY01 I wonder if you still plan to work on this item? If not, no worries I can do it, but I wanted to check with you before doing so |
Please feel free to proceed if you would like it to get done sooner, thank you @alamb 🙏🏼 |
Expr
creation for ScalarUDF
Expr
creation for ScalarUDF
: Resolve function calls by name during planning
@alamb I checked out and on main the structs and enums are already written as you recommended pub enum ScalarFunctionDefinition {
/// Resolved to a built in scalar function
/// (will be removed long term)
BuiltIn(built_in_function::BuiltinScalarFunction),
/// Resolved to a user defined function
UDF(ScalarUDF),
/// A scalar function that will be called by name
Name(Arc<str>),
}
#[derive(Clone, PartialEq, Eq, Hash, Debug)]
pub struct ScalarFunction {
/// The function
pub fun: ScalarFunctionDefinition,
/// List of expressions to feed to the functions as arguments
pub args: Vec<Expr>,
} So I am not sure I understood correctly. I could see that /// Calls a named built in function
/// ```
/// use datafusion_expr::{col, lit, call_fn};
///
/// // create the expression sin(x) < 0.2
/// let expr = call_fn("sin", vec![col("x")]).unwrap().lt(lit(0.2));
/// ```
pub fn call_fn(name: impl AsRef<str>, args: Vec<Expr>) -> Result<Expr> {
match name.as_ref().parse::<BuiltinScalarFunction>() {
Ok(fun) => Ok(Expr::ScalarFunction(ScalarFunction::new(fun, args))),
Err(e) => Err(e),
}
} |
I think the idea would be to make it be something like this /// Calls a named function
/// ```
/// use datafusion_expr::{col, lit, call_fn};
///
/// // create the expression sin(x) < 0.2
/// let expr = call_fn("sin", vec![col("x")]).unwrap().lt(lit(0.2));
/// ```
pub fn call_fn(name: impl AsRef<str>, args: Vec<Expr>) -> Result<Expr> {
Ok(Expr::ScalarFunction(ScalarFunctionDefinition::Name(Arc::new(name.as_ref()))
}```
And then add code to replace instances of `ScalarFunctionDefinition::Name` with either `ScalarFunctionDefinition::BuiltIn` or `ScalarFunction::Udf` to the optimizer.
Perhaps you can use the same logic here: https://github.com/apache/arrow-datafusion/blob/d9d8ddd5f770817f325190c4c0cc02436e7777e6/datafusion/sql/src/expr/function.rs#L66-L76
BTW the point of doing this is so we can remove `BuiltInScalarFunction` eventually incrementally and treat all functions the same |
Update here is that @edmondop tried to implement what is suggested in this issue (❤️ ) and we got a bit hung up on being able to keep the expr_api backwards compatible. See #8447 (comment) I took another shot at trying to implement So therefore I think we might be able to close this issue as "won't fix" -- what do you think @edmondop and @2010YOUY01 ? If we do do that, we should consider what to do with the existing Name function defintion |
How does this affect the effort of having a single API for scalar functions, removing the separation between BuiltIn and UDF? |
I think it offers a way to still have a single API for scalar functions without having to do String --> ScalarUDF resolution as an analysis pass. It also has the nice property that the |
Is your feature request related to a problem or challenge?
Motivation
There is ongoing work migrating
BuitlinScalarFunction
s ->ScalarUDF
#8045, and we noticed oneExpr
related issue during prototyping:We can use
Expr
API to create builtin scalar functions directly:We want to still use this API after functions are migrated to
ScalarUDF
based implementation, it's not possible now because thoseExpr
creations are stateless, andScalarUDF
s are registered insideSessionState
. It's only possible if we doDescribe the solution you'd like
To continue supporting the original stateless
Expr
creation API, we can createExpr
with only the string function name, and resolve the function during logical optimization.Then
call_fn("abs", vec![lit(-1)])
can be supported (now it only supportBuitlinScalarFunction
and don't support UDFs)Another macro-based API
abs(lit(-1))
can be supported if we hard code all possible function names within the core (should we do that?)Potential implementation is:
let expr2 = call_fn("abs", vec![lit(-1)]);
, create aScalarUDF
expression with dummy implementation.AnalyzerRule
(a mandatory logical optimizer rule) to resolve this UDF name using external functions registered inSessionState
Issue:
Now function implementation is inside
SessionState
but outsideSessionConfig
, and the analyzer can only accessSessionConfig
.We have to move the function registry into
SessionConfig
first (or is there any better way?)cc @alamb I wonder if you have any thoughts on this approach 🤔
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: