Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support computing ScalarValue return types from argument values (e.g. constants) rather than just types #8624

Closed
alamb opened this issue Dec 22, 2023 · 4 comments · Fixed by #8985

Comments

@alamb
Copy link
Contributor

alamb commented Dec 22, 2023

Def one UDF:

my_udf(expr, 'return_type')

Using:

case1:

my_duf(col, 'UInt32')

Return Type is:
DataType::UInt32

case2:

my_duf(col, 'UInt64')

Return Type is:
DataType::UInt64

At present, the above udf cannot be realized through the ReturnTypeFunction. This is because the ReturnTypeFunction provides only the data types of the input parameters.

As we discussed on #7657 (comment) arrow_cast does this but it is handled specially. This ticket tracks adding UDF support for it

Originally posted by @JasonLi-cn in #7657

@alamb
Copy link
Contributor Author

alamb commented Jan 1, 2024

I think we can probably work on this now that #8568 is merged

@yyy1000
Copy link
Contributor

yyy1000 commented Jan 24, 2024

Since my first PR is close to be merged. I'd like to work on this as a next. 😀

@alamb
Copy link
Contributor Author

alamb commented Jan 24, 2024

Thanks @yyy1000 -- FYI aware this is likely to be an exercise in careful API design

Perhaps you could look into extending ScalarFunctionImpl https://github.com/apache/arrow-datafusion/blob/bc0ba6a724aaaf312b770451718cbce696de6640/datafusion/expr/src/udf.rs#L237 in backwards compatible way

For example:

trait ScalarUDFImpl { 
...

    /// What [`DataType`] will be returned by this function, given the types of
    /// the expr arguments
    fn return_type_from_exprs(&self, arg_exprs: &[Expr], schema: &SchemaRef) -> Result<DataType> {
     // provide default implementation that calls `self.return_type()`
     // so that people don't have to implement `return_type_from_exprs` if they dont want to
    }
 ...
}

An exellent test of the new API would be to see if you can implement arrow_cast as a ScalarUDF we can put it into datafusion-functions after #8705 is merged

@yyy1000
Copy link
Contributor

yyy1000 commented Jan 24, 2024

I created a PR in #8985 that wants to fix this. ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants