Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScalarUDF: Remove supports_zero_argument and avoid creating null array for empty args #10205

Closed
jayzhan211 opened this issue Apr 24, 2024 · 0 comments · Fixed by #10193
Closed
Labels
enhancement New feature or request

Comments

@jayzhan211
Copy link
Contributor

jayzhan211 commented Apr 24, 2024

Is your feature request related to a problem or challenge?

File an issue to discuss about design in #10193, since it is no longer a minor change

Previously, we always provided a null array if the function supports zero args, like random(), pi(), make_array().
I think the additional null array should not provided for all the function that supports 0 args, instead handle case by case in each function. It turns out the null array is not useful either. random, pi, and uuid takes the number of rows instead of the actual null array.

We need to design an alternative way to communicate the number of rows to the function.

Proposal 1:
Add support_randomness -> bool to ScalarUDFImpl, and we provide the number of rows as the first argument for invoke.

Proposal 2:
We always provide the number of rows to invoke.

Change from

fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue>;

to

/// batch_rows is the number of rows in each batch
fn invoke(&self, args: &[ColumnarValue], batch_rows: usize) -> Result<ColumnarValue>;
fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
        let inputs = self
            .args
            .iter()
            .map(|e| e.evaluate(batch))
            .collect::<Result<Vec<_>>>()?;

        // evaluate the function
        match self.fun {
            ScalarFunctionDefinition::UDF(ref fun) => fun.invoke(&inputs, batch.num_rows()),
            ScalarFunctionDefinition::Name(_) => {
                internal_err!(
                    "Name function must be resolved to one of the other variants prior to physical planning"
                )
            }
        }
    }

It is more aggressive and breaks the signature, but batch is part of the evaluate(), provide information about batch and args to invoke() makes sense to me.

Proposol 3 from alamb
Introduce another function with batch rows only.

fn invoke_no_args(number_rows: usize) -> {
  not_yet_impl_err!
}

#10205 (comment)

We need support_randomness -> bool, so we know to switch to invoke_no_args.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

I decide to go for 3, since there are already over 100 invoke() used, not worth to break the signature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
1 participant