ScalarUDF: Remove `supports_zero_argument` and avoid creating null array for empty args #10205

jayzhan211 · 2024-04-24T00:41:41Z

Is your feature request related to a problem or challenge?

File an issue to discuss about design in #10193, since it is no longer a minor change

Previously, we always provided a null array if the function supports zero args, like random(), pi(), make_array().
I think the additional null array should not provided for all the function that supports 0 args, instead handle case by case in each function. It turns out the null array is not useful either. random, pi, and uuid takes the number of rows instead of the actual null array.

We need to design an alternative way to communicate the number of rows to the function.

Proposal 1:
Add support_randomness -> bool to ScalarUDFImpl, and we provide the number of rows as the first argument for invoke.

Proposal 2:
We always provide the number of rows to invoke.

Change from

fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue>;

to

/// batch_rows is the number of rows in each batch
fn invoke(&self, args: &[ColumnarValue], batch_rows: usize) -> Result<ColumnarValue>;

fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
        let inputs = self
            .args
            .iter()
            .map(|e| e.evaluate(batch))
            .collect::<Result<Vec<_>>>()?;

        // evaluate the function
        match self.fun {
            ScalarFunctionDefinition::UDF(ref fun) => fun.invoke(&inputs, batch.num_rows()),
            ScalarFunctionDefinition::Name(_) => {
                internal_err!(
                    "Name function must be resolved to one of the other variants prior to physical planning"
                )
            }
        }
    }

It is more aggressive and breaks the signature, but batch is part of the evaluate(), provide information about batch and args to invoke() makes sense to me.

Proposol 3 from alamb
Introduce another function with batch rows only.

fn invoke_no_args(number_rows: usize) -> {
  not_yet_impl_err!
}

#10205 (comment)

We need support_randomness -> bool, so we know to switch to invoke_no_args.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

I decide to go for 3, since there are already over 100 invoke() used, not worth to break the signature

The text was updated successfully, but these errors were encountered:

jayzhan211 added the enhancement New feature or request label Apr 24, 2024

jayzhan211 mentioned this issue Apr 24, 2024

ScalarUDF: Remove supports_zero_argument and avoid creating null array for empty args #10193

Merged

jayzhan211 closed this as completed in #10193 Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ScalarUDF: Remove `supports_zero_argument` and avoid creating null array for empty args #10205

ScalarUDF: Remove `supports_zero_argument` and avoid creating null array for empty args #10205

jayzhan211 commented Apr 24, 2024 •

edited

Loading

ScalarUDF: Remove supports_zero_argument and avoid creating null array for empty args #10205

ScalarUDF: Remove supports_zero_argument and avoid creating null array for empty args #10205

Comments

jayzhan211 commented Apr 24, 2024 • edited Loading

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

ScalarUDF: Remove `supports_zero_argument` and avoid creating null array for empty args #10205

ScalarUDF: Remove `supports_zero_argument` and avoid creating null array for empty args #10205

jayzhan211 commented Apr 24, 2024 •

edited

Loading