Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScalarUDF: Remove supports_zero_argument and avoid creating null array for empty args #10193

Merged
Original file line number Diff line number Diff line change
Expand Up @@ -434,15 +434,10 @@ impl ScalarUDFImpl for RandomUDF {
}

fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue> {
let len: usize = match &args[0] {
// This udf is always invoked with zero argument so its argument
// is a null array indicating the batch size.
ColumnarValue::Array(array) if array.data_type().is_null() => array.len(),
_ => {
return Err(datafusion::error::DataFusionError::Internal(
"Invalid argument type".to_string(),
))
}
let len = if args.is_empty() {
1
} else {
return internal_err!("Invalid argument type");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to have the test match the rest of the functions for consistency

Suggested change
return internal_err!("Invalid argument type");
return exec_err!("Expect function to take no parameters");

};
let mut rng = thread_rng();
let values = iter::repeat_with(|| rng.gen_range(0.1..1.0)).take(len);
Expand Down
7 changes: 4 additions & 3 deletions datafusion/functions/src/math/pi.rs
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,10 @@ impl ScalarUDFImpl for PiFunc {
}

fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue> {
if !matches!(&args[0], ColumnarValue::Array(_)) {
return exec_err!("Expect pi function to take no param");
}
if !args.is_empty() {
return exec_err!("Expect {} function to take no param", self.name());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return exec_err!("Expect {} function to take no param", self.name());
return exec_err!("Expect {} function to take no parameters", self.name());

};

let array = Float64Array::from_value(std::f64::consts::PI, 1);
Ok(ColumnarValue::Array(Arc::new(array)))
}
Expand Down
19 changes: 7 additions & 12 deletions datafusion/functions/src/math/random.rs
Original file line number Diff line number Diff line change
Expand Up @@ -69,12 +69,13 @@ impl ScalarUDFImpl for RandomFunc {
}
}

/// Random SQL function
fn random(args: &[ColumnarValue]) -> Result<ColumnarValue> {
let len: usize = match &args[0] {
ColumnarValue::Array(array) => array.len(),
_ => return exec_err!("Expect random function to take no param"),
let len = if args.is_empty() {
1
} else {
return exec_err!("Expect random function to take no param");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be more consistent with the other changes in this PR:

Suggested change
return exec_err!("Expect random function to take no param");
return exec_err!("Expect {} function to take no parameters", self.name());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this before, and I noticed that random is used in test so I need to move it out as an independent function, but adding name as another argument looks so weird, so I keep it like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can even return ScalarValue::F64 instead of Array. Let me fix these too

};

let mut rng = thread_rng();
let values = iter::repeat_with(|| rng.gen_range(0.0..1.0)).take(len);
let array = Float64Array::from_iter_values(values);
Expand All @@ -83,19 +84,13 @@ fn random(args: &[ColumnarValue]) -> Result<ColumnarValue> {

#[cfg(test)]
mod test {
use std::sync::Arc;

use arrow::array::NullArray;

use datafusion_common::cast::as_float64_array;
use datafusion_expr::ColumnarValue;

use crate::math::random::random;
use super::*;

#[test]
fn test_random_expression() {
let args = vec![ColumnarValue::Array(Arc::new(NullArray::new(1)))];
let array = random(&args)
let array = random(&[])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this test, since it is already covered

SELECT
random() BETWEEN 0.0 AND 1.0,
random() = random()

.expect("failed to initialize function random")
.into_array(1)
.expect("Failed to convert to array");
Expand Down
7 changes: 4 additions & 3 deletions datafusion/functions/src/string/uuid.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,10 @@ impl ScalarUDFImpl for UuidFunc {
/// Prints random (v4) uuid values per row
/// uuid() = 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11'
fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue> {
let len: usize = match &args[0] {
ColumnarValue::Array(array) => array.len(),
_ => return exec_err!("Expect uuid function to take no param"),
let len = if args.is_empty() {
1
} else {
return exec_err!("Expect {} function to take no param", self.name());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return exec_err!("Expect {} function to take no param", self.name());
return exec_err!("Expect {} function to take no parameters", self.name());

};

let values = iter::repeat_with(|| Uuid::new_v4().to_string()).take(len);
Expand Down
20 changes: 5 additions & 15 deletions datafusion/physical-expr/src/scalar_function.rs
Original file line number Diff line number Diff line change
Expand Up @@ -142,21 +142,11 @@ impl PhysicalExpr for ScalarFunctionExpr {
}

fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
// evaluate the arguments, if there are no arguments we'll instead pass in a null array
// indicating the batch size (as a convention)
let inputs = match self.args.is_empty() {
// If the function supports zero argument, we pass in a null array indicating the batch size.
// This is for user-defined functions.
// MakeArray support zero argument but has the different behavior from the array with one null.
alamb marked this conversation as resolved.
Show resolved Hide resolved
true if self.supports_zero_argument && self.name != "make_array" => {
vec![ColumnarValue::create_null_array(batch.num_rows())]
}
_ => self
.args
.iter()
.map(|e| e.evaluate(batch))
.collect::<Result<Vec<_>>>()?,
};
let inputs = self
.args
.iter()
.map(|e| e.evaluate(batch))
.collect::<Result<Vec<_>>>()?;

// evaluate the function
match self.fun {
Expand Down