Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add field trait method to WindowUDFImpl #12374

Open
wants to merge 54 commits into
base: main
Choose a base branch
from

Conversation

jcsherin
Copy link
Contributor

@jcsherin jcsherin commented Sep 7, 2024

Which issue does this PR close?

Closes #12373.

Rationale for this change

The result field from evaluating the user-defined window function is composed from the return_type and nullable trait methods in WindowUDFImpl.

This change explores folding both methods into a single trait method. The user-defined window functions have to implement only the field trait method which makes the intent more explicit.

The current implementation for a user-defined window function (without field trait method) looks like this:

impl WindowUDFImpl for RowNumber {
    fn return_type(&self, _arg_types: &[DataType]) -> Result<DataType> {
        Ok(DataType::UInt64)
    }

    fn nullable(&self) -> bool {
        true
    }
}

The implementation for a user-defined window function after this change:

impl WindowUDFImpl for RowNumber {
    fn field(&self, field_args: WindowUDFFieldArgs) -> Result<Field> {
        Ok(Field::new(
             field_args.name(), /* window function display name */
             DataType::UInt64,  /* result data type */
             false              /* row number is not nullable */
       ))
    }
}

What changes are included in this PR?

  1. Add field trait method:
fn field(&self, field_args: WindowUDFFieldArgs) -> Result<Field>
  1. Remove return_type trait method.
    /// What [`DataType`] will be returned by this function, given the types of
    /// the arguments
    fn return_type(&self, arg_types: &[DataType]) -> Result<DataType>;
  2. Remove nullable trait method which was added in Convert built-in row_number to user-defined window function #12030.
  3. Add WindowUDFFieldArgs:
/// Contains metadata necessary for defining the field which represents
/// the final result of evaluating a user-defined window function.
pub struct WindowUDFFieldArgs<'a> {
    /// The data types of input expressions to the user-defined window
    /// function.
    input_types: &'a [DataType],
    /// The display name of the user-defined window function.
    function_name: &'a str,
}

Are these changes tested?

Yes, against existing tests in CI.

Are there any user-facing changes?

Yes, this is a breaking change for user-defined window functions API.

datafusion/expr/src/udwf.rs Outdated Show resolved Hide resolved
datafusion/expr/src/udwf.rs Outdated Show resolved Hide resolved
datafusion/expr/src/expr.rs Outdated Show resolved Hide resolved
@jcsherin
Copy link
Contributor Author

jcsherin commented Sep 7, 2024

I lack permission to apply the api-change label to this PR.

.iter()
.map(|e| e.get_type(input_schema))
.collect::<Result<Vec<_>>>()?;
let input_types = data_types_with_window_udf(&data_types, udwf)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should handle this compute in to_field. data_types_with_window_udf could reuse to_field. data_types and nullable could reuse data_types_with_window_udf

.map(|e| e.get_type(input_schema))
.collect::<Result<Vec<_>>>()?;
let input_types = data_types_with_window_udf(&data_types, udwf)?;
let function_name = self.schema_name().to_string();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to self.qualified_name() in to_field, we might have different name for Column and Alias?

@jcsherin jcsherin marked this pull request as draft September 7, 2024 21:32
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Sep 17, 2024
@github-actions github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Sep 17, 2024
Comment on lines +458 to +472
impl Expr {
/// Common method for window functions that applies type coercion
/// to all arguments of the window function to check if it matches
/// its signature.
///
/// If successful, this method returns the data type and
/// nullability of the window function's result.
///
/// Otherwise, returns an error if there's a type mismatch between
/// the window function's signature and the provided arguments.
fn data_type_and_nullable_with_window_function(
&self,
schema: &dyn ExprSchema,
window_function: &WindowFunction,
) -> Result<(DataType, bool)> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted a common method to handle type coercion for all window function types (built-in, udaf and udwf) which is then reused by methods:,

  • data_type_and_nullable,
  • get_type and,
  • nullable

Comment on lines -162 to -168
/// Return the type of the function given its input types
///
/// See [`WindowUDFImpl::return_type`] for more details.
pub fn return_type(&self, args: &[DataType]) -> Result<DataType> {
self.inner.return_type(args)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed return_type.

Comment on lines -181 to -185
/// Returns if column values are nullable for this window function.
/// Returns the field of the final result of evaluating this window function.
///
/// See [`WindowUDFImpl::nullable`] for more details.
pub fn nullable(&self) -> bool {
self.inner.nullable()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed nullable

)
)
})?;
let (_, function_name) = self.qualified_name();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Expr::qualified_name which also handles:

  • Expr::Column and,
  • Expr::Alias

Comment on lines +719 to +721
WindowFunctionDefinition::WindowUDF(fun) => fun
.field(WindowUDFFieldArgs::new(input_expr_types, display_name))
.map(|field| field.data_type().clone()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return data type for udwf.

@jcsherin jcsherin marked this pull request as ready for review September 17, 2024 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions proto Related to proto crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add field trait method to WindowUDFImpl
2 participants