Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC / Prototype user defined sql planner might look like #11168

Closed
wants to merge 1 commit into from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jun 28, 2024

I don't intend to merge this PR.

Which issue does this PR close?

Prototype of what the API described in #10534 might look like

Inspired by #11155 from @jayzhan211 and @samuelcolvin on #11137 there

Rationale for this change

I feel like our sql planner needs some way for it to be extended so that we can plan sql queries with operators that are not hard coded into the core engine. There are currently two different approaches

What changes are included in this PR?

Prototype out what a "user defined planner" might look like

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the sql SQL Planner label Jun 28, 2024
@@ -236,11 +237,28 @@ impl PlannerContext {
}
}

/// This trait allows users to customize the behavior of the SQL planner
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to pull the custom logic for planning binary operators into a trait -- then we can extend the sql planner by providing trait instances

I think this is beneficial because:

  1. It avoids having a DataFusion Operator enum that only exists to be transformed later
  2. It avoids the planner having hard coded assumptions about function names (like "array_append")

I think this kind of approach is what might close #10534

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this seems like a fair approach.

It's worth noting that we could do some of this with #11137 - e.g. we could move the array stuff into a ParseCustomOperator, I don't really mind which approach we go with provided the flexibility is there.

@@ -1017,6 +979,71 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
}
}

pub struct ArrayFunctionPlanner {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is that this ArrayFunctionPlanner would be in the array_functions crate and thus avoid the implicit dependency on names of functions

Copy link
Contributor

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise LGTM

SqlToRel {
context_provider,
options,
normalizer: IdentNormalizer::new(normalize),
planners: vec![],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just put the array_planner in the vec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that I eventually wanted to move the definition into a different crate datafusion-functions-array

@@ -236,11 +237,28 @@ impl PlannerContext {
}
}

/// This trait allows users to customize the behavior of the SQL planner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this seems like a fair approach.

It's worth noting that we could do some of this with #11137 - e.g. we could move the array stuff into a ParseCustomOperator, I don't really mind which approach we go with provided the flexibility is there.

@alamb
Copy link
Contributor Author

alamb commented Jun 28, 2024

I think this seems like a fair approach.

It's worth noting that we could do some of this with #11137 - e.g. we could move the array stuff into a ParseCustomOperator, I don't really mind which approach we go with provided the flexibility is there.

Indeed -- thanks. Let's see if @jayzhan211 has any thoughts (I think it may be their night)

@jayzhan211
Copy link
Contributor

jayzhan211 commented Jun 29, 2024

It is a great idea overall! There are only two things that I think we could improve on

  1. Avoid clone like what you have mentioned, I don't have a good solution for now too.
  2. Extend it for more kinds of rewrite like FieldAccess plan_field_access

Given the current design, if we want to support plan_field_access we need to add the function in trait, and doesn't look flexible enough if there are many kinds of custom planning

        for planner in self.planners.iter() {
            if let Some(expr) =
                planner.plan_binary_op(op.clone(), left.clone(), right.clone(), schema)?
            {
                return Ok(expr);
            }
            if let Some(expr) =
                planner.plan_indices(expr....)?
            {
                return Ok(expr);
            }
            ...

        }

We could define a more general function and enum that includes all the possible ast node. I'm not sure if this is general enough since these are the only two things that I know are rewritten to function in datafusion right now.

// Node from sqlparser that are possible for rewrite
pub enum AstNode (or Operators) {
    BinaryOperator(sqlparser::ast::BinaryOperator),
    Subscript(sqlparser::ast::Subscript), // For FieldAccess rewrite
 OR GetFieldAccess(GetFieldAccess) //  We can use GetFieldAccess directly, if there is little possible that they need to customize Subscript to GetFieldAccess logic.
}

/// This trait allows users to customize the behavior of the SQL planner
pub trait UserDefinedPlanner {
    fn plan(&self, ast_node: AstNode, exprs: Vec<Expr>, _schema: &DFSchema) -> Result<Option<Expr>> {
        Ok(None)
    }
}

@jayzhan211
Copy link
Contributor

I find a way to avoid clone, returns tuple Result<(Option<PlannedNode>, Option<Expr>)> original_expr and new_expr mutually exclusive

pub struct PlannedNode {
    pub op: sqlparser::ast::BinaryOperator,
    pub left: Expr,
    pub right: Expr,
}

/// This trait allows users to customize the behavior of the SQL planner
pub trait UserDefinedPlanner {
    /// Plan the binary operation between two expressions, return None if not possible
    fn plan_binary_op(
        &self,
        planned_node: PlannedNode,
        _schema: &DFSchema,
    ) -> Result<(Option<PlannedNode>, Option<Expr>)> {
        Ok((Some(planned_node), None))
    }
}
fn build_logical_expr(
        &self,
        op: sqlparser::ast::BinaryOperator,
        left: Expr,
        right: Expr,
        schema: &DFSchema,
    ) -> Result<Expr> {
        let num_planners = self.planners.len();
        // try extension planers
        let mut planned_node = PlannedNode{op, left, right};
        for (i, planner) in self.planners.iter().enumerate() {
            match planner.plan_binary_op(planned_node, schema)? {
                (None, Some(expr)) => return Ok(expr),
                (Some(n), None) if i + 1 < num_planners => planned_node = n,
                (Some(n), None) if i + 1 == num_planners => {
                    return Ok(Expr::BinaryExpr(BinaryExpr::new(
                        Box::new(n.left),
                        self.parse_sql_binary_op(n.op)?,
                        Box::new(n.right),
                    )))
                }
                _ => unreachable!(""),
            }
        }
        unreachable!("")
    }

@alamb
Copy link
Contributor Author

alamb commented Jun 29, 2024

I find a way to avoid clone, returns tuple Result<(Option, Option)> original_expr and new_expr mutually exclusive

That is a good idea. I think we might be able to name the tuple with an explicitly enum, similar to ExprSimplifyResult which might be easier to understand

pub enum ExprSimplifyResult {
/// The function call was simplified to an entirely new Expr
Simplified(Expr),
/// the function call could not be simplified, and the arguments
/// are return unmodified.
Original(Vec<Expr>),
}

@alamb
Copy link
Contributor Author

alamb commented Jun 29, 2024

Extend it for more kinds of rewrite like FieldAccess plan_field_access

Yes, good idea

Given the current design, if we want to support plan_field_access we need to add the function in trait, and doesn't look flexible enough if there are many kinds of custom planning

I agree there is a tradeoff between the number of functions in the trait vs how general this custom planning could be

We could define a more general function and enum that includes all the possible ast node. I'm not sure if this is general enough since these are the only two things that I know are rewritten to function in datafusion right now.

Yeah, I am not sure how much more custom planning would be needed. I think part of most special syntaxes have embedded sub expressions, so SqlToRel will still need code to know how to plan such exprs.

Thus, while less flexible, I think adding a specific function for each type of SQL ast node that we permit customization of would be the easiest to understand and use

@alamb
Copy link
Contributor Author

alamb commented Jun 29, 2024

Here is a proposal for how to proceed:

  1. Merge Rewrite array @> array and array <@ array in sql_expr_to_logical_expr #11155 (so we don't have a third pattern to deal with)
  2. Make a PR with a production ready version of the API in this PR (sql planner extension)

What do you think @jayzhan211 and @samuelcolvin ? If you think this is a reasonable idea, do either if you have the time to do 2? If not, I will find the time to bash it out

@jayzhan211
Copy link
Contributor

jayzhan211 commented Jun 29, 2024

Thus, while less flexible, I think adding a specific function for each type of SQL ast node that we permit customization of would be the easiest to understand and use

I agree, we can start from this, it is easy to extend it in the future if there are more kinds of plan

Make a PR with a production ready version of the API in this PR (sql planner extension)

I can work on this

@samuelcolvin
Copy link
Contributor

This sounds like a good plan.

We (I) should also adapt/restart datafusion-contrib/datafusion-functions-json#22 to work with this API to prove it does what we need before merging.

@alamb
Copy link
Contributor Author

alamb commented Jun 29, 2024

This sounds like a good plan.

We (I) should also adapt/restart datafusion-contrib/datafusion-functions-json#22 to work with this API to prove it does what we need before merging.

I agree -- sounds like a plan. Thank you both! I will make sure to review the code from @jayzhan211 as soon as possible so we can get this done quickly

This is going to be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants