Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move sql_compound_identifier_to_expr to ExprPlanner #11487

Merged
merged 20 commits into from
Jul 21, 2024

Conversation

dharanad
Copy link
Contributor

@dharanad dharanad commented Jul 16, 2024

Which issue does this PR close?

Part of #11473

Rationale for this change

Part of #11207 moving the planning of compound_identifier to ExprPlanner, this change does not include planning for compound_identifier in outer query

What changes are included in this PR?

  • Moved the planning of compound expression to ExprPlanner
  • Moved testcases in optimizer crate dependent on functions to datafusion/core/tests/optimizer_integration.rs.
  • Added expression planner field to MockContextProvider to fix the breaking testcases.

Are these changes tested?

Existing test cases

cargo test

Are there any user-facing changes?

No

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions labels Jul 16, 2024
@@ -135,16 +138,17 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
let nested_name = nested_names[0].to_string();

let col = Expr::Column(Column::from((qualifier, field)));
if let Some(udf) =
self.context_provider.get_function_meta("get_field")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder could we plan the whole compound identifier, the benefit of plan_* function is able to customize the operator (in this case is compound identifier i.e. a.b.c) and expressions we get.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally i wanted to do that same, but this function uses IdentNormalizer field of SqlToRel struct. I am just thinking a way around & wanted to discuss the same here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the result of self.normalizer.normalize() as the planner arguments, similar to the spirit of others function that we take the result of sql_expr_to_logical_expr as the planner arguments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this revision ? Do you think we aligned here ? There is still more work to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fn sql_compound_identifier_to_expr does many things, handles the logic for variable, compund indentifier & compund indentifier in outer query. In this change we are only supporting compund indentifier & not adding support compund indentifier in outer query.

@github-actions github-actions bot added the optimizer Optimizer rules label Jul 16, 2024
@dharanad dharanad changed the title Move GetField to ExprPlanner Move sql_compound_identifier_to_expr to ExprPlanner Jul 17, 2024
@dharanad dharanad marked this pull request as ready for review July 17, 2024 17:31
@dharanad dharanad requested a review from jayzhan211 July 17, 2024 17:38
@dharanad
Copy link
Contributor Author

Weired that checks are failing, same are passing for me locally. Need to investigate

@github-actions github-actions bot added the core Core DataFusion crate label Jul 18, 2024
@dharanad dharanad requested a review from jayzhan211 July 18, 2024 06:31
@dharanad
Copy link
Contributor Author

@jayzhan211 I apologize for any confusion caused by submitting multiple reviews. I was mistaken in believing I had the latest changes from the main branch. This revision incorporates the most recent updates.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @dharanad and @jayzhan211 -- I think this PR is looking close. I had a few suggestions but I think it is quite close

It will be great to finally complete the user defined extension planner epic 🙏

datafusion-cli/Cargo.lock Outdated Show resolved Hide resolved
datafusion/expr/src/planner.rs Outdated Show resolved Hide resolved
datafusion/expr/src/planner.rs Outdated Show resolved Hide resolved
datafusion/sql/src/expr/identifier.rs Show resolved Hide resolved
datafusion/expr/src/planner.rs Outdated Show resolved Hide resolved
@dharanad dharanad requested a review from alamb July 19, 2024 19:53
@@ -115,6 +117,139 @@ fn concat_ws_literals() -> Result<()> {
Ok(())
}

#[test]
fn anti_join_with_join_filter() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these tests moved? Does that mean we can't plan SQL with identifiers like test.col_int32 without an Expr planner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the earlier implementation, I also moved plannig of basic SQL like column references to expr planner, we had to move these test cases from optimizer crate to core to avoid optimizer crate dependecy on functions.

Since we have moved that out of planner, ideally i should moved the testcases back. My bad, i missed them. Thanks for catching this. I have fixed this.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dharanad -- other than the one question about why the tests were moved, I think this PR is looking good to me.

@github-actions github-actions bot removed optimizer Optimizer rules core Core DataFusion crate labels Jul 20, 2024
@dharanad dharanad requested a review from alamb July 20, 2024 11:48
Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dharanad and @jayzhan211 -- I think this one looks really nice.

@alamb alamb merged commit 36660fe into apache:main Jul 21, 2024
24 checks passed
Lordworms pushed a commit to Lordworms/arrow-datafusion that referenced this pull request Jul 23, 2024
* move get_field to expr planner

* formatting

* formatting

* documentation

* refactor

* documentation & fix's

* move optimizer tests to core

* fix breaking tc's

* cleanup

* fix examples

* formatting

* rm datafusion-functions from optimizer

* update compound identifier

* update planner

* update planner

* formatting

* reverting optimizer tests

* formatting
wiedld pushed a commit to influxdata/arrow-datafusion that referenced this pull request Jul 31, 2024
* move get_field to expr planner

* formatting

* formatting

* documentation

* refactor

* documentation & fix's

* move optimizer tests to core

* fix breaking tc's

* cleanup

* fix examples

* formatting

* rm datafusion-functions from optimizer

* update compound identifier

* update planner

* update planner

* formatting

* reverting optimizer tests

* formatting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants