Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Feb 22, 2021

Rationale:

This is part of a larger effort, described on ARROW-11689. for making improvements to the DataFusion query optimizer easier to write and making it more efficient,.

The idea is that by splitting out the expr traversal code from the code that does the actual rewriting, we will:

  1. Reduce the amount repetitions in optimizer rules to make them easier to implement
  2. Make the actual logic clearer as it is not intertwined in code needed to do recursion
  3. Set the stage for a general PlanRewriter that doesn't have to clone its input, and can modify take their input by value and consume them.

Changes

This PR introduce a ExpressionRewriter, the mutable counterpart to ExpressionVisitor and demonstrates its usefulness by using it in the constant folding algorithm.

Note this also reduces a bunch of copies in the constant folding algorithm.

@github-actions
Copy link

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @houqp and @Dandandan -- as discussed on #9309 (comment) here is a proposal of how to rewrite expressions directly without quite so much copying.

It is part of my larger plan to make rewriting LogicalPlans easier too.

Github kind of mangled the diff in this file, but the core change is that all code for recursing Expr trees that are not relevant to the constant folding is in Expr::rewrite now and no longer in this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an improvement to me NOT NOT #b is the same as b :) I suspect something was not quite right with the recursion previously

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my original implementation did the tree traversal wrong for not expr :P It was doing a preorder traversal, which requires a convergent loop to produce #b in this case. Nice catch.

Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks great! Good boilerplate code clean up :)

Comment on lines 122 to 123
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like these manual rewrites are redundant because they should have been invoked during tree traversal before mutate was called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a great call -- I will try and remove them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, i think this rewrite is not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code gets quite a bit cleaner with this improvement @houqp - thank you for the suggestion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my original implementation did the tree traversal wrong for not expr :P It was doing a preorder traversal, which requires a convergent loop to produce #b in this case. Nice catch.

Comment on lines -295 to -298
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor, but I think we can use the pre_visit method to skip traversal for these expressions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it and I could not convince myself that this change this would gain much -- we still have to match on the Expr type so it would just move the list of the variants into another function (in a separate match) which seems to obscure the logic a bit for me

@alamb alamb force-pushed the alamb/expr_rewriter branch from 161122e to 67d35b7 Compare February 24, 2021 23:20
let right = optimize_expr(right, schemas)?;
match op {
Operator::Eq => match (&left, &right) {
impl<'a> ExprRewriter for ConstantRewriter<'a> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With some of @houqp 's comments, this rewrite pass is looking beautiful in my opinion -- it really looks like a rewrite rather than a reconstruction

@alamb alamb requested a review from andygrove February 24, 2021 23:22
@codecov-io
Copy link

Codecov Report

Merging #9545 (67d35b7) into master (5bea624) will increase coverage by 0.13%.
The diff coverage is 91.09%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9545      +/-   ##
==========================================
+ Coverage   82.25%   82.39%   +0.13%     
==========================================
  Files         244      244              
  Lines       55685    56216     +531     
==========================================
+ Hits        45806    46317     +511     
- Misses       9879     9899      +20     
Impacted Files Coverage Δ
...ion-testing/src/bin/arrow-json-integration-test.rs 0.00% <0.00%> (ø)
rust/arrow/src/ipc/writer.rs 87.23% <50.00%> (-0.59%) ⬇️
...datafusion/src/physical_plan/string_expressions.rs 77.00% <82.23%> (+7.37%) ⬆️
rust/datafusion/src/logical_plan/expr.rs 83.07% <90.10%> (+1.94%) ⬆️
rust/parquet/src/arrow/array_reader.rs 77.61% <91.30%> (-0.02%) ⬇️
rust/datafusion/src/physical_plan/functions.rs 85.52% <91.42%> (+11.69%) ⬆️
rust/datafusion/src/optimizer/constant_folding.rs 92.66% <94.11%> (-1.80%) ⬇️
rust/datafusion/tests/sql.rs 98.28% <100.00%> (+0.14%) ⬆️
rust/parquet/src/arrow/arrow_reader.rs 91.91% <100.00%> (+0.34%) ⬆️
rust/parquet/src/schema/types.rs 89.54% <100.00%> (+0.02%) ⬆️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4beb514...67d35b7. Read the comment docs.

Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, good work @alamb !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants