-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-11710: [Rust][DataFusion] Implement ExpressionRewriter #9545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a82ddd5 to
161122e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @houqp and @Dandandan -- as discussed on #9309 (comment) here is a proposal of how to rewrite expressions directly without quite so much copying.
It is part of my larger plan to make rewriting LogicalPlans easier too.
Github kind of mangled the diff in this file, but the core change is that all code for recursing Expr trees that are not relevant to the constant folding is in Expr::rewrite now and no longer in this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an improvement to me NOT NOT #b is the same as b :) I suspect something was not quite right with the recursion previously
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my original implementation did the tree traversal wrong for not expr :P It was doing a preorder traversal, which requires a convergent loop to produce #b in this case. Nice catch.
houqp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks great! Good boilerplate code clean up :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like these manual rewrites are redundant because they should have been invoked during tree traversal before mutate was called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a great call -- I will try and remove them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, i think this rewrite is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code gets quite a bit cleaner with this improvement @houqp - thank you for the suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my original implementation did the tree traversal wrong for not expr :P It was doing a preorder traversal, which requires a convergent loop to produce #b in this case. Nice catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor, but I think we can use the pre_visit method to skip traversal for these expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it and I could not convince myself that this change this would gain much -- we still have to match on the Expr type so it would just move the list of the variants into another function (in a separate match) which seems to obscure the logic a bit for me
161122e to
67d35b7
Compare
| let right = optimize_expr(right, schemas)?; | ||
| match op { | ||
| Operator::Eq => match (&left, &right) { | ||
| impl<'a> ExprRewriter for ConstantRewriter<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With some of @houqp 's comments, this rewrite pass is looking beautiful in my opinion -- it really looks like a rewrite rather than a reconstruction
Codecov Report
@@ Coverage Diff @@
## master #9545 +/- ##
==========================================
+ Coverage 82.25% 82.39% +0.13%
==========================================
Files 244 244
Lines 55685 56216 +531
==========================================
+ Hits 45806 46317 +511
- Misses 9879 9899 +20
Continue to review full report at Codecov.
|
houqp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, good work @alamb !
Rationale:
This is part of a larger effort, described on ARROW-11689. for making improvements to the DataFusion query optimizer easier to write and making it more efficient,.
The idea is that by splitting out the expr traversal code from the code that does the actual rewriting, we will:
PlanRewriterthat doesn't have to clone its input, and can modify take their input by value and consume them.Changes
This PR introduce a
ExpressionRewriter, the mutable counterpart toExpressionVisitorand demonstrates its usefulness by using it in the constant folding algorithm.Note this also reduces a bunch of copies in the constant folding algorithm.