-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: RewriteCycle API for short-circuiting optimizer loops #10386
base: main
Are you sure you want to change the base?
feat: RewriteCycle API for short-circuiting optimizer loops #10386
Conversation
cdb389a
to
7330dfe
Compare
ae31462
to
dcc9140
Compare
impl RewriteCycle { | ||
fn new() -> Self { | ||
RewriteCycle { | ||
// use usize::MAX as default to avoid checking null in is_done() comparison |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the reason to avoid checking null because of cost? I'm fine with either way, but curious on the choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think modeling this with num_rewriters: Option<usize>
would make it clearer what is going on and help the compiler check the logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think modeling this with
num_rewriters: Option<usize>
would make it clearer what is going on and help the compiler check the logic
yes I tried to get too fancy here with micro-optimizations when I saw benchmark regressions 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think cycle_length
would also be a more clear name for this variable 🤔 I think it is confusing that it is only initialized after the first iteration
} | ||
|
||
fn is_done(&self) -> bool { | ||
self.consecutive_unchanged_count >= self.num_rewriters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic to stop looping is checking if transformed
meet the num of rewriters, but can we just stop if any of the transformed
is false? I think it is more straightforward and we don't need to have a max_iter
num too.
Idea is something like
loop {
transformed |= rewrite_rule()
if not transformed {
return
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I think there are cases when simplification will do something that allows more constant propagation to proceed so even if transformed=false
is returned another application of a different pass could actually simplify things -- so in this case for example if the const evaluator returns false, simpliy could still return true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so in this case for example if the const evaluator returns false, simpliy could still return true
In this case, transformed is true, so we should run another pass.
I think once we found every rules is not transformed in a pass that means we done the optimization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think once we found every rules is not transformed in a pass that means we done the optimization
Yes, I agree -- I think this is what the counter is attempting to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I think there are cases when simplification will do something that allows more constant propagation to proceed so even if
transformed=false
is returned another application of a different pass could actually simplify things -- so in this case for example if the const evaluator returns false, simpliy could still return true
yes I found this to be true when running tests against this code. an additional constant evaluation produced a new transformation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @erratic-pattern and @jayzhan211
I am going to run the planning benchmarks on this branch using a VM and report performance numbers
impl RewriteCycle { | ||
fn new() -> Self { | ||
RewriteCycle { | ||
// use usize::MAX as default to avoid checking null in is_done() comparison |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think modeling this with num_rewriters: Option<usize>
would make it clearer what is going on and help the compiler check the logic
Here are my measurements on my gcp machine (it does seem to help by 1-2%) Details
|
My second run was consistent: Details
|
Other PR has been merged in: #10358 I think we can merge / rebase this PR now and mark it ready for review |
I will take another look at this and see if I can clean it up a bit more. |
a0b8397
to
4700004
Compare
83de911
to
f8e5c75
Compare
datafusion/common/src/tree_node.rs
Outdated
@@ -503,7 +503,7 @@ pub trait TreeNodeVisitor: Sized { | |||
/// | |||
/// # See Also: | |||
/// * [`TreeNode::visit`] to inspect borrowed `TreeNode`s | |||
pub trait TreeNodeRewriter: Sized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this was a supertrait of Sized
but it can't be if we want trait objects
datafusion/common/src/tree_node.rs
Outdated
@@ -172,7 +172,7 @@ pub trait TreeNode: Sized { | |||
/// TreeNodeRewriter::f_up(ChildNode2) | |||
/// TreeNodeRewriter::f_up(ParentNode) | |||
/// ``` | |||
fn rewrite<R: TreeNodeRewriter<Node = Self>>( | |||
fn rewrite<R: TreeNodeRewriter<Node = Self> + ?Sized>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed in order to be callable with a trait object
We need to figure out how many rewrites are in a cycle. Since there is no Note that the dynamic dispatch API that I deleted from this PR does not have this problem, since we can simply set You can make this code to not have a special first pass if you wanted to make it simpler to understand. If you check This makes the code and the API kind of complicated for something that should be relatively simple. But I found dynamic dispatch and additional conditional checks to have a significant performance regressions when we're operating over a small number of rewriters (only 3 rewriters here), so I wanted to make an API useable with static dispatch. A dynamic dispatch API is more appropriate for the top-level optimizer because:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erratic-pattern -- thank you -- this is really cool
I have some suggestions on testing and not needing to make the parts of the expr simplifier public.
Let me know if you think you have time to do this -- if not maybe I can take a shot at helping
Result, | ||
}; | ||
|
||
/// A builder with methods for executing a "rewrite cycle". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very powerful API -- I can imagine using it for the optimizer rules in general 🤔 -- very neat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very powerful API -- I can imagine using it for the optimizer rules in general 🤔 -- very neat
See 1886002 for an example of adding dynamic dispatch helpers. That code is actually much simpler and can leverage the same internal state logic.
/// individual [TreeNodeRewriter] in the cycle. [RewriteCycleState::rewrite] returns a [RewriteCycleControlFlow] | ||
/// result, indicating whether the loop should break or continue. | ||
/// | ||
/// ```rust |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a neat example, but I think it may overly complicated to try and reuse the simplification logic
What if you wrote your own simple test / example tree node rewriter (especially given this API is generic)
Maybe something that adds 1
to integers but not floats 🤔
I think that would both:
- Avoid having to make stuff in the simplifier pub
- Make it clearer what this API did
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this doctest with a simple example of two rewriters for constant addition/multiplication. I think it is important for the example to show a rewriter that eventually stops making changes, and makes changes that another rewriter depends on.
} | ||
} | ||
|
||
pub type RewriteCycleControlFlow<T> = ControlFlow<Result<T>, T>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you have this API so nicely broken out, I think it would be really neat to add tests for it that demonstrate it works properly in isolation as well as make its api clearer in code
Is this PR ready for the next round of review @erratic-pattern ? Or do you plan to make further changes to it? |
@alamb I will try to update this today or tomorrow. I've been putting this off a bit. |
Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look |
1886002
to
c1ffdfa
Compare
} | ||
|
||
pub type RewriteCycleControlFlow<T> = ControlFlow<Result<T>, T>; | ||
#[cfg(test)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new tests
efb2966
to
3161f14
Compare
Checking this out... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @erratic-pattern and @jayzhan211 -- I think this PR looks really nice.
I have some documentation / naming suggestions, but nothing that I think is required
I also tried out @jayzhan211 's suggestion https://github.com/apache/datafusion/pull/10386/files#r1632502092 and it seems to work well , so I think you should respond to that before we merge this PR
While I didn't see any performance change with this PR, I think it is worthwhile simply for the readability improvements
Performance
I ran the planning benchmark with this brach an in summary there is basically no performance change
logical_plan_tpcds_all 1.00 155.0±1.26ms ? ?/sec 1.00 155.2±1.27ms ? ?/sec
logical_plan_tpch_all 1.00 17.1±0.21ms ? ?/sec 1.00 17.2±0.17ms ? ?/sec
This may be due to the fact we have improved the performance of some of the optimizer passes now so the overhead of running them multiple times is not as great.
Details
++ critcmp main simplifier-static-dispatch
group main simplifier-static-dispatch
----- ---- --------------------------
logical_aggregate_with_join 1.00 1009.0±12.76µs ? ?/sec 1.01 1014.2±13.57µs ? ?/sec
logical_plan_tpcds_all 1.00 155.0±1.26ms ? ?/sec 1.00 155.2±1.27ms ? ?/sec
logical_plan_tpch_all 1.00 17.1±0.21ms ? ?/sec 1.00 17.2±0.17ms ? ?/sec
logical_select_all_from_1000 1.00 19.0±0.12ms ? ?/sec 1.00 19.1±0.17ms ? ?/sec
logical_select_one_from_700 1.01 825.9±8.11µs ? ?/sec 1.00 819.5±8.51µs ? ?/sec
logical_trivial_join_high_numbered_columns 1.00 768.0±11.59µs ? ?/sec 1.06 811.6±41.08µs ? ?/sec
logical_trivial_join_low_numbered_columns 1.00 758.8±23.88µs ? ?/sec 1.00 756.8±10.68µs ? ?/sec
physical_plan_tpcds_all 1.00 1250.3±8.57ms ? ?/sec 1.00 1248.0±4.97ms ? ?/sec
physical_plan_tpch_all 1.00 87.0±1.17ms ? ?/sec 1.00 86.8±1.06ms ? ?/sec
physical_plan_tpch_q1 1.00 4.6±0.06ms ? ?/sec 1.03 4.7±0.07ms ? ?/sec
physical_plan_tpch_q10 1.00 4.1±0.07ms ? ?/sec 1.00 4.1±0.07ms ? ?/sec
physical_plan_tpch_q11 1.00 3.6±0.06ms ? ?/sec 1.01 3.6±0.05ms ? ?/sec
physical_plan_tpch_q12 1.00 2.7±0.03ms ? ?/sec 1.00 2.7±0.04ms ? ?/sec
physical_plan_tpch_q13 1.00 2.0±0.02ms ? ?/sec 1.00 2.0±0.02ms ? ?/sec
physical_plan_tpch_q14 1.00 2.4±0.04ms ? ?/sec 1.00 2.4±0.04ms ? ?/sec
physical_plan_tpch_q16 1.02 3.6±0.07ms ? ?/sec 1.00 3.5±0.06ms ? ?/sec
physical_plan_tpch_q17 1.00 3.4±0.06ms ? ?/sec 1.00 3.4±0.06ms ? ?/sec
physical_plan_tpch_q18 1.01 3.8±0.07ms ? ?/sec 1.00 3.8±0.07ms ? ?/sec
physical_plan_tpch_q19 1.02 5.6±0.08ms ? ?/sec 1.00 5.6±0.07ms ? ?/sec
physical_plan_tpch_q2 1.00 7.4±0.08ms ? ?/sec 1.03 7.6±0.10ms ? ?/sec
physical_plan_tpch_q20 1.02 4.5±0.13ms ? ?/sec 1.00 4.4±0.06ms ? ?/sec
physical_plan_tpch_q21 1.00 6.0±0.08ms ? ?/sec 1.00 6.0±0.10ms ? ?/sec
physical_plan_tpch_q22 1.01 3.3±0.09ms ? ?/sec 1.00 3.2±0.05ms ? ?/sec
physical_plan_tpch_q3 1.00 3.0±0.04ms ? ?/sec 1.01 3.0±0.07ms ? ?/sec
physical_plan_tpch_q4 1.00 2.2±0.02ms ? ?/sec 1.02 2.2±0.04ms ? ?/sec
physical_plan_tpch_q5 1.00 4.2±0.06ms ? ?/sec 1.02 4.3±0.09ms ? ?/sec
physical_plan_tpch_q6 1.00 1449.0±66.64µs ? ?/sec 1.01 1461.8±26.90µs ? ?/sec
physical_plan_tpch_q7 1.00 5.3±0.09ms ? ?/sec 1.01 5.3±0.08ms ? ?/sec
physical_plan_tpch_q8 1.00 6.8±0.09ms ? ?/sec 1.02 6.9±0.09ms ? ?/sec
physical_plan_tpch_q9 1.00 5.2±0.08ms ? ?/sec 1.01 5.2±0.09ms ? ?/sec
physical_select_all_from_1000 1.00 61.3±0.26ms ? ?/sec 1.01 61.7±0.24ms ? ?/sec
physical_select_one_from_700 1.00 3.6±0.04ms ? ?/sec 1.01 3.6±0.03ms ? ?/sec
self.max_cycles | ||
} | ||
|
||
/// Runs a rewrite cycle on the given [TreeNode] using the given callback function to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please define how a "cycle" and "iteration" are related in this function? They are used in the apis below but I didn't see a definition of the terms and how they are related
I think it is that a cycle is composed of several iterations
Alternately, maybe "rewrites" is a better term than "iteration" as this is a "rewrite cycle" 🤔
#[derive(Debug)] | ||
pub struct RewriteCycleState<Node: TreeNode> { | ||
node: Node, | ||
consecutive_unchanged_count: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would help me follow the code better if these fields used the terms "cycle" and "iteration" consistently
Like
consecutive_unchanged_iterations: usize,
iteration_count: usize,
// the number of iterations in each cycle (set after the first cycle)
cycle_length: Option<usize>,
self.consecutive_unchanged_count >= cycle_length | ||
} | ||
|
||
/// Finishes the iteration by consuming the state and returning a [TreeNode] and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this finishes the cycle right (not the iteration?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this finishes the cycle right (not the iteration?)
it finishes all cycles. as in, it runs all of the remaining iterations and terminates the loop. I will reword to clarify
} | ||
// run remaining cycles | ||
match (1..self.max_cycles).try_fold(state, |state, _| f(state)) { | ||
ControlFlow::Break(result) => result?.finish(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this suggestion locally and it seems to work well 👍
ControlFlow::Break(result) => result?.finish(), | |
ControlFlow::Break(result) => return result?.finish(), |
I fixed a clippy lint and merged up from main to get the latest changes into this PR |
Yes that is what I observed as well. I assume the overhead offsets any potential improvement from skipping steps here. I think the value of this API comes from the potential reuseability of the logic for the entire optimizer. It is not ready to do that in its current state, but with some improvements it could be used across the optimizer. |
I will update this PR with review suggestions (maybe) this weekend |
Marking as draft as feedback is incorporated |
d64dbe3
to
f5d6ed4
Compare
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Going to update this soon to get it ready for merging. |
Which issue does this PR close?
Closes #1160.
Rationale for this change
This is a follow up to #10358 with a new approach that should short-circuit earlier. See previous discussion there.
What changes are included in this PR?
Are these changes tested?
yes
Are there any user-facing changes?
no