-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizer Sanity Checker #24
Optimizer Sanity Checker #24
Conversation
Only includes sort reqs, docs will be added.
let child_sort_req = child_sort_req.as_ref().unwrap(); | ||
if !child | ||
.equivalence_properties() | ||
.ordering_satisfy_requirement(&child_sort_req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API is the correct API (in response to your email).
let bw = bounded_window_exec("c9", sort_exprs, sort); | ||
let sanity_checker = SanityCheckPlan::new(); | ||
let opts = ConfigOptions::default(); | ||
assert_eq!(sanity_checker.optimize(bw, &opts).is_ok(), true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally you can assert plan of the bw
to make test more explicit. By this way plan will be more visible: Somethinkg like below:
let expected_plan = vec![
// Fill Plan
];
let actual_plan = get_plan_string(&bw);
assert_eq(expected_plan, actual_plan)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added plan checks in the below commit:
nulls_first: false, | ||
}, | ||
)]; | ||
let bw = bounded_window_exec("c9", sort_exprs, source); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing applies here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added plan checks in the below commit:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
// SanityChecker rule checks whether the order and the | ||
// partition requirements of each execution plan is | ||
// satisfied by its children or not. | ||
Arc::new(SanityCheckPlan::new()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can discuss if this rule can be merged with PipelineChecker
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, in addition to doing is_pipeline_friendly
, PipelineChecker
also checks a condition around SymmetricHashJoin
operation. Since PipelineChecker
is more enchanced (to check pipeline friendliness) should we remove the pipeline friendliness check from SanityChecker
? Or to simplify, I can merge PipelineChecker
into SanityChecker
(and remove PipelineChecker
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we should either
- remove pipeline friendliness check from
SanityChecker
(as this is done in another rule already) - or merge
PipelineChecker
intoSanityChecker
.
However, merging these two rules seems more appropriate. Hence, we should follow this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are merged in the below commit:
plan: Arc<dyn ExecutionPlan>, | ||
) -> Result<Transformed<Arc<dyn ExecutionPlan>>> { | ||
if !plan.execution_mode().pipeline_friendly() { | ||
return plan_err!("Plan {:?} is not pipeline friendly.", plan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pipeline friendly means the plan can be executed in bounded memory even if the source is unbounded, or the plan has a bounded source. If we want to apply this rule before PipelineChecker
, we need to remove this check, or alternatively we can move this rule after PipelineChecker
.
@mustafasrepo, about the testing strategy: So far I have picked one scenario for order requirements (using Should we extend this to cover all
|
I think, existing test coverage is good. We do not need to use all operators in the tests. However, all the operators in the tests are single child operators. We can also add a test case for an operator with multiple children (such as |
Also, I ran the CI tests. It seems that we have some clippy errors and test errors (I think they are related to pipeline error raised in the Sanity Checker Rule. Since error message is changed expected error message isn't same anymore. Once you merge PipelineChecker rule and Sanity Checker rule these errors will be fixed as far as I can tell). Also, CI tests has additional check whether code follows guidelines. For this check, we have the script If you encounter any issues during these stages, please notify me. I can help you out. |
I have added test using |
Thanks @mustafasrepo, I have fixed clippy errors and some other test cases. However, there is one test case that does not finish: |
These problems do not seem to be related with your work. Hence, I also debugged them. It seems that we have two problems
I sent a commit to fix these problems. Also there are some other tests failing with this check. After examining them, I have found that rule actually help us find invalid plans. In the commit I also fixed them. However, there are still some clippy errors (If you do not have those errors. They come with latest rust updates. You can do |
@@ -675,7 +675,7 @@ impl ExecutionPlan for AggregateExec { | |||
vec![Distribution::UnspecifiedDistribution] | |||
} | |||
AggregateMode::FinalPartitioned | AggregateMode::SinglePartitioned => { | |||
vec![Distribution::HashPartitioned(self.output_group_expr())] | |||
vec![Distribution::HashPartitioned(self.group_by.input_exprs())] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have required input expressions. Rule helped us to catch this bug.
Arc::new(MemTable::try_new(Arc::new(table.schema.clone()), vec![])?), | ||
Arc::new(MemTable::try_new( | ||
Arc::new(table.schema.clone()), | ||
vec![vec![]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the source, we were generating empty partitions (0 partition). This is invalid. Rule helped us to catch this bug.
I have merged SELECT 1 num UNION ALL SELECT 2 num ORDER BY num; This yields the physical plan:
Order requirement for the |
@yfy- It seems that this failure is another case this check helped us catch a bug. I will fix this problem. I will let you know once this problem is solved. |
@yfy- I resolved the problems mentioned above in the PR. We may either merge that PR into your PR, or continue as separate PRs. (We will decide after internal discussion.) With the changes in that PR, all test pass in my local. Hence, for now we can call this PR complete in terms of functionality. However, I will do one last review of this PR tomorrow to make sure we are not missing anything. Kind regards |
@yfy- this PR is ready to go upstream. 🎉 🎉 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments to address, but LGTM in general. Great work @yfy-, please coordinate with @mustafasrepo to send/merge upstream after addressing the comments.
Co-authored-by: Mehmet Ozan Kabak <[email protected]>
Co-authored-by: Mehmet Ozan Kabak <[email protected]>
@yfy- final reviews are addressed now. You can open this PR to upstream repo 🚀 🚀 . |
0c2eb45
into
synnada-ai:feature/optimizer_sanity_checker
…calculations, limit/order/distinct (apache#11756) * Fix unparser derived table with columns include calculations, limit/order/distinct (#24) * compare format output to make sure the two level of projects match * add method to find inner projection that could be nested under limit/order/distinct * use format! for matching in unparser sort optimization too * refactor * use to_string and also put comments in * clippy * fix unparser derived table contains cast (#25) * fix unparser derived table contains cast * remove dbg
Rationale for this change
We extend the existing
PipelineChecker
with additional checks of order and distribution requirements. In a physical plan each plan should be pipeline friendly (previously checked byPipelineChecker
) and each child plan's output order and partitioning should satisfy the parent's respective requirements. We combine these checks into theSanityCheckPlan
proposed with this PR.With the
SanityCheckPlan
, as the optimizer steps change and grow, we ensure that the order and distribution requirements of the final phsyical plan are always satisfied so that it can yield correct results.What changes are included in this PR?
SanityCheckPlan
step as the last step of the physical plan optimizations.SanityCheckPlan
contains the formerPipelineChecker
, that's whyPipelineChecker
is deleted.Are these changes tested?
We use the existing test cases of the former
PipelineChecker
. In addition following cases are added:Using
BoundedWindowAggExec
2 cases: First with the child order requirement is satisfied and another one that is not satisifed.
Using
GlobalLimitExec
2 cases: First with the child distribution requirement is satisfied and another one that is not satisifed.
Using
LocalLimitExec
We check when there are no requirements at all our check passes.
Using
SortMergeJoinExec
3 cases: First case with both children satisify both requirements. Second case, where the second child does not satisfy the order requirement. Finally, a case where the second child does not satisfy distribution requirements.