Fix prune output rules for intersect and except nodes#21343
Fix prune output rules for intersect and except nodes#21343feilong-liu merged 1 commit intoprestodb:masterfrom
Conversation
There was a problem hiding this comment.
Nit- The method is called rewriteSetOperationVariableMapping, so Instead of changing the method to not prune, can we use inline and exit early where it should be false?
There was a problem hiding this comment.
rewriteSetOperationVariableMapping is doing two things, 1) prune variables 2) rewrite Map<Variable, List> to ListMultimap<Variable, Variable>. For the intersect and except node, it does not need to be pruned. Ideally we do not even need to rewrite to ListMultimap here, however it needs to change function rewriteSetOperationSubPlans as well, which seems too much change for this fix. After some thought I found the current way to be with minimum change in code and more direct in intention, i.e. skip pruning for these two nodes.
There was a problem hiding this comment.
Another way we can also do is like this
@Override
public PlanNode visitIntersect(IntersectNode node, RewriteContext<Set<VariableReferenceExpression>> context)
{
Set<VariableReferenceExpression> expectedInputs = new HashSet<>(context.get());
expectedInputs.addAll(node.getOutputVariables());
ListMultimap<VariableReferenceExpression, VariableReferenceExpression> rewrittenVariableMapping = rewriteSetOperationVariableMapping(node, expectedInputs);
ImmutableList<PlanNode> rewrittenSubPlans = rewriteSetOperationSubPlans(node, context, rewrittenVariableMapping);
return new IntersectNode(node.getSourceLocation(), node.getId(), rewrittenSubPlans, ImmutableList.copyOf(rewrittenVariableMapping.keySet()), fromListMultimap(rewrittenVariableMapping));
}
Basically add intersects output as expectedInputs and refactor rewriteSetOperationVariableMapping function to take Set as the input. I think this is more in line logically with this optimizer.
Not blocking the current approach just adding a discussion point
aab2119 to
a41f4a9
Compare
There was a problem hiding this comment.
nit: perhaps call the argument pruneUnreferencedColumns
|
I guess this bug could lead to correctness issues if we incorrectly prune columns (and thus may collapse rows that aren't duplicates in the extra column but agree on all other columns). |
a41f4a9 to
3898753
Compare
We already have test cases for them, |
but do we have a test that had wrong results before and is now fixed? these may have been accidentally correct |
This will not fail, as currently the earliest run of |
|
I actually wonder if we should change intersect and execpt to use join/left join - aggregations are not very good for optimizations. Joins are better |
ajaygeorge
left a comment
There was a problem hiding this comment.
Stamping since it is already reviewed.
We should not prune output of intersect and except nodes.
3898753 to
f20be39
Compare
Description
Do not prune output in Intersect and Except nodes in prune unused output rule.
Motivation and Context
In presto, we implement Intersect and Except nodes as union+aggregation in
ImplementIntersectAndExceptAsUnionrule.For example, for query
SELECT k1 FROM (SELECT nationkey as k1, regionkey as k2 FROM nation intersect SELECT orderkey as k1, custkey as k2 FROM orders), it will be implemented as union of aggregation overnationandorders, group byk1andk2, and compare the count later. The example plan is as follows:However, in current prune output rule, it will prune the output of k2 from the intersect node, as it's not in the output, hence lead to incorrect result.
Fortunately, currently we only run the prune output rule after
ImplementIntersectAndExceptAsUnionrule, which means intersect and except node does not exist, hence we didn't hit this bug.Impact
Fix a potential correctness issue.
Test Plan
Unit test
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.