Fix localexchange properties when spill is enabled#23922
Fix localexchange properties when spill is enabled#23922feilong-liu merged 1 commit intoprestodb:masterfrom
Conversation
78ce74c to
b1adea0
Compare
| List<LocalProperty<VariableReferenceExpression>> localProperties = this.localProperties; | ||
| if (unordered) { | ||
| localProperties = filteredCopy(this.localProperties, property -> !property.isOrderSensitive()); | ||
| ImmutableList.Builder<LocalProperty<VariableReferenceExpression>> newLocalProperties = ImmutableList.builder(); |
There was a problem hiding this comment.
Native side doesn't need any change on the query shape if the spill is enabled. So can we fix in that way? I assume the query shape without spill enabled is correct for native but only when join spill session property is enabled? Thanks!
There was a problem hiding this comment.
To use the same plan, join spill algorithm needs to preserver order i.e. output is the same as when no spill.
| if (unordered) { | ||
| localProperties = filteredCopy(this.localProperties, property -> !property.isOrderSensitive()); | ||
| ImmutableList.Builder<LocalProperty<VariableReferenceExpression>> newLocalProperties = ImmutableList.builder(); | ||
| for (LocalProperty<VariableReferenceExpression> property : this.localProperties) { |
There was a problem hiding this comment.
How do we know G(unique) is the first in this case? and are we able to guarantee that?
There was a problem hiding this comment.
No, we do not need to guarantee this, as which one is earlier depends on other code which produce this property (in this case it's the property generation code for AssignUniqueId) and not related to this part. This code only needs to make sure that properties after removal is still valid
| if (!property.isOrderSensitive()) { | ||
| newLocalProperties.add(property); | ||
| } | ||
| else { | ||
| break; | ||
| } |
There was a problem hiding this comment.
I don't know what these props mean.
Here we are discarding all properties after and including the 1st order sensitive one.
Is this the intent?
There was a problem hiding this comment.
Yes, this is the intent. It also takes me some time to understand this part of logic, the property is hierarchical and if one is invalid, all following will also be invalid
b1adea0 to
0234230
Compare
rschlussel
left a comment
There was a problem hiding this comment.
Can we add a unit test also? Maybe to TestLocalProperties
Also, for anyone else who is confused, want to share a more detailed explanation that I got from conversation with @feilong-liu:
Without spill, we have the properties G(unique), C(custkey), C(totalprice) where unique is the unique key generated by the AssignUniqueId operator. That means that we are grouped by the unique key, and within each group, custkey and totalprice are constant. (if two rows have the same value for unique, they will have the same value for custkey and total price).
when spill is enabled, we assume that the order of rows is not maintained, so we mark it as unordered (https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/sql/planner/optimizations/PropertyDerivations.java#L394). When this field is set, we were removing any order sensitive properties from the list. In this case G(unique) is considered order sensitive, so we were removing that. However, if you remove the G(unique) property, but keep the C(custkey) and C(totalprice) properties, the planner will think that custkey and totalprice are always constant, which is not true. And that's what resulted in the incorrect plan.
Instead, you need to remove all properties following the order sensitive one because the properties are hierarchical, so the following properties are defined within the context of the previous ones.
0234230 to
4ede454
Compare
Description
This is discovered by a bug, which can be replayed with the following query:
When executing this query in Presto CPP (i.e. native execution is true) and with
set session spill_enabled=true, it will produce a wrong query plan, where the Aggregation on the top is a streaming aggregation which is wrong.After fix with this PR
After debugging, it's due to the fact that when spill is enabled, the order sensitive local properties will be removed. However, the local properties has hierarchy and cannot be simply removed. In this example, the properties are G(unique), C(custkey), C(totalprice) where unique is the unique key generated by the AssignUniqueId operator. Current logic will simply remove the G(unique) property and keeping the rest of the two. This is not right as the rest will not hold true after the first is removed.
This PR fix it by discarding the rest of properties if any previous properties are removed.
Motivation and Context
Bug fix
Impact
Fix potential incorrect plan
Test Plan
Add unit tests
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.