-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Improve explain by ordering predicates #11903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
65e4ce4 to
d92de39
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a test to com.facebook.presto.tests.TestQueryPlanDeterminism that would fail without your change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kokosing Grzegorz, this functionality requires partitioned tables. TestQueryPlanDeterminism doesn't have any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't TPCH "partitioned"? See orders.orderstatus, part.container and part.type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kokosing That's right. I didn't notice. I couldn't reproduce the issue using these tables though. In general, this has been tricky to reproduce. One repro we have involves a complicated query and a table with 7 partition keys.
d92de39 to
9ffd10b
Compare
de56f0e to
cf20092
Compare
|
@bmc99 Mithun, the code changes look good. Let's improve the commit message a bit. How about this: |
cf20092 to
f1d3d25
Compare
|
I have updated the commit message. |
Partition columns is a connector-specific concept (in particular, just Hive supports it). This engine-level change applies to any column that was pushed down into the connector during query planning. |
|
@martint Martin, thanks for clarifying. What would be a good way to describe this change then? Will this work? |
kokosing
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
% Martin's comments
presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like fixing the symptoms. Do you know what is the root cause of non-determinism?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have considered fixing the predicate order at the place they are added as one of the option. But, I did not see a need for the the predicates to be in order anywhere else except in the explain plan. So, I have decided to fix it a that layer.
Are there any requirements/need to have the predicates ordered throughout? Can you please elaborate as to why you think there is a root issue other than explain plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TableScanNode#getCurrentConstraint is used in multiple places in the code. It could be that predicate order matters in those places (e.g: when converting predicate back to filter expression). There are quite sophisticated methods that process TupleDomain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sopel39 Karol, it feels like exploring what other places might need fixing is beyond the scope of this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's just TupleDomain itself. For instance I found TupleDomain#simplify which uses nondeterministic HashMap collector. Another one is: TupleDomain#intersect (uses HashMap). Another one: TupleDomain#columnWiseUnion (uses HashMap). Another one: TupleDomain#transform. Instead of HashMap we could use LinkedHashMap that preserves order.
There are probably few such places to be fixed. Then explicit sorting in explain should not be needed and plan itself would be deterministic (not just explain).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change doesn't improve plan stability (in actual plan partition columns are still in unspecified order, right?), but explain stability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the change stabilizes the explain; Plan stability is not an issue as it relates to unspecified order of the predicates so the verbiage can be changed to use the word explain instead of plan.
f1d3d25 to
575b709
Compare
presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
To improve explain plan stability (e.g. make sure explain plans for the same query match) print pushed-down predicates in alphabetical order.
575b709 to
e3d6e5a
Compare
martint
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Karol that this is just fixing a symptom. We should fix the underlying cause, instead.
|
@mbasmanova I think in Presto we rely that immutable collections preserve insertion order quite a lot as it allows for plan determinism. That's why we use Guava immutable collections: |
|
@sopel39 Karol, I didn't know that. Thanks for explaining. |
|
Superseded by #12332 |
Fixes #11444