Skip to content

Conversation

@bmc99
Copy link

@bmc99 bmc99 commented Nov 12, 2018

Fixes #11444

@bmc99 bmc99 force-pushed the stable_explain_plan branch 3 times, most recently from 65e4ce4 to d92de39 Compare November 13, 2018 01:44
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a test to com.facebook.presto.tests.TestQueryPlanDeterminism that would fail without your change here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kokosing Grzegorz, this functionality requires partitioned tables. TestQueryPlanDeterminism doesn't have any.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't TPCH "partitioned"? See orders.orderstatus, part.container and part.type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kokosing That's right. I didn't notice. I couldn't reproduce the issue using these tables though. In general, this has been tricky to reproduce. One repro we have involves a complicated query and a table with 7 partition keys.

@bmc99 bmc99 force-pushed the stable_explain_plan branch from d92de39 to 9ffd10b Compare November 13, 2018 06:56
@bmc99 bmc99 changed the title [WIP] Make query plan stable Improve query plan by ordering partition columns Nov 13, 2018
@bmc99 bmc99 force-pushed the stable_explain_plan branch 3 times, most recently from de56f0e to cf20092 Compare November 13, 2018 07:58
@mbasmanova
Copy link
Contributor

@bmc99 Mithun, the code changes look good. Let's improve the commit message a bit. How about this:

Print partition columns in order in explain plans

To improve plan stability (e.g. make sure explain plans for the same query match) print 
partition columns for a table in alphabetical order.

@bmc99 bmc99 force-pushed the stable_explain_plan branch from cf20092 to f1d3d25 Compare November 13, 2018 16:31
@bmc99
Copy link
Author

bmc99 commented Nov 13, 2018

I have updated the commit message.

@mbasmanova mbasmanova self-assigned this Nov 13, 2018
@martint
Copy link
Contributor

martint commented Nov 13, 2018

print partition columns for a table in alphabetical order.

Partition columns is a connector-specific concept (in particular, just Hive supports it). This engine-level change applies to any column that was pushed down into the connector during query planning.

@mbasmanova
Copy link
Contributor

@martint Martin, thanks for clarifying. What would be a good way to describe this change then? Will this work?

Print pushed-down predicates in order in explain plans

To improve plan stability (e.g. make sure explain plans for the same query match) print
pushed-down predicates in alphabetical order.

Copy link
Contributor

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% Martin's comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like fixing the symptoms. Do you know what is the root cause of non-determinism?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have considered fixing the predicate order at the place they are added as one of the option. But, I did not see a need for the the predicates to be in order anywhere else except in the explain plan. So, I have decided to fix it a that layer.

Are there any requirements/need to have the predicates ordered throughout? Can you please elaborate as to why you think there is a root issue other than explain plan?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TableScanNode#getCurrentConstraint is used in multiple places in the code. It could be that predicate order matters in those places (e.g: when converting predicate back to filter expression). There are quite sophisticated methods that process TupleDomain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sopel39 Karol, it feels like exploring what other places might need fixing is beyond the scope of this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's just TupleDomain itself. For instance I found TupleDomain#simplify which uses nondeterministic HashMap collector. Another one is: TupleDomain#intersect (uses HashMap). Another one: TupleDomain#columnWiseUnion (uses HashMap). Another one: TupleDomain#transform. Instead of HashMap we could use LinkedHashMap that preserves order.

There are probably few such places to be fixed. Then explicit sorting in explain should not be needed and plan itself would be deterministic (not just explain).

Copy link
Contributor

@sopel39 sopel39 Nov 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change doesn't improve plan stability (in actual plan partition columns are still in unspecified order, right?), but explain stability

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the change stabilizes the explain; Plan stability is not an issue as it relates to unspecified order of the predicates so the verbiage can be changed to use the word explain instead of plan.

@bmc99 bmc99 changed the title Improve query plan by ordering partition columns Improve explain by ordering predicates Nov 16, 2018
@bmc99 bmc99 force-pushed the stable_explain_plan branch from f1d3d25 to 575b709 Compare November 16, 2018 10:07
To improve explain plan stability (e.g. make sure explain plans for the same query match) print
pushed-down predicates in alphabetical order.
@bmc99 bmc99 force-pushed the stable_explain_plan branch from 575b709 to e3d6e5a Compare November 16, 2018 18:33
Copy link
Contributor

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Karol that this is just fixing a symptom. We should fix the underlying cause, instead.

@mbasmanova
Copy link
Contributor

@martint @sopel39 Just for my understanding, isn't it fragile to rely on a map to be in a particular order? The interface itself doesn't seem to guarantee that.

@sopel39
Copy link
Contributor

sopel39 commented Nov 16, 2018

@mbasmanova I think in Presto we rely that immutable collections preserve insertion order quite a lot as it allows for plan determinism. That's why we use Guava immutable collections: Except for sorted collections, order is preserved from construction time. (https://github.com/google/guava/wiki/ImmutableCollectionsExplained).
I've fixed equality inference with that assumption too: a23c88c

@mbasmanova
Copy link
Contributor

@sopel39 Karol, I didn't know that. Thanks for explaining.

@mbasmanova
Copy link
Contributor

mbasmanova commented Feb 14, 2019

Superseded by #12332

@mbasmanova mbasmanova closed this Feb 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants