Add support for preferred insert or create table layouts by sopel39 · Pull Request #2358 · trinodb/trino

sopel39 · 2019-12-28T11:52:29Z

No description provided.

sopel39 · 2019-12-31T11:02:05Z

this is ready to go

findepi · 2020-01-01T19:58:18Z

presto-hive/src/main/java/io/prestosql/plugin/hive/HiveMetadata.java

Apply partitioning by partition columns also when the table is bucketed.

When table is bucketed then data partitioning on partition columns should be required, but this is outside of scope of this PR.
For example,

currently: if you have 10 buckets per partition, then we partition data by bucket columns. This means only 10 workers will write data for all partitions

better: if you partition data by bucket columns AND partition columns, then still 10 workers will write data per partition. However, set of workers per partition changes therefore improving parallelism.

findepi · 2020-01-01T20:00:26Z

This adds use-preferred-write-partitioning configuration / use_preferred_write_partitioning session toggle.
Shouldn't we have a Hive- (catalog-) specific toggle instead?

sopel39 · 2020-01-02T11:26:22Z

Shouldn't we have a Hive- (catalog-) specific toggle instead?

Some more context on engine vs connector approach. While connector specific toggle could be some stop-gap here, I don't think it's a good approach. This PR opens a way for CBO to make decision on using preferred insert layout. For example there could be very basic CBO rule that chooses proffered partitioning if number of groups if greater than some constant. I think such rule would work good enough in practice. This PR is step toward it. I don't like too many manual toggles
and insert queries are often simpler so stats are easier to estimate.

The SPI needs to be changed regardless of approach as connectors need to be able to return FIXED_HASH ConnectorPartitioningHandle. This is system partitioning handle and reimplementing it in connectors (there are various methods associated with it) doesn't make much sense.

sopel39 · 2020-01-08T09:12:37Z

presto-main/src/main/java/io/prestosql/sql/planner/optimizations/AddLocalExchanges.java

@dain mentioned there was some PR to add support for arbitrary partitioning in local exchanges.

martint · 2020-01-27T14:44:01Z

presto-main/src/main/java/io/prestosql/metadata/NewTableLayout.java

Not sure I understand what the commit message is referring to. Whats "evenly partitioning"?

martint · 2020-01-27T14:49:53Z

presto-spi/src/main/java/io/prestosql/spi/connector/ConnectorNewTableLayout.java

This is strange. Why would partioningColumns be missing when partitioningColumns is present?

This is strange. Why would partitioning be missing when partitioningColumns is present?

If you just care that data is partitioned on columns, but don't care how exactly, then you don't need to specify partitioning

martint · 2020-01-27T15:02:48Z

presto-main/src/main/java/io/prestosql/sql/planner/LogicalPlanner.java

I find the fact we're looking at getLayout().getPartitioning() to decide whether to call getPartitioning() very confusing, and probably a sign of an API design issue.

It'd be cleaner to make getPartitioning() for NewTableLayout return optional and just do:

PartitioningHandle partitioningHandle = writeTableLayout.get() .getPartitioning() .orElse(FIXED_HASH_DISTRIBUTION)

martint · 2020-01-27T15:15:08Z

presto-main/src/main/java/io/prestosql/sql/planner/optimizations/AddLocalExchanges.java

Maybe...

boolean hasFixedHashDistribution = node.getPartitioningScheme() .map(scheme -> scheme.getPartitioning().getHandle()) .filter(isEqual(FIXED_HASH_DISTRIBUTION)) .isPresent();

or

boolean hasFixedHashDistribution = node.getPartitioningScheme() .filter(scheme -> scheme.getPartitioning().getHandle().equals(FIXED_HASH_DISTRIBUTION)) .isPresent();

This can move inside the if (getTaskWriterCount(session) > 1) { block

sopel39

ac

sopel39 · 2020-01-28T16:03:20Z

presto-spi/src/main/java/io/prestosql/spi/connector/ConnectorNewTableLayout.java

This is strange. Why would partitioning be missing when partitioningColumns is present?

If you just care that data is partitioned on columns, but don't care how exactly, then you don't need to specify partitioning

sopel39 added the WIP label Dec 28, 2019

cla-bot bot added the cla-signed label Dec 28, 2019

sopel39 mentioned this pull request Dec 28, 2019

Repartition writes across nodes when loading data into partitioned table #304

Closed

sopel39 force-pushed the ks/insert_repartition branch 6 times, most recently from fefa7f0 to 69a4f64 Compare December 31, 2019 11:01

sopel39 changed the title ~~[WIP] Add support for preferred insert or create table layouts~~ Add support for preferred insert or create table layouts Dec 31, 2019

sopel39 requested a review from martint December 31, 2019 11:01

sopel39 assigned electrum and unassigned electrum Dec 31, 2019

sopel39 requested a review from electrum December 31, 2019 11:01

sopel39 removed the WIP label Dec 31, 2019

sopel39 force-pushed the ks/insert_repartition branch 2 times, most recently from 18c9274 to 7ef8a04 Compare December 31, 2019 12:25

findepi reviewed Jan 1, 2020

View reviewed changes

sopel39 assigned martint Jan 3, 2020

sopel39 force-pushed the ks/insert_repartition branch from 7ef8a04 to 3edf486 Compare January 7, 2020 21:29

sopel39 commented Jan 8, 2020

View reviewed changes

sopel39 mentioned this pull request Jan 9, 2020

Add 2019 summary post trinodb/trino.io#66

Merged

martint reviewed Jan 27, 2020

View reviewed changes

Wait for running queries issued during grace period to finish

c177188

sopel39 force-pushed the ks/insert_repartition branch from 3edf486 to 3c65d37 Compare January 28, 2020 16:28

sopel39 commented Jan 28, 2020

View reviewed changes

Improve comment

07e8dea

sopel39 added 2 commits January 28, 2020 17:30

Use lambda

e519235

Add support for inserts to MockConnectorFactory

6c68031

sopel39 force-pushed the ks/insert_repartition branch from 3c65d37 to 0c629af Compare January 28, 2020 16:34

martint self-requested a review January 28, 2020 16:52

sopel39 added 6 commits January 29, 2020 14:21

Unnest MockConnectorTableHandle

cdf9c76

Add support for CTAS to MockConnectorFactory

1c793c6

Allow connectors to specify evenly partitioning for create or insert

1832255

Add "use-preferred-write-partitioning" feature config

534ec0f

Add insert and CTAS planning tests

acf6bf4

Add support for preferred layout to Hive connector

835ec62

sopel39 force-pushed the ks/insert_repartition branch from 0c629af to 835ec62 Compare January 29, 2020 13:46

martint approved these changes Jan 30, 2020

View reviewed changes

sopel39 merged commit 6dbc220 into trinodb:master Jan 30, 2020

sopel39 deleted the ks/insert_repartition branch January 30, 2020 15:02

This was referenced Jan 30, 2020

Release notes for 330 #2595

Closed

Use stats to automatically determine if connector preferred layout should be used for insert #2741

Open

arhimondr mentioned this pull request Oct 24, 2022

Enable preferred write partitioning for FTE by default #14735

Merged

Conversation

sopel39 commented Dec 28, 2019

Uh oh!

sopel39 commented Dec 31, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

findepi commented Jan 1, 2020

Uh oh!

sopel39 commented Jan 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martint Jan 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sopel39 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

sopel39 commented Jan 2, 2020 •

edited

Loading

martint Jan 27, 2020 •

edited

Loading