Support shuffle on Hive partition columns before write by wenleix · Pull Request #14010 · prestodb/presto

wenleix · 2020-01-24T08:56:41Z

Previously, writing worker will receive rows in all partitions,
and thus can write upper to hive.max-partitions-per-writers partitions.

This session property allows shuffle on partitioned columns when writing
to partitioned unbucketed Hive tables. As a result, rows in the same
partition will be sent to the same writing worker. This increase the
number of maximum partitions written in single query by a factor of
number of total writing workers.

== RELEASE NOTES ==

Hive Changes
* Allow shuffle on partitioned columns when writing to partitioned unbucketed Hive tables. This increase the number of maximum partitions written in single query by a factor of number of total writing workers.This behavior has to be explicitly enabled by Connector session property `shuffle_partitioned_columns_for_table_write`. #14010

wenleix · 2020-01-24T08:57:04Z

Supersedes #13969

wenleix · 2020-01-24T18:21:43Z

cc @mbasmanova , @kaikalur , @aweisberg

mbasmanova · 2020-01-24T18:48:58Z

CC: @biswapesh

arhimondr · 2020-01-28T15:32:44Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java

!isShufflePartitionedColumnsForTableWriteEnabled(session) || table.getPartitionColumns().isEmpty()?

@arhimondr : Nice catch!

wenleix · 2020-01-30T19:11:46Z

@highker : Wondering if you can take a look at the SPI change?

highker

minor comment

highker · 2020-02-01T04:14:43Z

presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorMetadata.java

Can we add javadoc to the interface?

highker · 2020-02-01T04:14:48Z

presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorMetadata.java

wenleix · 2020-02-03T04:49:13Z

Thanks @highker for the review. Comments addressed 😃

mbasmanova · 2020-02-03T19:45:22Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java

@wenleix I don't understand this TODO. It seems to me that bucketed tables are handled above and would never reach that code. Would you clarify?

@mbasmanova : Sorry for the confusion, I mean we don't have to use HivePartitionHandle for the shuffle partitioning. Thinking about implementing a different HiveShufflePartitioningHandle that distribute the keys more uniformly (Hive bucket function is not that great :) )

mbasmanova · 2020-02-03T19:46:55Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java

@wenleix Same question about this TODO

mbasmanova

@wenleix I'd like to getter a better understanding of the effects of this change. With shuffle_partitioned_columns_for_table_write_enabled=true, the data will be shuffles on partition columns and TableWriter operator will run on as many nodes are there are available in the cluster, each node processing distinct set of partitions. E.g. all data for a single partition will be written by the same node. That node may also write few other partitions. Within the node, there will be at least 4 threads, but could be more if the node writes data for multiple partitions, up to 100 (TBD: config name). Hence, this properly allows to write up to #-nodes x 100 partitions in a single query and avoid making very small files. Is this accurate?

wenleix · 2020-02-04T00:12:53Z

@mbasmanova : Thanks @mbasmanova for the review!

With shuffle_partitioned_columns_for_table_write_enabled=true, the data will be shuffles on partition columns and TableWriter operator will run on as many nodes are there are available in the cluster, each node processing distinct set of partitions. E.g. all data for a single partition will be written by the same node

Correct.

Within the node, there will be at least 4 threads

It's configurable via task.writer-count.

Hence, this properly allows to write up to #-nodes x 100 partitions in a single query and avoid making very small files.

That's accurate. Although I am only thinking write to at most a few thousands partitions in practice 😃

mbasmanova · 2020-02-04T01:01:56Z

@wenleix Thanks for explaining. Sounds great! Can't wait to give it a try.

Previously, writing worker will receive rows in all partitions, and thus can write upper to hive.max-partitions-per-writers partitions. This session property allows shuffle on partitioned columns when writing to partitioned unbucketed Hive tables. As a result, rows in the same partition will be sent to the same writing worker. This increase the number of maximum partitions written in single query by a factor of number of total writing workers.

wenleix changed the title ~~Many parts2~~ Support shuffle on Hive partition columns before write Jan 24, 2020

wenleix requested review from arhimondr, highker and mbasmanova January 24, 2020 18:20

wenleix mentioned this pull request Jan 24, 2020

Support shuffle on Hive partition columns before write #13969

Closed

wenleix assigned mbasmanova Jan 24, 2020

arhimondr approved these changes Jan 30, 2020

View reviewed changes

wenleix force-pushed the many_parts2 branch from bb4879b to de854ef Compare January 30, 2020 19:10

highker reviewed Feb 1, 2020

View reviewed changes

wenleix force-pushed the many_parts2 branch from de854ef to 66e8331 Compare February 2, 2020 23:54

mbasmanova reviewed Feb 3, 2020

View reviewed changes

presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java Outdated

Copy link

Contributor

mbasmanova Feb 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenleix Same question about this TODO

mbasmanova reviewed Feb 3, 2020

View reviewed changes

wenleix force-pushed the many_parts2 branch 2 times, most recently from d43d4ce to df27916 Compare February 4, 2020 07:10

wenleix added 2 commits February 4, 2020 15:27

Rename TableWriterNode#partitioningScheme to tablePartitioningScheme

dc5415f

wenleix force-pushed the many_parts2 branch from df27916 to dd6d2ef Compare February 4, 2020 23:57

wenleix merged commit 5833338 into prestodb:master Feb 5, 2020

wenleix deleted the many_parts2 branch February 5, 2020 19:23

caithagoras mentioned this pull request Feb 20, 2020

Add release notes for 0.232 #14130

Merged

8 tasks

Conversation

wenleix commented Jan 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wenleix commented Jan 24, 2020

Uh oh!

wenleix commented Jan 24, 2020

Uh oh!

mbasmanova commented Jan 24, 2020

Uh oh!

arhimondr Jan 28, 2020

Choose a reason for hiding this comment

Uh oh!

wenleix Jan 30, 2020

Choose a reason for hiding this comment

Uh oh!

wenleix commented Jan 30, 2020

Uh oh!

highker left a comment

Choose a reason for hiding this comment

Uh oh!

highker Feb 1, 2020

Choose a reason for hiding this comment

Uh oh!

highker Feb 1, 2020

Choose a reason for hiding this comment

Uh oh!

wenleix commented Feb 3, 2020

Uh oh!

mbasmanova Feb 3, 2020

Choose a reason for hiding this comment

Uh oh!

wenleix Feb 4, 2020

Choose a reason for hiding this comment

Uh oh!

mbasmanova Feb 3, 2020

Choose a reason for hiding this comment

Uh oh!

mbasmanova left a comment

Choose a reason for hiding this comment

Uh oh!

wenleix commented Feb 4, 2020

Uh oh!

mbasmanova commented Feb 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wenleix commented Jan 24, 2020 •

edited

Loading