Skip to content

Support shuffle on Hive partition columns before write#14010

Merged
wenleix merged 2 commits intoprestodb:masterfrom
wenleix:many_parts2
Feb 5, 2020
Merged

Support shuffle on Hive partition columns before write#14010
wenleix merged 2 commits intoprestodb:masterfrom
wenleix:many_parts2

Conversation

@wenleix
Copy link
Contributor

@wenleix wenleix commented Jan 24, 2020

Previously, writing worker will receive rows in all partitions,
and thus can write upper to hive.max-partitions-per-writers partitions.

This session property allows shuffle on partitioned columns when writing
to partitioned unbucketed Hive tables. As a result, rows in the same
partition will be sent to the same writing worker. This increase the
number of maximum partitions written in single query by a factor of
number of total writing workers.

== RELEASE NOTES ==

Hive Changes
* Allow shuffle on partitioned columns when writing to partitioned unbucketed Hive tables. This increase the number of maximum partitions written in single query by a factor of number of total writing workers.This behavior has to be explicitly enabled by Connector session property `shuffle_partitioned_columns_for_table_write`. #14010

@wenleix wenleix changed the title Many parts2 Support shuffle on Hive partition columns before write Jan 24, 2020
@wenleix
Copy link
Contributor Author

wenleix commented Jan 24, 2020

Supersedes #13969

@wenleix
Copy link
Contributor Author

wenleix commented Jan 24, 2020

cc @mbasmanova , @kaikalur , @aweisberg

@mbasmanova
Copy link
Contributor

CC: @biswapesh

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!isShufflePartitionedColumnsForTableWriteEnabled(session) || table.getPartitionColumns().isEmpty()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arhimondr : Nice catch!

@wenleix
Copy link
Contributor Author

wenleix commented Jan 30, 2020

@highker : Wondering if you can take a look at the SPI change?

Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add javadoc to the interface?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@wenleix
Copy link
Contributor Author

wenleix commented Feb 3, 2020

Thanks @highker for the review. Comments addressed 😃

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenleix I don't understand this TODO. It seems to me that bucketed tables are handled above and would never reach that code. Would you clarify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova : Sorry for the confusion, I mean we don't have to use HivePartitionHandle for the shuffle partitioning. Thinking about implementing a different HiveShufflePartitioningHandle that distribute the keys more uniformly (Hive bucket function is not that great :) )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenleix Same question about this TODO

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenleix I'd like to getter a better understanding of the effects of this change. With shuffle_partitioned_columns_for_table_write_enabled=true, the data will be shuffles on partition columns and TableWriter operator will run on as many nodes are there are available in the cluster, each node processing distinct set of partitions. E.g. all data for a single partition will be written by the same node. That node may also write few other partitions. Within the node, there will be at least 4 threads, but could be more if the node writes data for multiple partitions, up to 100 (TBD: config name). Hence, this properly allows to write up to #-nodes x 100 partitions in a single query and avoid making very small files. Is this accurate?

@wenleix
Copy link
Contributor Author

wenleix commented Feb 4, 2020

@mbasmanova : Thanks @mbasmanova for the review!

With shuffle_partitioned_columns_for_table_write_enabled=true, the data will be shuffles on partition columns and TableWriter operator will run on as many nodes are there are available in the cluster, each node processing distinct set of partitions. E.g. all data for a single partition will be written by the same node

Correct.

Within the node, there will be at least 4 threads

It's configurable via task.writer-count.

Hence, this properly allows to write up to #-nodes x 100 partitions in a single query and avoid making very small files.

That's accurate. Although I am only thinking write to at most a few thousands partitions in practice 😃

@mbasmanova
Copy link
Contributor

@wenleix Thanks for explaining. Sounds great! Can't wait to give it a try.

@wenleix wenleix force-pushed the many_parts2 branch 2 times, most recently from d43d4ce to df27916 Compare February 4, 2020 07:10
Previously, writing worker will receive rows in all partitions,
and thus can write upper to hive.max-partitions-per-writers partitions.

This session property allows shuffle on partitioned columns when writing
to partitioned unbucketed Hive tables. As a result, rows in the same
partition will be sent to the same writing worker. This increase the
number of maximum partitions written in single query by a factor of
number of total writing workers.
@wenleix wenleix merged commit 5833338 into prestodb:master Feb 5, 2020
@wenleix wenleix deleted the many_parts2 branch February 5, 2020 19:23
@caithagoras caithagoras mentioned this pull request Feb 20, 2020
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants