Skip to content

Conversation

@gaurav8297
Copy link
Member

@gaurav8297 gaurav8297 commented Mar 30, 2023

Description

Benchmarks (6 worker nodes)

1.) Single partition (1.2B rows)

  • Without preferred partitioning: 1:41 mins (cpu: 1.97h)
  • With preferred partitioning (before): 7:54 mins (cpu: 1.97h)
  • With preferred partitioning (after): 1:45 mins (cpu: 2.05h)

2.) 3 partitions (514M rows)

  • Without preferred partitioning: 55.78 mins (cpu: 50.29m, peak mem: 28.0GB)
  • With preferred partitioning (before): 1:59 mins (cpu: 52.43m, peak mem: 16.7GB)
  • With preferred partitioning (after): 51.08 secs (cpu: 52.45m, peak mem: 23.4GB)

3.) 2000+ partitions with almost no skewness (600M rows)

  • Without preferred partitioning: 2:34 mins (cpu: 52.12m, peak mem: 100GB)
  • With preferred partitioning (before): 3:05 mins (cpu: 55.32m, peak mem: 63.5GB)
  • With preferred partitioning (after): 2:55 mins (cpu: 55.20m, peak mem: 70.7GB)

Additional context and related issues

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Mar 30, 2023
@gaurav8297 gaurav8297 requested a review from sopel39 March 31, 2023 00:21
@gaurav8297
Copy link
Member Author

@sopel39 Please take a look at some early benchmarks. I'm looking into how we can scale up faster.

@gaurav8297 gaurav8297 marked this pull request as ready for review April 3, 2023 05:51
@github-actions github-actions bot added delta-lake Delta Lake connector hive Hive connector iceberg Iceberg connector tests:hive labels Apr 11, 2023
@gaurav8297 gaurav8297 requested a review from sopel39 April 18, 2023 11:30
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

@gaurav8297 gaurav8297 requested a review from sopel39 April 19, 2023 22:49
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it might be easier to just have abstract bucketIds and then map bucket to task when taskId is really needed

@gaurav8297 gaurav8297 requested a review from sopel39 April 25, 2023 07:58
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: that is one, but also if node count is 11 and bucket count is 8, then it will effectively mean that each node will write every bucket (so it's effectively round robin from the start).

@sopel39
Copy link
Member

sopel39 commented Apr 27, 2023

awesome job!

This commit introduces a SkewedPartitionRebalancer
which helps in distributing big or skewed partitions
across available tasks to improve the performance of
partitioned writes.

This rebalancer initialize a bunch of buckets for
each task based on a given taskBucketCount and then
tries to uniformly distribute partitions across those
buckets. This helps to mitigate two problems:

1. Mitigate skewness across tasks.
2. Scale few big partitions across tasks even if
   there's no skewness among them. This will essentially
   speed the local scaling without impacting much
   overall resource utilization.

Example:

Before: 3 tasks, 3 buckets per task, and 2 skewed partitions
Task1                Task2               Task3
Bucket1 (Part 1)     Bucket1 (Part 2)    Bucket1
Bucket2              Bucket2             Bucket2
Bucket3              Bucket3             Bucket3

After rebalancing:
Task1                Task2               Task3
Bucket1 (Part 1)     Bucket1 (Part 2)    Bucket1 (Part 1)
Bucket2 (Part 2)     Bucket2 (Part 1)    Bucket2 (Part 2)
Bucket3              Bucket3             Bucket3
@sopel39
Copy link
Member

sopel39 commented Apr 27, 2023

There is relevant failure: 2023-04-27T12:11:53.1783433Z java.lang.IllegalStateException: No catalog handle for partitioning handle: HASH 2023-04-27T12:11:53.1784022Z at io.trino.sql.planner.NodePartitioningManager.lambda$requiredCatalogHandle$7(NodePartitioningManager.java:312) 2023-04-27T12:11:53.1784552Z at java.base/java.util.Optional.orElseThrow(Optional.java:403) 2023-04-27T12:11:53.1785233Z at io.trino.sql.planner.NodePartitioningManager.requiredCatalogHandle(NodePartitioningManager.java:311) 2023-04-27T12:11:53.1785985Z at io.trino.sql.planner.NodePartitioningManager.getConnectorBucketNodeMap(NodePartitioningManager.java:272) 2023-04-27T12:11:53.1786829Z at io.trino.operator.output.SkewedPartitionRebalancer.checkCanScalePartitionsRemotely(SkewedPartitionRebalancer.java:110) 2023-04-27T12:11:53.1787536Z at io.trino.sql.planner.LocalExecutionPlanner.plan(LocalExecutionPlanner.java:577) 2023-04-27T12:11:53.1788183Z at io.trino.execution.SqlTaskExecutionFactory.create(SqlTaskExecutionFactory.java:84) 2023-04-27T12:11:53.1788698Z at io.trino.execution.SqlTask.tryCreateSqlTaskExecution(SqlTask.java:536) 2023-04-27T12:11:53.1789143Z at io.trino.execution.SqlTask.updateTask(SqlTask.java:491)

@gaurav8297
Copy link
Member Author

There is relevant failure: 2023-04-27T12:11:53.1783433Z java.lang.IllegalStateException: No catalog handle for partitioning handle: HASH 2023-04-27T12:11:53.1784022Z at io.trino.sql.planner.NodePartitioningManager.lambda$requiredCatalogHandle$7(NodePartitioningManager.java:312) 2023-04-27T12:11:53.1784552Z at java.base/java.util.Optional.orElseThrow(Optional.java:403) 2023-04-27T12:11:53.1785233Z at io.trino.sql.planner.NodePartitioningManager.requiredCatalogHandle(NodePartitioningManager.java:311) 2023-04-27T12:11:53.1785985Z at io.trino.sql.planner.NodePartitioningManager.getConnectorBucketNodeMap(NodePartitioningManager.java:272) 2023-04-27T12:11:53.1786829Z at io.trino.operator.output.SkewedPartitionRebalancer.checkCanScalePartitionsRemotely(SkewedPartitionRebalancer.java:110) 2023-04-27T12:11:53.1787536Z at io.trino.sql.planner.LocalExecutionPlanner.plan(LocalExecutionPlanner.java:577) 2023-04-27T12:11:53.1788183Z at io.trino.execution.SqlTaskExecutionFactory.create(SqlTaskExecutionFactory.java:84) 2023-04-27T12:11:53.1788698Z at io.trino.execution.SqlTask.tryCreateSqlTaskExecution(SqlTask.java:536) 2023-04-27T12:11:53.1789143Z at io.trino.execution.SqlTask.updateTask(SqlTask.java:491)

Fixed!

@sopel39 sopel39 merged commit e03d410 into trinodb:master Apr 28, 2023
@sopel39 sopel39 mentioned this pull request Apr 28, 2023
@github-actions github-actions bot added this to the 416 milestone Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed delta-lake Delta Lake connector hive Hive connector iceberg Iceberg connector

Development

Successfully merging this pull request may close these issues.

2 participants