Skip to content

Adaptive task sizing for fault tolerant execution#16719

Merged
losipiuk merged 4 commits intotrinodb:masterfrom
linzebing:fte-adaptive-task-sizing
Mar 30, 2023
Merged

Adaptive task sizing for fault tolerant execution#16719
losipiuk merged 4 commits intotrinodb:masterfrom
linzebing:fte-adaptive-task-sizing

Conversation

@linzebing
Copy link
Copy Markdown
Member

@linzebing linzebing commented Mar 24, 2023

Description

This PR adds adaptive task sizing for fault tolerant execution. Specifically:

  • For arbitrary distribution: we will start small and gradually increase target partition size.
  • For hash distribution: we now use a smaller target partition size for compute tasks and simple let small tasks coalesce (number of tasks are bounded by number of partitions); for write tasks, we adjust target partition size based on the total amount of input bytes to avoid creating massive amount of tasks.

This change dramatically improves small query latency on fault-tolerant execution. Preliminary testing on tpcds-sf100 shows 40%+ latency reduction.

Additional context and related issues

Fix #16103

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Section
* Implement adaptive task sizing for fault-tolerant execution, which greatly reduces latency for small queries on a FTE cluster. ({issue}`16103`)

@cla-bot cla-bot bot added the cla-signed label Mar 24, 2023
@linzebing linzebing requested review from arhimondr and losipiuk March 24, 2023 20:18
@linzebing linzebing marked this pull request as ready for review March 24, 2023 20:18
@github-actions github-actions bot added hive Hive connector tests:hive labels Mar 24, 2023
@linzebing linzebing force-pushed the fte-adaptive-task-sizing branch 4 times, most recently from dd49fb4 to f65a27b Compare March 25, 2023 13:39
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It would be nice to ensure that this translates to number of splits which are multiply of task.max-drivers-per-task here.
One option - maybe good enough would be to assume that minTargetPartitionSizeInBytes satisfies this requirement and then ensure that calculated targetPartitionSizeInBytes is a multiply of minTargetPartitionSizeInBytes.

Then we can use adaptiveGrowthFactor less than 2.0 to have more fluent growth. 2.0 is pretty agressive.

cc: @arhimondr

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option - maybe good enough would be to assume that minTargetPartitionSizeInBytes satisfies this requirement and then ensure that calculated targetPartitionSizeInBytes is a multiply of minTargetPartitionSizeInBytes.

I was also thinking about something along these lines, basically try to round to the closest minTargetPartitionSizeInBytes

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Copy Markdown
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comments.
@arhimondr PTAL

@linzebing linzebing force-pushed the fte-adaptive-task-sizing branch from f65a27b to 60816cb Compare March 29, 2023 02:52
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss what would be good defaults offline

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed defaults to:

    private int faultTolerantExecutionArbitraryDistributionComputeTaskTargetSizeGrowthPeriod = 64;
    private double faultTolerantExecutionArbitraryDistributionComputeTaskTargetSizeGrowthFactor = 1.2;
    private DataSize faultTolerantExecutionArbitraryDistributionComputeTaskTargetSizeMin = DataSize.of(512, MEGABYTE);
    private DataSize faultTolerantExecutionArbitraryDistributionComputeTaskTargetSizeMax = DataSize.of(50, GIGABYTE);

    private int faultTolerantExecutionArbitraryDistributionWriteTaskTargetSizeGrowthPeriod = 64;
    private double faultTolerantExecutionArbitraryDistributionWriteTaskTargetSizeGrowthFactor = 1.2;
    private DataSize faultTolerantExecutionArbitraryDistributionWriteTaskTargetSizeMin = DataSize.of(4, GIGABYTE);
    private DataSize faultTolerantExecutionArbitraryDistributionWriteTaskTargetSizeMax = DataSize.of(50, GIGABYTE);

    private DataSize faultTolerantExecutionHashDistributionComputeTaskTargetSize = DataSize.of(512, MEGABYTE);
    private DataSize faultTolerantExecutionHashDistributionWriteTaskTargetSize = DataSize.of(4, GIGABYTE);
    private int faultTolerantExecutionHashDistributionWriteTaskTargetMaxCount = 2000;

let me know your thoughts

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option - maybe good enough would be to assume that minTargetPartitionSizeInBytes satisfies this requirement and then ensure that calculated targetPartitionSizeInBytes is a multiply of minTargetPartitionSizeInBytes.

I was also thinking about something along these lines, basically try to round to the closest minTargetPartitionSizeInBytes

@linzebing linzebing force-pushed the fte-adaptive-task-sizing branch from 60816cb to 1f1082d Compare March 30, 2023 03:56
@losipiuk
Copy link
Copy Markdown
Member

One option - maybe good enough would be to assume that minTargetPartitionSizeInBytes satisfies this requirement and then ensure that calculated targetPartitionSizeInBytes is a multiply of minTargetPartitionSizeInBytes.

I was also thinking about something along these lines, basically try to round to the closest minTargetPartitionSizeInBytes

This would not guarantee that we are actually having number of splits per task which is multiply of task.max-drivers-per-task. Probably explicitly counting splits is better.
Let's address that in separate PR. @linzebing you said you will create tracking issue. Did you maybe?

@linzebing
Copy link
Copy Markdown
Member Author

linzebing commented Mar 30, 2023

Created #16805 and #16806

@losipiuk losipiuk merged commit f088b73 into trinodb:master Mar 30, 2023
@github-actions github-actions bot added this to the 412 milestone Mar 30, 2023
@linzebing linzebing deleted the fte-adaptive-task-sizing branch March 31, 2023 01:25
@linzebing
Copy link
Copy Markdown
Member Author

@jhlodin : can you (or someone else) help me update documentation for this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed hive Hive connector

Development

Successfully merging this pull request may close these issues.

Adjust task sizing properties in FTE

3 participants