Skip to content

Conversation

@raunaqmorarka
Copy link
Member

@raunaqmorarka raunaqmorarka commented Sep 1, 2025

Description

When the max possible tasks count is close to the min partition count, it is not worth paying the cost of fetching statistics to determine optimal partitions count

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## General
* Improve performance of simple queries in clusters with small number of nodes. ({issue}`26525`)

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes performance for simple queries in small clusters by avoiding unnecessary statistics fetching when determining partition counts. The optimization triggers when the maximum possible task count is close to the minimum partition count, eliminating the overhead of statistics collection in scenarios where it provides minimal benefit.

  • Early exit from partition count determination when maxPossiblePartitionCount <= 2 * minPartitionCount
  • Reuse of the calculated maxPossiblePartitionCount value to avoid redundant computations
  • Updated test cases to verify the new behavior and adjust edge case scenarios

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
DeterminePartitionCount.java Adds early exit logic and reuses task count estimation to avoid statistics fetching in small clusters
TestDeterminePartitionCount.java Adds new test case and updates existing tests to verify the optimization behavior

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@raunaqmorarka raunaqmorarka force-pushed the determine-small branch 2 times, most recently from 0a06910 to 9e02434 Compare September 1, 2025 18:55
@github-actions github-actions bot added delta-lake Delta Lake connector hive Hive connector labels Sep 1, 2025
@raunaqmorarka raunaqmorarka force-pushed the determine-small branch 3 times, most recently from 9823ace to b98ba17 Compare September 1, 2025 21:01
…uster

When the max possible tasks count is close to the min partition count, it is
not worth paying the cost of fetching statistics to determine optimal partitions count
@raunaqmorarka
Copy link
Member Author

/test-with-secrets sha=7ec454205c90180e60e225bcfc929e0f847da5c5

@github-actions
Copy link

github-actions bot commented Sep 2, 2025

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/17393527142

@raunaqmorarka raunaqmorarka merged commit 0fa7351 into trinodb:master Sep 2, 2025
95 checks passed
@github-actions github-actions bot added this to the 477 milestone Sep 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed delta-lake Delta Lake connector hive Hive connector

Development

Successfully merging this pull request may close these issues.

2 participants