-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Avoid fetching of statistics from DeterminePartitionCount in small cluster #26525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2cae61d to
ed0fbf5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes performance for simple queries in small clusters by avoiding unnecessary statistics fetching when determining partition counts. The optimization triggers when the maximum possible task count is close to the minimum partition count, eliminating the overhead of statistics collection in scenarios where it provides minimal benefit.
- Early exit from partition count determination when
maxPossiblePartitionCount <= 2 * minPartitionCount - Reuse of the calculated
maxPossiblePartitionCountvalue to avoid redundant computations - Updated test cases to verify the new behavior and adjust edge case scenarios
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| DeterminePartitionCount.java | Adds early exit logic and reuses task count estimation to avoid statistics fetching in small clusters |
| TestDeterminePartitionCount.java | Adds new test case and updates existing tests to verify the optimization behavior |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
0a06910 to
9e02434
Compare
...trino-main/src/test/java/io/trino/sql/planner/optimizations/TestDeterminePartitionCount.java
Show resolved
Hide resolved
9823ace to
b98ba17
Compare
…uster When the max possible tasks count is close to the min partition count, it is not worth paying the cost of fetching statistics to determine optimal partitions count
b98ba17 to
7ec4542
Compare
|
/test-with-secrets sha=7ec454205c90180e60e225bcfc929e0f847da5c5 |
|
The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/17393527142 |
Description
When the max possible tasks count is close to the min partition count, it is not worth paying the cost of fetching statistics to determine optimal partitions count
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: