feat: Add configurable query admission pacing to prevent worker overload by prashantgolash · Pull Request #26988 · prestodb/presto

prashantgolash · 2026-01-20T07:14:12Z

Summary:

Introduces global query admission pacing to control the rate at which queries
are admitted for execution. When a cluster experiences sudden bursts of heavy
queries, workers can become overloaded, leading to degraded performance and
potential failures. This feature allows operators to configure a maximum
admission rate to smooth out query submission spikes.

Two new configuration options are added:

query-manager.max-query-admissions-per-second: Maximum queries admitted
per second globally (default: unlimited). Set to a lower value to pace
query admissions.
query-manager.min-running-queries-for-pacing: Minimum running queries
before pacing is applied (default: 1). Allows bypassing pacing when the
cluster is idle.

The pacing uses a lock-free atomic approach to avoid deadlocks with resource
group locks while ensuring correctness across multiple root resource groups.

Example configuration for production:

query-manager.max-query-admissions-per-second=10
query-manager.min-running-queries-for-pacing=30

This admits at most 10 queries per second, but only when 30+ queries are
already running.

Differential Revision: D88058630

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

sourcery-ai · 2026-01-20T07:14:19Z

Reviewer's Guide

Adds a global, lock-free query admission pacing mechanism wired through InternalResourceGroupManager and InternalResourceGroup, configurable via new QueryManagerConfig settings, with JMX metrics and targeted unit tests to prevent worker overload during query bursts.

Sequence diagram for query admission with global pacing

sequenceDiagram
    actor Client
    participant QueryManager
    participant InternalResourceGroupManager as IRGManager
    participant RootInternalResourceGroup as RootRG
    participant InternalResourceGroup as RG

    Client->>QueryManager: submitQuery()
    QueryManager->>RootRG: run(query)
    RootRG->>RG: run(query)
    activate RG
    RG->>RG: evaluate canRun and queuedQueries.isEmpty()
    alt immediate_start_candidate
        RG->>IRGManager: tryAcquireAdmissionSlot()
        activate IRGManager
        IRGManager->>IRGManager: getTotalRunningQueries()
        alt pacing_disabled_or_below_threshold
            IRGManager-->>RG: true
        else pacing_enforced
            IRGManager->>IRGManager: check interval via CAS
            alt within_rate_limit
                IRGManager-->>RG: true
            else rate_limited
                IRGManager-->>RG: false
            end
        end
        deactivate IRGManager
        alt admission_granted
            RG->>RG: startInBackground(query)
            RG->>IRGManager: incrementRunningQueries()
            RG->>QueryManager: startWaitingForResources()
        else admission_denied
            RG->>RG: enqueueQuery(query)
        end
    else must_queue
        RG->>RG: enqueueQuery(query)
    end
    deactivate RG

    loop scheduling_cycle
        RootRG->>RootRG: processQueuedQueries()
        RootRG->>RG: internalStartNext()
        RG->>IRGManager: tryAcquireAdmissionSlot()
        alt granted
            RG->>RG: dequeue and startInBackground(query)
            RG->>IRGManager: incrementRunningQueries()
        else denied
            RG-->>RootRG: false (retry later)
        end
    end

    note over RG,IRGManager: When query finishes, RG calls IRGManager.decrementRunningQueries()

Class diagram for query admission pacing integration

classDiagram
    class InternalResourceGroupManager {
        <<final>>
        -int maxQueryAdmissionsPerSecond
        -int minRunningQueriesForPacing
        -long queryAdmissionIntervalNanos
        -AtomicLong lastAdmittedQueryNanos
        -AtomicLong totalAdmissionAttempts
        -AtomicLong totalAdmissionsGranted
        -AtomicLong totalAdmissionsDenied
        -AtomicInteger totalRunningQueriesCounter
        +InternalResourceGroupManager(QueryManagerConfig queryManagerConfig, ClusterResourceChecker clusterResourceChecker, ...)
        +boolean tryAcquireAdmissionSlot()
        -int getTotalRunningQueries()
        +void incrementRunningQueries()
        +void decrementRunningQueries()
        +int getMaxQueryAdmissionsPerSecond()
        +long getTotalAdmissionAttempts()
        +long getTotalAdmissionsGranted()
        +long getTotalAdmissionsDenied()
        +int getMinRunningQueriesForPacing()
        +double getAdmissionGrantRate()
        +double getAdmissionDenyRate()
        +long getMillisSinceLastAdmission()
    }

    class InternalResourceGroup {
        -Optional~InternalResourceGroup~ parent
        -BooleanSupplier admissionSlotSupplier
        -Runnable onQueryStarted
        -Runnable onQueryFinished
        -Queue~ManagedQueryExecution~ queuedQueries
        +InternalResourceGroup(Optional~InternalResourceGroup~ parent, String name, BiConsumer~InternalResourceGroup,Boolean~ jmxExportListener, Executor executor, Function~ResourceGroupId,Optional~ResourceGroupRuntimeInfo~ additionalRuntimeInfo, Predicate~InternalResourceGroup~ shouldWaitForResourceManagerUpdate, InternalNodeManager nodeManager, ClusterResourceChecker clusterResourceChecker, BooleanSupplier admissionSlotSupplier, Runnable onQueryStarted, Runnable onQueryFinished)
        +InternalResourceGroup getOrCreateSubGroup(String name, boolean staticSegments)
        +void run(ManagedQueryExecution query)
        -void startInBackground(ManagedQueryExecution query)
        -void queryFinished(ManagedQueryExecution query)
        -boolean internalStartNext()
    }

    class RootInternalResourceGroup {
        +RootInternalResourceGroup(String name, BiConsumer~InternalResourceGroup,Boolean~ jmxExportListener, Executor executor, Function~ResourceGroupId,Optional~ResourceGroupRuntimeInfo~ additionalRuntimeInfo, Predicate~InternalResourceGroup~ shouldWaitForResourceManagerUpdate, InternalNodeManager nodeManager, ClusterResourceChecker clusterResourceChecker, BooleanSupplier admissionSlotSupplier, Runnable onQueryStarted, Runnable onQueryFinished)
        +RootInternalResourceGroup(String name, BiConsumer~InternalResourceGroup,Boolean~ jmxExportListener, Executor executor, Function~ResourceGroupId,Optional~ResourceGroupRuntimeInfo~ additionalRuntimeInfo, Predicate~InternalResourceGroup~ shouldWaitForResourceManagerUpdate, InternalNodeManager nodeManager, ClusterResourceChecker clusterResourceChecker)
        +void processQueuedQueries()
    }

    class QueryManagerConfig {
        -int maxQueryAdmissionsPerSecond
        -int minRunningQueriesForPacing
        +int getMaxQueryAdmissionsPerSecond()
        +QueryManagerConfig setMaxQueryAdmissionsPerSecond(int maxQueryAdmissionsPerSecond)
        +int getMinRunningQueriesForPacing()
        +QueryManagerConfig setMinRunningQueriesForPacing(int minRunningQueriesForPacing)
    }

    InternalResourceGroupManager --> RootInternalResourceGroup : creates
    InternalResourceGroupManager ..> QueryManagerConfig : reads
    InternalResourceGroupManager ..> ClusterResourceChecker
    RootInternalResourceGroup --|> InternalResourceGroup
    RootInternalResourceGroup o--> InternalResourceGroup : subGroups
    InternalResourceGroup ..> ManagedQueryExecution
    InternalResourceGroup ..> InternalNodeManager
    InternalResourceGroup ..> ClusterResourceChecker
    InternalResourceGroup ..> BooleanSupplier
    InternalResourceGroup ..> Runnable

File-Level Changes

Change	Details	Files
Introduce global query admission rate limiting with lock-free tracking and JMX metrics in InternalResourceGroupManager, and integrate it into resource group creation.	Add maxQueryAdmissionsPerSecond, minRunningQueriesForPacing, queryAdmissionIntervalNanos, and atomic counters (lastAdmittedQueryNanos, admission attempts/grants/denies, running queries) to InternalResourceGroupManager, initialized from QueryManagerConfig. Implement tryAcquireAdmissionSlot() as a CAS-based global rate limiter that bypasses pacing when disabled or when total running queries are below the configured threshold, and uses bounded retries under contention. Track global running query count via incrementRunningQueries()/decrementRunningQueries() callbacks (invoked from InternalResourceGroup) instead of scanning resource groups, to avoid lock interactions. Expose pacing configuration and metrics via @Managed JMX methods, including grant/deny rates and millis since last admission. Wire pacing callbacks (tryAcquireAdmissionSlot, incrementRunningQueries, decrementRunningQueries) into RootInternalResourceGroup construction paths for both resource-manager-enabled and legacy modes.	`presto-main-base/src/main/java/com/facebook/presto/execution/resourceGroups/InternalResourceGroupManager.java`
Propagate admission pacing and running-query tracking into InternalResourceGroup scheduling and execution lifecycle, ensuring queued and immediate-start queries respect the global rate limiter.	Extend InternalResourceGroup and RootInternalResourceGroup constructors to accept an admissionSlotSupplier and onQueryStarted/onQueryFinished callbacks, with a default no-op constructor for non-paced usage. In run(), gate immediate-start candidates on admissionSlotSupplier before starting in background; otherwise enqueue the query as before. In startInBackground(), invoke onQueryStarted when a query transitions to running, and in queryFinished(), invoke onQueryFinished when a running query completes, keeping the global counter accurate. In internalStartNext(), check admissionSlotSupplier before dequeuing a queued query; only poll and start when pacing allows, otherwise leave the queue intact and return false. Adjust subgroup scheduling logic so subgroups are re-queued either when they didn’t start a query due to pacing or when they started one and remain eligible, maintaining fair scheduling under pacing.	`presto-main-base/src/main/java/com/facebook/presto/execution/resourceGroups/InternalResourceGroup.java`
Add configuration knobs for query admission pacing and document them in QueryManagerConfig.	Introduce maxQueryAdmissionsPerSecond (default Integer.MAX_VALUE) and minRunningQueriesForPacing (default 30) fields in QueryManagerConfig with appropriate @min constraints. Add @Config-annotated setters with descriptions for query-manager.max-query-admissions-per-second and query-manager.min-running-queries-for-pacing to make pacing configurable. Use these new config fields in InternalResourceGroupManager to drive pacing behavior.	`presto-main-base/src/main/java/com/facebook/presto/execution/QueryManagerConfig.java`
Add unit tests to validate admission pacing behavior, including rate limits, bypass conditions, and metric tracking.	Add tests that verify unlimited pacing (default config) always admits queries without affecting metrics. Add tests for 1 qps and 10 qps configurations that validate admissions succeed only after the appropriate time interval elapses. Add tests to confirm pacing is bypassed when running queries are below the minRunningQueriesForPacing threshold, and that metrics are not updated in this case. Add tests that ensure pacing is enforced when running queries exceed the threshold, including immediate deny behavior on back-to-back calls and correct accounting of attempts/grants/denies.	`presto-main-base/src/test/java/com/facebook/presto/execution/resourceGroups/TestInternalResourceGroupManager.java`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

The new pacing tests rely on Thread.sleep to advance time, which can make the suite slow and flaky; consider injecting a clock/ticker into the pacing logic so tests can deterministically control time without real sleeps.
The global running query counter in InternalResourceGroupManager is updated via onQueryStarted/onQueryFinished callbacks; it might be safer to guard decrementRunningQueries() against going negative (e.g., with a checkState) to catch any mismatched updates early.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The new pacing tests rely on Thread.sleep to advance time, which can make the suite slow and flaky; consider injecting a clock/ticker into the pacing logic so tests can deterministically control time without real sleeps.
- The global running query counter in InternalResourceGroupManager is updated via onQueryStarted/onQueryFinished callbacks; it might be safer to guard decrementRunningQueries() against going negative (e.g., with a checkState) to catch any mismatched updates early.

## Individual Comments

### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/execution/resourceGroups/InternalResourceGroupManager.java:229-231` </location>
<code_context>
+    }
+
+    /** Called by InternalResourceGroup when a query finishes execution. */
+    public void decrementRunningQueries()
+    {
+        totalRunningQueriesCounter.decrementAndGet();
     }

</code_context>

<issue_to_address>
**issue (bug_risk):** Running query counter can silently go negative if callbacks become unbalanced

`decrementRunningQueries()` unconditionally decrements the global `totalRunningQueriesCounter`. If `onQueryStarted`/`onQueryFinished` ever become unbalanced (missed increment, double decrement, future changes), this can drive the counter negative and break pacing thresholds. Please add protection (e.g., `updateAndGet(value -> Math.max(0, value - 1))` or at least an assertion/log when it would go below zero) so mis-accounting is detected rather than silently corrupting the global count.
</issue_to_address>

### Comment 2
<location> `presto-main-base/src/test/java/com/facebook/presto/execution/resourceGroups/TestInternalResourceGroupManager.java:172-181` </location>
<code_context>
+    public void testAdmissionPacingAppliedAboveRunningQueryThreshold()
</code_context>

<issue_to_address>
**suggestion (testing):** Missing coverage for how pacing behaves when running queries drop back below the threshold

Current tests only cover behavior strictly below and strictly above `minRunningQueriesForPacing`. There’s no test for the transition back below the threshold (queries finishing, counter decrementing, and pacing turning off). Please add a test that:
- Starts above the threshold and confirms pacing is applied.
- Simulates query completion so the running query count drops below the threshold.
- Verifies subsequent admissions are not rate limited and metrics reflect that pacing is bypassed.
This will exercise `incrementRunningQueries`/`decrementRunningQueries` and the threshold logic over time rather than at a single point.

Suggested implementation:

```java
        InternalResourceGroupManager<ImmutableMap<Object, Object>> manager = new InternalResourceGroupManager<>(
                (poolId, listener) -> {},
                config,

```

To fully implement the requested behavior, please add a new test method that explicitly exercises the transition from *above* the pacing threshold back to *below* it, verifying that pacing is turned off and metrics reflect bypassed pacing.

Place the following test near the other admission pacing tests (e.g., immediately after `testAdmissionPacingAppliedAboveRunningQueryThreshold`):

```java
    @Test
    public void testAdmissionPacingTurnsOffWhenRunningQueriesDropBelowThreshold()
            throws Exception
    {
        // Configure pacing with a threshold of 2 running queries and a low rate
        QueryManagerConfig config = new QueryManagerConfig()
                .setMaxQueryAdmissionsPerSecond(1)  // 1 per second, so pacing should be visible
                .setMinRunningQueriesForPacing(2);  // Threshold of 2 running queries

        InternalResourceGroupManager<ImmutableMap<Object, Object>> manager = new InternalResourceGroupManager<>(
                (poolId, listener) -> {},
                config,
                // keep any additional constructor args consistent with other tests
                /* ... existing arguments ... */);

        // Start at/above the threshold: simulate 2 running queries
        manager.incrementRunningQueries();
        manager.incrementRunningQueries();

        // First admission: we should be paced and metrics should record the attempt
        long start = System.nanoTime();
        assertTrue(manager.tryAcquireAdmissionSlot());
        long firstDurationMs = NANOSECONDS.toMillis(System.nanoTime() - start);

        // At 1 qps, the first call may or may not wait depending on token bucket state,
        // but the key is that metrics are tracking pacing, not bypassing it.
        assertEquals(manager.getTotalAdmissionAttempts(), 1);
        assertEquals(manager.getTotalAdmissionsGranted(), 1);
        assertEquals(manager.getTotalAdmissionsDenied(), 0);

        // Now simulate queries finishing so that we drop below the threshold
        manager.decrementRunningQueries();
        manager.decrementRunningQueries();

        // Sanity: running queries counter should now be below threshold (typically 0)
        // If there's a getter for running queries, assert it; otherwise rely on
        // the behavior below as an indirect check.

        // Subsequent admissions should no longer be paced: they should be granted immediately
        start = System.nanoTime();
        assertTrue(manager.tryAcquireAdmissionSlot());
        long secondDurationMs = NANOSECONDS.toMillis(System.nanoTime() - start);

        // Verify that the second admission was effectively unpaced (close to zero delay).
        // Use a small upper bound to avoid flakiness; adjust threshold to match other tests.
        assertTrue(secondDurationMs < 10, "Admission should not be significantly delayed when below pacing threshold");

        // Metrics should reflect that pacing logic is effectively bypassed once below the threshold
        // (no additional "denied" admissions and attempts/granted increment as admissions succeed).
        assertEquals(manager.getTotalAdmissionsDenied(), 0);
        assertEquals(manager.getTotalAdmissionsGranted(), 2);
        assertEquals(manager.getTotalAdmissionAttempts(), 2);
    }
```

Notes / things to align with the existing codebase:

1. **Constructor arguments**: Replace `/* ... existing arguments ... */` with the same extra arguments used in other tests in this class when constructing `InternalResourceGroupManager`.
2. **Time measurement utilities**: If the test class already imports `java.util.concurrent.TimeUnit.NANOSECONDS` or uses a different helper for timing, keep it consistent (e.g., static import `NANOSECONDS` or use `System.currentTimeMillis()` as other tests do).
3. **Running query getter (optional)**: If `InternalResourceGroupManager` exposes a `getRunningQueries()` or similar method used elsewhere in tests, add an assertion after the decrements:
   ```java
   assertEquals(manager.getRunningQueries(), 0);
   ```
4. **Pacing delay assertion**: If other tests use a shared helper or a different bound (e.g., `< 50` ms) for "non-pacing" admissions, adjust `secondDurationMs < 10` to match that convention to avoid flakiness.
5. **Static imports**: Ensure `NANOSECONDS` is statically imported if not already:
   ```java
   import static java.util.concurrent.TimeUnit.NANOSECONDS;
   ```
   or adjust calls accordingly.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-01-20T07:19:33Z

...test/java/com/facebook/presto/execution/resourceGroups/TestInternalResourceGroupManager.java

+    public void testAdmissionPacingAppliedAboveRunningQueryThreshold()
+            throws Exception
+    {
+        // Configure pacing with a threshold of 2 running queries
+        QueryManagerConfig config = new QueryManagerConfig()
+                .setMaxQueryAdmissionsPerSecond(1)  // 1 per second
+                .setMinRunningQueriesForPacing(2);  // Threshold of 2 running queries
+
+        InternalResourceGroupManager<ImmutableMap<Object, Object>> manager = new InternalResourceGroupManager<>(
+                (poolId, listener) -> {},


suggestion (testing): Missing coverage for how pacing behaves when running queries drop back below the threshold

Current tests only cover behavior strictly below and strictly above minRunningQueriesForPacing. There’s no test for the transition back below the threshold (queries finishing, counter decrementing, and pacing turning off). Please add a test that:

Starts above the threshold and confirms pacing is applied.

Simulates query completion so the running query count drops below the threshold.

Verifies subsequent admissions are not rate limited and metrics reflect that pacing is bypassed.
This will exercise incrementRunningQueries/decrementRunningQueries and the threshold logic over time rather than at a single point.

Suggested implementation:

InternalResourceGroupManager<ImmutableMap<Object, Object>> manager = new InternalResourceGroupManager<>( (poolId, listener) -> {}, config,

To fully implement the requested behavior, please add a new test method that explicitly exercises the transition from above the pacing threshold back to below it, verifying that pacing is turned off and metrics reflect bypassed pacing.

Place the following test near the other admission pacing tests (e.g., immediately after testAdmissionPacingAppliedAboveRunningQueryThreshold):

@Test public void testAdmissionPacingTurnsOffWhenRunningQueriesDropBelowThreshold() throws Exception { // Configure pacing with a threshold of 2 running queries and a low rate QueryManagerConfig config = new QueryManagerConfig() .setMaxQueryAdmissionsPerSecond(1) // 1 per second, so pacing should be visible .setMinRunningQueriesForPacing(2); // Threshold of 2 running queries InternalResourceGroupManager<ImmutableMap<Object, Object>> manager = new InternalResourceGroupManager<>( (poolId, listener) -> {}, config, // keep any additional constructor args consistent with other tests /* ... existing arguments ... */); // Start at/above the threshold: simulate 2 running queries manager.incrementRunningQueries(); manager.incrementRunningQueries(); // First admission: we should be paced and metrics should record the attempt long start = System.nanoTime(); assertTrue(manager.tryAcquireAdmissionSlot()); long firstDurationMs = NANOSECONDS.toMillis(System.nanoTime() - start); // At 1 qps, the first call may or may not wait depending on token bucket state, // but the key is that metrics are tracking pacing, not bypassing it. assertEquals(manager.getTotalAdmissionAttempts(), 1); assertEquals(manager.getTotalAdmissionsGranted(), 1); assertEquals(manager.getTotalAdmissionsDenied(), 0); // Now simulate queries finishing so that we drop below the threshold manager.decrementRunningQueries(); manager.decrementRunningQueries(); // Sanity: running queries counter should now be below threshold (typically 0) // If there's a getter for running queries, assert it; otherwise rely on // the behavior below as an indirect check. // Subsequent admissions should no longer be paced: they should be granted immediately start = System.nanoTime(); assertTrue(manager.tryAcquireAdmissionSlot()); long secondDurationMs = NANOSECONDS.toMillis(System.nanoTime() - start); // Verify that the second admission was effectively unpaced (close to zero delay). // Use a small upper bound to avoid flakiness; adjust threshold to match other tests. assertTrue(secondDurationMs < 10, "Admission should not be significantly delayed when below pacing threshold"); // Metrics should reflect that pacing logic is effectively bypassed once below the threshold // (no additional "denied" admissions and attempts/granted increment as admissions succeed). assertEquals(manager.getTotalAdmissionsDenied(), 0); assertEquals(manager.getTotalAdmissionsGranted(), 2); assertEquals(manager.getTotalAdmissionAttempts(), 2); }

Notes / things to align with the existing codebase:

Constructor arguments: Replace /* ... existing arguments ... */ with the same extra arguments used in other tests in this class when constructing InternalResourceGroupManager.

Time measurement utilities: If the test class already imports java.util.concurrent.TimeUnit.NANOSECONDS or uses a different helper for timing, keep it consistent (e.g., static import NANOSECONDS or use System.currentTimeMillis() as other tests do).

Running query getter (optional): If InternalResourceGroupManager exposes a getRunningQueries() or similar method used elsewhere in tests, add an assertion after the decrements:
assertEquals(manager.getRunningQueries(), 0);

Pacing delay assertion: If other tests use a shared helper or a different bound (e.g., < 50 ms) for "non-pacing" admissions, adjust secondDurationMs < 10 to match that convention to avoid flakiness.

Static imports: Ensure NANOSECONDS is statically imported if not already:
import static java.util.concurrent.TimeUnit.NANOSECONDS;
or adjust calls accordingly.

…oad (prestodb#26988) Summary: Introduces global query admission pacing to control the rate at which queries are admitted for execution. When a cluster experiences sudden bursts of heavy queries, workers can become overloaded, leading to degraded performance and potential failures. This feature allows operators to configure a maximum admission rate to smooth out query submission spikes. Two new configuration options are added: - `query-manager.max-query-admissions-per-second`: Maximum queries admitted per second globally (default: unlimited). Set to a lower value to pace query admissions. - `query-manager.min-running-queries-for-pacing`: Minimum running queries before pacing is applied (default: 1). Allows bypassing pacing when the cluster is idle. The pacing uses a lock-free atomic approach to avoid deadlocks with resource group locks while ensuring correctness across multiple root resource groups. Example configuration for production: ``` query-manager.max-query-admissions-per-second=10 query-manager.min-running-queries-for-pacing=30 ``` This admits at most 10 queries per second, but only when 30+ queries are already running. Differential Revision: D88058630

spershin

@prashantgolash thank you for working on this, it is an important change to avoid regular spiky cluster overload.

presto-main-base/src/main/java/com/facebook/presto/execution/QueryManagerConfig.java

...n-base/src/main/java/com/facebook/presto/execution/resourceGroups/InternalResourceGroup.java

...src/main/java/com/facebook/presto/execution/resourceGroups/InternalResourceGroupManager.java

steveburnett · 2026-01-21T21:48:44Z

Please add documentation for the new configs query-manager.max-query-admissions-per-second and query-manager.min-running-queries-for-pacing as described in Designing Your Code in CONTRIBUTING.md:

"All new language features, new functions, session and config properties, and major features have documentation added"

…oad (prestodb#26988) Summary: Introduces global query admission pacing to control the rate at which queries are admitted for execution. When a cluster experiences sudden bursts of heavy queries, workers can become overloaded, leading to degraded performance and potential failures. This feature allows operators to configure a maximum admission rate to smooth out query submission spikes. Two new configuration options are added: - `query-manager.max-query-admissions-per-second`: Maximum queries admitted per second globally (default: unlimited). Set to a lower value to pace query admissions. - `query-manager.min-running-queries-for-pacing`: Minimum running queries before pacing is applied (default: 1). Allows bypassing pacing when the cluster is idle. The pacing uses a lock-free atomic approach to avoid deadlocks with resource group locks while ensuring correctness across multiple root resource groups. Example configuration for production: ``` query-manager.max-query-admissions-per-second=10 query-manager.min-running-queries-for-pacing=30 ``` This admits at most 10 queries per second, but only when 30+ queries are already running. Differential Revision: D88058630

prashantgolash · 2026-01-25T09:04:20Z

Please add documentation for the new configs query-manager.max-query-admissions-per-second and query-manager.min-running-queries-for-pacing as described in Designing Your Code in CONTRIBUTING.md:

"All new language features, new functions, session and config properties, and major features have documentation added"

Thanks added the documentation.

steveburnett

Thank you for the documentation! Just one nit.

steveburnett · 2026-01-26T15:24:15Z

presto-docs/src/main/sphinx/admin/properties.rst

+when the number of running queries exceeds the threshold configured by
+``query-manager.query-pacing.min-running-queries``.
+
+Set to a lower value (e.g., ``10``) to limit query admission rate during


Suggested change

Set to a lower value (e.g., ``10``) to limit query admission rate during

Set to a lower value such as``10`` to limit query admission rate during

Avoid Latin abbreviations. See the GitLab documentation style guide recommended word list entry for e.g. for discussion and alternatives.

spershin

LGTM.
We need to seek approval from a Presto committer now.

The change looks safe to rollout due to config property guard.
When in build we can start canarying in verification & RC clusters (batch).
When in production build we can start canarying in 1 batch cluster and then increase the coverage.

…oad (prestodb#26988) Summary: Introduces global query admission pacing to control the rate at which queries are admitted for execution. When a cluster experiences sudden bursts of heavy queries, workers can become overloaded, leading to degraded performance and potential failures. This feature allows operators to configure a maximum admission rate to smooth out query submission spikes. Two new configuration options are added: - `query-manager.max-query-admissions-per-second`: Maximum queries admitted per second globally (default: unlimited). Set to a lower value to pace query admissions. - `query-manager.min-running-queries-for-pacing`: Minimum running queries before pacing is applied (default: 1). Allows bypassing pacing when the cluster is idle. The pacing uses a lock-free atomic approach to avoid deadlocks with resource group locks while ensuring correctness across multiple root resource groups. Example configuration for production: ``` query-manager.max-query-admissions-per-second=10 query-manager.min-running-queries-for-pacing=30 ``` This admits at most 10 queries per second, but only when 30+ queries are already running. Reviewed By: spershin Differential Revision: D88058630

steveburnett

LGTM! (docs)

Pull updated branch, new local doc build, looks good. Thanks!

prashantgolash requested a review from a team as a code owner January 20, 2026 07:14

prestodb-ci added the from:Meta PR from Meta label Jan 20, 2026

meta-codesync bot added fb-exported meta-exported labels Jan 20, 2026

prashantgolash changed the title ~~Add configurable query admission pacing to prevent worker overload~~ feat: Add configurable query admission pacing to prevent worker overload Jan 20, 2026

sourcery-ai bot reviewed Jan 20, 2026

View reviewed changes

prashantgolash requested a review from spershin January 20, 2026 07:37

prashantgolash force-pushed the export-D88058630 branch from c8279ec to 4834e0a Compare January 20, 2026 21:17

spershin reviewed Jan 21, 2026

View reviewed changes

prashantgolash force-pushed the export-D88058630 branch from 4834e0a to 654fb61 Compare January 25, 2026 08:18

prashantgolash requested review from elharo and steveburnett as code owners January 25, 2026 08:18

steveburnett requested changes Jan 26, 2026

View reviewed changes

spershin previously approved these changes Jan 26, 2026

View reviewed changes

prashantgolash requested a review from feilong-liu January 26, 2026 18:35

prashantgolash dismissed spershin’s stale review via 3927f7d January 26, 2026 18:42

prashantgolash force-pushed the export-D88058630 branch 2 times, most recently from 3927f7d to 3e95d6f Compare January 26, 2026 18:42

steveburnett approved these changes Jan 26, 2026

View reviewed changes

feilong-liu approved these changes Jan 26, 2026

View reviewed changes

feilong-liu merged commit fede681 into prestodb:master Jan 26, 2026
81 checks passed

This was referenced Mar 31, 2026

docs: Add release notes for 0.297 unix280/presto#51

Closed

docs: Add release notes for 0.297 unix280/presto#52

Open

prestodb-ci mentioned this pull request Apr 1, 2026

docs: Add release notes for 0.297 #27484

Open

15 tasks

	Set to a lower value (e.g., ``10``) to limit query admission rate during
	Set to a lower value such as``10`` to limit query admission rate during

Conversation

prashantgolash commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary:

Release Notes

Uh oh!

sourcery-ai bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for query admission with global pacing

Class diagram for query admission pacing integration

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

spershin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

steveburnett commented Jan 21, 2026

Uh oh!

prashantgolash commented Jan 25, 2026

Uh oh!

steveburnett left a comment

Choose a reason for hiding this comment

Uh oh!

steveburnett Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

prashantgolash Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

spershin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steveburnett left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

prashantgolash commented Jan 20, 2026 •

edited

Loading

sourcery-ai bot commented Jan 20, 2026 •

edited

Loading

spershin left a comment •

edited

Loading