Skip to content

feat: Add configurable query admission pacing to prevent worker overload#26988

Merged
feilong-liu merged 1 commit intoprestodb:masterfrom
prashantgolash:export-D88058630
Jan 26, 2026
Merged

feat: Add configurable query admission pacing to prevent worker overload#26988
feilong-liu merged 1 commit intoprestodb:masterfrom
prashantgolash:export-D88058630

Conversation

@prashantgolash
Copy link
Copy Markdown
Contributor

@prashantgolash prashantgolash commented Jan 20, 2026

Summary:

Introduces global query admission pacing to control the rate at which queries
are admitted for execution. When a cluster experiences sudden bursts of heavy
queries, workers can become overloaded, leading to degraded performance and
potential failures. This feature allows operators to configure a maximum
admission rate to smooth out query submission spikes.

Two new configuration options are added:

  • query-manager.max-query-admissions-per-second: Maximum queries admitted
    per second globally (default: unlimited). Set to a lower value to pace
    query admissions.
  • query-manager.min-running-queries-for-pacing: Minimum running queries
    before pacing is applied (default: 1). Allows bypassing pacing when the
    cluster is idle.

The pacing uses a lock-free atomic approach to avoid deadlocks with resource
group locks while ensuring correctness across multiple root resource groups.

Example configuration for production:

query-manager.max-query-admissions-per-second=10
query-manager.min-running-queries-for-pacing=30

This admits at most 10 queries per second, but only when 30+ queries are
already running.

Differential Revision: D88058630

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

@prashantgolash prashantgolash requested a review from a team as a code owner January 20, 2026 07:14
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Jan 20, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Jan 20, 2026

Reviewer's Guide

Adds a global, lock-free query admission pacing mechanism wired through InternalResourceGroupManager and InternalResourceGroup, configurable via new QueryManagerConfig settings, with JMX metrics and targeted unit tests to prevent worker overload during query bursts.

Sequence diagram for query admission with global pacing

sequenceDiagram
    actor Client
    participant QueryManager
    participant InternalResourceGroupManager as IRGManager
    participant RootInternalResourceGroup as RootRG
    participant InternalResourceGroup as RG

    Client->>QueryManager: submitQuery()
    QueryManager->>RootRG: run(query)
    RootRG->>RG: run(query)
    activate RG
    RG->>RG: evaluate canRun and queuedQueries.isEmpty()
    alt immediate_start_candidate
        RG->>IRGManager: tryAcquireAdmissionSlot()
        activate IRGManager
        IRGManager->>IRGManager: getTotalRunningQueries()
        alt pacing_disabled_or_below_threshold
            IRGManager-->>RG: true
        else pacing_enforced
            IRGManager->>IRGManager: check interval via CAS
            alt within_rate_limit
                IRGManager-->>RG: true
            else rate_limited
                IRGManager-->>RG: false
            end
        end
        deactivate IRGManager
        alt admission_granted
            RG->>RG: startInBackground(query)
            RG->>IRGManager: incrementRunningQueries()
            RG->>QueryManager: startWaitingForResources()
        else admission_denied
            RG->>RG: enqueueQuery(query)
        end
    else must_queue
        RG->>RG: enqueueQuery(query)
    end
    deactivate RG

    loop scheduling_cycle
        RootRG->>RootRG: processQueuedQueries()
        RootRG->>RG: internalStartNext()
        RG->>IRGManager: tryAcquireAdmissionSlot()
        alt granted
            RG->>RG: dequeue and startInBackground(query)
            RG->>IRGManager: incrementRunningQueries()
        else denied
            RG-->>RootRG: false (retry later)
        end
    end

    note over RG,IRGManager: When query finishes, RG calls IRGManager.decrementRunningQueries()
Loading

Class diagram for query admission pacing integration

classDiagram
    class InternalResourceGroupManager {
        <<final>>
        -int maxQueryAdmissionsPerSecond
        -int minRunningQueriesForPacing
        -long queryAdmissionIntervalNanos
        -AtomicLong lastAdmittedQueryNanos
        -AtomicLong totalAdmissionAttempts
        -AtomicLong totalAdmissionsGranted
        -AtomicLong totalAdmissionsDenied
        -AtomicInteger totalRunningQueriesCounter
        +InternalResourceGroupManager(QueryManagerConfig queryManagerConfig, ClusterResourceChecker clusterResourceChecker, ...)
        +boolean tryAcquireAdmissionSlot()
        -int getTotalRunningQueries()
        +void incrementRunningQueries()
        +void decrementRunningQueries()
        +int getMaxQueryAdmissionsPerSecond()
        +long getTotalAdmissionAttempts()
        +long getTotalAdmissionsGranted()
        +long getTotalAdmissionsDenied()
        +int getMinRunningQueriesForPacing()
        +double getAdmissionGrantRate()
        +double getAdmissionDenyRate()
        +long getMillisSinceLastAdmission()
    }

    class InternalResourceGroup {
        -Optional~InternalResourceGroup~ parent
        -BooleanSupplier admissionSlotSupplier
        -Runnable onQueryStarted
        -Runnable onQueryFinished
        -Queue~ManagedQueryExecution~ queuedQueries
        +InternalResourceGroup(Optional~InternalResourceGroup~ parent, String name, BiConsumer~InternalResourceGroup,Boolean~ jmxExportListener, Executor executor, Function~ResourceGroupId,Optional~ResourceGroupRuntimeInfo~ additionalRuntimeInfo, Predicate~InternalResourceGroup~ shouldWaitForResourceManagerUpdate, InternalNodeManager nodeManager, ClusterResourceChecker clusterResourceChecker, BooleanSupplier admissionSlotSupplier, Runnable onQueryStarted, Runnable onQueryFinished)
        +InternalResourceGroup getOrCreateSubGroup(String name, boolean staticSegments)
        +void run(ManagedQueryExecution query)
        -void startInBackground(ManagedQueryExecution query)
        -void queryFinished(ManagedQueryExecution query)
        -boolean internalStartNext()
    }

    class RootInternalResourceGroup {
        +RootInternalResourceGroup(String name, BiConsumer~InternalResourceGroup,Boolean~ jmxExportListener, Executor executor, Function~ResourceGroupId,Optional~ResourceGroupRuntimeInfo~ additionalRuntimeInfo, Predicate~InternalResourceGroup~ shouldWaitForResourceManagerUpdate, InternalNodeManager nodeManager, ClusterResourceChecker clusterResourceChecker, BooleanSupplier admissionSlotSupplier, Runnable onQueryStarted, Runnable onQueryFinished)
        +RootInternalResourceGroup(String name, BiConsumer~InternalResourceGroup,Boolean~ jmxExportListener, Executor executor, Function~ResourceGroupId,Optional~ResourceGroupRuntimeInfo~ additionalRuntimeInfo, Predicate~InternalResourceGroup~ shouldWaitForResourceManagerUpdate, InternalNodeManager nodeManager, ClusterResourceChecker clusterResourceChecker)
        +void processQueuedQueries()
    }

    class QueryManagerConfig {
        -int maxQueryAdmissionsPerSecond
        -int minRunningQueriesForPacing
        +int getMaxQueryAdmissionsPerSecond()
        +QueryManagerConfig setMaxQueryAdmissionsPerSecond(int maxQueryAdmissionsPerSecond)
        +int getMinRunningQueriesForPacing()
        +QueryManagerConfig setMinRunningQueriesForPacing(int minRunningQueriesForPacing)
    }

    InternalResourceGroupManager --> RootInternalResourceGroup : creates
    InternalResourceGroupManager ..> QueryManagerConfig : reads
    InternalResourceGroupManager ..> ClusterResourceChecker
    RootInternalResourceGroup --|> InternalResourceGroup
    RootInternalResourceGroup o--> InternalResourceGroup : subGroups
    InternalResourceGroup ..> ManagedQueryExecution
    InternalResourceGroup ..> InternalNodeManager
    InternalResourceGroup ..> ClusterResourceChecker
    InternalResourceGroup ..> BooleanSupplier
    InternalResourceGroup ..> Runnable
Loading

File-Level Changes

Change Details Files
Introduce global query admission rate limiting with lock-free tracking and JMX metrics in InternalResourceGroupManager, and integrate it into resource group creation.
  • Add maxQueryAdmissionsPerSecond, minRunningQueriesForPacing, queryAdmissionIntervalNanos, and atomic counters (lastAdmittedQueryNanos, admission attempts/grants/denies, running queries) to InternalResourceGroupManager, initialized from QueryManagerConfig.
  • Implement tryAcquireAdmissionSlot() as a CAS-based global rate limiter that bypasses pacing when disabled or when total running queries are below the configured threshold, and uses bounded retries under contention.
  • Track global running query count via incrementRunningQueries()/decrementRunningQueries() callbacks (invoked from InternalResourceGroup) instead of scanning resource groups, to avoid lock interactions.
  • Expose pacing configuration and metrics via @Managed JMX methods, including grant/deny rates and millis since last admission.
  • Wire pacing callbacks (tryAcquireAdmissionSlot, incrementRunningQueries, decrementRunningQueries) into RootInternalResourceGroup construction paths for both resource-manager-enabled and legacy modes.
presto-main-base/src/main/java/com/facebook/presto/execution/resourceGroups/InternalResourceGroupManager.java
Propagate admission pacing and running-query tracking into InternalResourceGroup scheduling and execution lifecycle, ensuring queued and immediate-start queries respect the global rate limiter.
  • Extend InternalResourceGroup and RootInternalResourceGroup constructors to accept an admissionSlotSupplier and onQueryStarted/onQueryFinished callbacks, with a default no-op constructor for non-paced usage.
  • In run(), gate immediate-start candidates on admissionSlotSupplier before starting in background; otherwise enqueue the query as before.
  • In startInBackground(), invoke onQueryStarted when a query transitions to running, and in queryFinished(), invoke onQueryFinished when a running query completes, keeping the global counter accurate.
  • In internalStartNext(), check admissionSlotSupplier before dequeuing a queued query; only poll and start when pacing allows, otherwise leave the queue intact and return false.
  • Adjust subgroup scheduling logic so subgroups are re-queued either when they didn’t start a query due to pacing or when they started one and remain eligible, maintaining fair scheduling under pacing.
presto-main-base/src/main/java/com/facebook/presto/execution/resourceGroups/InternalResourceGroup.java
Add configuration knobs for query admission pacing and document them in QueryManagerConfig.
  • Introduce maxQueryAdmissionsPerSecond (default Integer.MAX_VALUE) and minRunningQueriesForPacing (default 30) fields in QueryManagerConfig with appropriate @min constraints.
  • Add @Config-annotated setters with descriptions for query-manager.max-query-admissions-per-second and query-manager.min-running-queries-for-pacing to make pacing configurable.
  • Use these new config fields in InternalResourceGroupManager to drive pacing behavior.
presto-main-base/src/main/java/com/facebook/presto/execution/QueryManagerConfig.java
Add unit tests to validate admission pacing behavior, including rate limits, bypass conditions, and metric tracking.
  • Add tests that verify unlimited pacing (default config) always admits queries without affecting metrics.
  • Add tests for 1 qps and 10 qps configurations that validate admissions succeed only after the appropriate time interval elapses.
  • Add tests to confirm pacing is bypassed when running queries are below the minRunningQueriesForPacing threshold, and that metrics are not updated in this case.
  • Add tests that ensure pacing is enforced when running queries exceed the threshold, including immediate deny behavior on back-to-back calls and correct accounting of attempts/grants/denies.
presto-main-base/src/test/java/com/facebook/presto/execution/resourceGroups/TestInternalResourceGroupManager.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@prashantgolash prashantgolash changed the title Add configurable query admission pacing to prevent worker overload feat: Add configurable query admission pacing to prevent worker overload Jan 20, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The new pacing tests rely on Thread.sleep to advance time, which can make the suite slow and flaky; consider injecting a clock/ticker into the pacing logic so tests can deterministically control time without real sleeps.
  • The global running query counter in InternalResourceGroupManager is updated via onQueryStarted/onQueryFinished callbacks; it might be safer to guard decrementRunningQueries() against going negative (e.g., with a checkState) to catch any mismatched updates early.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new pacing tests rely on Thread.sleep to advance time, which can make the suite slow and flaky; consider injecting a clock/ticker into the pacing logic so tests can deterministically control time without real sleeps.
- The global running query counter in InternalResourceGroupManager is updated via onQueryStarted/onQueryFinished callbacks; it might be safer to guard decrementRunningQueries() against going negative (e.g., with a checkState) to catch any mismatched updates early.

## Individual Comments

### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/execution/resourceGroups/InternalResourceGroupManager.java:229-231` </location>
<code_context>
+    }
+
+    /** Called by InternalResourceGroup when a query finishes execution. */
+    public void decrementRunningQueries()
+    {
+        totalRunningQueriesCounter.decrementAndGet();
     }

</code_context>

<issue_to_address>
**issue (bug_risk):** Running query counter can silently go negative if callbacks become unbalanced

`decrementRunningQueries()` unconditionally decrements the global `totalRunningQueriesCounter`. If `onQueryStarted`/`onQueryFinished` ever become unbalanced (missed increment, double decrement, future changes), this can drive the counter negative and break pacing thresholds. Please add protection (e.g., `updateAndGet(value -> Math.max(0, value - 1))` or at least an assertion/log when it would go below zero) so mis-accounting is detected rather than silently corrupting the global count.
</issue_to_address>

### Comment 2
<location> `presto-main-base/src/test/java/com/facebook/presto/execution/resourceGroups/TestInternalResourceGroupManager.java:172-181` </location>
<code_context>
+    public void testAdmissionPacingAppliedAboveRunningQueryThreshold()
</code_context>

<issue_to_address>
**suggestion (testing):** Missing coverage for how pacing behaves when running queries drop back below the threshold

Current tests only cover behavior strictly below and strictly above `minRunningQueriesForPacing`. There’s no test for the transition back below the threshold (queries finishing, counter decrementing, and pacing turning off). Please add a test that:
- Starts above the threshold and confirms pacing is applied.
- Simulates query completion so the running query count drops below the threshold.
- Verifies subsequent admissions are not rate limited and metrics reflect that pacing is bypassed.
This will exercise `incrementRunningQueries`/`decrementRunningQueries` and the threshold logic over time rather than at a single point.

Suggested implementation:

```java
        InternalResourceGroupManager<ImmutableMap<Object, Object>> manager = new InternalResourceGroupManager<>(
                (poolId, listener) -> {},
                config,

```

To fully implement the requested behavior, please add a new test method that explicitly exercises the transition from *above* the pacing threshold back to *below* it, verifying that pacing is turned off and metrics reflect bypassed pacing.

Place the following test near the other admission pacing tests (e.g., immediately after `testAdmissionPacingAppliedAboveRunningQueryThreshold`):

```java
    @Test
    public void testAdmissionPacingTurnsOffWhenRunningQueriesDropBelowThreshold()
            throws Exception
    {
        // Configure pacing with a threshold of 2 running queries and a low rate
        QueryManagerConfig config = new QueryManagerConfig()
                .setMaxQueryAdmissionsPerSecond(1)  // 1 per second, so pacing should be visible
                .setMinRunningQueriesForPacing(2);  // Threshold of 2 running queries

        InternalResourceGroupManager<ImmutableMap<Object, Object>> manager = new InternalResourceGroupManager<>(
                (poolId, listener) -> {},
                config,
                // keep any additional constructor args consistent with other tests
                /* ... existing arguments ... */);

        // Start at/above the threshold: simulate 2 running queries
        manager.incrementRunningQueries();
        manager.incrementRunningQueries();

        // First admission: we should be paced and metrics should record the attempt
        long start = System.nanoTime();
        assertTrue(manager.tryAcquireAdmissionSlot());
        long firstDurationMs = NANOSECONDS.toMillis(System.nanoTime() - start);

        // At 1 qps, the first call may or may not wait depending on token bucket state,
        // but the key is that metrics are tracking pacing, not bypassing it.
        assertEquals(manager.getTotalAdmissionAttempts(), 1);
        assertEquals(manager.getTotalAdmissionsGranted(), 1);
        assertEquals(manager.getTotalAdmissionsDenied(), 0);

        // Now simulate queries finishing so that we drop below the threshold
        manager.decrementRunningQueries();
        manager.decrementRunningQueries();

        // Sanity: running queries counter should now be below threshold (typically 0)
        // If there's a getter for running queries, assert it; otherwise rely on
        // the behavior below as an indirect check.

        // Subsequent admissions should no longer be paced: they should be granted immediately
        start = System.nanoTime();
        assertTrue(manager.tryAcquireAdmissionSlot());
        long secondDurationMs = NANOSECONDS.toMillis(System.nanoTime() - start);

        // Verify that the second admission was effectively unpaced (close to zero delay).
        // Use a small upper bound to avoid flakiness; adjust threshold to match other tests.
        assertTrue(secondDurationMs < 10, "Admission should not be significantly delayed when below pacing threshold");

        // Metrics should reflect that pacing logic is effectively bypassed once below the threshold
        // (no additional "denied" admissions and attempts/granted increment as admissions succeed).
        assertEquals(manager.getTotalAdmissionsDenied(), 0);
        assertEquals(manager.getTotalAdmissionsGranted(), 2);
        assertEquals(manager.getTotalAdmissionAttempts(), 2);
    }
```

Notes / things to align with the existing codebase:

1. **Constructor arguments**: Replace `/* ... existing arguments ... */` with the same extra arguments used in other tests in this class when constructing `InternalResourceGroupManager`.
2. **Time measurement utilities**: If the test class already imports `java.util.concurrent.TimeUnit.NANOSECONDS` or uses a different helper for timing, keep it consistent (e.g., static import `NANOSECONDS` or use `System.currentTimeMillis()` as other tests do).
3. **Running query getter (optional)**: If `InternalResourceGroupManager` exposes a `getRunningQueries()` or similar method used elsewhere in tests, add an assertion after the decrements:
   ```java
   assertEquals(manager.getRunningQueries(), 0);
   ```
4. **Pacing delay assertion**: If other tests use a shared helper or a different bound (e.g., `< 50` ms) for "non-pacing" admissions, adjust `secondDurationMs < 10` to match that convention to avoid flakiness.
5. **Static imports**: Ensure `NANOSECONDS` is statically imported if not already:
   ```java
   import static java.util.concurrent.TimeUnit.NANOSECONDS;
   ```
   or adjust calls accordingly.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +172 to +181
public void testAdmissionPacingAppliedAboveRunningQueryThreshold()
throws Exception
{
// Configure pacing with a threshold of 2 running queries
QueryManagerConfig config = new QueryManagerConfig()
.setMaxQueryAdmissionsPerSecond(1) // 1 per second
.setMinRunningQueriesForPacing(2); // Threshold of 2 running queries

InternalResourceGroupManager<ImmutableMap<Object, Object>> manager = new InternalResourceGroupManager<>(
(poolId, listener) -> {},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Missing coverage for how pacing behaves when running queries drop back below the threshold

Current tests only cover behavior strictly below and strictly above minRunningQueriesForPacing. There’s no test for the transition back below the threshold (queries finishing, counter decrementing, and pacing turning off). Please add a test that:

  • Starts above the threshold and confirms pacing is applied.
  • Simulates query completion so the running query count drops below the threshold.
  • Verifies subsequent admissions are not rate limited and metrics reflect that pacing is bypassed.
    This will exercise incrementRunningQueries/decrementRunningQueries and the threshold logic over time rather than at a single point.

Suggested implementation:

        InternalResourceGroupManager<ImmutableMap<Object, Object>> manager = new InternalResourceGroupManager<>(
                (poolId, listener) -> {},
                config,

To fully implement the requested behavior, please add a new test method that explicitly exercises the transition from above the pacing threshold back to below it, verifying that pacing is turned off and metrics reflect bypassed pacing.

Place the following test near the other admission pacing tests (e.g., immediately after testAdmissionPacingAppliedAboveRunningQueryThreshold):

    @Test
    public void testAdmissionPacingTurnsOffWhenRunningQueriesDropBelowThreshold()
            throws Exception
    {
        // Configure pacing with a threshold of 2 running queries and a low rate
        QueryManagerConfig config = new QueryManagerConfig()
                .setMaxQueryAdmissionsPerSecond(1)  // 1 per second, so pacing should be visible
                .setMinRunningQueriesForPacing(2);  // Threshold of 2 running queries

        InternalResourceGroupManager<ImmutableMap<Object, Object>> manager = new InternalResourceGroupManager<>(
                (poolId, listener) -> {},
                config,
                // keep any additional constructor args consistent with other tests
                /* ... existing arguments ... */);

        // Start at/above the threshold: simulate 2 running queries
        manager.incrementRunningQueries();
        manager.incrementRunningQueries();

        // First admission: we should be paced and metrics should record the attempt
        long start = System.nanoTime();
        assertTrue(manager.tryAcquireAdmissionSlot());
        long firstDurationMs = NANOSECONDS.toMillis(System.nanoTime() - start);

        // At 1 qps, the first call may or may not wait depending on token bucket state,
        // but the key is that metrics are tracking pacing, not bypassing it.
        assertEquals(manager.getTotalAdmissionAttempts(), 1);
        assertEquals(manager.getTotalAdmissionsGranted(), 1);
        assertEquals(manager.getTotalAdmissionsDenied(), 0);

        // Now simulate queries finishing so that we drop below the threshold
        manager.decrementRunningQueries();
        manager.decrementRunningQueries();

        // Sanity: running queries counter should now be below threshold (typically 0)
        // If there's a getter for running queries, assert it; otherwise rely on
        // the behavior below as an indirect check.

        // Subsequent admissions should no longer be paced: they should be granted immediately
        start = System.nanoTime();
        assertTrue(manager.tryAcquireAdmissionSlot());
        long secondDurationMs = NANOSECONDS.toMillis(System.nanoTime() - start);

        // Verify that the second admission was effectively unpaced (close to zero delay).
        // Use a small upper bound to avoid flakiness; adjust threshold to match other tests.
        assertTrue(secondDurationMs < 10, "Admission should not be significantly delayed when below pacing threshold");

        // Metrics should reflect that pacing logic is effectively bypassed once below the threshold
        // (no additional "denied" admissions and attempts/granted increment as admissions succeed).
        assertEquals(manager.getTotalAdmissionsDenied(), 0);
        assertEquals(manager.getTotalAdmissionsGranted(), 2);
        assertEquals(manager.getTotalAdmissionAttempts(), 2);
    }

Notes / things to align with the existing codebase:

  1. Constructor arguments: Replace /* ... existing arguments ... */ with the same extra arguments used in other tests in this class when constructing InternalResourceGroupManager.
  2. Time measurement utilities: If the test class already imports java.util.concurrent.TimeUnit.NANOSECONDS or uses a different helper for timing, keep it consistent (e.g., static import NANOSECONDS or use System.currentTimeMillis() as other tests do).
  3. Running query getter (optional): If InternalResourceGroupManager exposes a getRunningQueries() or similar method used elsewhere in tests, add an assertion after the decrements:
    assertEquals(manager.getRunningQueries(), 0);
  4. Pacing delay assertion: If other tests use a shared helper or a different bound (e.g., < 50 ms) for "non-pacing" admissions, adjust secondDurationMs < 10 to match that convention to avoid flakiness.
  5. Static imports: Ensure NANOSECONDS is statically imported if not already:
    import static java.util.concurrent.TimeUnit.NANOSECONDS;
    or adjust calls accordingly.

prashantgolash added a commit to prashantgolash/presto that referenced this pull request Jan 20, 2026
…oad (prestodb#26988)

Summary:

Introduces global query admission pacing to control the rate at which queries
are admitted for execution. When a cluster experiences sudden bursts of heavy
queries, workers can become overloaded, leading to degraded performance and
potential failures. This feature allows operators to configure a maximum
admission rate to smooth out query submission spikes.

Two new configuration options are added:
- `query-manager.max-query-admissions-per-second`: Maximum queries admitted
  per second globally (default: unlimited). Set to a lower value to pace
  query admissions.
- `query-manager.min-running-queries-for-pacing`: Minimum running queries
  before pacing is applied (default: 1). Allows bypassing pacing when the
  cluster is idle.

The pacing uses a lock-free atomic approach to avoid deadlocks with resource
group locks while ensuring correctness across multiple root resource groups.

Example configuration for production:
```
query-manager.max-query-admissions-per-second=10
query-manager.min-running-queries-for-pacing=30
```

This admits at most 10 queries per second, but only when 30+ queries are
already running.

Differential Revision: D88058630
Copy link
Copy Markdown
Contributor

@spershin spershin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prashantgolash thank you for working on this, it is an important change to avoid regular spiky cluster overload.

@steveburnett
Copy link
Copy Markdown
Contributor

Please add documentation for the new configs query-manager.max-query-admissions-per-second and query-manager.min-running-queries-for-pacing as described in Designing Your Code in CONTRIBUTING.md:

"All new language features, new functions, session and config properties, and major features have documentation added"

prashantgolash added a commit to prashantgolash/presto that referenced this pull request Jan 25, 2026
…oad (prestodb#26988)

Summary:

Introduces global query admission pacing to control the rate at which queries
are admitted for execution. When a cluster experiences sudden bursts of heavy
queries, workers can become overloaded, leading to degraded performance and
potential failures. This feature allows operators to configure a maximum
admission rate to smooth out query submission spikes.

Two new configuration options are added:
- `query-manager.max-query-admissions-per-second`: Maximum queries admitted
  per second globally (default: unlimited). Set to a lower value to pace
  query admissions.
- `query-manager.min-running-queries-for-pacing`: Minimum running queries
  before pacing is applied (default: 1). Allows bypassing pacing when the
  cluster is idle.

The pacing uses a lock-free atomic approach to avoid deadlocks with resource
group locks while ensuring correctness across multiple root resource groups.

Example configuration for production:
```
query-manager.max-query-admissions-per-second=10
query-manager.min-running-queries-for-pacing=30
```

This admits at most 10 queries per second, but only when 30+ queries are
already running.

Differential Revision: D88058630
@prashantgolash
Copy link
Copy Markdown
Contributor Author

Please add documentation for the new configs query-manager.max-query-admissions-per-second and query-manager.min-running-queries-for-pacing as described in Designing Your Code in CONTRIBUTING.md:

"All new language features, new functions, session and config properties, and major features have documentation added"

Thanks added the documentation.

Copy link
Copy Markdown
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the documentation! Just one nit.

when the number of running queries exceeds the threshold configured by
``query-manager.query-pacing.min-running-queries``.

Set to a lower value (e.g., ``10``) to limit query admission rate during
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Set to a lower value (e.g., ``10``) to limit query admission rate during
Set to a lower value such as``10`` to limit query admission rate during

Avoid Latin abbreviations. See the GitLab documentation style guide recommended word list entry for e.g. for discussion and alternatives.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

spershin
spershin previously approved these changes Jan 26, 2026
Copy link
Copy Markdown
Contributor

@spershin spershin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
We need to seek approval from a Presto committer now.

The change looks safe to rollout due to config property guard.
When in build we can start canarying in verification & RC clusters (batch).
When in production build we can start canarying in 1 batch cluster and then increase the coverage.

…oad (prestodb#26988)

Summary:

Introduces global query admission pacing to control the rate at which queries
are admitted for execution. When a cluster experiences sudden bursts of heavy
queries, workers can become overloaded, leading to degraded performance and
potential failures. This feature allows operators to configure a maximum
admission rate to smooth out query submission spikes.

Two new configuration options are added:
- `query-manager.max-query-admissions-per-second`: Maximum queries admitted
  per second globally (default: unlimited). Set to a lower value to pace
  query admissions.
- `query-manager.min-running-queries-for-pacing`: Minimum running queries
  before pacing is applied (default: 1). Allows bypassing pacing when the
  cluster is idle.

The pacing uses a lock-free atomic approach to avoid deadlocks with resource
group locks while ensuring correctness across multiple root resource groups.

Example configuration for production:
```
query-manager.max-query-admissions-per-second=10
query-manager.min-running-queries-for-pacing=30
```

This admits at most 10 queries per second, but only when 30+ queries are
already running.

Reviewed By: spershin

Differential Revision: D88058630
prashantgolash added a commit to prashantgolash/presto that referenced this pull request Jan 26, 2026
…oad (prestodb#26988)

Summary:

Introduces global query admission pacing to control the rate at which queries
are admitted for execution. When a cluster experiences sudden bursts of heavy
queries, workers can become overloaded, leading to degraded performance and
potential failures. This feature allows operators to configure a maximum
admission rate to smooth out query submission spikes.

Two new configuration options are added:
- `query-manager.max-query-admissions-per-second`: Maximum queries admitted
  per second globally (default: unlimited). Set to a lower value to pace
  query admissions.
- `query-manager.min-running-queries-for-pacing`: Minimum running queries
  before pacing is applied (default: 1). Allows bypassing pacing when the
  cluster is idle.

The pacing uses a lock-free atomic approach to avoid deadlocks with resource
group locks while ensuring correctness across multiple root resource groups.

Example configuration for production:
```
query-manager.max-query-admissions-per-second=10
query-manager.min-running-queries-for-pacing=30
```

This admits at most 10 queries per second, but only when 30+ queries are
already running.

Reviewed By: spershin

Differential Revision: D88058630
@prashantgolash prashantgolash force-pushed the export-D88058630 branch 2 times, most recently from 3927f7d to 3e95d6f Compare January 26, 2026 18:42
Copy link
Copy Markdown
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local doc build, looks good. Thanks!

@feilong-liu feilong-liu merged commit fede681 into prestodb:master Jan 26, 2026
81 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants