Skip to content

Conversation

@Swiddis
Copy link
Collaborator

@Swiddis Swiddis commented Nov 25, 2025

Description

Patch to address a symptom of #4842. Currently, the only timeout logic we support is in the underlying DSL, which works if we submit a slow query to OpenSearch but fails if we have exhaustion in any other part of the code. During those failures, the sql-worker thread is frozen and takes 100% CPU until completion, which can be several hours (and potentially bring down cluster nodes).

This PR adds some basic configurable timeout handling, and puts interrupt points in our index enumeration and planning rules.

Related Issues

Related to #4842

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • New Features

    • Configurable PPL query timeout (default 300s) applied to execution and planning, interrupting long-running operations.
  • Documentation

    • Added docs and usage example for plugins.ppl.query.timeout, including default, node scope, and dynamic update.
  • Bug Fixes

    • Enforced timeouts across query execution, planning rules, and index scan iteration to prevent runaway operations and improve robustness.

✏️ Tip: You can customize this high-level summary in your review settings.

@Swiddis Swiddis added infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc. PPL Piped processing language calcite calcite migration releated maintenance Improves code quality, but not the product and removed infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc. labels Nov 25, 2025
Signed-off-by: Simeon Widdis <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 26, 2025

Walkthrough

Adds a PPL query timeout setting and exposes it via OpenSearchSettings; makes query execution timeout-aware in OpenSearchQueryManager and OpenSearchIndexScan; introduces InterruptibleRelRule and migrates many planner rules to it; updates DI, tests, and documentation.

Changes

Cohort / File(s) Summary
Settings & Registration
common/src/main/java/org/opensearch/sql/common/setting/Settings.java, opensearch/src/main/java/org/opensearch/sql/opensearch/setting/OpenSearchSettings.java
Added enum key PPL_QUERY_TIMEOUT and a new Setting<TimeValue> (plugins.ppl.query.timeout, default 300s, node scope, dynamic); registered the setting and exposed it via pluginSettings().
Documentation
docs/user/ppl/admin/settings.rst
Documented plugins.ppl.query.timeout with purpose, default (300s), scope (node), dynamic update info and example usage.
Query Execution & Timeout Scheduling
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java
Injects Settings; reads PPL timeout per request; schedules a timeout interrupter via ThreadPool/Scheduler, cancels scheduled task on completion, handles interruptions by logging and throwing OpenSearchTimeoutException.
Storage Interruption Handling
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexScan.java
Added interruption checks in hasNext()/next() and throw OpenSearchTimeoutException when thread is interrupted.
Planner Rules Base
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/InterruptibleRelRule.java
New abstract InterruptibleRelRule that finalizes onMatch to pre-check thread interruption and delegates to protected onMatchImpl(...), throwing OpenSearchTimeoutException if interrupted.
Planner Rules Migration
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/...
AggregateIndexScanRule.java, DedupPushdownRule.java, ExpandCollationOnProjectExprRule.java, FilterIndexScanRule.java, LimitIndexScanRule.java, ProjectIndexScanRule.java, RareTopPushdownRule.java, RelevanceFunctionPushdownRule.java, SortAggregateMeasureRule.java, SortExprIndexScanRule.java, SortIndexScanRule.java, SortProjectExprTransposeRule.java
Migrated many rules from RelRule to InterruptibleRelRule; changed public void onMatch(...) to protected void onMatchImpl(...); removed RelRule imports; a few rules include small push-down/transform adjustments.
Dependency Injection & Tests
plugin/src/main/java/org/opensearch/sql/plugin/config/OpenSearchPluginModule.java, opensearch/src/test/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManagerTest.java
Provider queryManager signature and constructor calls updated to accept Settings; tests updated to mock Settings, Scheduler behavior and ScheduledCancellable; ThreadPool scheduling behavior mocked to run immediately in tests.

Sequence Diagram(s)

sequenceDiagram
    participant U as User
    participant QM as OpenSearchQueryManager
    participant TP as ThreadPool/Scheduler
    participant Worker as WorkerThread
    participant Scan as OpenSearchIndexScan

    U->>QM: submit(request)
    QM->>QM: read PPL_QUERY_TIMEOUT from Settings
    QM->>TP: schedule(timeout task -> interrupt Worker) 
    TP-->>QM: ScheduledCancellable

    par Execution vs Timeout
        QM->>Worker: execute query task (Runnable)
        Worker->>Scan: iterate rows / hasNext()/next()
        Scan->>Scan: check Thread.isInterrupted()
        alt interrupted
            Scan-->>Worker: throw OpenSearchTimeoutException
            Worker-->>QM: propagate timeout error
        else
            Scan-->>Worker: yield rows
            Worker-->>QM: complete normally
        end
    and Timeout task
        TP->>Worker: interrupt Worker when timeout expires
    end

    alt completed before timeout
        QM->>TP: cancel ScheduledCancellable
        QM->>U: return results
    else timeout
        QM->>U: return timeout error
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Focus areas:
    • InterruptibleRelRule: correctness of interruption check and exception wrapping.
    • OpenSearchQueryManager: race conditions between scheduling, interrupting, cancellation, and context propagation.
    • OpenSearchIndexScan: correct propagation of timeout exceptions through iterator semantics.
    • DI and tests: ensure provider signature change is consistently applied and tests reliably simulate scheduler behavior.
    • Spot-check migrated planner rules that added push-down/transform logic.

Poem

🐇 I timed my hop through code and root,

A gentle nudge to stop and reboot.
Rules now pause when clocks tick red,
Threads unwind and carrots spread.
Timeout set — now onward we scoot.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.68% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding timeout support for Calcite queries, which is the core objective across the entire changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
docs/user/ppl/admin/settings.rst (1)

76-110: Documentation follows established format.

The new section documents the setting clearly with default value, scope, and an example. Consider adding a note about whether the timeout can be disabled or set to a very high value for long-running queries, as this was raised in a previous review. This is a concern.

🧹 Nitpick comments (2)
opensearch/src/test/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManagerTest.java (1)

47-78: Test correctly validates the new constructor and timeout integration.

The test properly mocks the Settings dependency with PPL_QUERY_TIMEOUT returning a TimeValue, and the schedule() method returns a mock ScheduledCancellable. The immediate execution pattern is appropriate for unit testing.

Consider adding a test case that verifies the scheduled timeout task is cancelled when the query completes successfully (if the production code has that behavior). This would provide coverage for the timeout cleanup path.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ExpandCollationOnProjectExprRule.java (1)

59-60: Minor: Redundant null check after assertion.

The assertion on lines 59-60 checks toCollation != null, but line 94 performs another null check on toCollation. Since assertions can be disabled at runtime (with -da flag), the defensive check in handleComplexExpressionsSortedByScan is actually appropriate for production safety. However, if you rely on the check in the helper, consider converting the assert to a proper early-return guard for consistency.

Also applies to: 93-96

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5049a03 and 58bb0d9.

📒 Files selected for processing (20)
  • common/src/main/java/org/opensearch/sql/common/setting/Settings.java (1 hunks)
  • docs/user/ppl/admin/settings.rst (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/AggregateIndexScanRule.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ExpandCollationOnProjectExprRule.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/FilterIndexScanRule.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/InterruptibleRelRule.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/LimitIndexScanRule.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ProjectIndexScanRule.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RelevanceFunctionPushdownRule.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortAggregateMeasureRule.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortExprIndexScanRule.java (3 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortIndexScanRule.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortProjectExprTransposeRule.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/setting/OpenSearchSettings.java (3 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexScan.java (3 hunks)
  • opensearch/src/test/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManagerTest.java (3 hunks)
  • plugin/src/main/java/org/opensearch/sql/plugin/config/OpenSearchPluginModule.java (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
opensearch/src/test/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManagerTest.java (1)
common/src/main/java/org/opensearch/sql/common/setting/Settings.java (1)
  • Settings (17-105)
🔇 Additional comments (24)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexScan.java (2)

72-76: LGTM - Interrupt check placement is appropriate.

The interrupt check at the start of hasNext() ensures timely response to timeout signals during iteration. Using isInterrupted() (non-clearing) is correct here since you immediately throw an exception.


90-94: LGTM - Consistent interrupt handling in next().

The duplicate check in next() provides defense-in-depth for cases where hasNext() might not be called before next().

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/InterruptibleRelRule.java (1)

40-83: Well-designed base class for interruptible rules.

The Template Method pattern with final onMatch delegating to onMatchImpl is clean. The interrupt check placement ensures all subclasses benefit from timeout support automatically.

common/src/main/java/org/opensearch/sql/common/setting/Settings.java (1)

28-28: LGTM - Setting key follows existing conventions.

The new PPL_QUERY_TIMEOUT key is appropriately placed in the PPL Settings section with consistent naming.

opensearch/src/main/java/org/opensearch/sql/opensearch/setting/OpenSearchSettings.java (2)

374-379: LGTM - Setting registration follows established patterns.

The registration in the constructor with Updater and exposure via pluginSettings() is consistent with other settings in this file.

Also applies to: 654-654


73-78: I'll help you verify the review comment about the PPL_QUERY_TIMEOUT_SETTING. Let me investigate whether the timeout can be disabled and understand the constraints around positiveTimeSetting.
<function_calls>

OpenSearch positiveTimeSetting minimum value validation constraints

</function_calls>

Now let me examine the codebase to see how other timeout settings are configured:
<function_calls>


#!/bin/bash

Search for other timeout settings in the file to understand the pattern

rg -n "timeSetting|TimeValue" opensearch/src/main/java/org/opensearch/sql/opensearch/setting/OpenSearchSettings.java -A 3 -B 1


</function_calls>

<function_calls>


#!/bin/bash

Search for other timeout/time-related settings across the codebase to understand patterns

rg -n "timeValueSeconds|timeValueMinutes|positiveTimeSetting|optionalTimeSetting" --type java -A 2 | head -80


</function_calls>

<function_calls>


#!/bin/bash

Look for how timeouts are used and if there's a disable pattern

rg -n "TimeValue|timeout" opensearch/src/main/java/org/opensearch/sql/opensearch/setting/OpenSearchSettings.java -B 1 -A 3 | head -100


</function_calls>

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/AggregateIndexScanRule.java (1)

42-50: LGTM - Clean migration to InterruptibleRelRule.

The rule now inherits automatic interrupt checking from InterruptibleRelRule. The onMatchImpl implementation preserves all existing logic unchanged.

plugin/src/main/java/org/opensearch/sql/plugin/config/OpenSearchPluginModule.java (1)

83-85: LGTM - Clean dependency injection update.

Adding Settings parameter enables the OpenSearchQueryManager to access the timeout configuration. The change follows the existing DI patterns in this module.

opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java (2)

36-41: LGTM - Timeout configuration retrieval.

The timeout value is correctly retrieved from settings and passed to the scheduling mechanism.


51-60: I'll help you verify the review comment about timeout scheduling. Let me investigate the thread pool configuration and how timeout tasks are handled in this codebase.
<function_calls>


#!/bin/bash

Search for ThreadPool and Scheduler references in the executor directory

rg -n "ThreadPool|Scheduler|ScheduledCancellable" --type=java opensearch/src/main/java/org/opensearch/sql/opensearch/executor/ -A 2 -B 2


</function_calls>

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (1)

30-38: LGTM! Consistent migration to interruptible rule pattern.

The changes correctly extend InterruptibleRelRule and rename the entry point to onMatchImpl with protected visibility. This allows the base class to perform interruption checks before delegating to the concrete implementation. The core dedup pushdown logic remains unchanged.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ProjectIndexScanRule.java (1)

29-37: LGTM! Correctly adopts the interruptible rule pattern.

The migration to InterruptibleRelRule and onMatchImpl follows the same pattern as other rules in this PR. The projection pushdown logic in apply() remains intact.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/FilterIndexScanRule.java (1)

20-28: LGTM! Interruptible pattern applied consistently.

The base class migration and onMatchImpl pattern align with the other rule changes in this PR. The filter pushdown behavior is preserved.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/LimitIndexScanRule.java (1)

24-31: LGTM! Consistent with the interruptible rule migration pattern.

The changes follow the same structure as the other rule migrations in this PR. The limit pushdown logic and helper methods remain unaffected.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java (1)

26-33: LGTM! Clean migration to InterruptibleRelRule.

The refactoring correctly extends InterruptibleRelRule and moves the matching logic to the protected onMatchImpl method. This enables the base class to handle interrupt checks before delegating to the actual rule logic.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortIndexScanRule.java (1)

17-36: LGTM! Consistent migration pattern.

The rule correctly extends InterruptibleRelRule and implements onMatchImpl. The sort pushdown logic remains unchanged.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RelevanceFunctionPushdownRule.java (1)

29-52: LGTM! Migration follows established pattern.

The rule correctly extends InterruptibleRelRule with onMatchImpl. The relevance function detection logic and conditional pushdown remain unchanged.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortProjectExprTransposeRule.java (1)

39-47: LGTM! Clean refactoring with unchanged transpose logic.

The migration to InterruptibleRelRule and onMatchImpl is consistent with other rules. The complex collation transformation logic remains intact.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ExpandCollationOnProjectExprRule.java (1)

45-53: LGTM! Migration to InterruptibleRelRule is correct.

The refactoring follows the established pattern with onMatchImpl as the protected entry point.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortAggregateMeasureRule.java (2)

19-27: LGTM! Consistent migration to InterruptibleRelRule.

The class correctly extends InterruptibleRelRule and implements the protected onMatchImpl template method. The base class handles interruption checks in its onMatch before delegating here.


28-34: Core logic preserved correctly.

The push-down logic is unchanged—retrieving the LogicalSort and CalciteLogicalIndexScan, attempting the push-down, and transforming if successful.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortExprIndexScanRule.java (3)

29-29: LGTM! Import and class signature updated correctly.

The new OpenSearchRuleConfig import and the extension of InterruptibleRelRule follow the same migration pattern applied across other rules in this PR.

Also applies to: 42-42


48-112: LGTM! Method correctly refactored to onMatchImpl.

The complex sort expression push-down logic is preserved with proper early-return guards:

  1. Checks sort references expressions
  2. Validates expression simplicity
  3. Verifies scan collation compatibility
  4. Extracts and validates sort digests
  5. Handles limit/offset push-down

The multiple early returns keep the logic efficient by avoiding unnecessary work when conditions aren't met.


234-260: LGTM! Config interface properly updated.

The Config interface now extends OpenSearchRuleConfig instead of RelRule.Config, aligning with the InterruptibleRelRule base class requirements and maintaining consistency with other migrated rules.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortProjectExprTransposeRule.java (1)

39-47: LGTM! Consistent with the timeout refactoring pattern.

This file follows the same clean refactoring pattern as the other rules, migrating to InterruptibleRelRule and implementing onMatchImpl. The internal logic is preserved.

The PR summary notes that "Tests not added". Consider adding integration tests to verify that:

  1. Timeout interruption properly stops rule matching during planning
  2. Rules continue to function correctly in the normal (non-timeout) case
  3. Thread interruption is handled gracefully without leaving the planner in an inconsistent state

Do you want assistance generating test cases for the timeout behavior?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 58bb0d9 and 88073bd.

📒 Files selected for processing (4)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/AggregateIndexScanRule.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ExpandCollationOnProjectExprRule.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortProjectExprTransposeRule.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (4)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (2)

33-33: LGTM! Migration to InterruptibleRelRule aligns with timeout support.

The migration from RelRule to InterruptibleRelRule correctly implements the timeout functionality described in the PR objectives. This enables the rule to be interrupted during long-running planning operations.


41-41: LGTM! Method signature change correctly implements the interruptible pattern.

The rename from onMatch to onMatchImpl and visibility change from public to protected correctly follows the pattern where the base class InterruptibleRelRule provides a onMatch method that handles interruption checks before delegating to the implementation.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ExpandCollationOnProjectExprRule.java (1)

46-53: LGTM! Consistent with the timeout refactoring pattern.

This file follows the same clean refactoring pattern as AggregateIndexScanRule, migrating to InterruptibleRelRule and implementing onMatchImpl. The internal logic is preserved.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/AggregateIndexScanRule.java (1)

42-50: Based on my verification attempts, I need to rewrite this review comment. The original review requests verification of InterruptibleRelRule base class implementation, but extensive searches across GitHub repositories, including opensearch-project and all public code, found no evidence that this class exists. This is a critical issue that prevents completing the requested verification.


Unable to verify base class implementation — InterruptibleRelRule not found in public repositories.

The review requests verification that InterruptibleRelRule properly implements interrupt checking before delegating to onMatchImpl. However, searches across opensearch-project GitHub repositories, the broader GitHub codebase, and Apache Calcite (the query planning framework used) found no evidence that InterruptibleRelRule exists.

This base class either:

  • Is newly introduced in this PR and not yet available for inspection
  • Exists only in a private/internal branch
  • May be incorrectly named in the review

Action required: Verify that InterruptibleRelRule exists and provide access to its implementation for review, or clarify the actual base class name if this is a refactoring error.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java (1)

70-75: Exception type check may miss direct InterruptedException.

The check e.getCause() instanceof InterruptedException only examines the immediate cause. If the exception itself is an InterruptedException (not wrapped), or if the InterruptedException is nested deeper, it won't be detected.

Consider a more robust check:

-               if (Thread.interrupted() || e.getCause() instanceof InterruptedException) {
+               if (Thread.interrupted()
+                   || e instanceof InterruptedException
+                   || e.getCause() instanceof InterruptedException) {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 88073bd and 9c4e3d2.

📒 Files selected for processing (1)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java (1)
common/src/main/java/org/opensearch/sql/common/setting/Settings.java (1)
  • Settings (17-105)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java (2)

84-90: Context propagation looks correct.

The withCurrentContext method properly captures and restores the Log4j ThreadContext for cross-thread execution.


37-38: Based on my research, I've found that the OpenSearch SQL documentation indicates that PPL_QUERY_TIMEOUT has a default value of 1 minute (60 seconds). However, I cannot definitively verify the null-safety behavior of the Settings.getSettingValue() method without direct access to the codebase.

The concern raised in the review comment is theoretically valid—if getSettingValue() can return null for any reason (misconfiguration, missing setting, etc.), this would cause issues. However, based on standard OpenSearch Settings patterns, settings with defaults are typically guaranteed to return non-null values through their defaultValue function.

Since I cannot conclusively verify the actual implementation of Settings.getSettingValue() or confirm whether null checks are necessary:


Verify that Settings.getSettingValue(Settings.Key.PPL_QUERY_TIMEOUT) is guaranteed to return a non-null TimeValue.

The review comment raises a valid concern about potential null pointer exceptions. While OpenSearch documentation confirms that PPL_QUERY_TIMEOUT has a 1-minute default, the null-safety guarantee of getSettingValue() cannot be verified without reviewing the Settings implementation. Confirm whether this method can return null or if the default is always guaranteed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java (2)

27-38: Clarify semantics of PPL_QUERY_TIMEOUT (null/zero/disabled).

submit() always retrieves PPL_QUERY_TIMEOUT and schedules a timeout with it. If this setting can be null, negative, or 0 to mean “no timeout”, threadPool.schedule(..., timeout, ...) will either NPE or cause immediate interrupts.

Consider explicitly handling “disabled” or invalid values here, e.g. by skipping the timeout scheduling when the configured value is null or non‑positive, or by enforcing a sane lower bound.


62-78: Make interrupt detection more robust by also handling InterruptedException directly.

The timeout path currently treats a timeout as:

if (Thread.interrupted() || e.getCause() instanceof InterruptedException) {
  ...
}

This misses the case where task.run() throws an InterruptedException directly (rather than wrapping it as a cause). In that case, the interruption would not be recognized as a timeout and would be rethrown as‑is.

You can make this more robust with a small change:

-                // Special-case handling of timeout-related interruptions
-                if (Thread.interrupted() || e.getCause() instanceof InterruptedException) {
+                // Special-case handling of timeout-related interruptions
+                if (Thread.interrupted()
+                    || e instanceof InterruptedException
+                    || e.getCause() instanceof InterruptedException) {
                   LOG.error("Query was interrupted due to timeout after {}", timeout);
                   throw new OpenSearchTimeoutException(
                       "Query execution timed out after " + timeout);
                 }

This preserves the existing interrupt‑clearing behavior while correctly classifying direct InterruptedException throws as timeouts.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c4e3d2 and 277da90.

📒 Files selected for processing (1)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java (1)

43-61: Timeout scheduling strategy looks solid.

Using a wrapped task with withCurrentContext(...) and scheduling the timeout interrupt on ThreadPool.Names.GENERIC while the query itself runs on the sql-worker pool addresses the original starvation problem and keeps the timeout trigger independent of worker exhaustion.

This structure is reasonable and aligns with the PR’s stated goals.

@Swiddis Swiddis requested a review from LantaoJin December 1, 2025 18:37
@Swiddis Swiddis added the v3.4.0 label Dec 1, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java (1)

71-76: Consider deeper cause chain inspection for wrapped InterruptedException.

The current check only inspects the direct cause. If InterruptedException is wrapped multiple levels deep (e.g., RuntimeException(SomeException(InterruptedException))), it won't be detected as a timeout. This is a minor edge case, but for robustness you could add a helper to traverse the cause chain:

private boolean hasCause(Throwable t, Class<? extends Throwable> causeType) {
    while (t != null) {
        if (causeType.isInstance(t)) return true;
        t = t.getCause();
    }
    return false;
}

Then use: Thread.interrupted() || hasCause(e, InterruptedException.class)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 277da90 and a058b02.

📒 Files selected for processing (1)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: WhiteSource Security Check
🔇 Additional comments (4)
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java (4)

10-21: Imports are appropriate for the added functionality.

The new imports align with the timeout handling requirements.


27-31: LGTM!

Logger and Settings fields are appropriate additions for the timeout functionality.


44-83: Well-structured timeout handling with past review concerns addressed.

Good implementation:

  • Timeout task correctly scheduled on ThreadPool.Names.GENERIC to avoid worker pool exhaustion issues.
  • Interrupt flag properly cleared after successful completion to prevent leakage to subsequent pooled tasks.
  • Wrapped task preserves thread context.

37-39: Verify that PPL_QUERY_TIMEOUT has a default value configured.

If settings.getSettingValue(Settings.Key.PPL_QUERY_TIMEOUT) returns null, this could cause issues when passed to the scheduler. Check that the setting is registered with a sensible default in OpenSearchSettings and that the schedule() method handles null values safely.

Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could u add IT to verify the exception throw correctly?

@Swiddis
Copy link
Collaborator Author

Swiddis commented Dec 2, 2025

Could u add IT to verify the exception throw correctly?

Per side-channel discussion, let's fast-follow this

@Swiddis Swiddis requested a review from penghuo December 2, 2025 18:51

/** PPL Settings. */
PPL_ENABLED("plugins.ppl.enabled"),
PPL_QUERY_TIMEOUT("plugins.ppl.query.timeout"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this impact both SQL and PPL queries or only PPL?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only implemented it with Calcite in mind, but I think it applies to both (up to whatever hits OSQueryManager)

Comment on lines +52 to +61
Scheduler.ScheduledCancellable timeoutTask =
threadPool.schedule(
() -> {
LOG.warn(
"Query execution timed out after {}. Interrupting execution thread.",
timeout);
executionThread.interrupt();
},
timeout,
ThreadPool.Names.GENERIC);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can check later if ScheduledExecutorService inside ThreadPool already provides scheduling task with timeout.

@Swiddis Swiddis merged commit 679d8ca into opensearch-project:main Dec 2, 2025
37 checks passed
@Swiddis Swiddis deleted the feature/ppl-timeouts branch December 2, 2025 19:46
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4857-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 679d8ca6415cb81a56ec79bd499e5647d57f79d1
# Push it to GitHub
git push --set-upstream origin backport/backport-4857-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-4857-to-2.19-dev.

Swiddis added a commit to Swiddis/sql that referenced this pull request Dec 3, 2025
Swiddis added a commit that referenced this pull request Dec 5, 2025
@LantaoJin LantaoJin added the backport-manually Filed a PR to backport manually. label Dec 8, 2025
asifabashar pushed a commit to asifabashar/sql that referenced this pull request Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev backport-failed backport-manually Filed a PR to backport manually. calcite calcite migration releated maintenance Improves code quality, but not the product PPL Piped processing language v3.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants