-
Notifications
You must be signed in to change notification settings - Fork 180
Support timeouts for Calcite queries #4857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Simeon Widdis <[email protected]>
Signed-off-by: Simeon Widdis <[email protected]>
Signed-off-by: Simeon Widdis <[email protected]>
Signed-off-by: Simeon Widdis <[email protected]>
WalkthroughAdds a PPL query timeout setting and exposes it via OpenSearchSettings; makes query execution timeout-aware in OpenSearchQueryManager and OpenSearchIndexScan; introduces InterruptibleRelRule and migrates many planner rules to it; updates DI, tests, and documentation. Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant QM as OpenSearchQueryManager
participant TP as ThreadPool/Scheduler
participant Worker as WorkerThread
participant Scan as OpenSearchIndexScan
U->>QM: submit(request)
QM->>QM: read PPL_QUERY_TIMEOUT from Settings
QM->>TP: schedule(timeout task -> interrupt Worker)
TP-->>QM: ScheduledCancellable
par Execution vs Timeout
QM->>Worker: execute query task (Runnable)
Worker->>Scan: iterate rows / hasNext()/next()
Scan->>Scan: check Thread.isInterrupted()
alt interrupted
Scan-->>Worker: throw OpenSearchTimeoutException
Worker-->>QM: propagate timeout error
else
Scan-->>Worker: yield rows
Worker-->>QM: complete normally
end
and Timeout task
TP->>Worker: interrupt Worker when timeout expires
end
alt completed before timeout
QM->>TP: cancel ScheduledCancellable
QM->>U: return results
else timeout
QM->>U: return timeout error
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
docs/user/ppl/admin/settings.rst (1)
76-110: Documentation follows established format.The new section documents the setting clearly with default value, scope, and an example. Consider adding a note about whether the timeout can be disabled or set to a very high value for long-running queries, as this was raised in a previous review. This is a concern.
🧹 Nitpick comments (2)
opensearch/src/test/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManagerTest.java (1)
47-78: Test correctly validates the new constructor and timeout integration.The test properly mocks the
Settingsdependency withPPL_QUERY_TIMEOUTreturning aTimeValue, and theschedule()method returns a mockScheduledCancellable. The immediate execution pattern is appropriate for unit testing.Consider adding a test case that verifies the scheduled timeout task is cancelled when the query completes successfully (if the production code has that behavior). This would provide coverage for the timeout cleanup path.
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ExpandCollationOnProjectExprRule.java (1)
59-60: Minor: Redundant null check after assertion.The assertion on lines 59-60 checks
toCollation != null, but line 94 performs another null check ontoCollation. Since assertions can be disabled at runtime (with-daflag), the defensive check inhandleComplexExpressionsSortedByScanis actually appropriate for production safety. However, if you rely on the check in the helper, consider converting the assert to a proper early-return guard for consistency.Also applies to: 93-96
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
common/src/main/java/org/opensearch/sql/common/setting/Settings.java(1 hunks)docs/user/ppl/admin/settings.rst(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/AggregateIndexScanRule.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ExpandCollationOnProjectExprRule.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/FilterIndexScanRule.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/InterruptibleRelRule.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/LimitIndexScanRule.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ProjectIndexScanRule.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RelevanceFunctionPushdownRule.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortAggregateMeasureRule.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortExprIndexScanRule.java(3 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortIndexScanRule.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortProjectExprTransposeRule.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/setting/OpenSearchSettings.java(3 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexScan.java(3 hunks)opensearch/src/test/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManagerTest.java(3 hunks)plugin/src/main/java/org/opensearch/sql/plugin/config/OpenSearchPluginModule.java(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
opensearch/src/test/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManagerTest.java (1)
common/src/main/java/org/opensearch/sql/common/setting/Settings.java (1)
Settings(17-105)
🔇 Additional comments (24)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexScan.java (2)
72-76: LGTM - Interrupt check placement is appropriate.The interrupt check at the start of
hasNext()ensures timely response to timeout signals during iteration. UsingisInterrupted()(non-clearing) is correct here since you immediately throw an exception.
90-94: LGTM - Consistent interrupt handling in next().The duplicate check in
next()provides defense-in-depth for cases wherehasNext()might not be called beforenext().opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/InterruptibleRelRule.java (1)
40-83: Well-designed base class for interruptible rules.The Template Method pattern with
final onMatchdelegating toonMatchImplis clean. The interrupt check placement ensures all subclasses benefit from timeout support automatically.common/src/main/java/org/opensearch/sql/common/setting/Settings.java (1)
28-28: LGTM - Setting key follows existing conventions.The new
PPL_QUERY_TIMEOUTkey is appropriately placed in the PPL Settings section with consistent naming.opensearch/src/main/java/org/opensearch/sql/opensearch/setting/OpenSearchSettings.java (2)
374-379: LGTM - Setting registration follows established patterns.The registration in the constructor with
Updaterand exposure viapluginSettings()is consistent with other settings in this file.Also applies to: 654-654
73-78: I'll help you verify the review comment about the PPL_QUERY_TIMEOUT_SETTING. Let me investigate whether the timeout can be disabled and understand the constraints aroundpositiveTimeSetting.
<function_calls>
OpenSearch positiveTimeSetting minimum value validation constraints
</function_calls>Now let me examine the codebase to see how other timeout settings are configured:
<function_calls>
#!/bin/bashSearch for other timeout settings in the file to understand the pattern
rg -n "timeSetting|TimeValue" opensearch/src/main/java/org/opensearch/sql/opensearch/setting/OpenSearchSettings.java -A 3 -B 1
</function_calls><function_calls>
#!/bin/bashSearch for other timeout/time-related settings across the codebase to understand patterns
rg -n "timeValueSeconds|timeValueMinutes|positiveTimeSetting|optionalTimeSetting" --type java -A 2 | head -80
</function_calls><function_calls>
#!/bin/bashLook for how timeouts are used and if there's a disable pattern
rg -n "TimeValue|timeout" opensearch/src/main/java/org/opensearch/sql/opensearch/setting/OpenSearchSettings.java -B 1 -A 3 | head -100
</function_calls>opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/AggregateIndexScanRule.java (1)
42-50: LGTM - Clean migration to InterruptibleRelRule.The rule now inherits automatic interrupt checking from
InterruptibleRelRule. TheonMatchImplimplementation preserves all existing logic unchanged.plugin/src/main/java/org/opensearch/sql/plugin/config/OpenSearchPluginModule.java (1)
83-85: LGTM - Clean dependency injection update.Adding
Settingsparameter enables theOpenSearchQueryManagerto access the timeout configuration. The change follows the existing DI patterns in this module.opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java (2)
36-41: LGTM - Timeout configuration retrieval.The timeout value is correctly retrieved from settings and passed to the scheduling mechanism.
51-60: I'll help you verify the review comment about timeout scheduling. Let me investigate the thread pool configuration and how timeout tasks are handled in this codebase.
<function_calls>
#!/bin/bashSearch for ThreadPool and Scheduler references in the executor directory
rg -n "ThreadPool|Scheduler|ScheduledCancellable" --type=java opensearch/src/main/java/org/opensearch/sql/opensearch/executor/ -A 2 -B 2
</function_calls>opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (1)
30-38: LGTM! Consistent migration to interruptible rule pattern.The changes correctly extend
InterruptibleRelRuleand rename the entry point toonMatchImplwithprotectedvisibility. This allows the base class to perform interruption checks before delegating to the concrete implementation. The core dedup pushdown logic remains unchanged.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ProjectIndexScanRule.java (1)
29-37: LGTM! Correctly adopts the interruptible rule pattern.The migration to
InterruptibleRelRuleandonMatchImplfollows the same pattern as other rules in this PR. The projection pushdown logic inapply()remains intact.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/FilterIndexScanRule.java (1)
20-28: LGTM! Interruptible pattern applied consistently.The base class migration and
onMatchImplpattern align with the other rule changes in this PR. The filter pushdown behavior is preserved.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/LimitIndexScanRule.java (1)
24-31: LGTM! Consistent with the interruptible rule migration pattern.The changes follow the same structure as the other rule migrations in this PR. The limit pushdown logic and helper methods remain unaffected.
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java (1)
26-33: LGTM! Clean migration to InterruptibleRelRule.The refactoring correctly extends
InterruptibleRelRuleand moves the matching logic to the protectedonMatchImplmethod. This enables the base class to handle interrupt checks before delegating to the actual rule logic.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortIndexScanRule.java (1)
17-36: LGTM! Consistent migration pattern.The rule correctly extends
InterruptibleRelRuleand implementsonMatchImpl. The sort pushdown logic remains unchanged.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RelevanceFunctionPushdownRule.java (1)
29-52: LGTM! Migration follows established pattern.The rule correctly extends
InterruptibleRelRulewithonMatchImpl. The relevance function detection logic and conditional pushdown remain unchanged.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortProjectExprTransposeRule.java (1)
39-47: LGTM! Clean refactoring with unchanged transpose logic.The migration to
InterruptibleRelRuleandonMatchImplis consistent with other rules. The complex collation transformation logic remains intact.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ExpandCollationOnProjectExprRule.java (1)
45-53: LGTM! Migration to InterruptibleRelRule is correct.The refactoring follows the established pattern with
onMatchImplas the protected entry point.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortAggregateMeasureRule.java (2)
19-27: LGTM! Consistent migration to InterruptibleRelRule.The class correctly extends
InterruptibleRelRuleand implements the protectedonMatchImpltemplate method. The base class handles interruption checks in itsonMatchbefore delegating here.
28-34: Core logic preserved correctly.The push-down logic is unchanged—retrieving the
LogicalSortandCalciteLogicalIndexScan, attempting the push-down, and transforming if successful.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortExprIndexScanRule.java (3)
29-29: LGTM! Import and class signature updated correctly.The new
OpenSearchRuleConfigimport and the extension ofInterruptibleRelRulefollow the same migration pattern applied across other rules in this PR.Also applies to: 42-42
48-112: LGTM! Method correctly refactored to onMatchImpl.The complex sort expression push-down logic is preserved with proper early-return guards:
- Checks sort references expressions
- Validates expression simplicity
- Verifies scan collation compatibility
- Extracts and validates sort digests
- Handles limit/offset push-down
The multiple early returns keep the logic efficient by avoiding unnecessary work when conditions aren't met.
234-260: LGTM! Config interface properly updated.The
Configinterface now extendsOpenSearchRuleConfiginstead ofRelRule.Config, aligning with theInterruptibleRelRulebase class requirements and maintaining consistency with other migrated rules.
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortProjectExprTransposeRule.java (1)
39-47: LGTM! Consistent with the timeout refactoring pattern.This file follows the same clean refactoring pattern as the other rules, migrating to
InterruptibleRelRuleand implementingonMatchImpl. The internal logic is preserved.The PR summary notes that "Tests not added". Consider adding integration tests to verify that:
- Timeout interruption properly stops rule matching during planning
- Rules continue to function correctly in the normal (non-timeout) case
- Thread interruption is handled gracefully without leaving the planner in an inconsistent state
Do you want assistance generating test cases for the timeout behavior?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/AggregateIndexScanRule.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ExpandCollationOnProjectExprRule.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortProjectExprTransposeRule.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (21, doc)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, unit)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (4)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (2)
33-33: LGTM! Migration to InterruptibleRelRule aligns with timeout support.The migration from
RelRuletoInterruptibleRelRulecorrectly implements the timeout functionality described in the PR objectives. This enables the rule to be interrupted during long-running planning operations.
41-41: LGTM! Method signature change correctly implements the interruptible pattern.The rename from
onMatchtoonMatchImpland visibility change frompublictoprotectedcorrectly follows the pattern where the base classInterruptibleRelRuleprovides aonMatchmethod that handles interruption checks before delegating to the implementation.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ExpandCollationOnProjectExprRule.java (1)
46-53: LGTM! Consistent with the timeout refactoring pattern.This file follows the same clean refactoring pattern as
AggregateIndexScanRule, migrating toInterruptibleRelRuleand implementingonMatchImpl. The internal logic is preserved.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/AggregateIndexScanRule.java (1)
42-50: Based on my verification attempts, I need to rewrite this review comment. The original review requests verification ofInterruptibleRelRulebase class implementation, but extensive searches across GitHub repositories, including opensearch-project and all public code, found no evidence that this class exists. This is a critical issue that prevents completing the requested verification.
Unable to verify base class implementation — InterruptibleRelRule not found in public repositories.
The review requests verification that
InterruptibleRelRuleproperly implements interrupt checking before delegating toonMatchImpl. However, searches across opensearch-project GitHub repositories, the broader GitHub codebase, and Apache Calcite (the query planning framework used) found no evidence thatInterruptibleRelRuleexists.This base class either:
- Is newly introduced in this PR and not yet available for inspection
- Exists only in a private/internal branch
- May be incorrectly named in the review
Action required: Verify that
InterruptibleRelRuleexists and provide access to its implementation for review, or clarify the actual base class name if this is a refactoring error.
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
Show resolved
Hide resolved
Signed-off-by: Simeon Widdis <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java (1)
70-75: Exception type check may miss direct InterruptedException.The check
e.getCause() instanceof InterruptedExceptiononly examines the immediate cause. If the exception itself is anInterruptedException(not wrapped), or if theInterruptedExceptionis nested deeper, it won't be detected.Consider a more robust check:
- if (Thread.interrupted() || e.getCause() instanceof InterruptedException) { + if (Thread.interrupted() + || e instanceof InterruptedException + || e.getCause() instanceof InterruptedException) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java (1)
common/src/main/java/org/opensearch/sql/common/setting/Settings.java (1)
Settings(17-105)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (21, unit)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (25, integration)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java (2)
84-90: Context propagation looks correct.The
withCurrentContextmethod properly captures and restores the Log4jThreadContextfor cross-thread execution.
37-38: Based on my research, I've found that the OpenSearch SQL documentation indicates thatPPL_QUERY_TIMEOUThas a default value of 1 minute (60 seconds). However, I cannot definitively verify the null-safety behavior of theSettings.getSettingValue()method without direct access to the codebase.The concern raised in the review comment is theoretically valid—if
getSettingValue()can return null for any reason (misconfiguration, missing setting, etc.), this would cause issues. However, based on standard OpenSearch Settings patterns, settings with defaults are typically guaranteed to return non-null values through theirdefaultValuefunction.Since I cannot conclusively verify the actual implementation of
Settings.getSettingValue()or confirm whether null checks are necessary:
Verify that
Settings.getSettingValue(Settings.Key.PPL_QUERY_TIMEOUT)is guaranteed to return a non-nullTimeValue.The review comment raises a valid concern about potential null pointer exceptions. While OpenSearch documentation confirms that
PPL_QUERY_TIMEOUThas a 1-minute default, the null-safety guarantee ofgetSettingValue()cannot be verified without reviewing the Settings implementation. Confirm whether this method can return null or if the default is always guaranteed.
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Simeon Widdis <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java (2)
27-38: Clarify semantics ofPPL_QUERY_TIMEOUT(null/zero/disabled).
submit()always retrievesPPL_QUERY_TIMEOUTand schedules a timeout with it. If this setting can benull, negative, or0to mean “no timeout”,threadPool.schedule(..., timeout, ...)will either NPE or cause immediate interrupts.Consider explicitly handling “disabled” or invalid values here, e.g. by skipping the timeout scheduling when the configured value is
nullor non‑positive, or by enforcing a sane lower bound.
62-78: Make interrupt detection more robust by also handlingInterruptedExceptiondirectly.The timeout path currently treats a timeout as:
if (Thread.interrupted() || e.getCause() instanceof InterruptedException) { ... }This misses the case where
task.run()throws anInterruptedExceptiondirectly (rather than wrapping it as a cause). In that case, the interruption would not be recognized as a timeout and would be rethrown as‑is.You can make this more robust with a small change:
- // Special-case handling of timeout-related interruptions - if (Thread.interrupted() || e.getCause() instanceof InterruptedException) { + // Special-case handling of timeout-related interruptions + if (Thread.interrupted() + || e instanceof InterruptedException + || e.getCause() instanceof InterruptedException) { LOG.error("Query was interrupted due to timeout after {}", timeout); throw new OpenSearchTimeoutException( "Query execution timed out after " + timeout); }This preserves the existing interrupt‑clearing behavior while correctly classifying direct
InterruptedExceptionthrows as timeouts.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (21, integration)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: build-linux (21, doc)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java (1)
43-61: Timeout scheduling strategy looks solid.Using a wrapped task with
withCurrentContext(...)and scheduling the timeout interrupt onThreadPool.Names.GENERICwhile the query itself runs on thesql-workerpool addresses the original starvation problem and keeps the timeout trigger independent of worker exhaustion.This structure is reasonable and aligns with the PR’s stated goals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java (1)
71-76: Consider deeper cause chain inspection for wrapped InterruptedException.The current check only inspects the direct cause. If
InterruptedExceptionis wrapped multiple levels deep (e.g.,RuntimeException(SomeException(InterruptedException))), it won't be detected as a timeout. This is a minor edge case, but for robustness you could add a helper to traverse the cause chain:private boolean hasCause(Throwable t, Class<? extends Throwable> causeType) { while (t != null) { if (causeType.isInstance(t)) return true; t = t.getCause(); } return false; }Then use:
Thread.interrupted() || hasCause(e, InterruptedException.class)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (25, unit)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (21, unit)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: CodeQL-Scan (java)
- GitHub Check: WhiteSource Security Check
🔇 Additional comments (4)
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchQueryManager.java (4)
10-21: Imports are appropriate for the added functionality.The new imports align with the timeout handling requirements.
27-31: LGTM!Logger and Settings fields are appropriate additions for the timeout functionality.
44-83: Well-structured timeout handling with past review concerns addressed.Good implementation:
- Timeout task correctly scheduled on
ThreadPool.Names.GENERICto avoid worker pool exhaustion issues.- Interrupt flag properly cleared after successful completion to prevent leakage to subsequent pooled tasks.
- Wrapped task preserves thread context.
37-39: Verify thatPPL_QUERY_TIMEOUThas a default value configured.If
settings.getSettingValue(Settings.Key.PPL_QUERY_TIMEOUT)returnsnull, this could cause issues when passed to the scheduler. Check that the setting is registered with a sensible default inOpenSearchSettingsand that theschedule()method handles null values safely.
penghuo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could u add IT to verify the exception throw correctly?
Per side-channel discussion, let's fast-follow this |
|
|
||
| /** PPL Settings. */ | ||
| PPL_ENABLED("plugins.ppl.enabled"), | ||
| PPL_QUERY_TIMEOUT("plugins.ppl.query.timeout"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this impact both SQL and PPL queries or only PPL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only implemented it with Calcite in mind, but I think it applies to both (up to whatever hits OSQueryManager)
| Scheduler.ScheduledCancellable timeoutTask = | ||
| threadPool.schedule( | ||
| () -> { | ||
| LOG.warn( | ||
| "Query execution timed out after {}. Interrupting execution thread.", | ||
| timeout); | ||
| executionThread.interrupt(); | ||
| }, | ||
| timeout, | ||
| ThreadPool.Names.GENERIC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can check later if ScheduledExecutorService inside ThreadPool already provides scheduling task with timeout.
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4857-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 679d8ca6415cb81a56ec79bd499e5647d57f79d1
# Push it to GitHub
git push --set-upstream origin backport/backport-4857-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-devThen, create a pull request where the |
Description
Patch to address a symptom of #4842. Currently, the only timeout logic we support is in the underlying DSL, which works if we submit a slow query to OpenSearch but fails if we have exhaustion in any other part of the code. During those failures, the sql-worker thread is frozen and takes 100% CPU until completion, which can be several hours (and potentially bring down cluster nodes).
This PR adds some basic configurable timeout handling, and puts interrupt points in our index enumeration and planning rules.
Related Issues
Related to #4842
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
New Features
Documentation
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.