Skip to content

Conversation

@kaushalmahi12
Copy link
Contributor

Description

This change introduces cancellation checks for early termination in FetchPhase during shard level reduce and co-ordinator level reduce.

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Kaushal Kumar <[email protected]>
Signed-off-by: Kaushal Kumar <[email protected]>
@kaushalmahi12 kaushalmahi12 changed the title add coordinator cancellation check in FetchPhase add cancellation check in FetchPhase for Agg reductions Jul 3, 2025
@kkhatua
Copy link
Member

kkhatua commented Jul 3, 2025

Follow up PR to #18426 and covering the FetchPhase

Tagging @jainankitk , @jed326 and @sgup432 for review

@github-actions
Copy link
Contributor

github-actions bot commented Jul 3, 2025

❌ Gradle check result for 4093508: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kaushalmahi12
Copy link
Contributor Author

Looking into test failures

Signed-off-by: Kaushal Kumar <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Jul 3, 2025

✅ Gradle check result for e610d4a: SUCCESS

@codecov
Copy link

codecov bot commented Jul 3, 2025

Codecov Report

❌ Patch coverage is 93.87755% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.74%. Comparing base (223f9fd) to head (e610d4a).
⚠️ Report is 384 commits behind head on main.

Files with missing lines Patch % Lines
...pensearch/action/search/TransportSearchAction.java 50.00% 2 Missing ⚠️
...pensearch/action/search/SearchPhaseController.java 90.90% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #18673      +/-   ##
============================================
+ Coverage     72.72%   72.74%   +0.02%     
- Complexity    68409    68439      +30     
============================================
  Files          5566     5568       +2     
  Lines        314292   314369      +77     
  Branches      45579    45586       +7     
============================================
+ Hits         228554   228691     +137     
+ Misses        67153    67105      -48     
+ Partials      18585    18573      -12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@sgup432 sgup432 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we are changing all the desired method signatures to pass the cancellation checks. Doesn't look a nice way to do it.

Instead, I was thinking whether we can create a generic framework(or maybe reuse existing) where we can pass this isTaskCancelled in a more efficient way without change method signatures?
Like maybe storing this info in ThreadLocal(or reusing ThreadContext) where every thread has this info and which can also be propagated along? And we can directly call a static method and fetch that info.

This way we could use this framework for later usecases as well to extend the soft cancellation checks.

@kaushalmahi12
Copy link
Contributor Author

kaushalmahi12 commented Jul 4, 2025

Thanks @sgup432 ! It is a valid suggestion and I agree that the propagation of task cancellation is not very clean but even to pass the ThreadContext via ThreadPool still requires propagating threadPool. In addition to this threadContext is exposed to the whole lifecycle of search request it becomes very hard to maintain the validity of the state since any new change can stash or restore it.

I feel maybe we need to give it more thought to refactor the query lifecycle in such a way that it is less intrusive to access the request state.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Aug 4, 2025
Copy link
Contributor

@jainankitk jainankitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kaushalmahi12 for raising this PR. While the functionality itself of being able to effectively cancel the long running search tasks is really useful, we should look into better ways of achieving the same to also ensure the backward compatibility across versions.

@jainankitk
Copy link
Contributor

@kaushalmahi12 - Can we move this PR into draft state until we resume active work on it?

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Sep 5, 2025
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stalled Issues that have stalled

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants