Skip to content

Fix AsyncSearchErrorTraceIT_ testAsyncSearchFailingQueryErrorTraceFalse#137078

Merged
drempapis merged 1 commit intoelastic:mainfrom
drempapis:fix/AsyncSearchErrorTraceI_testAsyncSearchFailingQueryErrorTraceFalse
Oct 24, 2025
Merged

Fix AsyncSearchErrorTraceIT_ testAsyncSearchFailingQueryErrorTraceFalse#137078
drempapis merged 1 commit intoelastic:mainfrom
drempapis:fix/AsyncSearchErrorTraceI_testAsyncSearchFailingQueryErrorTraceFalse

Conversation

@drempapis
Copy link
Contributor

@drempapis drempapis commented Oct 24, 2025

The exception is thrown while attempting to acquire a shard lock. The key stack frames show that it occurs during the after-test checks (assertAfterTest), when ESIntegTestCase runs its “assert after test” cleanup. At that point, it’s trying to lock shard 0 of the .async-search index, but the shard is still in the starting state because the async task hasn’t fully finished. As a result, the lock attempt times out after 5 seconds.

Caused by: org.elasticsearch.env.ShardLockObtainFailedException: [.async-search][0]: obtaining shard lock for [InternalTestCluster assert after test] timed out after [5000ms]; this shard lock is still held by a different instance of the shard and has been in state [starting shard] for [5.3s/5310ms] |  
  | at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:895) |  
  | at org.elasticsearch.test.InternalTestCluster.assertAfterTest(InternalTestCluster.java:2617) |  
  | at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:612) |  
  | at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2620) |  
  | at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) |  
  | at java.lang.reflect.Method.invoke(Method.java:565) |  
  | at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1763) |  
  | at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004) |  
  | at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) |  
  | at org.junit.rules.RunRules.evaluate(RunRules.java:20) |  
  | at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48) |  
  | at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) |  
  | at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45) |  
  | at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) |  
  | at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
...

The REST response returns is_running as a Boolean, but the code reads it as a String and evaluates true.equals("true"), which is always false. As a result, the wait/poll step is skipped, and the test executes the assertion immediately. This creates a race condition where the async search has not yet finished propagating results between the data and coordinating nodes, causing ErrorTraceHelper.assertStackTraceCleared(..) to occasionally run too early and fail.

 if (createAsyncResponseEntity.get("is_running").equals("true")) 

Closes #136986 #136691

@drempapis drempapis requested a review from benchaplin October 24, 2025 07:36
@drempapis drempapis added >test Issues or PRs that are addressing/adding tests Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch :Search Foundations/Search Catch all for Search Foundations v9.2.0 labels Oct 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@drempapis drempapis added backport auto-backport Automatically create backport pull requests when merged and removed backport labels Oct 24, 2025
Copy link
Contributor

@benchaplin benchaplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, nice catch! Resolves #133010 as well?

@benchaplin
Copy link
Contributor

Hm, another note - I wrote these tests completely wrong! The assertions work in request handlers, meaning they should be set BEFORE the async search starts. I'll open a PR once you merge @drempapis...

@drempapis
Copy link
Contributor Author

drempapis commented Oct 24, 2025

Ha, nice catch! Resolves #133010 as well?

I’ve updated the test method testAsyncSearchFailingQueryErrorTraceDefault in this PR to address the same issue

@drempapis drempapis merged commit 558650b into elastic:main Oct 24, 2025
34 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
9.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch >test Issues or PRs that are addressing/adding tests v9.2.0 v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] AsyncSearchErrorTraceIT testDataNodeLogsStackTrace failing

3 participants