Skip to content

Fix circuit breaker leak in percolator query construction#144827

Merged
drempapis merged 21 commits intoelastic:mainfrom
drempapis:fix/cb_leak_percolator
Mar 26, 2026
Merged

Fix circuit breaker leak in percolator query construction#144827
drempapis merged 21 commits intoelastic:mainfrom
drempapis:fix/cb_leak_percolator

Conversation

@drempapis
Copy link
Copy Markdown
Contributor

Problem

PR #142150 added CB accounting for automaton-based queries (wildcard, regexp, fuzzy, prefix), but introduced a leak for percolator workloads (#144748).

For each percolator document, createStore() creates a shallow copy of the outer SearchExecutionContext. The copy shares the same physical CircuitBreaker but has its own fresh queryConstructionMemoryUsed counter. When toQuery() reserves CB bytes on the copy, the outer context's release (called at request end) drains its own counter, which is always zero. The copy's bytes are never freed.

Fix

Add a try/finally in createStore() that calls percolateShardContext.releaseQueryConstructionMemory() after each
document's query is built. CB protection during construction is unaffected.

I used the repository https://github.com/NikolajLeischner/elasticsearch-9.3.2-circuit-breaker-bug to validate the behavior after applying the changes. Prior to the changes, the leak is reproducible.

Request circuit breaker: ZLWG9Iu4SmaOIFADk4DjZw request 153.5mb 49.6kb 0

Request circuit breaker: ZLWG9Iu4SmaOIFADk4DjZw request 153.5mb 99.3kb 0

Request circuit breaker: ZLWG9Iu4SmaOIFADk4DjZw request 153.5mb 148.9kb 0

Request circuit breaker: ZLWG9Iu4SmaOIFADk4DjZw request 153.5mb 198.6kb 0

Request circuit breaker: ZLWG9Iu4SmaOIFADk4DjZw request 153.5mb 248.3kb 0

However, when using a Docker image built locally from the latest snapshot version, the leak no longer occurs.

Request circuit breaker: ZLWG9Iu4SmaOIFADk4DjZw request 153.5mb 0b 0

Request circuit breaker: ZLWG9Iu4SmaOIFADk4DjZw request 153.5mb 0b 0

Request circuit breaker: ZLWG9Iu4SmaOIFADk4DjZw request 153.5mb 0b 0

Request circuit breaker: ZLWG9Iu4SmaOIFADk4DjZw request 153.5mb 0b 0

@drempapis drempapis added >bug auto-backport Automatically create backport pull requests when merged Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch :Search Foundations/Search Catch all for Search Foundations v8.19.0 v9.2.0 v9.3.0 v9.4.0 labels Mar 24, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @drempapis, I've created a changelog YAML for you.

@drempapis drempapis requested a review from spinscale March 24, 2026 08:52
.put("indices.breaker.request.overhead", "1.0")
.build();
ClusterSettings clusterSettings = new ClusterSettings(breakerSettings, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS);
CircuitBreaker circuitBreaker = new HierarchyCircuitBreakerService(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use the circuit breaker from the ESTestCase class again, that automatically tracks if it is freed again at test end? Might also make sense in the other unit test here? Could they be unified or too different?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, both tests now use newLimitedBreaker(ByteSizeValue.ofMb(100)) from ESTestCase

The two tests are kept separate, one verifies round-trip serialization with TermQueryBuilder while the other requires automaton-based queries to exercise the CB code path.

Copy link
Copy Markdown
Contributor

@spinscale spinscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, do you see any other use-cases, where a similar pattern could occur that went unnoticed?

spinscale

This comment was marked as duplicate.

return queryBuilder.toQuery(percolateShardContext);
} finally {
percolateShardContext.releaseQueryConstructionMemory();
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drempapis, the same pattern exists in PhraseSuggester:138, where the circuit breaker accumulates bytes from all iterations, even though only one query is alive at any given time.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Percolator queries are unique in that they are executed in multiple threads. More than one query can be alive at any time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reungn, thanks for spotting that.

In PhraseSuggester, wrapped the toQuery() call in try/finally with releaseQueryConstructionMemory() after each correction iteration, since only one query is alive at a time there (sequential loop), releasing per-iteration is safe.

For ExpressionQueryList, the same try/finally pattern applies. It is called once per row in an ESQL lookup join, sequentially. The built query is used and completed before the next row starts.

For InnerHitContextBuilder, the query is stored inside InnerHitsContext and remains live throughout the fetch phase, so releasing immediately after toQuery() would be too early. The outer context is already registered with SearchService.addReleasable, which handles cleanup.

@benwtrent benwtrent requested a review from davidkyle March 24, 2026 11:03
return queryBuilder.toQuery(percolateShardContext);
} finally {
percolateShardContext.releaseQueryConstructionMemory();
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Percolator queries are unique in that they are executed in multiple threads. More than one query can be alive at any time.

queryBuilder = Rewriteable.rewrite(queryBuilder, percolateShardContext);
return queryBuilder.toQuery(percolateShardContext);
} finally {
percolateShardContext.releaseQueryConstructionMemory();
Copy link
Copy Markdown
Member

@davidkyle davidkyle Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this release the memory too early while it is still in use?

PercolateQueryBuilder::newPercolateSearchContext() overrides some methods on the new SearchExecutionContext specific to percolate. If the created context delegated the calls of addCircuitBreakerMemory, getQueryConstructionMemoryUsed and releaseQueryConstructionMemory to the source context then you wouldn't have the problem where the copy was being updated and you can rely on the usual release mechanism.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @davidkyle.

Instead of releasing in a finally block, newPercolateSearchContext() now overrides addCircuitBreakerMemory, getQueryConstructionMemoryUsed, and releaseQueryConstructionMemory as suggested, to delegate back to the source context. This means all CB accounting flows through the outer request-scoped context, which SearchService already releases via addReleasable at the request end covering the full lifetime of all concurrent queries correctly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are tests covering the examined classes that can be found in the comment #144827 (comment)

@davidkyle
Copy link
Copy Markdown
Member

I think the leak could also occur at

executionContext = PercolateQueryBuilder.newPercolateSearchContext(executionContext, isMapUnmappedFieldAsText());
where a copy of the context is made and that is passed to toQuery()

@drempapis
Copy link
Copy Markdown
Contributor Author

@davidkyle In PercolatorFieldMapper, the SearchExecutionContext supplied to MapperService is the indexing-time context, created in IndexService with circuitBreaker = null. The addCircuitBreakerMemory() is a no-op, no bytes are ever charged to the CB on this path, so there is nothing to leak.

Copy link
Copy Markdown
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@drempapis drempapis merged commit b36fdbc into elastic:main Mar 26, 2026
36 checks passed
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts
9.2 Commit could not be cherrypicked due to conflicts
9.3 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 144827

drempapis added a commit to drempapis/elasticsearch that referenced this pull request Mar 26, 2026
…4827)

(cherry picked from commit b36fdbc)

# Conflicts:
#	modules/percolator/src/test/java/org/elasticsearch/percolator/QueryBuilderStoreTests.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/ExpressionQueryList.java
drempapis added a commit to drempapis/elasticsearch that referenced this pull request Mar 26, 2026
…4827)

(cherry picked from commit b36fdbc)

# Conflicts:
#	modules/percolator/src/test/java/org/elasticsearch/percolator/QueryBuilderStoreTests.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/ExpressionQueryList.java
drempapis added a commit to drempapis/elasticsearch that referenced this pull request Mar 26, 2026
…4827)

(cherry picked from commit b36fdbc)

# Conflicts:
#	modules/percolator/src/test/java/org/elasticsearch/percolator/QueryBuilderStoreTests.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/ExpressionQueryList.java
@drempapis
Copy link
Copy Markdown
Contributor Author

💚 All backports created successfully

Status Branch Result
9.3
9.2
8.19

Questions ?

Please refer to the Backport tool documentation

szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 26, 2026
* upstream/main: (146 commits)
  Revert "[Native] Gradle-related tweaks to improve handling of the simdvec native library (elastic#144539)"
  Fix ArrayIndexOutOfBoundsException in fetch phase with partial results (elastic#144385)
  ESQL: Correctly manage NULL data type for SUM (elastic#144942)
  [ESQL] Fixes GroupedTopNBenchmark not executing (elastic#144944)
  Fix reader context leak when query response serialization fails (elastic#144708)
  Validate individual offset values in BULK_OFFSETS bounds checks (elastic#144643)
  Merge main21 source set into main in simdvec (elastic#144921)
  [TEST] Unmute TsidExtractingIdFieldMapperTests (elastic#144848)
  [Native] Gradle-related tweaks to improve handling of the simdvec native library (elastic#144539)
  Fix `ThreadedActionListenerTests#testRejectionHandling` (elastic#144795)
  Add new DLM Frozen Tier Transition execution plugin and service (elastic#144595)
  Prometheus: execute query_range via parsed EsqlStatement plan (elastic#144416)
  Investigate `testBulkIndexingRequestSplitting` failure (elastic#144766)
  Add test utility for wrapping directories in FilterDirectory layer (elastic#143563)
  Fix ES|QL decay tests with negative scale (elastic#144657)
  Fix circuit breaker leak in percolator query construction (elastic#144827)
  Use XPerFieldDocValuesFormat in AbstractTSDBSyntheticIdCodec (elastic#144744)
  [DOCS] Document how reindex work in CPS (elastic#144016)
  Fix Int4 vector library tests failing on Java 21 (elastic#144830)
  [DiskBBQ] Fix index sorting on flush (elastic#144938)
  ...
elasticsearchmachine pushed a commit that referenced this pull request Mar 26, 2026
) (#144998)

* Fix circuit breaker leak in percolator query construction (#144827)

(cherry picked from commit b36fdbc)

# Conflicts:
#	modules/percolator/src/test/java/org/elasticsearch/percolator/QueryBuilderStoreTests.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/ExpressionQueryList.java

* revert code

* Delete x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/enrich/ExpressionQueryListCircuitBreakerTests.java

* Update QueryBuilderStoreTests.java

* Rename variable for search execution context

* update after review
seanzatzdev pushed a commit to seanzatzdev/elasticsearch that referenced this pull request Mar 26, 2026
elasticsearchmachine pushed a commit that referenced this pull request Mar 26, 2026
) (#144999)

* Fix circuit breaker leak in percolator query construction (#144827)

(cherry picked from commit b36fdbc)

# Conflicts:
#	modules/percolator/src/test/java/org/elasticsearch/percolator/QueryBuilderStoreTests.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/ExpressionQueryList.java

* [CI] Auto commit changes from spotless

* Update ExpressionQueryList.java

* Delete x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/enrich/ExpressionQueryListCircuitBreakerTests.java

* update code

* update

* update

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
seanzatzdev pushed a commit to seanzatzdev/elasticsearch that referenced this pull request Mar 27, 2026
elasticsearchmachine pushed a commit that referenced this pull request Mar 27, 2026
…4827) (#145000)

* Fix circuit breaker leak in percolator query construction (#144827)

(cherry picked from commit b36fdbc)

# Conflicts:
#	modules/percolator/src/test/java/org/elasticsearch/percolator/QueryBuilderStoreTests.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/ExpressionQueryList.java

* [CI] Auto commit changes from spotless

* Delete x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/enrich/ExpressionQueryListCircuitBreakerTests.java

* update

* Remove accidental .claude files

* restore files

* update code

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
mamazzol pushed a commit to mamazzol/elasticsearch that referenced this pull request Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v8.19.0 v9.2.0 v9.3.0 v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants