Implement chunked fetch streaming with circuit breaker integration by drempapis · Pull Request #139124 · elastic/elasticsearch

drempapis · 2025-12-05T14:15:10Z

In the current implementation, when Elasticsearch executes a search query that returns a large number of documents, the fetch phase retrieves the actual document content from each shard, which can lead to significant memory pressure on data nodes.

Data Node
- All SearchHit objects are built and held in memory simultaneously before being serialized and sent to the coordinator. For large result sets (e.g., 1000 or more documents with nested fields), this can consume gigabytes of heap memory.
Transport
- Big messages are transferred through the network.
Coordinator Node
- Receives the complete response from each shard at once, accumulating all hits in memory before building the final response. With multiple shards, memory usage multiplies even for one query.
Result
- OutOfMemoryError (OOM) crashes, especially during concurrent large queries or when document sizes are unpredictable.

This PR implements chunked streaming for the fetch phase to reduce memory pressure when handling large result sets. Instead of accumulating all search hits in memory on the data node before sending them to the coordinator, hits are streamed in configurable chunks (default: 256 KB) as they are produced. Memory usage is bounded by circuit breakers on both the data and coordinator nodes.

How OOM is Prevented on the Data Node

Immediate Serialization
- Each SearchHit is serialized to bytes immediately after fetching, then the object is released. The bytes are enqueued in chunks for processing..
Byte-Based Chunking (default 256KB)
- Chunks are emitted when serialized bytes exceed the 256KB threshold. This bounds the maximum buffer size regardless of document count or size.
Circuit Breaker Reservation
- Before each chunk is enqueued for sending, memory is reserved via CB.addEstimateBytesAndMaybeBreak(). If the breaker trips (too much memory), the operation fails fast with CircuitBreakingException instead of OOM.
- Circuit breaker memory accounting is more accurate in this implementation. It tracks the full serialized SearchHit size (including all fields, metadata, and nested structures), whereas the traditional implementation only accounts for the _source field bytes.
ThrottledTaskRunner Backpressure
- Limits concurrent in-flight chunks to maxInFlightChunks. When at capacity, new chunks queue internally. This prevents unbounded chunk accumulation when the coordinator is slow.
ACK-Based Memory Release
- Circuit breaker memory is released only when the coordinator ACKs each chunk. This creates natural backpressure, if the coordinator is slow, data node memory stays high, eventually tripping the circuit-breaker.

How OOM is Prevented on the Coordinator Node

Incremental Chunk Reception
- Instead of receiving all hits at once, the coordinator receives small chunks (>= 256KB each). Memory grows incrementally as chunks arrive.
Circuit Breaker Tracking
- FetchPhaseResponseStream tracks accumulated bytes and reserves memory on the coordinator's circuit breaker (for all shards). If breaker trips, new chunks are rejected.
ACK Flow Control
- The coordinator only ACKs a chunk after successfully processing it. If the coordinator is overwhelmed, it stops ACKing, which throttles the data node via backpressure.
Cleanup on Failure
- If any error occurs, closeInternal() releases all circuit breaker bytes and cleans up accumulated hits, preventing memory leaks.

Flow Diagram

The implementation followed the paradigm of TransportRepositoryVerifyIntegrityCoordinationAction but it streams only between the coordinator and data-nodes.

DaveCTurner

I like it :)

server/src/main/java/org/elasticsearch/search/fetch/FetchPhaseDocsIterator.java

...rc/main/java/org/elasticsearch/search/fetch/chunk/TransportFetchPhaseCoordinationAction.java

server/src/main/java/org/elasticsearch/search/fetch/FetchPhaseDocsIterator.java

drempapis · 2025-12-23T08:42:35Z

@elasticmachine run elasticsearch-ci/part-2

DaveCTurner

Couple of thoughts about blocking of threads.

server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java

server/src/main/java/org/elasticsearch/search/fetch/FetchPhaseDocsIterator.java

drempapis · 2026-01-18T16:03:57Z

@elasticmachine run elasticsearch-ci/part-1

drempapis · 2026-01-18T18:49:22Z

@elasticmachine run elasticsearch-ci/part-2

…rch into chunked_fetch_phase

DaveCTurner · 2026-01-21T16:10:04Z

server/src/main/java/org/elasticsearch/search/fetch/FetchPhaseDocsIterator.java

+                        return false;
+                    }
+
+                    nextChunk = queue.poll(100, TimeUnit.MILLISECONDS);


Ok this is definitely better because at least it's only blocking a thread while fetching the docs locally, but now we need two threads.

I'm partly to blame for suggesting a ThrottledIterator here. That would have worked if we could have moved the fetch-from-disk process between threads but it doesn't fit here given the single-threadedness constraint. I think instead we need a new ThrottledTaskRunner("fetch", maxInFlightChunks, EsExecutors.DIRECT_EXECUTOR_SERVICE) to manage the queue.

@DaveCTurner thank you for the feedback!

I want to make sure I understand correctly. When you say use ThrottledTaskRunner with DIRECT_EXECUTOR_SERVICE, do you mean

Eliminate the producer-consumer pattern entirely and have the Lucene thread enqueue send tasks directly to ThrottledTaskRunner, which runs them inline when under capacity

Keep the producer-consumer pattern, but replace ThrottledIterator with ThrottledTaskRunner on the consumer

I've updated the implementation to use ThrottledTaskRunner

The iterateAsync method now uses a single ThrottledTaskRunner("fetch", maxInFlightChunks, DIRECT_EXECUTOR_SERVICE) to manage chunk sends.

The calling thread fetches documents sequentially, serializes hits into chunks, and enqueues send tasks directly to the ThrottledTaskRunner

Tasks run immediately on the calling thread via DIRECT_EXECUTOR_SERVICE when fewer than maxInFlightChunks are in flight

When at the limit, tasks queue internally until ACK callbacks signal completion, which triggers queued tasks

This is better than the custom-made producer/Consumer implementation. Νo thread blocks waiting for network I/O, and the producer thread is freed immediately after enqueueing, while memory usage is throttled by the CB on the data nodes to protect from OOM

drempapis · 2026-01-22T09:20:45Z

@elasticmachine run elasticsearch-ci/part-2

…ementation

drempapis · 2026-03-20T14:24:16Z

Buildkite benchmark this with noaa-3n-2g please

drempapis · 2026-03-21T18:48:32Z

Buildkite benchmark this with esql please

drempapis · 2026-03-22T06:01:42Z

Buildkite benchmark this with geoshape please

elasticmachine · 2026-03-22T06:05:35Z

💔 Build Failed

Buildkite Build
Commit: 0705998
Baseline: dee4729 (env ID 8a0efb60-c6e4-4451-8ef5-45f52c0af253)
Contender: 0705998 (env ID 7190765-8edb-4d6d-aef4-bb6ffc99c0c1)

Failed CI Steps

This build attempts two geoshape benchmarks to evaluate performance impact of this PR. To estimate benchmark completion time inspect previous nightly runs here.

History

💚 Build #483 succeeded 0705998
💔 Build #476 failed 27ca22b
💚 Build #475 succeeded c79fb54
💔 Build #474 failed c79fb54
💔 Build #473 failed 10e2a92

…lastic#139124)

elasticsearchmachine added the v9.3.0 label Dec 5, 2025

drempapis mentioned this pull request Dec 5, 2025

Fix SearchContext CB memory accounting #138002

Merged

9 tasks

DaveCTurner reviewed Dec 5, 2025

View reviewed changes

jimczi mentioned this pull request Dec 5, 2025

Improve Memory Accounting and Stability in FetchPhase and Multi-Search #139132

Open

9 tasks

drempapis added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch :Search Foundations/Search Catch all for Search Foundations >refactoring labels Dec 11, 2025

DaveCTurner reviewed Dec 16, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/search/fetch/FetchPhaseDocsIterator.java Outdated Show resolved Hide resolved

elasticsearchmachine added v9.4.0 and removed v9.3.0 labels Dec 17, 2025

DaveCTurner reviewed Dec 24, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java Outdated Show resolved Hide resolved

server/src/main/java/org/elasticsearch/search/fetch/FetchPhaseDocsIterator.java Outdated Show resolved Hide resolved

DaveCTurner reviewed Dec 29, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/search/fetch/FetchPhaseDocsIterator.java Outdated Show resolved Hide resolved

server/src/main/java/org/elasticsearch/search/fetch/FetchPhaseDocsIterator.java Outdated Show resolved Hide resolved

DaveCTurner reviewed Jan 16, 2026

View reviewed changes

server/src/main/java/org/elasticsearch/search/fetch/FetchPhaseDocsIterator.java Outdated Show resolved Hide resolved

drempapis added 9 commits January 20, 2026 17:25

use a queue for async consumption of lucene generated bytes

6fcc1ca

create tests form the Producer-Consumer pattern

36d0dd3

Merge branch 'chunked_fetch_phase' of github.com:drempapis/elasticsea…

61191cc

…rch into chunked_fetch_phase

Merge branch 'main' into chunked_fetch_phase

15b2775

Merge branch 'chunked_fetch_phase' of github.com:drempapis/elasticsea…

1e35a61

…rch into chunked_fetch_phase

apply spot and update transport version

768b4f3

Merge branch 'main' into chunked_fetch_phase

e4e4aeb

Merge branch 'main' into chunked_fetch_phase

c9be35b

Merge branch 'main' into chunked_fetch_phase

821ab00

DaveCTurner reviewed Jan 21, 2026

View reviewed changes

Merge branch 'main' into chunked_fetch_phase

663d412

drempapis added 2 commits January 22, 2026 14:16

fix test

fc2b941

Use a ThrottledTaskRunner rather than a custom producer/consumer impl…

c72e681

…ementation

drempapis added 4 commits March 20, 2026 16:37

Merge branch 'main' into chunked_fetch_phase

0ca27b9

Merge branch 'main' into chunked_fetch_phase

ba57235

Merge branch 'main' into chunked_fetch_phase

c644e32

Merge branch 'main' into chunked_fetch_phase

0705998

drempapis and others added 19 commits March 22, 2026 09:33

Merge branch 'main' into chunked_fetch_phase

97d3ed8

Merge branch 'main' into chunked_fetch_phase

f1fb579

Merge branch 'main' into chunked_fetch_phase

57b89c4

update transport version

cf53b1c

Merge branch 'main' into chunked_fetch_phase

935f673

Merge branch 'main' into chunked_fetch_phase

ea0bf5e

Merge branch 'main' into chunked_fetch_phase

93fe7f0

update transport version

14ca541

Merge branch 'main' into chunked_fetch_phase

b09529e

Merge branch 'main' into chunked_fetch_phase

4d5d9f1

Merge branch 'main' into chunked_fetch_phase

79d57a8

Merge branch 'main' into chunked_fetch_phase

4ce2b1b

Merge branch 'main' into chunked_fetch_phase

1b77ed6

Merge branch 'main' into chunked_fetch_phase

e861874

update code

d497b5b

update after review

1528d63

[CI] Auto commit changes from spotless

7f90408

Merge branch 'main' into chunked_fetch_phase

4976248

Merge branch 'main' into chunked_fetch_phase

3916b53

drempapis merged commit a7e2068 into elastic:main Mar 27, 2026
36 checks passed

mamazzol pushed a commit to mamazzol/elasticsearch that referenced this pull request Mar 30, 2026

Implement chunked fetch streaming with circuit breaker integration (e…

aa9d765

…lastic#139124)

drempapis mentioned this pull request Apr 6, 2026

Keep chunked fetch Lucene I/O on search threads drempapis/elasticsearch#1

Draft

Conversation

drempapis commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

drempapis commented Dec 23, 2025

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

drempapis commented Jan 18, 2026

Uh oh!

drempapis commented Jan 18, 2026

Uh oh!

DaveCTurner Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

drempapis Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

drempapis Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

drempapis commented Jan 22, 2026

Uh oh!

drempapis commented Mar 20, 2026

Uh oh!

drempapis commented Mar 21, 2026

Uh oh!

drempapis commented Mar 22, 2026

Uh oh!

elasticmachine commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💔 Build Failed

Failed CI Steps

History

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

drempapis commented Dec 5, 2025 •

edited

Loading

elasticmachine commented Mar 22, 2026 •

edited

Loading