Skip to content

Defer circuit breaker release until transport write completes#143136

Merged
drempapis merged 70 commits intoelastic:mainfrom
drempapis:fix/release-breaker-after-send-response
Mar 19, 2026
Merged

Defer circuit breaker release until transport write completes#143136
drempapis merged 70 commits intoelastic:mainfrom
drempapis:fix/release-breaker-after-send-response

Conversation

@drempapis
Copy link
Copy Markdown
Contributor

@drempapis drempapis commented Feb 26, 2026

Problem

When a search response is ready, SearchService serializes it and hands it to Netty for writing to the network. The problem is that Netty's writeAndFlush queues the bytes into an internal buffer and returns; the data hasn't left the JVM yet.

In the current implementation, the circuit breaker reservation is released right after this synchronous return. Under normal conditions, the gap is tiny, but when the network is slow, or many responses are in flight at the same time, bytes pile up in Netty's write queue while the breaker thinks the memory is free. This lets the node accept more work than it can handle, potentially leading to OOM.

The breaker tracks the search hit objects, not the serialized bytes. Those objects are freed (via decRef) shortly after serialization. However, a serialized copy of that same data still sits in Netty's buffer, consuming heap until the write finishes.

image

What changed

  1. The fetch phase builds the search hits and reserves circuit-breaker bytes for them.
  2. On the network path, the response is serialized into a circuit-breaker-aware byte stream. At this point, both the response objects and the serialized bytes are on the heap and tracked by the breaker.
  3. As soon as serialization completes, the search layer releases the response object bytes in a finally block. The heavy Java objects are no longer needed and can be garbage-collected.
  4. The serialized bytes remain tracked on the breaker while they sit in Netty's write queue. When Netty finishes writing and drops its reference, the pages are freed and the breaker is decremented.
  5. On the direct (same-node) path, no serialization happen, the response is forwarded as-is, and the search layer releases the response object bytes immediately after hand-off.

@elasticsearchmachine elasticsearchmachine added v9.4.0 needs:triage Requires assignment of a team area label labels Feb 26, 2026
@drempapis drempapis added >non-issue Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch :Search Foundations/Search Catch all for Search Foundations and removed needs:triage Requires assignment of a team area label labels Feb 26, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@drempapis
Copy link
Copy Markdown
Contributor Author

@DaveCTurner Could you share your thoughts on the approach we took here? Before his vacation, Andrei mentioned that you had discussed another possible solution, but I didn’t get many details. If this approach isn’t the right one, could you guide me on how you think we should move forward?

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no we don't want to do this. Outbound transport messages already have this lifecycle, let's not add another one.

The exact behaviour depends on whether you're using a BytesTransportResponse or not. With a BytesTransportResponse the content is retained via ref-counting, with a decRef() once the message is sent. With other responses the message content is copied into the final network message before sendResponse returns, so there's no need to retain anything after that point.

@DaveCTurner
Copy link
Copy Markdown
Member

Sorry I forgot to highlight the impact on CBs before hitting "send".

If the messages are large enough to warrant CB integration then they should be using a BytesTransportResponse. If you don't then we are doubling(!) the actual memory usage by serializing it to bytes during sendResponse(), but those additional bytes are not tracked in a CB.

Copy link
Copy Markdown
Contributor

@spinscale spinscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only left minor nits, that I'll leave up to you.

Copy link
Copy Markdown
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Dimitris 🚀

@drempapis drempapis merged commit 9bf7d24 into elastic:main Mar 19, 2026
36 checks passed
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Mar 20, 2026
…c#143136)

- The fetch phase builds the search hits and reserves circuit-breaker bytes for them.
- On the network path, the response is serialized into a circuit-breaker-aware byte stream. At this point, both the response objects and the serialized bytes are on the heap and tracked by the breaker.
- As soon as serialization completes, the search layer releases the response object bytes in a finally block. The heavy Java objects are no longer needed and can be garbage-collected.
- The serialized bytes remain tracked on the breaker while they sit in Netty's write queue. When Netty finishes writing and drops its reference, the pages are freed and the breaker is decremented.
- On the direct (same-node) path, no serialization happen, the response is forwarded as-is, and the search layer releases the response object bytes immediately after hand-off.
michalborek pushed a commit to michalborek/elasticsearch that referenced this pull request Mar 23, 2026
…c#143136)

- The fetch phase builds the search hits and reserves circuit-breaker bytes for them.
- On the network path, the response is serialized into a circuit-breaker-aware byte stream. At this point, both the response objects and the serialized bytes are on the heap and tracked by the breaker.
- As soon as serialization completes, the search layer releases the response object bytes in a finally block. The heavy Java objects are no longer needed and can be garbage-collected.
- The serialized bytes remain tracked on the breaker while they sit in Netty's write queue. When Netty finishes writing and drops its reference, the pages are freed and the breaker is decremented.
- On the direct (same-node) path, no serialization happen, the response is forwarded as-is, and the search layer releases the response object bytes immediately after hand-off.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants