Generalize `testClientCancellation` test by DaveCTurner · Pull Request #143586 · elastic/elasticsearch

DaveCTurner · 2026-03-04T14:21:12Z

In #143252 we saw a case where a chunked response appears to have
computed far too many chunks at once, with some indication that it might
relate to client cancellation. We have a test for this case already but
it's a little weak. This commit reworks the test to use a Netty HTTP
client giving much deeper control over the client's behaviour, allowing
to close the connection with either a FIN or a RST and to do so after
starting to receive the response body. It also randomizes the chunk size
and allows for a delay to simulate computation happening while yielding
the chunk.

In elastic#143252 we saw a case where a chunked response appears to have computed far too many chunks at once, with some indication that it might relate to client cancellation. We have a test for this case already but it's a little weak. This commit reworks the test to use a Netty HTTP client giving much deeper control over the client's behaviour, allowing to close the connection with either a FIN or a RST and to do so after starting to receive the response body. It also randomizes the chunk size and allows for a delay to simulate computation happening while yielding the chunk.

elasticsearchmachine · 2026-03-04T14:21:39Z

Pinging @elastic/es-distributed (Team:Distributed)

…ests

inespot

Looks really good! A couple questions/clarifications before I plusone

inespot · 2026-03-05T22:45:52Z

...tty4/src/internalClusterTest/java/org/elasticsearch/http/netty4/Netty4ChunkedEncodingIT.java

+                                        ctx.close();
+                                    }
+                                } else {
+                                    assertThat(msg, instanceOf(HttpResponse.class));


Is this else condition practically reachable? I also don't think I fully understand why we are checking the message is instance of org.elasticsearch.http.HttpResponse in that case. But I might be missing a piece of the puzzle

Yes, we get a single HttpResponse instance representing the response headers, followed by a sequence of HttpContent objects representing the body. Try it: add an assert false here and run the test :)

My bad, I thought this was the io.netty.handler.codec.http.HttpResponse branch. This cannot be a org.elasticsearch.http.HttpResponse indeed.

inespot · 2026-03-05T23:00:53Z

...tty4/src/internalClusterTest/java/org/elasticsearch/http/netty4/Netty4ChunkedEncodingIT.java

+            releasables.add(() -> eventLoopGroup.shutdownGracefully(0, 0, TimeUnit.SECONDS).awaitUninterruptibly());
+
+            final var gracefulClose = randomBoolean();
+            final var chunkSizeBytes = between(1, ByteSizeUnit.KB.toIntBytes(512));


Is the max intentionally Netty4WriteThrottlingHandler.MAX_BYTES_PER_WRITE * 2 to potentially trigger multiple slices of write?
If so, could we make this explicit by using the constant?
If not, is this value intentional or arbitrary?

It's kind of arbitrary, the case in #143252 happened to have chunks that looked to be in the 300kiB-450kiB range so I rounded that up to the next power of two. But yes Netty4WriteThrottlingHandler.MAX_BYTES_PER_WRITE * 2 seems like a reasonable way to express the limit as well - in practice this isn't a max, we flush a write every time we have queued up at least MAX_BYTES_PER_WRITE bytes but almost always we're going to overflow that because we only hint to encodeChunk to stop at that point, we do not enforce it.

inespot · 2026-03-05T23:29:54Z

...tty4/src/internalClusterTest/java/org/elasticsearch/http/netty4/Netty4ChunkedEncodingIT.java

                            @Override
                            public BytesReference next() {
-                                return CHUNK;
+                                logger.info("--> yielding chunk of size [{}]", chunkSize);


I expect this will be quite noisy!
Is that intentional? Is the idea to keep this info log in case we manage to capture the situation described in issue # 143252 ? To check that the node continues processing chunks after the client has closed?

We stop after at most 5 chunk lengths (plus some buffering) so this is fine. Unless we hit the infinite loop in which case the noise is a feature not a bug.

inespot · 2026-03-06T01:22:11Z

...tty4/src/internalClusterTest/java/org/elasticsearch/http/netty4/Netty4ChunkedEncodingIT.java

+        final var releasables = new ArrayList<Releasable>(4);
+
+        try {
+            releasables.add(withResourceTracker());


So to verify my understanding, the current hypothesis for issue # 143252 is that when the client drops the connection, the channel becomes inactive but not yet fully closed, and something in the Netty pipeline starts silently discarding writes. So those writes never reach the ChannelOutboundBuffer, and the buffer stays empty, isWritable() keeps returning true, and the Netty4HttpPipeliningHandler.doFlush() loop keeps encoding chunks that go nowhere.
If this test manages to capture that, we would then see a timeout because sendChunksResponse never releases the ref that resource tracker is waiting for.
Is that all correct?

Yes exactly.

…ests

inespot

Lgtm!

…ests

…locations * upstream/main: (153 commits) ES|QL: Update docs for TOP_SNIPPETS and DECAY (elastic#143739) Correctly include endpoint id in log msg in AuthorizationPoller (elastic#143743) Bar searching or sorting on _seq_no when disabled (elastic#143600) Generalize `testClientCancellation` test (elastic#143586) JSON_EXTRACT: zero-copy byte slicing for object, array, and number extraction (elastic#143702) Track recycler pages in circuit breaker (elastic#143738) [ESQL] Enable distributed pipeline breakers for external sources via FragmentExec (elastic#143696) Adding 'mode' and 'codec' fields to ES monitoring template (elastic#143673) [ESQL] Columnar I/O and vectorized block conversion for external sources (elastic#143703) Fix flaky MMR diversification YAML tests (elastic#143706) ES|QL codegen: check builder arguments for vector support (elastic#143724) Add Views Security Model (elastic#141050) ESQL: Prevent pushdown of unmapped fields in filters and sorts (elastic#143460) Don't run seq_no pruning tests in release CI (elastic#143725) ESQL: Support intra-row field references in ROW command (elastic#140217) ES|QL: Remove implicit limit in FORK branches in CSV tests (elastic#143601) IndexRoutingTests with and without synthetic id (elastic#143566) Synthetic id upgrade test in serverless (elastic#142471) Disable "Review skipped" comments for PRs without specified labels (elastic#143728) Cleanup ES|QL T-Digest code duplication, add memory accounting (elastic#143662) ...

In elastic#143252 we saw a case where a chunked response appears to have computed far too many chunks at once, with some indication that it might relate to client cancellation. We have a test for this case already but it's a little weak. This commit reworks the test to use a Netty HTTP client giving much deeper control over the client's behaviour, allowing to close the connection with either a FIN or a RST and to do so after starting to receive the response body. It also randomizes the chunk size and allows for a delay to simulate computation happening while yielding the chunk.

DaveCTurner requested review from eranweiss-elastic and mhl-b March 4, 2026 14:21

DaveCTurner added >test Issues or PRs that are addressing/adding tests :Distributed/Network Http and internode communication implementations v9.4.0 labels Mar 4, 2026

elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Mar 4, 2026

DaveCTurner added 3 commits March 4, 2026 14:24

Revert log

f07b0e7

Merge branch 'main' into 2026/03/04/chunked-response-closed-channel-t…

e711387

…ests

Merge branch 'main' into 2026/03/04/chunked-response-closed-channel-t…

27f315a

…ests

inespot self-requested a review March 5, 2026 17:07

inespot reviewed Mar 6, 2026

View reviewed changes

DaveCTurner added 2 commits March 6, 2026 11:00

Merge branch 'main' into 2026/03/04/chunked-response-closed-channel-t…

fa76c24

…ests

less magic

59cdea9

DaveCTurner requested a review from inespot March 6, 2026 11:04

elasticsearchmachine and others added 3 commits March 6, 2026 11:11

[CI] Auto commit changes from spotless

bb87973

HttpResponse confusion

e630d91

Assert no more possibilities

f66a308

inespot approved these changes Mar 6, 2026

View reviewed changes

DaveCTurner enabled auto-merge (squash) March 6, 2026 13:50

Merge branch 'main' into 2026/03/04/chunked-response-closed-channel-t…

9e91d26

…ests

DaveCTurner merged commit 29b5062 into elastic:main Mar 6, 2026
37 checks passed

prwhelan mentioned this pull request Mar 6, 2026

[ML] Wait for cluster state in test #143767

Merged

prwhelan mentioned this pull request Mar 9, 2026

[Transform] Disable PIT for CPS #143876

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize `testClientCancellation` test#143586

Generalize `testClientCancellation` test#143586
DaveCTurner merged 10 commits intoelastic:mainfrom
DaveCTurner:2026/03/04/chunked-response-closed-channel-tests

DaveCTurner commented Mar 4, 2026

Uh oh!

elasticsearchmachine commented Mar 4, 2026

Uh oh!

inespot left a comment

Uh oh!

inespot Mar 5, 2026

Uh oh!

DaveCTurner Mar 6, 2026

Uh oh!

DaveCTurner Mar 6, 2026

Uh oh!

inespot Mar 5, 2026

Uh oh!

DaveCTurner Mar 6, 2026

Uh oh!

inespot Mar 5, 2026

Uh oh!

DaveCTurner Mar 6, 2026

Uh oh!

inespot Mar 6, 2026

Uh oh!

DaveCTurner Mar 6, 2026

Uh oh!

inespot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DaveCTurner commented Mar 4, 2026

Uh oh!

elasticsearchmachine commented Mar 4, 2026

Uh oh!

inespot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

inespot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants