Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -643,10 +643,10 @@ public void close() {

@Override
public void accept(RestChannel channel) {
localRefs.mustIncRef();
client.execute(TYPE, new Request(), new RestActionListener<>(channel) {
@Override
protected void processResponse(Response response) {
localRefs.mustIncRef();
Copy link
Copy Markdown
Contributor Author

@nicktindall nicktindall Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at a failed build scan with the logging turned on, it became apparent that the unbalanced sequence of events went like (most recent first):

  1. BaseRestHandler closing the RestChannelConsumer (decRef)
    • at Netty4ChunkedContinuationsIT:641
  2. Closing the resource tracker (decRef)
    • at Netty4ChunkedContinuationsIT:324
  3. RestChannelConsumer.accept (incRef)
    • at Netty4ChunkedContinuationsIT:646
  4. prepareRequest (incRef)
    • at Netty4ChunkedContinuationsIT:637
  5. created (implicit incRef)
    • at Netty4ChunkedContinuationsIT.java:322

Which suggests to me the following sequence of events:

  1. RestChannelConsumer.accept is called (and mustIncRef is called)
  2. TransportInfiniteContinuationsAction.doExecute is called and schedules the return of the chunked response
  3. The request is cancelled, the channel is closed
  4. The chunked response is returned, RestActionListener#onResponse fails in the call to #ensureOpen() and ends up calling RestActionListener#onFailure instead of RestActionListener#processResponse so the decRef on line 655 is never called, and we have our inbalance.

Options for fixing

If we move the mustIncRef such that it only occurs when we receive the InfiniteContinuationsPlugin.Response from the action (as above), the test shouldn't be susceptible to such a failure, but it does indicate that the Response object can be created but never closed. I'm not sure if this is an actual bug. Response type parameter doesn't have any type constraints in RestActionListener so it's not necessarily Releasable. I'm not clear on what the contract is here and whether RestActionListener has a responsibility to try and close the Response when it fails due to #ensureOpen() throwing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new positioning of mustIncRef is OK, because instead of

"if we accepted the request" -> "then we close the chunked response"

we now assert

"if we created the chunked response" -> "then we close the chunked response"

But I may have missed something.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other plugins (YieldsContinuationsPlugin at least) seem to have the same issue, but I guess it doesn't trigger because we wait for the request to complete when calling those.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job on tracking this down. I can confirm the test can fail with following diff

--- a/modules/transport-netty4/src/internalClusterTest/java/org/elasticsearch/http/netty4/Netty4ChunkedContinuationsIT.java
+++ b/modules/transport-netty4/src/internalClusterTest/java/org/elasticsearch/http/netty4/Netty4ChunkedContinuationsIT.java
@@ -645,6 +645,12 @@ public class Netty4ChunkedContinuationsIT extends ESNetty4IntegTestCase {
                             public void accept(RestChannel channel) {
                                 localRefs.mustIncRef();
                                 client.execute(TYPE, new Request(), new RestActionListener<>(channel) {
+                                    @Override
+                                    protected void ensureOpen() {
+                                        safeSleep(500);
+                                        super.ensureOpen();
+                                    }
+
                                     @Override
                                     protected void processResponse(Response response) {
                                         channel.sendResponse(RestResponse.chunked(RestStatus.OK, response.getResponseBodyPart(), () -> {

I think this is the right fix we are doing the same thing in other RestActionListener implementation. 👍

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, the analysis looks right to me. However I would still prefer that we didn't just leak the response listener in this case, that's not how production code should behave, and we should be able to assert that it's always completed. I opened #110309 with an alternative fix which I would prefer.

channel.sendResponse(RestResponse.chunked(RestStatus.OK, response.getResponseBodyPart(), () -> {
// cancellation notification only happens while processing a continuation, not while computing
// the next one; prompt cancellation requires use of something like RestCancellableNodeClient
Expand Down
3 changes: 0 additions & 3 deletions muted-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,6 @@ tests:
- class: org.elasticsearch.xpack.transform.transforms.TransformIndexerTests
method: testMaxPageSearchSizeIsResetToConfiguredValue
issue: https://github.com/elastic/elasticsearch/issues/109844
- class: org.elasticsearch.http.netty4.Netty4ChunkedContinuationsIT
method: testClientCancellation
issue: https://github.com/elastic/elasticsearch/issues/109866
- class: org.elasticsearch.index.store.FsDirectoryFactoryTests
method: testStoreDirectory
issue: https://github.com/elastic/elasticsearch/issues/110210
Expand Down