Skip to content

Fix race in HTTP response shutdown handling#105306

Merged
DaveCTurner merged 3 commits intoelastic:mainfrom
DaveCTurner:2024/02/08/http-response-after-close-redux
Feb 9, 2024
Merged

Fix race in HTTP response shutdown handling#105306
DaveCTurner merged 3 commits intoelastic:mainfrom
DaveCTurner:2024/02/08/http-response-after-close-redux

Conversation

@DaveCTurner
Copy link
Copy Markdown
Member

Similar to #97301, the fix in #105293 was still not quite correct: we
could in principle shut down the transport after checking isOpen() but
before sending the message. Applying the same fix as for the transport
layer here.

Similar to elastic#97301, the fix in elastic#105293 was still not quite correct: we
could in principle shut down the transport after checking `isOpen()` but
before sending the message. Applying the same fix as for the transport
layer here.
@DaveCTurner DaveCTurner added >bug :Distributed/Network Http and internode communication implementations v8.13.0 v8.12.2 labels Feb 8, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Feb 8, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @DaveCTurner, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Copy Markdown
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +39 to 41
if (channel.eventLoop().isShutdown()) {
wrapped.onFailure(new TransportException("Cannot send HTTP response, event loop is shutting down."));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the event loop stops after this check, I assume we can rely on Netty to release the response?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah shutting down the event loop involves draining the executor's queue. If we get here and the event loop isn't shut down then we know the work was enqueued and thus will run.

@DaveCTurner DaveCTurner added auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) auto-backport-and-merge labels Feb 9, 2024
@DaveCTurner DaveCTurner merged commit 2615aa0 into elastic:main Feb 9, 2024
@DaveCTurner DaveCTurner deleted the 2024/02/08/http-response-after-close-redux branch February 9, 2024 08:43
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Feb 9, 2024
Similar to elastic#97301, the fix in elastic#105293 was still not quite correct: we
could in principle shut down the transport after checking `isOpen()` but
before sending the message. Applying the same fix as for the transport
layer here.
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

💚 Backport successful

Status Branch Result
8.12

elasticsearchmachine pushed a commit that referenced this pull request Feb 9, 2024
Similar to #97301, the fix in #105293 was still not quite correct: we
could in principle shut down the transport after checking `isOpen()` but
before sending the message. Applying the same fix as for the transport
layer here.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Feb 14, 2024
We could still be manipulating a network message when the event loop
shuts down, causing us to close the message while it's still in use.
This is at best going to be a little surprising to the caller, and at
worst could be an outright use-after-free bug.

This commit moves the double-check for a leaked promise to happen
strictly after the event loop has fully terminated, so that we can be
sure we've finished using it by this point.

Relates elastic#105306, elastic#97301
elasticsearchmachine pushed a commit that referenced this pull request Feb 15, 2024
We could still be manipulating a network message when the event loop
shuts down, causing us to close the message while it's still in use.
This is at best going to be a little surprising to the caller, and at
worst could be an outright use-after-free bug.

This commit moves the double-check for a leaked promise to happen
strictly after the event loop has fully terminated, so that we can be
sure we've finished using it by this point.

Relates #105306, #97301
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Feb 15, 2024
We could still be manipulating a network message when the event loop
shuts down, causing us to close the message while it's still in use.
This is at best going to be a little surprising to the caller, and at
worst could be an outright use-after-free bug.

This commit moves the double-check for a leaked promise to happen
strictly after the event loop has fully terminated, so that we can be
sure we've finished using it by this point.

Relates elastic#105306, elastic#97301
elasticsearchmachine pushed a commit that referenced this pull request Feb 15, 2024
We could still be manipulating a network message when the event loop
shuts down, causing us to close the message while it's still in use.
This is at best going to be a little surprising to the caller, and at
worst could be an outright use-after-free bug.

This commit moves the double-check for a leaked promise to happen
strictly after the event loop has fully terminated, so that we can be
sure we've finished using it by this point.

Relates #105306, #97301
@DaveCTurner DaveCTurner restored the 2024/02/08/http-response-after-close-redux branch June 17, 2024 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug :Distributed/Network Http and internode communication implementations Team:Distributed Meta label for distributed team. v8.12.2 v8.13.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants