Skip to content

Conversation

@tommcdon
Copy link
Member

@tommcdon tommcdon commented Jun 3, 2024

Backport of #102413 to release/6.0

If the call to CreateNamedPipe fails here:

ipc->pipe = CreateNamedPipeA (
ipc->pipe_name, // pipe name
creationFlags,
PIPE_TYPE_BYTE | PIPE_WAIT | PIPE_REJECT_REMOTE_CLIENTS, // message type pipe, message-read and blocking mode
PIPE_UNLIMITED_INSTANCES, // max. instances
out_buffer_size, // output buffer size
in_buffer_size, // input buffer size
0, // default client time-out
NULL); // default security attribute
DS_EXIT_BLOCKING_PAL_SECTION;
if (ipc->pipe == INVALID_HANDLE_VALUE) {
if (callback)
callback ("Failed to create an instance of a named pipe.", GetLastError());
ep_raise_error ();
}

We will end up with an invalid handle for the pipe and the overlapped IO event. There currently is no code that tries to reset the connection and we will repeatedly try to poll an invalid handle, leading to one core being pegged as the wait fails due to an invalid handle and we keep waiting.

This PR makes it so we will call ds_ipc_listen when we run in to an error, which will reconnect to the named pipe. It also adds a delay when we detect an error, so if there are undetected cases that cause the same issue we will at least not consume the whole core we are running on.

Customer Impact

  • Customer reported
  • [] Found internally

The IPC server can fully consume a CPU core and prevent incoming connections.

Regression

  • [] Yes
  • No

Testing

Fix was verified in the customer's environment that it prevents the CPU issue

Risk

Low to Medium - The fix itself is small and reasonable, but this area of the code is hard to reason about all possible code paths due to the async nature of the named pipe APIs.

@jeffschwMSFT jeffschwMSFT added the Servicing-consider Issue for next servicing release review label Jun 3, 2024
Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. we will take for consideration in 6.0.x

@tommcdon
Copy link
Member Author

tommcdon commented Jun 4, 2024

All failures are known or infra related

@tommcdon tommcdon merged commit 3f22874 into dotnet:release/6.0-staging Jun 4, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jul 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-Tracing-coreclr Servicing-approved Approved for servicing release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants