Skip to content

Conversation

@lateralusX
Copy link
Member

@lateralusX lateralusX commented Mar 5, 2021

Add support for TCP/IP in ReversedDiagnosticServer so it can be used by Mono runtime platforms that can't use existing named pipe or Unix domain socket support (like IOS/Android). By default all existing tooling won't include support for running the ReversedDiagnosticServer with TCP/IP enabled, meaning that they won't be affected by change.

Added new tool, dotnet-dsrouter used to connect existing IPC based tooling with TCP based runtime.

Router implement reverse connect protocol (client-server mode), meaning that the diagnostic tooling will act as a IPC server and runtime will act as a TCP client, meaning that the router sets up a TCP listener and connect towards tooling using IPC client (namedpipe/unix domain socket). Tool currently accepts a couple of arguments:

dotnet-dsrouter client-server --ipc-client [IPCPath] --tcp-server [host:port] --runtime-timeout --verbose

client-server command enables the IPC client, TCP server mode.
ipc-client option sets the IPC path to connect against, normally what's used when launching diagnostic tool with --diagnostic-port argument.
tcp-server option set the TCP address/addresses to bind and listen.
runtime-timeout option indicating if router should shutdown if runtime doesn't connect before timeout (seconds). Default timeout is infinite, router will wait for runtime to connect.
verbose option to enable verbose logging in router.

tcp-server can be any supported IPv4/IPv6 address, including * that will bind to all interfaces and if machine supports IPv6, enable dual mode, binding all IPv4 and IPv6 interfaces. It is also possible to restrict to just use any IPv4 by using 0.0.0.0, any IPv6 using [::], IPv4 loopback, 127.0.0.1, IPv6 loopback [::1], host name (will only bind first resolved interface), localhost (will only bind first resolved interface) or any specific IPv4 or IPv6 interface.

Router also implements server-server mode where diagnostic tools act as IPC clients and runtime as a TCP/IP client, so router is setup with a front end ipc server and a back end ipc server.

dotnet-dsrouter server-server --ipc-server [IPCPath] --tcp-server [host:port] --runtime-timeout--verbose

most arguments same as above:

server-server command enables the IPC server, TCP server router mode.
ipc-server option sets the IPC path used by ipc server. If excluded or empty, ipc server will use default diagnostic naming of ipc server path.

ipc-server will default to same naming schema as used by native runtime ipc listener, tcp-server will default to 127.0.0.1:0 where 0 will be bound to a dynamic port.

TCP/IP support in Diagnostic Server is added through the following PR:

dotnet/runtime#48154

@lateralusX lateralusX requested a review from a team as a code owner March 5, 2021 17:25
@lateralusX
Copy link
Member Author

//CC @josalem

@josalem josalem self-requested a review March 5, 2021 19:49
@lateralusX
Copy link
Member Author

Hold of on review, I will add at least on of the proxy configurations into this PR, the named-pipe/unixdomain socket client -> TCP/IP listener, meaning we can have runtime work in reversed connect scenario against tools setup in reversed connect scenarios. This includes adding a new tool representing the proxy, planning on using dotnet-dsproxy as tool name for this where ds stands for Diagnostic Server.

@jander-msft jander-msft self-requested a review March 9, 2021 04:21
@jander-msft
Copy link
Contributor

When closing down tools like dotnet-counters, DisposeAsync signals _disposalSource in ReversedDiagnosticServer, that will in turn cancel any pending accept calls, but since it won't cancel the complete server (just the pending request), code will re-create a new instance of the server, meaning that the runtime will re-connect to that instance, that in turn will be shutdown immediately, causing unnecessary connects and noise. Since we are closing down the tool and complete ReversedDiagnosticServer, we could add a separate cancelation token detecting complete shutdown instead of cancel just accept call. Complete shutdown will also cancel out any pending accept calls, but won't re-create the server, preventing unnecessary re-connect/disconnects in runtime.

Can you be more specific about what 'code will re-create a new instance of the server'? It feels like you are adding a lot of CancellationToken passing via constructors (a pattern that I've never seen in any code) to fix a issue in, maybe, dotnet-counters that really should be fixed there rather than forcing it down into the lower primitives. In my opinion, if something is finished with the ReversedDiagnosticsServer, just call DisposeAsync on it and don't recreate it. No need for more CancellationTokens and combining them into new sources when disposing the object should do.

@lateralusX
Copy link
Member Author

lateralusX commented Mar 9, 2021

Did it this way since it was close to how things currently work adding ability to see the difference between cancel a pending AcceptAsync call vs cancel complete server transport. The code is in IpcServerTransport.cs, problem is that the use of it in ReversedDiagnosticServer won't have access to the server transport instance (just the Task), so when the ReversedDiagnosticServer does its DisposeAsync it will only cancel out the pending request, but won't explicitly dispose the server transport, meaning that the server transport cancel token will never be cancelled, so it can't see difference between a shutdown and a cancelled AcceptAsync request, additional token makes it possible to detect difference in why AcceptAsync was cancelled (if canceled at all) without changing to much of the existing logic.

This could of course be solved in different ways by altering how ReversedDiagnosticServer handles the listener task making sure we have access to the server transport instance from ReversedDiagnosticServer::DisposeAsync. One thing to keep in mind is that the dispose of ReversedDiagnosticServer is async, so we can't dispose server transport until the listener task is complete and at that time we already passed the code path recreating the server transport. The use of the _disposalSource token to indicate a shutdown of server enabled the implementation of ReversedDiagnosticServer::DisposeAsync to work as is, but the internals of AcceptAsync could still detect differences between server shutdown (not re-create server transport) and cancel AcceptAsync or other errors (re-create server transport).

One option could be to add a flag to the server transport indicating that we are pending a shutdown and then the code can check that flag instead of cancel token to decide if it should re-create the server transport. To make that work we also need access to the server transport instance inside ReversedDiagnosticServer or at least hook up setting that flag before cancelling of _disposalSource in DisposeAsync. Let me try that as well and see how it works.

@lateralusX
Copy link
Member Author

The problem is in ReversedDiagnosticServer so it applies to all tools supporting the --diagnostic-port argument.

@lateralusX
Copy link
Member Author

lateralusX commented Mar 9, 2021

In my opinion, if something is finished with the ReversedDiagnosticsServer, just call DisposeAsync on it and don't recreate it. No need for more CancellationTokens and combining them into new sources when disposing the object should do.

That's the pattern used even with this fix, the additional token is internal to ReversedDiagnosticsServer so client still just calls DisposeAsync, but currently that will re-create the internal server instance, causing a lot of noise before it's finally comes to a rest and this fix mitigate that issue. But as said above, if we get access to the server transport instance, we could probably solve the same thing using a flag, set in DisposeAsync just before cancel _disposalSource or if we believe it is safe (will dispose objects still in use, so could lead to a race), just disposing the server transport in DisposeAsync instead of relying on server transport to react on cancelation of _disposalSource.

@jander-msft
Copy link
Contributor

Did it this way since it was close to how things currently work adding ability to see the difference between cancel a pending AcceptAsync call vs cancel complete server transport. The code is in IpcServerTransport.cs, problem is that the use of it in ReversedDiagnosticServer won't have access to the server transport instance (just the Task), so when the ReversedDiagnosticServer does its DisposeAsync it will only cancel out the pending request, but won't explicitly dispose the server transport, meaning that the server transport cancel token will never be cancelled, so it can't see difference between a shutdown and a cancelled AcceptAsync request, additional token makes it possible to detect difference in why AcceptAsync was cancelled (if canceled at all) without changing to much of the existing logic.

I believe this simplest way to solve this is to dispose the transport before the AcceptAsync call can observe the cancellation. This way, the transport will its _cancellation.Cancel() and cause the AcceptAsync method to cancel before that method can observe that that token parameter is cancelled. Thus, the server stream or socket will not be recreated because _cancellation.IsCancellationRequested is true. See jander-msft@30c5a7d for this change.

I've created a PR that should fix this issue separately from your TCP/IP addition: #2064

@lateralusX
Copy link
Member Author

lateralusX commented Mar 10, 2021

I was a little hesitant to take that route initially, disposing an object will still in use in async operations, but if you believe it's ok and won't cause to problematic side effects I'm totally fine backing out that part from this PR and rebase on top of what you did in #2064.

@lateralusX lateralusX force-pushed the lateralusX/add-eventpipe-tcp-support branch from aee4d48 to 41ccb6c Compare March 11, 2021 12:45
@lateralusX
Copy link
Member Author

Excluded fix for not recreating ReversedDiagnosticServer on shutdown in favour of #2064. Added new tool, dotnet-dsproxy enabling IPC <-> TCP proxying, enabling tools to speak with remote runtimes (like mobile platforms). Proxy currently implements IPC client, TCP server proxy, so supporting the reverse connect scenario from tooling and runtime. Going forward tool should be expanded with IPC server, TCP server support meaning tooling will act as IPC client and runtime as TCP client.

Base automatically changed from master to main March 12, 2021 05:48
@lateralusX
Copy link
Member Author

//CC @josalem You can now start doing the review of this PR.

@shirhatti
Copy link
Contributor

@lateralusX Just FYI @josalem is out this week

Copy link
Contributor

@josalem josalem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of comments. I'm going to take a deeper look at the routing code still.

Copy link
Contributor

@jander-msft jander-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial comments; I'll take another pass when these and John's comments are addressed.

@lateralusX
Copy link
Member Author

@josalem, @jander-msft, would be great if we could get progress on this PR so we could get it merged (since it will be needed for end2end testing of EventPipe on mobile). I believe all comments so far in this PR have been addressed.

@lateralusX lateralusX force-pushed the lateralusX/add-eventpipe-tcp-support branch from df54ec4 to 184ca11 Compare April 12, 2021 12:28
Copy link
Contributor

@josalem josalem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is in a good state to iterate on for preview status. That being said, @hoyosjs is stabilizing the repo for a release. Let's wait a day and check in tomorrow before we merge.

@lateralusX
Copy link
Member Author

lateralusX commented Apr 13, 2021

@josalem, looks like I don't have right permissions to merge.

@lateralusX lateralusX force-pushed the lateralusX/add-eventpipe-tcp-support branch from cb84681 to ff7d474 Compare April 15, 2021 18:31
@josalem josalem merged commit 24cfdb2 into dotnet:main Apr 20, 2021
@github-actions github-actions bot locked and limited conversation to collaborators Jan 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants