Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read loop hangs after server stop #108

Open
bastiandoetsch opened this issue Sep 11, 2023 · 7 comments
Open

Read loop hangs after server stop #108

bastiandoetsch opened this issue Sep 11, 2023 · 7 comments

Comments

@bastiandoetsch
Copy link

We have encountered the situation, that our go executable does not return from WaitStatus.

Apparently, the goroutine to receive messages on the channel does not terminate after the server.stop, and thus the Wait does not return.

What kind of additional information would you need to analyse this in depth?

You can reproduce this running the snyk-ls executable, available at https://github.com/snyk/snyk-ls (we do that from an e2e test, located here).

Please let me know if I can assist further.

@creachadair
Copy link
Owner

Thanks for the report. From my initial skim of the code, it looks as if there is a handler that is trying to call Stop on the server, and my guess is that is causing a deadlock.

In general it won't be safe for a handler to stop its own server: while Stop holds the internal mutex to clean up the server state, it will exclude the reader from acquiring it while it shuts down. That in turn will cause the wait logic to never complete.

If you'd like a handler to stop its server, you'll probably want to have it signal that a shutdown was requested, and do it from another goroutine after the handler exits.

@creachadair
Copy link
Owner

I poked at this a bit more, and another thing I noticed is that this language server appears to be using stdio to communicate (that is normal). Here, however, instead of the client closing its channel to the server, it sends the server a request to shut itself down.

When the server stops, it calls Close on its channel, which closes its write side (in this case stdout), but it does not have access to close its read side (in this case stdin).

If the client ignores this and leaves its write side open, the server will continue to wait, which is I think what you're seeing. Probably the simplest way to address this in your integration test is to either (a) make the client stop when it gets EOF from the server, or (b) make the shutdown process on the server side explicitly close its own stdin. E.g., perhaps adding here:

os.Stdin.Close()

I see from this that there is a documentation bug for Channel.Close, which asserts no further records may be sent or received. The latter is not (always) true, and I should fix that.

creachadair added a commit that referenced this issue Sep 15, 2023
The comment incorrectly stated that Close affects both sends and receives, when
in fact it only affects sends. The receives affected by Close are at the other
end of the channel pair.

Updates #108.
@bastiandoetsch
Copy link
Author

Sorry for the late reply - I'll try out closing stdin and see what happens :).

@bastiandoetsch
Copy link
Author

Closing stdin unfortunately doesn't fix the issue. Tried stopping the server within a go routine outside of the handler, but it still hangs.

func exit(srv *jrpc2.Server, c *config.Config) jrpc2.Handler {
	return handler.New(func(_ context.Context) (any, error) {
		logger := c.Logger().With().Str("method", "Exit").Logger()
		logger.Info().Msg("ENTERING")
		logger.Info().Msg("Flushing error reporting...")
		di.ErrorReporter().FlushErrorReporting()
		go func() {
			// wait for the handler to quit, then stop outside of handler the server
			time.Sleep(time.Second)
			logger.Info().Msg("Stopping server...")
			_ = os.Stdin.Close()
			srv.Stop()
		}()
		return nil, nil
	})
}

@creachadair
Copy link
Owner

I don't have a good theory about what's happening in this case. Is this the code that drives the server to the hang you are seeing? I don't see anything in that code that obviously calls your "exit" method, nor closes the pipe. Is there some other implicit plumbing that is doing so?

@bastiandoetsch
Copy link
Author

bastiandoetsch commented Sep 18, 2023

Yeah, we removed shutdown and exit, as it was not working. But I tested with shutdown and exit and the process didn't quit. Locally the end of the test looks like this:

for (let i = 0; i < 45; i++) {
      console.debug('Waiting for diagnostics...');
      if (diagnosticCount > 0) {
        break;
      }
      await sleep(1000);
}
await connection.sendRequest('shutdown');
await connection.sendNotification('exit');
// cli.kill(9);

@creachadair
Copy link
Owner

I attempted to build a minimal repro but it does not in fact reproduce what you're seeing. But perhaps it will give you a basis to isolate the conditions that are different in a way we can turn into a test?

To use:

unzip repro.sh
cd repro
make
./tcli

repro.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants