Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dap: rework auto-resume logic when connection is closed in multi-client mode #2958

Open
polinasok opened this issue Apr 5, 2022 · 9 comments

Comments

@polinasok
Copy link
Collaborator

polinasok commented Apr 5, 2022

If a client connection is closed while a running command is in progress, that command is not interrupted, so the auto-resume goroutine continues as if nothing has happened. The auto-resume code therefore needs to handle any stops it encounters after that correctly.

The auto-resumer might get interrupted by the following events:

  1. A halt from a new connection on entry. Auto-resumer should log stop reason and not resume.
  2. A breakpoint. (These are not cleared on disconnect.) Auto-resumer should log stop reason and not resume. A user can connect afterwards to inspect state.
  3. An unrecoverable error. Auto-resumer should log the problem and not resume. The user can connect afterwards to inspect state.
  4. A logpoint. Auto-resumer can log the logpoint via stderr and resume execution unless another session issues a halt (see 1) while the logpoint is being processed.
  5. Server shutdown (e.g. by a kill command). Auto-resumer should log halt and not resume.
  6. (There cannot be halt as part of setting breakpoints, normally another reason to auto-resume, because 1. would happen first.)
  • Problem: We can have a potential conflict between 1 & 4. If the halt arrives while debugger is running, the Debugger.StopReason() will reflect it. But if it happens while the auto-resumer is processing another stop, the reason will be lost. This is because our halt-tracking mechanism is per Session, so when a halt is triggered from a new Session, the paused auto-resumer in the old Session has no way of detecting it.
    • Solution 1: Yet another vote for pushing this state into the Debugger object, which is shared between Sessions.
    • Solution 2: Another option is to clear logpoints as part of connection closure, so there is never a reason again to auto-resume from a dying Session.
    • Solution 3: Or we can treat any interruption, even on a logpoint, as a reason to quit resuming once connection is closed. That's the current behavior (from PR.
  • Problem: Debugger pointer gets reset as part of connection closure, so it is no longer possible to check stop reasons or resume.
    • Solution 1 See if we can skip clearing s.debugger
    • Solution 2: Save a copy of the pointer in the auto-resumer.
    • Solution 3: If never to auto-resume, just set halt-requested and debugger reason won't be needed.
  • Problem: Current checks for closed connection can race. The disconnect might arrive after we confirmed that connection is closed, but before the logpoint is processed as if it is still open.
  • Problem: TBD Because disconnect code can be called more than once, the current way of keeping track might run into channel double closure issue (see more).
  • Note: keeping track of the closed connection also allows us to avoid sending any events to it. Those errors are not fatal, but can look unnecessarily alarming in the server logs.
@aarzilli
Copy link
Member

aarzilli commented Apr 6, 2022

Two things:

  • logpoints could be reimplemented as (breaklet callbacks)[https://github.com/go-delve/delve/blob/1669711c0e7c95eea04f7b2ce71ce53f980babc5/pkg/proc/breakpoints.go#L117], which would simplify the auto-resume logic a lot, since it would only have to support stops induced by setting breakpoints
  • does it make sense to keep logpoints after a client has disconnected?

@polinasok
Copy link
Collaborator Author

polinasok commented Apr 6, 2022

  • does it make sense to keep logpoints after a client has disconnected?

I have been wondering about that myself. If a dap client reconnects, all breakpoints will be cleared and reset anyway. But an rpc client would inherit them, won't it? And if the user has logging, the logpoints would log to stderr with no client and just the server running. So some small reasons, but not strong reasons. We could go either way and should pick whatever is easiest, I think. There is also a difference between treating them as regular breakpoints with no client - i.e. stopping auto-resume until a client reconnects and actually clearing them and letting the process just run uninterrupted.

@hyangah
Copy link
Contributor

hyangah commented Apr 9, 2022

does it make sense to keep logpoints after a client has disconnected?

I have been wondering about that myself. If a dap client reconnects, all breakpoints will be cleared and reset anyway. But an rpc client would inherit them, won't it?

Shouldn't we clear all breakpoints if a client disconnects (at least when a user asks to resume the target before disconnecting)?

I couldn't find a convincing use case of leaving breakpoints (or logpoints) after the client is disconnected yet.

If one wants to leave an app running with breakpoints/logpoints over night in the hope of catching a rare event, couldn't the person leave the debugging session active over night too?

Interestingly, in GDB remote, I think the following behavior effectively causes all the breakpoints to get cleared before a user who connects using terminal-based cli detaches/disconnects from the remote gdb server.

The GDB command to set breakpoints, break does not immediately cause a RSP interaction. GDB only actually sets breakpoints immediately before execution (for example by a continue or step command) and immediately clears them when a breakpoint is hit. This minimizes the risk of a program being left with breakpoints inserted, for example when a serial link fails.

Separately, I found exit -c not clearing breakpoints is not intuitive* and can be dangerous.

[*]: In my case, my expectation on dlv connect is somewhat similar to dlv attach (but remote version). When I disconnect and quit my dlv cli with exit -c, my intention is "I am done debugging. Clean up my mess and resume normal processing" even though behind the scene, my program can be still ptraced by the headless dlv (accept-multiclient) unlike dlv attach.

@hyangah
Copy link
Contributor

hyangah commented Jul 27, 2022

In golang/vscode-go#2368 a user wants to clear all the breakpoints & resume when disconnecting. I also agree with the user and this is a regression when switching from the legacy adapter.

Reading the original issue again, I am afraid the scope of this issue is broader than changing the disconnect behavior around breakpoints/logpoints handling. Do you want us to open a narrower issue (change the disconnect behavior to clear all breakpoints) and fix the issue?

@polinasok
Copy link
Collaborator Author

Yes, the scope of this issue was intended to be broader - it was meant to sort out all the race conditions caused by the complexity of disconnecting and reconnecting while managing different types of breakpoints. If we simply clear all breakpoints, this becomes very simple. So the prerequisite question was: are the surviving breakpoints valuable enough to deal with the complexity?

There are definitely users that don't want their server to get unexpectedly stuck on some breakpoint after they are done debugging. The UI has a toggle to deactivate/remove all breakpoints in one click, but a user would have to remember to use it. On the other hand there could be users who loose connection due to flakiness, who might appreciate a seamless reconnect. Or maybe they are chasing a rare condition and want things to keep running until it is hit, then connect to investigate more.

We had a VC discussion about this a couple of months ago. From the notes:

  • Decision on logpoints: clear the logpoints - nowhere useful to log anyway and solves complexities of auto-resuming between sessions
  • Decision on breakpoints: pros and cons for clearing/disabling them. The best solution would be to offer both as a prompt at disconnect and/or as a flag to set with dlv cli and launch.json in vscode.

And that's why this issue is still open :) In our attempt to maybe agree to simplify this, we ended up agreeing on more work, not less (both dealing with races while having breakpoints AND adding flags to clear them). Happy to revisit :)

@briandealwis What would CloudCode users prefer?

@hyangah
Copy link
Contributor

hyangah commented Aug 2, 2022

From CloudCode's use case here (#2772 (comment)) I think the good default for CloudCode is to clear all the breakpoints + continue. "Continue" is already the default, and "Clearing all breakpoints" is another missing feature.

Decision on breakpoints: pros and cons for clearing/disabling them. The best solution would be to offer both as a prompt at disconnect and/or as a flag to set with dlv cli and launch.json in vscode.

What I remember from the discussion is nobody came up with convincing, strong reasons to keep the breakpoints (and I still see no answer to my question #2958 (comment)). The decision was made based on an assumption that this can be implemented without adding too much burden/time or this issue was under active work. Since the assumption is no longer true, why not rescope and focus on #3083?

According to what @OrBin reported in golang/vscode-go#2368, Goland clears breakpoints before tearing down the session. GDB protocol by design clears breakpoints. Why not just follow the step?

there could be users who loose connection due to flakiness, who might appreciate a seamless reconnect.

I don't think this is worth all the complexities.

Or maybe they are chasing a rare condition and want things to keep running until it is hit, then connect to investigate more.

Users can keep the debug session running.

@briandealwis
Copy link
Contributor

I hesitate to make a statement on behalf of all Cloud Code users, but like @hyangah, I can’t think of a compelling reason to retain breakpoints on detach. I think it would be unlikely for a user to want to set breakpoints, detach, and then re-attach later in the hopes of finding some session paused in a breakpoint, as it would lead to hangs in each process hitting the breakpoint.

@polinasok
Copy link
Collaborator Author

briandealwis Thanks for the input.

We have never come up with any particularly strong reasons for keeping the breakpoints. It's just been the historical behavior that got copied over when we ported the legacy adapter, which in turn was probably modeled after the CLI. Such historical behaviors tend to stick because there is a chance that somebody depends on them, so nobody revisits the status quo until it causes some friction. Interestingly, I can't find any user complaints about this behavior in the old issues. But we have one now (golang/vscode-go#2368). And this helps simplify this issue well. So the status quo has been challenged. All good reasons to do it the other way. If some user pops in and complaints that we took away their favorite feature, we can ask them to use the cli.

@rosti-il
Copy link

Hello. This bug report is more than two years old already and still in a discussion stage without any progress after August 2022. What is stopping you to decide how to implement this obvious feature and make remote debugging logically correct? I came to Go development from Java and in Java disconnecting debugger from a remote process disables all breakpoints there and lets that remote process to continue running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants