Add fork after authentication for tsh ssh#54696
Conversation
|
whats the best way to test this? |
|
@avatus The simplest way is to do something like |
|
Reviewers, the tests are fixed and this is now ready for review. |
|
I'd like to wait to merge until you have another backender look at this. Thank you! |
There was a problem hiding this comment.
This does not work when per-session mfa is required. See the attached demo. There is never a prompt for MFA and to the user the connection appears to hang forever.
Screen.Recording.2025-05-16.at.9.43.45.AM.mov
Update: It looks like the prompt can eventually get triggered by the user pressing enter.
Screen.Recording.2025-05-16.at.9.48.20.AM.mov
espadolini
left a comment
There was a problem hiding this comment.
Should we have the child process exit if the parent is killed before the detach point, or is that not really necessary?
| signalR, signalW, err := os.Pipe() | ||
| if err != nil { | ||
| return nil, trace.Wrap(err) | ||
| } | ||
| signalFd := addSignalFdToChild(cmd, signalW) | ||
|
|
||
| cmd.Args = append(cmd.Args, params.GetArgs(signalFd)...) | ||
| cmd.Stdin = params.Stdin | ||
| cmd.Stdout = params.Stdout | ||
| cmd.Stderr = params.Stderr | ||
|
|
||
| return &forkAuthCmd{ | ||
| Cmd: cmd, | ||
| disownSignal: signalR, | ||
| }, nil | ||
| } |
There was a problem hiding this comment.
In the windows case nothing keeps signalW alive in here, so its finalizer is liable to close it before the child is spawned.
There was a problem hiding this comment.
This is still an issue AFAICT. In UNIX there's a []*os.File in exec.Cmd which keeps the signalW object alive, but in windows we just have a []uintptr and the garbage collector is liable to trigger the execution of the finalizer of signalW at any point after line 73 (or even right before addSignalFdToChild returns).
There was a problem hiding this comment.
Is this not fixed by unsetting the finalizer here?
teleport/lib/client/reexec_windows.go
Line 41 in 45bd0da
There was a problem hiding this comment.
I missed that, but that's... pretty bad? We're now just leaking the read side signal pipe (admittedly we're also doing that on the unix case now that I look at it further, it's just technically not a leak because the file object is still in the exec.Cmd) and we're also never going to unblock on the read if the process dies, since we are holding open the other end of the pipe on our end.
*os.File is also not documented (and thus not guaranteed) to use a finalizer, it could be using a cleanup, it could be using special runtime magic that we can't interact with - and the finalizer on *os.File is a last resort to try to avoid file descriptor leaks in misbehaving code, not something that correct code should ever deal with.
A proper way to do this would be to carry both sides of the pipe in forkAuthCmd (if we have to use a dedicated object for this) and then close the write side immediately after Run.
| go func() { | ||
| // The child process will close the pipe when it has authenticated | ||
| // and is ready to be disowned. | ||
| _, err := cmd.disownSignal.Read(make([]byte, 1)) |
There was a problem hiding this comment.
Is this guaranteed to return 1, nil if the child writes a byte and closes the pipe? Is 1, io.EOF a possibility? Is 0, nil a possibility, technically?
There was a problem hiding this comment.
I would consider those cases to be equivalent for our purposes, as we really just the Read to finish. They are now handled.
| if stdin, ok := cf.Stdin().(io.ReadCloser); ok { | ||
| errors = append(errors, stdin.Close()) | ||
| } |
There was a problem hiding this comment.
At this level we should not be dealing in go objects, we should open /dev/null and clone it over fd 0 - never close the stdio file descriptors, always replace them with something like /dev/null or newly opened files are liable to end up in there (I'm not sure how things works in windows tho, I'm afraid).
There was a problem hiding this comment.
Do we need a setsid() call or something, too?
There was a problem hiding this comment.
I'm curious: is it because reading from a closed stdin is going to fail, or is there some other caveat I'm not aware of?
There was a problem hiding this comment.
Newly opened files are opened in the lowest file descriptor, and lots of things sort of assume that 0, 1 and 2 are always the stdio fds, so if you close one of those three and then open something else in some other thread you might end up with data going where it's not supposed to go.
There was a problem hiding this comment.
Oh, wow, that's a good thing to know. Thanks!
There was a problem hiding this comment.
I ended up just replacing tc.Stdin with /dev/null, as that's what the ssh session will use, and at the point the child calls OnAuthenticate we won't be in the middle of any reads.
| defer cancel() | ||
|
|
||
| defer cmd.signalR.Close() | ||
| defer cmd.killW.Close() | ||
| defer cmd.killW.Write([]byte{0x00}) |
There was a problem hiding this comment.
Would there be any benefit to lumping all of these into a single defer func () { }?
| disownReady <- err | ||
| if err == nil { | ||
| disownReady <- nil | ||
| return | ||
| } |
There was a problem hiding this comment.
Is sending nil if no error is returned of any use? The disownReady channel is only consumed a single time below and we will have already wrote nil to the channel above.
| disownReady <- err | |
| if err == nil { | |
| disownReady <- nil | |
| return | |
| } | |
| disownReady <- err |
| assert.EventuallyWithT(t, func(collect *assert.CollectT) { | ||
| assert.Equal(collect, "stdout: hello\n", stdout.String()) | ||
| assert.Equal(collect, "stderr: hello\n", stderr.String()) |
There was a problem hiding this comment.
Suggestion: shadow t here so it can't accidentally be used instead of collect?
| assert.EventuallyWithT(t, func(collect *assert.CollectT) { | |
| assert.Equal(collect, "stdout: hello\n", stdout.String()) | |
| assert.Equal(collect, "stderr: hello\n", stderr.String()) | |
| assert.EventuallyWithT(t, func(t *assert.CollectT) { | |
| assert.Equal(t, "stdout: hello\n", stdout.String()) | |
| assert.Equal(t, "stderr: hello\n", stderr.String()) |
| require.EventuallyWithT(t, func(collect *assert.CollectT) { | ||
| assert.Equal(collect, "stdout: hello\n", stdout.String()) | ||
| assert.Equal(collect, "stderr: hello\n", stderr.String()) |
There was a problem hiding this comment.
Same suggestion as above.
| require.EventuallyWithT(t, func(collect *assert.CollectT) { | |
| assert.Equal(collect, "stdout: hello\n", stdout.String()) | |
| assert.Equal(collect, "stderr: hello\n", stderr.String()) | |
| require.EventuallyWithT(t, func(t *assert.CollectT) { | |
| assert.Equal(t, "stdout: hello\n", stdout.String()) | |
| assert.Equal(t, "stderr: hello\n", stderr.String()) |
| assert.EventuallyWithT(t, func(collect *assert.CollectT) { | ||
| assert.Equal(collect, "stdout: hello\n", stdout.String()) | ||
| assert.Equal(collect, "stderr: hello\n", stderr.String()) | ||
| }, time.Second, 10*time.Millisecond) |
There was a problem hiding this comment.
Maybe consider bumping these up to something that seems a bit unreasonable to prevent CI from flaking? Same for below.
| }, time.Second, 10*time.Millisecond) | |
| }, 10*time.Second, 100*time.Millisecond) |
| return assert.EventuallyWithT(t, func(collect *assert.CollectT) { | ||
| assert.FileExists(collect, testFile) |
There was a problem hiding this comment.
Same suggestion as above.
| return assert.EventuallyWithT(t, func(collect *assert.CollectT) { | |
| assert.FileExists(collect, testFile) | |
| return assert.EventuallyWithT(t, func(t *assert.CollectT) { | |
| assert.FileExists(t, testFile) |
c6cc90e to
f62e8e3
Compare
| for _, file := range cmd.ExtraFiles { | ||
| if err := file.Close(); err != nil { | ||
| return trace.Wrap(err) | ||
| } | ||
| } |
There was a problem hiding this comment.
This should close cmd.signalW and cmd.killR rather than the ExtraFiles, ExtraFiles is empty on windows.
| _, err := cmd.signalR.Read(make([]byte, 1)) | ||
| disownReady <- err |
There was a problem hiding this comment.
Protection against 1, io.EOF, 0, nil and other unsavory situations.
| _, err := cmd.signalR.Read(make([]byte, 1)) | |
| disownReady <- err | |
| n, err := cmd.signalR.Read(make([]byte, 1)) | |
| if n > 0 { | |
| disownReady <- nil | |
| } else if err == nil { | |
| // this should be impossible according to the io.Reader contract | |
| disownReady <- io.UnexpectedEOF | |
| } else { | |
| disownReady <- err | |
| } |
| return trace.Wrap(waitErr) | ||
| case <-time.After(3 * time.Second): | ||
| if killErr := cmd.Process.Kill(); killErr != nil { | ||
| return trace.Wrap(killErr) |
There was a problem hiding this comment.
Can't Kill fail if the process was already finished? It also seems weird to leave a goroutine blocked on Wait but if the killing has failed for reasons out of our control there's not much we can truly do.
Perhaps we can give it a couple more seconds to wait for the exit status from Wait() here? In general once we've spawned the goroutine we should do our best not to exit the function before the goroutine is finishing.
| if err := syscall.Dup2(int(devNull.Fd()), int(os.Stdin.Fd())); err != nil { | ||
| return nil, trace.Wrap(err) | ||
| } |
There was a problem hiding this comment.
Fd() has an unfortunate side effect on some files, turning them blocking (and stdin's fd is always 0 syscall.Stdin so we don't need to Fd() it). This was also leaking the new devNull file on early errorful returns.
| if err := syscall.Dup2(int(devNull.Fd()), int(os.Stdin.Fd())); err != nil { | |
| return nil, trace.Wrap(err) | |
| } | |
| rc, err := devNull.SyscallConn() | |
| if err != nil { | |
| _ = devNull.Close() | |
| return nil, trace.Wrap(err) | |
| } | |
| var dupErr error | |
| if ctrlErr := rc.Control(func(fd uintptr) { | |
| dupErr = syscall.Dup2(int(fd), syscall.Stdin) | |
| // stdin is not O_CLOEXEC after dup2 but thankfully the three stdio | |
| // file descriptors must be not O_CLOEXEC anyway, so we can avoid | |
| // a linux-specific implementation or syscall.ForkLock shenanigans | |
| }); ctrlErr != nil { | |
| _ = devNull.Close() | |
| return nil, trace.Wrap(ctrlErr) | |
| } | |
| if dupErr != nil { | |
| _ = devNull.Close() | |
| return nil, trace.Wrap(err) | |
| } |
| return trace.Wrap(err) | ||
| } | ||
| return trace.Wrap(ctx.Err()) |
There was a problem hiding this comment.
These two returns drop cmd without calling its Wait, which is liable to panic due to the finalizer guard.
| func onSSH(cf *CLIConf) error { | ||
| // Handle fork after authentication. | ||
| var disownSignal *os.File | ||
| forkAuthSuccessful := &atomic.Bool{} |
There was a problem hiding this comment.
| forkAuthSuccessful := &atomic.Bool{} | |
| var forkAuthSuccessful atomic.Bool |
espadolini
left a comment
There was a problem hiding this comment.
I would urge you to look at runForkAuthenticateChild again and see if you can simplify it in such a way that it doesn't lead to various resources getting leaked depending on which exit condition we happen to hit.
| } | ||
|
|
||
| func runForkAuthenticateChild(ctx context.Context, cmd *forkAuthCmd) error { | ||
| defer func() { |
There was a problem hiding this comment.
| defer func() { | |
| defer func() { | |
| cmd.killR.Close() | |
| cmd.signalW.Close() |
| if err := cmd.Process.Kill(); err != nil && !strings.Contains(err.Error(), "os: process already released") { | ||
| return trace.Wrap(err) | ||
| } | ||
| return trace.Wrap(cmd.Process.Release()) |
There was a problem hiding this comment.
Is this right? We're waiting on it, if we release it we might not actually see the process exit, will we? Won't Release return an error if we've already successfully waited the process?
| case <-ctx.Done(): | ||
| return trace.Wrap(cmd.killProcess()) |
There was a problem hiding this comment.
This is the only occurrence of ctx in this - what happens if the context actually expires? Won't it result in a zombie process, if the process doesn't cooperate?
espadolini
left a comment
There was a problem hiding this comment.
Please undo the "Rewrite to use signals instead of pipes" commit, nothing was ever made simpler by using more asynchronous signals rather than pipes.
| proc, err := os.FindProcess(ppid) | ||
| if err != nil { | ||
| return trace.Wrap(err) | ||
| } | ||
| return trace.Wrap(proc.Signal(childReadySignal)) |
There was a problem hiding this comment.
This is racy, the pid of the parent could've been reused by the time we hit this.
8a4c722 to
bdedde4
Compare
bdedde4 to
07df1f6
Compare
This change adds fork after authentication support to tsh ssh.
07df1f6 to
b6a1075
Compare
This change adds fork after authentication support to tsh ssh.
This change adds fork after authentication support to tsh ssh.
This change adds fork after authentication support to tsh ssh.
* Add fork after authentication for tsh ssh (#54696) This change adds fork after authentication support to tsh ssh. * tsh: Add wrapper for syscall.Dup2 for linux/arm64 (#55925) * tsh: Add wrapper for syscall.Dup2 for linux/arm64 Add a wrapper for `syscall.Dup2()` as linux ARM64 does not have that syscall. On that platform, `syscall.Dup3()` needs to be used instead. Fixes: 57c909d * Implement dup2 with syscall.Dup3 on all linux platforms * Add explicit "unix" constraint to dup2_unix.go to ensure Windows is excluded --------- Co-authored-by: Cam Hutchison <camh@goteleport.com>
* Add fork after authentication for tsh ssh (#54696) This change adds fork after authentication support to tsh ssh. * tsh: Add wrapper for syscall.Dup2 for linux/arm64 (#55925) * tsh: Add wrapper for syscall.Dup2 for linux/arm64 Add a wrapper for `syscall.Dup2()` as linux ARM64 does not have that syscall. On that platform, `syscall.Dup3()` needs to be used instead. Fixes: 57c909d * Implement dup2 with syscall.Dup3 on all linux platforms * Add explicit "unix" constraint to dup2_unix.go to ensure Windows is excluded --------- Co-authored-by: Cam Hutchison <camh@goteleport.com>
This change adds the ability for
tshto fork after authentication (tsh ssh -f), analogous tossh -f. After authentication completes (including access requests, per-session MFA, etc.), the foregroundtshprocess returns while the actual ssh session continues in a background process. The child process may not read from the original stdin once disowned.Resolves #52255.
Changelog: Added fork after authentication to
tsh ssh