Skip to content

Add fork after authentication for tsh ssh#54696

Merged
atburke merged 1 commit intomasterfrom
atburke/ssh-fork-auth-early
Jun 18, 2025
Merged

Add fork after authentication for tsh ssh#54696
atburke merged 1 commit intomasterfrom
atburke/ssh-fork-auth-early

Conversation

@atburke
Copy link
Copy Markdown
Contributor

@atburke atburke commented May 10, 2025

This change adds the ability for tsh to fork after authentication (tsh ssh -f), analogous to ssh -f. After authentication completes (including access requests, per-session MFA, etc.), the foreground tsh process returns while the actual ssh session continues in a background process. The child process may not read from the original stdin once disowned.

Resolves #52255.

Changelog: Added fork after authentication to tsh ssh

@atburke atburke requested a review from rosstimothy May 10, 2025 00:15
@github-actions github-actions Bot requested a review from avatus May 10, 2025 00:15
@github-actions github-actions Bot added the tsh tsh - Teleport's command line tool for logging into nodes running Teleport. label May 10, 2025
@github-actions github-actions Bot requested a review from kopiczko May 10, 2025 00:15
@avatus
Copy link
Copy Markdown
Contributor

avatus commented May 12, 2025

whats the best way to test this?

@atburke
Copy link
Copy Markdown
Contributor Author

atburke commented May 12, 2025

@avatus The simplest way is to do something like tsh ssh -f <node> "sleep 3 && echo test", so you can see that tsh exits and then test is written afterwards.

@atburke atburke marked this pull request as draft May 13, 2025 20:02
@atburke atburke marked this pull request as ready for review May 14, 2025 16:49
@github-actions github-actions Bot requested review from bl-nero and timothyb89 May 14, 2025 16:50
@atburke
Copy link
Copy Markdown
Contributor Author

atburke commented May 14, 2025

Reviewers, the tests are fixed and this is now ready for review.

Comment thread tool/tsh/common/tsh.go Outdated
Copy link
Copy Markdown
Contributor

@avatus avatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it works for me

@avatus
Copy link
Copy Markdown
Contributor

avatus commented May 15, 2025

I'd like to wait to merge until you have another backender look at this. Thank you!

@rosstimothy rosstimothy requested a review from espadolini May 15, 2025 19:24
@atburke atburke requested review from rosstimothy and removed request for espadolini May 15, 2025 19:24
@rosstimothy rosstimothy requested a review from espadolini May 16, 2025 13:43
Copy link
Copy Markdown
Contributor

@rosstimothy rosstimothy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work when per-session mfa is required. See the attached demo. There is never a prompt for MFA and to the user the connection appears to hang forever.

Screen.Recording.2025-05-16.at.9.43.45.AM.mov

Update: It looks like the prompt can eventually get triggered by the user pressing enter.

Screen.Recording.2025-05-16.at.9.48.20.AM.mov

@public-teleport-github-review-bot public-teleport-github-review-bot Bot removed the request for review from espadolini May 16, 2025 13:45
Comment thread lib/client/reexec.go Outdated
Comment thread lib/client/reexec.go Outdated
Comment thread tool/tsh/common/tsh.go Outdated
Comment thread lib/client/reexec.go Outdated
Copy link
Copy Markdown
Contributor

@espadolini espadolini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have the child process exit if the parent is killed before the detach point, or is that not really necessary?

Comment thread lib/client/reexec.go Outdated
Comment thread lib/client/reexec.go Outdated
Comment on lines +69 to +91
signalR, signalW, err := os.Pipe()
if err != nil {
return nil, trace.Wrap(err)
}
signalFd := addSignalFdToChild(cmd, signalW)

cmd.Args = append(cmd.Args, params.GetArgs(signalFd)...)
cmd.Stdin = params.Stdin
cmd.Stdout = params.Stdout
cmd.Stderr = params.Stderr

return &forkAuthCmd{
Cmd: cmd,
disownSignal: signalR,
}, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the windows case nothing keeps signalW alive in here, so its finalizer is liable to close it before the child is spawned.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still an issue AFAICT. In UNIX there's a []*os.File in exec.Cmd which keeps the signalW object alive, but in windows we just have a []uintptr and the garbage collector is liable to trigger the execution of the finalizer of signalW at any point after line 73 (or even right before addSignalFdToChild returns).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not fixed by unsetting the finalizer here?

runtime.SetFinalizer(signal, nil)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that, but that's... pretty bad? We're now just leaking the read side signal pipe (admittedly we're also doing that on the unix case now that I look at it further, it's just technically not a leak because the file object is still in the exec.Cmd) and we're also never going to unblock on the read if the process dies, since we are holding open the other end of the pipe on our end.

*os.File is also not documented (and thus not guaranteed) to use a finalizer, it could be using a cleanup, it could be using special runtime magic that we can't interact with - and the finalizer on *os.File is a last resort to try to avoid file descriptor leaks in misbehaving code, not something that correct code should ever deal with.

A proper way to do this would be to carry both sides of the pipe in forkAuthCmd (if we have to use a dedicated object for this) and then close the write side immediately after Run.

Comment thread lib/client/reexec.go Outdated
Comment thread lib/client/reexec.go Outdated
go func() {
// The child process will close the pipe when it has authenticated
// and is ready to be disowned.
_, err := cmd.disownSignal.Read(make([]byte, 1))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guaranteed to return 1, nil if the child writes a byte and closes the pipe? Is 1, io.EOF a possibility? Is 0, nil a possibility, technically?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider those cases to be equivalent for our purposes, as we really just the Read to finish. They are now handled.

Comment thread lib/client/reexec.go Outdated
Comment thread lib/client/reexec.go Outdated
Comment thread lib/client/reexec_linux.go
Comment thread tool/tsh/common/tsh.go Outdated
Comment thread tool/tsh/common/tsh.go
Comment thread tool/tsh/common/tsh.go Outdated
Comment on lines +4100 to +4102
if stdin, ok := cf.Stdin().(io.ReadCloser); ok {
errors = append(errors, stdin.Close())
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this level we should not be dealing in go objects, we should open /dev/null and clone it over fd 0 - never close the stdio file descriptors, always replace them with something like /dev/null or newly opened files are liable to end up in there (I'm not sure how things works in windows tho, I'm afraid).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a setsid() call or something, too?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious: is it because reading from a closed stdin is going to fail, or is there some other caveat I'm not aware of?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newly opened files are opened in the lowest file descriptor, and lots of things sort of assume that 0, 1 and 2 are always the stdio fds, so if you close one of those three and then open something else in some other thread you might end up with data going where it's not supposed to go.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wow, that's a good thing to know. Thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up just replacing tc.Stdin with /dev/null, as that's what the ssh session will use, and at the point the child calls OnAuthenticate we won't be in the middle of any reads.

Comment thread lib/client/reexec.go Outdated
Comment on lines +95 to +99
defer cancel()

defer cmd.signalR.Close()
defer cmd.killW.Close()
defer cmd.killW.Write([]byte{0x00})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be any benefit to lumping all of these into a single defer func () { }?

Comment thread lib/client/reexec.go Outdated
Comment on lines +106 to +123
disownReady <- err
if err == nil {
disownReady <- nil
return
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is sending nil if no error is returned of any use? The disownReady channel is only consumed a single time below and we will have already wrote nil to the channel above.

Suggested change
disownReady <- err
if err == nil {
disownReady <- nil
return
}
disownReady <- err

Comment thread lib/client/reexec_test.go Outdated
Comment on lines +107 to +109
assert.EventuallyWithT(t, func(collect *assert.CollectT) {
assert.Equal(collect, "stdout: hello\n", stdout.String())
assert.Equal(collect, "stderr: hello\n", stderr.String())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: shadow t here so it can't accidentally be used instead of collect?

Suggested change
assert.EventuallyWithT(t, func(collect *assert.CollectT) {
assert.Equal(collect, "stdout: hello\n", stdout.String())
assert.Equal(collect, "stderr: hello\n", stderr.String())
assert.EventuallyWithT(t, func(t *assert.CollectT) {
assert.Equal(t, "stdout: hello\n", stdout.String())
assert.Equal(t, "stderr: hello\n", stderr.String())

Comment thread lib/client/reexec_test.go Outdated
Comment on lines +181 to +183
require.EventuallyWithT(t, func(collect *assert.CollectT) {
assert.Equal(collect, "stdout: hello\n", stdout.String())
assert.Equal(collect, "stderr: hello\n", stderr.String())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion as above.

Suggested change
require.EventuallyWithT(t, func(collect *assert.CollectT) {
assert.Equal(collect, "stdout: hello\n", stdout.String())
assert.Equal(collect, "stderr: hello\n", stderr.String())
require.EventuallyWithT(t, func(t *assert.CollectT) {
assert.Equal(t, "stdout: hello\n", stdout.String())
assert.Equal(t, "stderr: hello\n", stderr.String())

Comment thread lib/client/reexec_test.go Outdated
assert.EventuallyWithT(t, func(collect *assert.CollectT) {
assert.Equal(collect, "stdout: hello\n", stdout.String())
assert.Equal(collect, "stderr: hello\n", stderr.String())
}, time.Second, 10*time.Millisecond)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe consider bumping these up to something that seems a bit unreasonable to prevent CI from flaking? Same for below.

Suggested change
}, time.Second, 10*time.Millisecond)
}, 10*time.Second, 100*time.Millisecond)

Comment on lines +7516 to +7540
return assert.EventuallyWithT(t, func(collect *assert.CollectT) {
assert.FileExists(collect, testFile)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion as above.

Suggested change
return assert.EventuallyWithT(t, func(collect *assert.CollectT) {
assert.FileExists(collect, testFile)
return assert.EventuallyWithT(t, func(t *assert.CollectT) {
assert.FileExists(t, testFile)

@atburke atburke force-pushed the atburke/ssh-fork-auth-early branch from c6cc90e to f62e8e3 Compare June 11, 2025 23:25
Comment thread lib/client/reexec.go Outdated
Comment on lines +113 to +117
for _, file := range cmd.ExtraFiles {
if err := file.Close(); err != nil {
return trace.Wrap(err)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should close cmd.signalW and cmd.killR rather than the ExtraFiles, ExtraFiles is empty on windows.

Comment thread lib/client/reexec.go Outdated
Comment on lines +106 to +107
_, err := cmd.signalR.Read(make([]byte, 1))
disownReady <- err
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protection against 1, io.EOF, 0, nil and other unsavory situations.

Suggested change
_, err := cmd.signalR.Read(make([]byte, 1))
disownReady <- err
n, err := cmd.signalR.Read(make([]byte, 1))
if n > 0 {
disownReady <- nil
} else if err == nil {
// this should be impossible according to the io.Reader contract
disownReady <- io.UnexpectedEOF
} else {
disownReady <- err
}

Comment thread lib/client/reexec.go Outdated
return trace.Wrap(waitErr)
case <-time.After(3 * time.Second):
if killErr := cmd.Process.Kill(); killErr != nil {
return trace.Wrap(killErr)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't Kill fail if the process was already finished? It also seems weird to leave a goroutine blocked on Wait but if the killing has failed for reasons out of our control there's not much we can truly do.

Perhaps we can give it a couple more seconds to wait for the exit status from Wait() here? In general once we've spawned the goroutine we should do our best not to exit the function before the goroutine is finishing.

Comment thread tool/tsh/common/reexec_unix.go Outdated
Comment on lines +47 to +66
if err := syscall.Dup2(int(devNull.Fd()), int(os.Stdin.Fd())); err != nil {
return nil, trace.Wrap(err)
}
Copy link
Copy Markdown
Contributor

@espadolini espadolini Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fd() has an unfortunate side effect on some files, turning them blocking (and stdin's fd is always 0 syscall.Stdin so we don't need to Fd() it). This was also leaking the new devNull file on early errorful returns.

Suggested change
if err := syscall.Dup2(int(devNull.Fd()), int(os.Stdin.Fd())); err != nil {
return nil, trace.Wrap(err)
}
rc, err := devNull.SyscallConn()
if err != nil {
_ = devNull.Close()
return nil, trace.Wrap(err)
}
var dupErr error
if ctrlErr := rc.Control(func(fd uintptr) {
dupErr = syscall.Dup2(int(fd), syscall.Stdin)
// stdin is not O_CLOEXEC after dup2 but thankfully the three stdio
// file descriptors must be not O_CLOEXEC anyway, so we can avoid
// a linux-specific implementation or syscall.ForkLock shenanigans
}); ctrlErr != nil {
_ = devNull.Close()
return nil, trace.Wrap(ctrlErr)
}
if dupErr != nil {
_ = devNull.Close()
return nil, trace.Wrap(err)
}

Comment thread lib/client/reexec.go Outdated
Comment on lines +141 to +143
return trace.Wrap(err)
}
return trace.Wrap(ctx.Err())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two returns drop cmd without calling its Wait, which is liable to panic due to the finalizer guard.

Comment thread tool/tsh/common/tsh.go Outdated
func onSSH(cf *CLIConf) error {
// Handle fork after authentication.
var disownSignal *os.File
forkAuthSuccessful := &atomic.Bool{}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
forkAuthSuccessful := &atomic.Bool{}
var forkAuthSuccessful atomic.Bool

Copy link
Copy Markdown
Contributor

@espadolini espadolini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would urge you to look at runForkAuthenticateChild again and see if you can simplify it in such a way that it doesn't lead to various resources getting leaked depending on which exit condition we happen to hit.

Comment thread lib/client/reexec.go
}

func runForkAuthenticateChild(ctx context.Context, cmd *forkAuthCmd) error {
defer func() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defer func() {
defer func() {
cmd.killR.Close()
cmd.signalW.Close()

Comment thread lib/client/reexec.go Outdated
if err := cmd.Process.Kill(); err != nil && !strings.Contains(err.Error(), "os: process already released") {
return trace.Wrap(err)
}
return trace.Wrap(cmd.Process.Release())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? We're waiting on it, if we release it we might not actually see the process exit, will we? Won't Release return an error if we've already successfully waited the process?

Comment thread lib/client/reexec.go Outdated
Comment on lines +162 to +163
case <-ctx.Done():
return trace.Wrap(cmd.killProcess())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only occurrence of ctx in this - what happens if the context actually expires? Won't it result in a zombie process, if the process doesn't cooperate?

Copy link
Copy Markdown
Contributor

@espadolini espadolini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo the "Rewrite to use signals instead of pipes" commit, nothing was ever made simpler by using more asynchronous signals rather than pipes.

Comment thread lib/client/reexec/reexec.go Outdated
Comment on lines +80 to +84
proc, err := os.FindProcess(ppid)
if err != nil {
return trace.Wrap(err)
}
return trace.Wrap(proc.Signal(childReadySignal))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is racy, the pid of the parent could've been reused by the time we hit this.

@atburke atburke force-pushed the atburke/ssh-fork-auth-early branch from 8a4c722 to bdedde4 Compare June 18, 2025 19:37
@atburke atburke enabled auto-merge June 18, 2025 19:37
@atburke atburke force-pushed the atburke/ssh-fork-auth-early branch from bdedde4 to 07df1f6 Compare June 18, 2025 20:02
This change adds fork after authentication support
to tsh ssh.
@atburke atburke force-pushed the atburke/ssh-fork-auth-early branch from 07df1f6 to b6a1075 Compare June 18, 2025 21:29
@atburke atburke added this pull request to the merge queue Jun 18, 2025
Merged via the queue into master with commit 57c909d Jun 18, 2025
39 checks passed
@atburke atburke deleted the atburke/ssh-fork-auth-early branch June 18, 2025 22:15
@backport-bot-workflows
Copy link
Copy Markdown
Contributor

@atburke See the table below for backport results.

Branch Result
branch/v16 Failed
branch/v17 Failed

atburke added a commit that referenced this pull request Jun 18, 2025
This change adds fork after authentication support
to tsh ssh.
atburke added a commit that referenced this pull request Jun 18, 2025
This change adds fork after authentication support
to tsh ssh.
atburke added a commit that referenced this pull request Jun 19, 2025
This change adds fork after authentication support
to tsh ssh.
github-merge-queue Bot pushed a commit that referenced this pull request Jun 24, 2025
* Add fork after authentication for tsh ssh (#54696)

This change adds fork after authentication support
to tsh ssh.

* tsh: Add wrapper for syscall.Dup2 for linux/arm64 (#55925)

* tsh: Add wrapper for syscall.Dup2 for linux/arm64

Add a wrapper for `syscall.Dup2()` as linux ARM64 does not have that
syscall. On that platform, `syscall.Dup3()` needs to be used instead.

Fixes: 57c909d

* Implement dup2 with syscall.Dup3 on all linux platforms

* Add explicit "unix" constraint to dup2_unix.go to ensure Windows is excluded

---------

Co-authored-by: Cam Hutchison <camh@goteleport.com>
github-merge-queue Bot pushed a commit that referenced this pull request Jun 26, 2025
* Add fork after authentication for tsh ssh (#54696)

This change adds fork after authentication support
to tsh ssh.

* tsh: Add wrapper for syscall.Dup2 for linux/arm64 (#55925)

* tsh: Add wrapper for syscall.Dup2 for linux/arm64

Add a wrapper for `syscall.Dup2()` as linux ARM64 does not have that
syscall. On that platform, `syscall.Dup3()` needs to be used instead.

Fixes: 57c909d

* Implement dup2 with syscall.Dup3 on all linux platforms

* Add explicit "unix" constraint to dup2_unix.go to ensure Windows is excluded

---------

Co-authored-by: Cam Hutchison <camh@goteleport.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/branch/v17 backport/branch/v18 size/lg tsh tsh - Teleport's command line tool for logging into nodes running Teleport.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ForkAfterAuthentication support to tsh ssh

5 participants