Add fork after authentication for tsh ssh by atburke · Pull Request #54696 · gravitational/teleport

atburke · 2025-05-10T00:15:14Z

This change adds the ability for tsh to fork after authentication (tsh ssh -f), analogous to ssh -f. After authentication completes (including access requests, per-session MFA, etc.), the foreground tsh process returns while the actual ssh session continues in a background process. The child process may not read from the original stdin once disowned.

Resolves #52255.

Changelog: Added fork after authentication to tsh ssh

avatus · 2025-05-12T21:13:21Z

whats the best way to test this?

atburke · 2025-05-12T22:25:36Z

@avatus The simplest way is to do something like tsh ssh -f <node> "sleep 3 && echo test", so you can see that tsh exits and then test is written afterwards.

atburke · 2025-05-14T16:55:53Z

Reviewers, the tests are fixed and this is now ready for review.

avatus

it works for me

avatus · 2025-05-15T18:31:31Z

I'd like to wait to merge until you have another backender look at this. Thank you!

rosstimothy

This does not work when per-session mfa is required. See the attached demo. There is never a prompt for MFA and to the user the connection appears to hang forever.

Screen.Recording.2025-05-16.at.9.43.45.AM.mov

Update: It looks like the prompt can eventually get triggered by the user pressing enter.

Screen.Recording.2025-05-16.at.9.48.20.AM.mov

espadolini

Should we have the child process exit if the parent is killed before the detach point, or is that not really necessary?

espadolini · 2025-05-19T17:41:10Z

+	signalR, signalW, err := os.Pipe()
+	if err != nil {
+		return nil, trace.Wrap(err)
+	}
+	signalFd := addSignalFdToChild(cmd, signalW)
+
+	cmd.Args = append(cmd.Args, params.GetArgs(signalFd)...)
+	cmd.Stdin = params.Stdin
+	cmd.Stdout = params.Stdout
+	cmd.Stderr = params.Stderr
+
+	return &forkAuthCmd{
+		Cmd:          cmd,
+		disownSignal: signalR,
+	}, nil
+}


In the windows case nothing keeps signalW alive in here, so its finalizer is liable to close it before the child is spawned.

This is still an issue AFAICT. In UNIX there's a []*os.File in exec.Cmd which keeps the signalW object alive, but in windows we just have a []uintptr and the garbage collector is liable to trigger the execution of the finalizer of signalW at any point after line 73 (or even right before addSignalFdToChild returns).

Is this not fixed by unsetting the finalizer here?

teleport/lib/client/reexec_windows.go

Line 41 in 45bd0da

runtime.SetFinalizer(signal, nil)

I missed that, but that's... pretty bad? We're now just leaking the read side signal pipe (admittedly we're also doing that on the unix case now that I look at it further, it's just technically not a leak because the file object is still in the exec.Cmd) and we're also never going to unblock on the read if the process dies, since we are holding open the other end of the pipe on our end.

*os.File is also not documented (and thus not guaranteed) to use a finalizer, it could be using a cleanup, it could be using special runtime magic that we can't interact with - and the finalizer on *os.File is a last resort to try to avoid file descriptor leaks in misbehaving code, not something that correct code should ever deal with.

A proper way to do this would be to carry both sides of the pipe in forkAuthCmd (if we have to use a dedicated object for this) and then close the write side immediately after Run.

espadolini · 2025-05-19T17:58:41Z

+	go func() {
+		// The child process will close the pipe when it has authenticated
+		// and is ready to be disowned.
+		_, err := cmd.disownSignal.Read(make([]byte, 1))


Is this guaranteed to return 1, nil if the child writes a byte and closes the pipe? Is 1, io.EOF a possibility? Is 0, nil a possibility, technically?

I would consider those cases to be equivalent for our purposes, as we really just the Read to finish. They are now handled.

espadolini · 2025-05-19T18:57:38Z

+					if stdin, ok := cf.Stdin().(io.ReadCloser); ok {
+						errors = append(errors, stdin.Close())
+					}


At this level we should not be dealing in go objects, we should open /dev/null and clone it over fd 0 - never close the stdio file descriptors, always replace them with something like /dev/null or newly opened files are liable to end up in there (I'm not sure how things works in windows tho, I'm afraid).

Do we need a setsid() call or something, too?

I'm curious: is it because reading from a closed stdin is going to fail, or is there some other caveat I'm not aware of?

Newly opened files are opened in the lowest file descriptor, and lots of things sort of assume that 0, 1 and 2 are always the stdio fds, so if you close one of those three and then open something else in some other thread you might end up with data going where it's not supposed to go.

Oh, wow, that's a good thing to know. Thanks!

I ended up just replacing tc.Stdin with /dev/null, as that's what the ssh session will use, and at the point the child calls OnAuthenticate we won't be in the middle of any reads.

rosstimothy · 2025-06-11T19:05:58Z

+	defer cancel()
+
+	defer cmd.signalR.Close()
+	defer cmd.killW.Close()
+	defer cmd.killW.Write([]byte{0x00})


Would there be any benefit to lumping all of these into a single defer func () { }?

rosstimothy · 2025-06-11T19:08:26Z

+		disownReady <- err
+		if err == nil {
+			disownReady <- nil
+			return
+		}


Is sending nil if no error is returned of any use? The disownReady channel is only consumed a single time below and we will have already wrote nil to the channel above.

Suggested change

disownReady <- err

if err == nil {

disownReady <- nil

return

}

disownReady <- err

rosstimothy · 2025-06-11T19:10:20Z

+		assert.EventuallyWithT(t, func(collect *assert.CollectT) {
+			assert.Equal(collect, "stdout: hello\n", stdout.String())
+			assert.Equal(collect, "stderr: hello\n", stderr.String())


Suggestion: shadow t here so it can't accidentally be used instead of collect?

Suggested change

assert.EventuallyWithT(t, func(collect *assert.CollectT) {

assert.Equal(collect, "stdout: hello\n", stdout.String())

assert.Equal(collect, "stderr: hello\n", stderr.String())

assert.EventuallyWithT(t, func(t *assert.CollectT) {

assert.Equal(t, "stdout: hello\n", stdout.String())

assert.Equal(t, "stderr: hello\n", stderr.String())

rosstimothy · 2025-06-11T19:10:48Z

+		require.EventuallyWithT(t, func(collect *assert.CollectT) {
+			assert.Equal(collect, "stdout: hello\n", stdout.String())
+			assert.Equal(collect, "stderr: hello\n", stderr.String())


Same suggestion as above.

Suggested change

require.EventuallyWithT(t, func(collect *assert.CollectT) {

assert.Equal(collect, "stdout: hello\n", stdout.String())

assert.Equal(collect, "stderr: hello\n", stderr.String())

require.EventuallyWithT(t, func(t *assert.CollectT) {

assert.Equal(t, "stdout: hello\n", stdout.String())

assert.Equal(t, "stderr: hello\n", stderr.String())

rosstimothy · 2025-06-11T19:11:53Z

+		assert.EventuallyWithT(t, func(collect *assert.CollectT) {
+			assert.Equal(collect, "stdout: hello\n", stdout.String())
+			assert.Equal(collect, "stderr: hello\n", stderr.String())
+		}, time.Second, 10*time.Millisecond)


Maybe consider bumping these up to something that seems a bit unreasonable to prevent CI from flaking? Same for below.

Suggested change

}, time.Second, 10*time.Millisecond)

}, 10*time.Second, 100*time.Millisecond)

rosstimothy · 2025-06-11T19:13:43Z

+				return assert.EventuallyWithT(t, func(collect *assert.CollectT) {
+					assert.FileExists(collect, testFile)


Same suggestion as above.

Suggested change

return assert.EventuallyWithT(t, func(collect *assert.CollectT) {

assert.FileExists(collect, testFile)

return assert.EventuallyWithT(t, func(t *assert.CollectT) {

assert.FileExists(t, testFile)

espadolini · 2025-06-12T09:57:02Z

+	for _, file := range cmd.ExtraFiles {
+		if err := file.Close(); err != nil {
+			return trace.Wrap(err)
+		}
+	}


This should close cmd.signalW and cmd.killR rather than the ExtraFiles, ExtraFiles is empty on windows.

espadolini · 2025-06-12T09:58:19Z

+		_, err := cmd.signalR.Read(make([]byte, 1))
+		disownReady <- err


Protection against 1, io.EOF, 0, nil and other unsavory situations.

Suggested change

_, err := cmd.signalR.Read(make([]byte, 1))

disownReady <- err

n, err := cmd.signalR.Read(make([]byte, 1))

if n > 0 {

disownReady <- nil

} else if err == nil {

// this should be impossible according to the io.Reader contract

disownReady <- io.UnexpectedEOF

} else {

disownReady <- err

}

espadolini · 2025-06-12T10:03:14Z

+			return trace.Wrap(waitErr)
+		case <-time.After(3 * time.Second):
+			if killErr := cmd.Process.Kill(); killErr != nil {
+				return trace.Wrap(killErr)


Can't Kill fail if the process was already finished? It also seems weird to leave a goroutine blocked on Wait but if the killing has failed for reasons out of our control there's not much we can truly do.

Perhaps we can give it a couple more seconds to wait for the exit status from Wait() here? In general once we've spawned the goroutine we should do our best not to exit the function before the goroutine is finishing.

espadolini · 2025-06-12T10:16:30Z

+	if err := syscall.Dup2(int(devNull.Fd()), int(os.Stdin.Fd())); err != nil {
+		return nil, trace.Wrap(err)
+	}


Fd() has an unfortunate side effect on some files, turning them blocking (and stdin's fd is always 0 syscall.Stdin so we don't need to Fd() it). This was also leaking the new devNull file on early errorful returns.

Suggested change

if err := syscall.Dup2(int(devNull.Fd()), int(os.Stdin.Fd())); err != nil {

return nil, trace.Wrap(err)

}

rc, err := devNull.SyscallConn()

if err != nil {

_ = devNull.Close()

return nil, trace.Wrap(err)

}

var dupErr error

if ctrlErr := rc.Control(func(fd uintptr) {

dupErr = syscall.Dup2(int(fd), syscall.Stdin)

// stdin is not O_CLOEXEC after dup2 but thankfully the three stdio

// file descriptors must be not O_CLOEXEC anyway, so we can avoid

// a linux-specific implementation or syscall.ForkLock shenanigans

}); ctrlErr != nil {

_ = devNull.Close()

return nil, trace.Wrap(ctrlErr)

}

if dupErr != nil {

_ = devNull.Close()

return nil, trace.Wrap(err)

}

espadolini · 2025-06-12T10:19:48Z

+			return trace.Wrap(err)
+		}
+		return trace.Wrap(ctx.Err())


These two returns drop cmd without calling its Wait, which is liable to panic due to the finalizer guard.

espadolini · 2025-06-12T10:23:03Z

 func onSSH(cf *CLIConf) error {
+	// Handle fork after authentication.
+	var disownSignal *os.File
+	forkAuthSuccessful := &atomic.Bool{}


Suggested change

forkAuthSuccessful := &atomic.Bool{}

var forkAuthSuccessful atomic.Bool

espadolini

I would urge you to look at runForkAuthenticateChild again and see if you can simplify it in such a way that it doesn't lead to various resources getting leaked depending on which exit condition we happen to hit.

espadolini · 2025-06-13T11:43:46Z

+}
+
+func runForkAuthenticateChild(ctx context.Context, cmd *forkAuthCmd) error {
+	defer func() {


Suggested change

defer func() {

defer func() {

cmd.killR.Close()

cmd.signalW.Close()

espadolini · 2025-06-13T11:45:59Z

+	if err := cmd.Process.Kill(); err != nil && !strings.Contains(err.Error(), "os: process already released") {
+		return trace.Wrap(err)
+	}
+	return trace.Wrap(cmd.Process.Release())


Is this right? We're waiting on it, if we release it we might not actually see the process exit, will we? Won't Release return an error if we've already successfully waited the process?

espadolini · 2025-06-13T11:48:59Z

+	case <-ctx.Done():
+		return trace.Wrap(cmd.killProcess())


This is the only occurrence of ctx in this - what happens if the context actually expires? Won't it result in a zombie process, if the process doesn't cooperate?

espadolini

Please undo the "Rewrite to use signals instead of pipes" commit, nothing was ever made simpler by using more asynchronous signals rather than pipes.

espadolini · 2025-06-17T09:36:44Z

+	proc, err := os.FindProcess(ppid)
+	if err != nil {
+		return trace.Wrap(err)
+	}
+	return trace.Wrap(proc.Signal(childReadySignal))


This is racy, the pid of the parent could've been reused by the time we hit this.

This change adds fork after authentication support to tsh ssh.

backport-bot-workflows · 2025-06-18T22:17:01Z

@atburke See the table below for backport results.

Branch	Result
branch/v16	Failed
branch/v17	Failed

This change adds fork after authentication support to tsh ssh.

* Add fork after authentication for tsh ssh (#54696) This change adds fork after authentication support to tsh ssh. * tsh: Add wrapper for syscall.Dup2 for linux/arm64 (#55925) * tsh: Add wrapper for syscall.Dup2 for linux/arm64 Add a wrapper for `syscall.Dup2()` as linux ARM64 does not have that syscall. On that platform, `syscall.Dup3()` needs to be used instead. Fixes: 57c909d * Implement dup2 with syscall.Dup3 on all linux platforms * Add explicit "unix" constraint to dup2_unix.go to ensure Windows is excluded --------- Co-authored-by: Cam Hutchison <camh@goteleport.com>

atburke requested a review from rosstimothy May 10, 2025 00:15

atburke added backport/branch/v16 backport/branch/v17 labels May 10, 2025

github-actions Bot added the size/lg label May 10, 2025

github-actions Bot requested a review from avatus May 10, 2025 00:15

github-actions Bot added the tsh tsh - Teleport's command line tool for logging into nodes running Teleport. label May 10, 2025

github-actions Bot requested a review from kopiczko May 10, 2025 00:15

atburke marked this pull request as draft May 13, 2025 20:02

atburke marked this pull request as ready for review May 14, 2025 16:49

github-actions Bot requested review from bl-nero and timothyb89 May 14, 2025 16:50

bl-nero approved these changes May 15, 2025

View reviewed changes

Comment thread tool/tsh/common/tsh.go Outdated

avatus approved these changes May 15, 2025

View reviewed changes

public-teleport-github-review-bot Bot removed request for kopiczko, rosstimothy and timothyb89 May 15, 2025 18:31

rosstimothy requested a review from espadolini May 15, 2025 19:24

atburke requested review from rosstimothy and removed request for espadolini May 15, 2025 19:24

rosstimothy requested a review from espadolini May 16, 2025 13:43

rosstimothy requested changes May 16, 2025

View reviewed changes

public-teleport-github-review-bot Bot removed the request for review from espadolini May 16, 2025 13:45

espadolini reviewed May 16, 2025

View reviewed changes

Comment thread lib/client/reexec.go Outdated

Comment thread lib/client/reexec.go Outdated

Comment thread tool/tsh/common/tsh.go Outdated

Comment thread lib/client/reexec.go Outdated

espadolini reviewed May 19, 2025

View reviewed changes

rosstimothy requested review from espadolini and rosstimothy May 26, 2025 17:11

rosstimothy requested review from avatus, bl-nero, espadolini and rosstimothy June 11, 2025 17:01

rosstimothy reviewed Jun 11, 2025

View reviewed changes

atburke force-pushed the atburke/ssh-fork-auth-early branch from c6cc90e to f62e8e3 Compare June 11, 2025 23:25

espadolini reviewed Jun 12, 2025

View reviewed changes

espadolini reviewed Jun 13, 2025

View reviewed changes

espadolini requested changes Jun 17, 2025

View reviewed changes

espadolini approved these changes Jun 18, 2025

View reviewed changes

rosstimothy approved these changes Jun 18, 2025

View reviewed changes

atburke force-pushed the atburke/ssh-fork-auth-early branch from 8a4c722 to bdedde4 Compare June 18, 2025 19:37

atburke enabled auto-merge June 18, 2025 19:37

atburke force-pushed the atburke/ssh-fork-auth-early branch from bdedde4 to 07df1f6 Compare June 18, 2025 20:02

Add fork after authentication for tsh ssh

b6a1075

This change adds fork after authentication support to tsh ssh.

atburke force-pushed the atburke/ssh-fork-auth-early branch from 07df1f6 to b6a1075 Compare June 18, 2025 21:29

atburke added this pull request to the merge queue Jun 18, 2025

Merged via the queue into master with commit 57c909d Jun 18, 2025
39 checks passed

atburke deleted the atburke/ssh-fork-auth-early branch June 18, 2025 22:15

rosstimothy added the backport/branch/v18 label Jun 18, 2025

atburke added a commit that referenced this pull request Jun 18, 2025

Add fork after authentication for tsh ssh (#54696)

0099523

This change adds fork after authentication support to tsh ssh.

atburke added a commit that referenced this pull request Jun 18, 2025

Add fork after authentication for tsh ssh (#54696)

318b601

This change adds fork after authentication support to tsh ssh.

This was referenced Jun 18, 2025

[v18] Add fork after authentication for tsh ssh #55893

Merged

[v17] Add fork after authentication for tsh ssh #55894

Merged

atburke added a commit that referenced this pull request Jun 19, 2025

Add fork after authentication for tsh ssh (#54696)

3fc2c2c

This change adds fork after authentication support to tsh ssh.

camscale mentioned this pull request Jun 20, 2025

tsh: Add wrapper for syscall.Dup2 for linux/arm64 #55925

Merged

	}, time.Second, 10*time.Millisecond)
	}, 10time.Second, 100time.Millisecond)

		return assert.EventuallyWithT(t, func(collect *assert.CollectT) {
		assert.FileExists(collect, testFile)

		_, err := cmd.signalR.Read(make([]byte, 1))
		disownReady <- err

-		_, err := cmd.signalR.Read(make([]byte, 1))
-		disownReady <- err
+		n, err := cmd.signalR.Read(make([]byte, 1))
+		if n > 0 {
+			disownReady <- nil
+		} else if err == nil {
+			// this should be impossible according to the io.Reader contract
+			disownReady <- io.UnexpectedEOF
+		} else {
+			disownReady <- err
+		}

-	if err := syscall.Dup2(int(devNull.Fd()), int(os.Stdin.Fd())); err != nil {
-		return nil, trace.Wrap(err)
-	}
+	rc, err := devNull.SyscallConn()
+	if err != nil {
+		_ = devNull.Close()
+		return nil, trace.Wrap(err)
+	}
+	var dupErr error
+	if ctrlErr := rc.Control(func(fd uintptr) {
+		dupErr = syscall.Dup2(int(fd), syscall.Stdin)
+		// stdin is not O_CLOEXEC after dup2 but thankfully the three stdio
+		// file descriptors must be not O_CLOEXEC anyway, so we can avoid
+		// a linux-specific implementation or syscall.ForkLock shenanigans
+	}); ctrlErr != nil {
+		_ = devNull.Close()
+		return nil, trace.Wrap(ctrlErr)
+	}
+	if dupErr != nil {
+		_ = devNull.Close()
+		return nil, trace.Wrap(err)
+	}

	forkAuthSuccessful := &atomic.Bool{}
	var forkAuthSuccessful atomic.Bool

Conversation

atburke commented May 10, 2025 • edited by camscale Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avatus commented May 12, 2025

Uh oh!

atburke commented May 12, 2025

Uh oh!

atburke commented May 14, 2025

Uh oh!

Uh oh!

avatus left a comment

Choose a reason for hiding this comment

Uh oh!

avatus commented May 15, 2025

Uh oh!

rosstimothy left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

espadolini left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

espadolini Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atburke commented May 10, 2025 •

edited by camscale

Loading

rosstimothy left a comment •

edited

Loading

espadolini Jun 12, 2025 •

edited

Loading