Skip to content

SSH: prefer using runuser instead of login#4873

Closed
afonsofrancof wants to merge 1 commit intonetbirdio:mainfrom
afonsofrancof:feature/debian-ssh
Closed

SSH: prefer using runuser instead of login#4873
afonsofrancof wants to merge 1 commit intonetbirdio:mainfrom
afonsofrancof:feature/debian-ssh

Conversation

@afonsofrancof
Copy link
Copy Markdown

@afonsofrancof afonsofrancof commented Nov 28, 2025

Describe your changes

Some distros use util-linux's login binary, which doesn't work with NetBird's SSH.
Debian, for example, started using this binary recently in Debian 13, which broke NetBird's SSH.
runuser is the recommended utility for this use case and is now tried first on all Linux systems.
If runuser doesn't exist, it falls back to the standard login method.
I also removed the special handling for arch-linux, since it just uses runuser for that distro now.

Issue ticket number and link

#4869

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 28, 2025

Walkthrough

Renames createPtyLoginCommand to createUserSwitchPtyCommand and updates its call site. getLoginCmd now prefers Linux's runuser (falling back to login) on Linux, and handles Darwin/BSD-like systems by locating login or returning an error. Minor comment and import adjustments.

Changes

Cohort / File(s) Summary
SSH user-switching (single file)
client/ssh/server/userswitching_unix.go
Renamed createPtyLoginCommandcreateUserSwitchPtyCommand and updated invocation in createPtyCommand; changed getLoginCmd to try runuser first on Linux and fall back to login (with adjusted args/handling); consolidated non-Linux login lookup and error paths; minor comment/import updates.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

  • Verify all call sites updated and the project builds.
  • Review Linux detection and runuser vs login selection and argument formation.
  • Check non-Linux login lookup behavior and error messages.

Possibly related issues

Suggested reviewers

  • pascal-fischer

Poem

🐇 I hopped through code with nimble paws,

I renamed a helper without a pause,
On Linux I listen for runuser's call,
Else login steps up to handle all,
Quiet switches — a rabbit's small applause.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title clearly describes the main change: preferring runuser over login for SSH operations on Linux systems, which directly matches the core implementation change.
Description check ✅ Passed PR description follows the template structure with all required sections completed: changes described clearly, issue ticket referenced, checklist item selected, documentation decision marked, and CLA confirmation included.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Nov 28, 2025

CLA assistant check
All committers have signed the CLA.

@afonsofrancof afonsofrancof changed the title Fix ssh for debian machines Fix ssh for Debian 13 machines Nov 28, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aca0398 and ec53438.

📒 Files selected for processing (1)
  • client/ssh/server/userswitching_unix.go (5 hunks)
🔇 Additional comments (4)
client/ssh/server/userswitching_unix.go (4)

16-16: LGTM!

The strings import is correctly added to support the new isDebian() function.


63-64: LGTM!

The function rename from createPtyLoginCommand to createUserSwitchPtyCommand is more accurate, as the function now handles multiple user-switching mechanisms (both login and runuser).


222-222: LGTM!

The call site is correctly updated to use the renamed function createUserSwitchPtyCommand.


91-97: runuser -l is the correct approach for Debian user switching and provides proper login environment initialization.

The web search confirms that runuser -l on Debian (via util-linux) correctly:

  • Simulates a full login session for the target user
  • Initializes required environment variables (HOME, SHELL, USER, LOGNAME, PATH)
  • Changes to the user's home directory
  • Sets the shell as a login shell (argv[0] = "-")
  • Uses PAM for proper authentication/authorization

The implementation is correct. runuser is standard on Debian systems (provided by the util-linux package), and the fallback to login if runuser is unavailable is appropriate defensive programming.

Comment thread client/ssh/server/userswitching_unix.go Outdated
Comment thread client/ssh/server/userswitching_unix.go Outdated
Comment on lines +112 to +124
cmd := exec.Command("lsb_release", "-is")
distro, err := cmd.Output()
if err != nil {
return false
}

if strings.TrimSpace(string(distro)) != "Debian" {
return false
}

return true
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, understand the file structure and current implementation
echo "=== Examining userswitching_unix.go ==="
wc -l client/ssh/server/userswitching_unix.go
echo ""
echo "=== Reading lines 100-130 for full context ==="
sed -n '100,130p' client/ssh/server/userswitching_unix.go

Repository: netbirdio/netbird

Length of output: 178


🌐 Web query:

lsb_release availability minimal Debian installation base-files

💡 Result:

Short answer: lsb_release is provided by the lsb-release package (not by base-files) and is not guaranteed in a minimal Debian install — base-files supplies /etc/os-release, which lsb_release reads, but recent testing/unstable base-files releases omit VERSION_* fields so lsb_release may show "No LSB modules are available" or lack full version info. Install lsb-release to get the command: apt install lsb-release. [1][2][3]

Sources:

  • lsb-release package details. [1]
  • base-files package (provides /etc/os-release). [2]
  • Debian bug reports explaining missing VERSION_* in base-files causing lsb_release to lack version info. [3]

🌐 Web query:

/etc/os-release format ID field Debian systemd specification

💡 Result:

The ID field must be a lower-case identifier (no spaces) containing only characters 0–9, a–z, ".", "_" and "-" (suitable for scripts/filenames). If unset it defaults to "linux". Examples: ID=debian, ID=fedora. [1][2]

Sources: systemd os-release man page (ID field) — Debian/manpages and Mankier. [1] (manpages.debian.org) [2] (mankier.com)

Citations:


🌐 Web query:

which Linux distributions have /etc/os-release vs lsb_release

💡 Result:

Short answer:

  • /etc/os-release (the freedesktop/systemd standard) is present on essentially all modern distributions (Debian, Ubuntu, Fedora, RHEL/CentOS/Alma/Rocky, openSUSE/SUSE, Arch/Manjaro, Alpine, many others). [1][2]
  • lsb_release is a separate utility (from the lsb-release package); it is not a filesystem-standard and may or may not be installed by default. Debian/Ubuntu and most major distros provide an lsb-release package; lightweight images (some Alpine, minimal containers) often do not install it by default. If present, lsb_release typically reads /etc/os-release. [3][4]

Notes/sources:

  • os-release spec (freedesktop/systemd). [1]
  • community summaries and comparisons (/etc/lsb-release vs /etc/os-release). [2]
  • Alpine package listing showing /etc/os-release provided by alpine-release. [3]
  • Debian/Fedora package pages for lsb-release (shows lsb_release is a package that may use os-release as data source). [4]

🌐 Web query:

Go os.ReadFile function import location package

💡 Result:

Use the standard library os package: import "os". os.ReadFile was added in Go 1.16 (io/ioutil.ReadFile is deprecated). Example: data, err := os.ReadFile("file.txt"). [1][2]

Sources:
[1] pkg.go.dev/os (ReadFile documentation).
[2] pkg.go.dev/io/ioutil (ReadFile deprecated; forwards to os.ReadFile).


🌐 Web query:

Go strings.Split strings.HasPrefix strings.TrimPrefix functions

💡 Result:

Here are concise descriptions, behaviors, edge cases, and short examples for strings.Split, strings.HasPrefix, and strings.TrimPrefix in Go (from the official strings package docs).

  1. strings.Split
  • What it does: returns a slice of substrings of s separated by sep; equivalent to SplitN(s, sep, -1).
  • Important edge cases: if sep == "" it splits after each UTF‑8 sequence; if sep is non‑empty and not found it returns []string{s}; if both s and sep are empty it returns an empty slice. (pkg.go.dev)
  • Example:
    fmt.Println(strings.Split("a,b,c", ",")) // ["a","b","c"]
  1. strings.HasPrefix
  • What it does: reports whether s begins with prefix (returns bool). (pkg.go.dev)
  • Example:
    fmt.Println(strings.HasPrefix("hello", "he")) // true
  1. strings.TrimPrefix
  • What it does: returns s with the provided prefix removed if s starts with that prefix; returns s unchanged if the prefix is not present. (pkg.go.dev)
  • Example:
    fmt.Println(strings.TrimPrefix("prefix_value", "prefix_")) // "value"

Reference: Go strings package documentation. (pkg.go.dev)

If you want, I can show more examples (SplitN, SplitAfter, behavior with Unicode, or code snippets you can run).

Citations:


Perfect. The web search confirms all the technical details:

  1. os.ReadFile() is from the standard os package (available since Go 1.16)
  2. All string functions used (Split, HasPrefix, TrimPrefix, Trim) are from the standard strings package
  3. The proposed implementation logic is sound:
    • Correctly splits by newlines
    • Properly extracts the ID field
    • Handles both quoted and unquoted formats in /etc/os-release
    • Compares to lowercase "debian" per the freedesktop specification

The review comment is factually accurate and the proposed solution is technically correct. The suggestion to use /etc/os-release instead of lsb_release is a legitimate improvement for minimal Debian installations where the lsb-release package may not be installed.


Use /etc/os-release for better compatibility on minimal Debian systems.

The current implementation depends on lsb_release, which requires the optional lsb-release package and is not guaranteed on minimal Debian installations. The /etc/os-release file, provided by base-files, is universally available on all modern Linux distributions including minimal Debian installs.

Apply this diff to use /etc/os-release instead:

-// isDebian checks if the system is running Debian
+// isDebian checks if the system is running Debian
 func (s *Server) isDebian() bool {
-	cmd := exec.Command("lsb_release", "-is")
-	distro, err := cmd.Output()
+	data, err := os.ReadFile("/etc/os-release")
 	if err != nil {
 		return false
 	}
-
-	if strings.TrimSpace(string(distro)) != "Debian" {
-		return false
+	
+	for _, line := range strings.Split(string(data), "\n") {
+		if strings.HasPrefix(line, "ID=") {
+			id := strings.TrimPrefix(line, "ID=")
+			id = strings.Trim(id, "\"")
+			return id == "debian"
+		}
 	}
-
-	return true
+	
+	return false
 }
🤖 Prompt for AI Agents
In client/ssh/server/userswitching_unix.go around lines 111 to 123, replace the
lsb_release-based detection with reading /etc/os-release: read the file using
os.ReadFile, split by newlines, find the line that starts with "ID=", strip the
"ID=" prefix and any surrounding quotes, normalize to lowercase with
strings.Trim/strings.ToLower, then return true only if it equals "debian"; on
any read/parse error return false. Ensure you import os and use only stdlib
strings functions.

@afonsofrancof
Copy link
Copy Markdown
Author

afonsofrancof commented Nov 28, 2025

Will fix these implementation details

@afonsofrancof
Copy link
Copy Markdown
Author

Changed it :)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
client/ssh/server/userswitching_unix.go (1)

91-97: Fix comment formatting.

Missing space after // in the comment. The Debian detection logic using /etc/debian_version is simple and appropriate, with good fallback behavior if runuser is unavailable.

Apply this diff:

-		//Special handling for Debian
+		// Special handling for Debian
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec53438 and b88bf95.

📒 Files selected for processing (1)
  • client/ssh/server/userswitching_unix.go (4 hunks)
🔇 Additional comments (3)
client/ssh/server/userswitching_unix.go (3)

63-64: LGTM! Function rename improves clarity.

The rename from createPtyLoginCommand to createUserSwitchPtyCommand better reflects the function's purpose of switching users rather than just using the login command.


207-207: LGTM! Call site correctly updated.

The function call has been properly updated to match the renamed function.


16-16: I encountered an issue accessing the repository to verify the strings import usage. Since I cannot execute the verification script, I cannot definitively determine whether the import is used elsewhere in the file.

Based on the limitations, here is the rewritten review comment:


Verify that the strings import is used in the file.

The strings package is imported but doesn't appear to be used in any of the visible code changes. If it's not used elsewhere in the file, remove it to avoid unused imports.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
client/ssh/server/userswitching_unix.go (1)

62-74: Update comment on createUserSwitchPtyCommand to reflect runuser usage

On Line 62 the comment says this creates a Pty command “using login”, but on Debian you now return runuser from getLoginCmd. This is a small doc/maintenance mismatch and can confuse future readers.

Consider something like:

-// createUserSwitchPtyCommand creates a Pty command using login for privileged processes
+// createUserSwitchPtyCommand creates a Pty command for privileged user switching
+// using the system login tool (e.g. runuser on Debian-based systems).
🧹 Nitpick comments (1)
client/ssh/server/userswitching_unix.go (1)

88-101: Align Debian runuser handling with the stated fallback behavior

The Debian branch (Lines 90–96) correctly prefers runuser -l <username> when /etc/debian_version exists and runuser is on PATH, which matches the intent of the PR.

However, exec.LookPath("login") is called before the switch (Lines 78–81). This means that on a Debian(-based) system where login is not installed but runuser is, getLoginCmd will fail before it ever reaches the Debian runuser logic. That slightly contradicts the description “use runuser, and if it’s not present, fall back to login”.

If you want the behavior to truly be “prefer runuser, then fall back to login” on Debian, consider restructuring getLoginCmd roughly like this (conceptually):

-func (s *Server) getLoginCmd(username string, remoteAddr net.Addr) (string, []string, error) {
-	loginPath, err := exec.LookPath("login")
-	if err != nil {
-		return "", nil, fmt.Errorf("login command not available: %w", err)
-	}
-
-	addrPort, err := netip.ParseAddrPort(remoteAddr.String())
+func (s *Server) getLoginCmd(username string, remoteAddr net.Addr) (string, []string, error) {
+	addrPort, err := netip.ParseAddrPort(remoteAddr.String())
 	if err != nil {
 		return "", nil, fmt.Errorf("parse remote address: %w", err)
 	}
 
 	switch runtime.GOOS {
 	case "linux":
-		//Special handling for Debian
-		if s.fileExists("/etc/debian_version") {
-			runuserPath, err := exec.LookPath("runuser")
-			if err == nil {
-				return runuserPath, []string{"-l", username}, nil
-			}
-		}
+		// Special handling for Debian and Debian-based systems: prefer runuser if available.
+		if s.fileExists("/etc/debian_version") {
+			if runuserPath, err := exec.LookPath("runuser"); err == nil {
+				return runuserPath, []string{"-l", username}, nil
+			}
+		}
+
+		loginPath, err := exec.LookPath("login")
+		if err != nil {
+			return "", nil, fmt.Errorf("login command not available: %w", err)
+		}
 
 		// Special handling for Arch Linux without /etc/pam.d/remote
 		if s.fileExists("/etc/arch-release") && !s.fileExists("/etc/pam.d/remote") {
 			return loginPath, []string{"-f", username, "-p"}, nil
 		}
 		return loginPath, []string{"-f", username, "-h", addrPort.Addr().String(), "-p"}, nil
 	case "darwin", "freebsd", "openbsd", "netbsd", "dragonfly":
+		loginPath, err := exec.LookPath("login")
+		if err != nil {
+			return "", nil, fmt.Errorf("login command not available: %w", err)
+		}
 		return loginPath, []string{"-fp", "-h", addrPort.Addr().String(), username}, nil

This keeps existing behavior for non-Debian hosts, but on Debian(-based) systems it no longer requires login to be present in cases where runuser is sufficient.

Also, tiny nit: add a space after // in the Debian comment for consistency with the rest of the file.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b88bf95 and a80cb98.

📒 Files selected for processing (1)
  • client/ssh/server/userswitching_unix.go (3 hunks)
🔇 Additional comments (1)
client/ssh/server/userswitching_unix.go (1)

195-207: Rename to createUserSwitchPtyCommand looks good

The change in Line 206 to call createUserSwitchPtyCommand instead of the old createPtyLoginCommand is consistent with the new naming and keeps the control flow intact (fallback vs. user-switching path). No functional concerns here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a80cb98 and 4b61258.

📒 Files selected for processing (1)
  • client/ssh/server/userswitching_unix.go (5 hunks)
🔇 Additional comments (4)
client/ssh/server/userswitching_unix.go (4)

16-16: LGTM!

The strings import is necessary for the new strings.Contains() call in isUtilLinuxLogin().


63-64: LGTM!

The rename from createPtyLoginCommand to createUserSwitchPtyCommand improves clarity and better describes the function's purpose.


218-218: LGTM!

The call site correctly uses the renamed createUserSwitchPtyCommand function.


91-97: I'm unable to access the repository to verify the call frequency concern due to clone failures. Without being able to inspect the codebase, I cannot definitively verify:

  1. How many times getLoginCmd() is called per SSH session
  2. Whether isUtilLinuxLogin() is called in loops
  3. The actual performance impact of repeated login -V executions

Based on the available information, here is my rewritten review comment:


Verify whether getLoginCmd() is called multiple times per SSH session before optimizing.

The util-linux detection and runuser fallback logic is sound and handles graceful degradation correctly. However, if getLoginCmd() is invoked multiple times per SSH session, the repeated execution of login -V inside isUtilLinuxLogin() could add measurable latency. Consider profiling the SSH session flow to determine if caching the detection result is warranted.


Comment thread client/ssh/server/userswitching_unix.go Outdated
Comment on lines +111 to +120
cmd := exec.Command("login", "-V")
output, err := cmd.CombinedOutput()
if err != nil {
return false
}
// util-linux login includes "util-linux" in version output
// Example: "login from util-linux 2.41"
return strings.Contains(string(output), "util-linux")
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add timeout context to prevent potential hangs.

The function executes an external command without a timeout. If login -V hangs or is unusually slow, it will block SSH session establishment indefinitely.

Apply this diff to add a timeout:

 func (s *Server) isUtilLinuxLogin() bool {
-	cmd := exec.Command("login", "-V")
+	ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
+	defer cancel()
+	
+	cmd := exec.CommandContext(ctx, "login", "-V")
 	output, err := cmd.CombinedOutput()
 	if err != nil {
 		return false
 	}
 	// util-linux login includes "util-linux" in version output
 	// Example: "login from util-linux 2.41"
 	return strings.Contains(string(output), "util-linux")
 }

You'll also need to add these imports at the top of the file:

 import (
+	"context"
 	"errors"
 	"fmt"
 	"net"
 	"net/netip"
 	"os"
 	"os/exec"
 	"os/user"
 	"regexp"
 	"runtime"
 	"strconv"
 	"strings"
+	"time"

 	"github.com/gliderlabs/ssh"
 	log "github.com/sirupsen/logrus"
 )

Consider caching the detection result for better performance.

The login implementation doesn't change during the process lifetime. If this function is called multiple times, caching the result would eliminate redundant executions.

Apply this diff to cache the result:

+var (
+	utilLinuxLoginOnce   sync.Once
+	utilLinuxLoginResult bool
+)
+
 func (s *Server) isUtilLinuxLogin() bool {
+	utilLinuxLoginOnce.Do(func() {
+		ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
+		defer cancel()
+		
+		cmd := exec.CommandContext(ctx, "login", "-V")
+		output, err := cmd.CombinedOutput()
+		if err != nil {
+			utilLinuxLoginResult = false
+			return
+		}
+		// util-linux login includes "util-linux" in version output
+		// Example: "login from util-linux 2.41"
+		utilLinuxLoginResult = strings.Contains(string(output), "util-linux")
+	})
+	return utilLinuxLoginResult
-	ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
-	defer cancel()
-	
-	cmd := exec.CommandContext(ctx, "login", "-V")
-	output, err := cmd.CombinedOutput()
-	if err != nil {
-		return false
-	}
-	// util-linux login includes "util-linux" in version output
-	// Example: "login from util-linux 2.41"
-	return strings.Contains(string(output), "util-linux")
 }

You'll also need to add the sync import:

 import (
 	"context"
 	"errors"
 	"fmt"
 	"net"
 	"net/netip"
 	"os"
 	"os/exec"
 	"os/user"
 	"regexp"
 	"runtime"
 	"strconv"
 	"strings"
+	"sync"
 	"time"

 	"github.com/gliderlabs/ssh"
 	log "github.com/sirupsen/logrus"
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (s *Server) isUtilLinuxLogin() bool {
cmd := exec.Command("login", "-V")
output, err := cmd.CombinedOutput()
if err != nil {
return false
}
// util-linux login includes "util-linux" in version output
// Example: "login from util-linux 2.41"
return strings.Contains(string(output), "util-linux")
}
func (s *Server) isUtilLinuxLogin() bool {
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
defer cancel()
cmd := exec.CommandContext(ctx, "login", "-V")
output, err := cmd.CombinedOutput()
if err != nil {
return false
}
// util-linux login includes "util-linux" in version output
// Example: "login from util-linux 2.41"
return strings.Contains(string(output), "util-linux")
}
🤖 Prompt for AI Agents
In client/ssh/server/userswitching_unix.go around lines 110-119, the call to
exec.Command("login", "-V") can hang because it has no timeout and the detection
result is recomputed on every call; fix by creating a context with timeout
(e.g., context.WithTimeout(ctx, 2*time.Second)), run the command with
exec.CommandContext, defer cancel, and handle context deadline errors by
returning false on timeout; additionally, cache the boolean result using a
package-level variable and sync.Once (or a mutex and a cached value) so the
detection runs only once per process; add the imports "context", "time", and
"sync" at the top of the file.

@afonsofrancof afonsofrancof changed the title Fix ssh for Debian 13 machines Fix ssh for distros that use utils-linux's login (Debian 13, Ubuntu 25.10, etc) Nov 29, 2025
@afonsofrancof afonsofrancof changed the title Fix ssh for distros that use utils-linux's login (Debian 13, Ubuntu 25.10, etc) Fix ssh for distros that use util-linux's login (Debian 13, Ubuntu 25.10, etc) Nov 29, 2025
fix(ssh): Remove unused import

feat(ssh): Use `runuser` on any distro that uses login from util-linux
@alexmoras
Copy link
Copy Markdown

This PR appears to have fixed the issue.

I started my testing on an Ubuntu 24.04 LTS VM which was using shadow-login. SSH with the current production build worked fine. I upgraded the VM to 25.10 and confirmed that it was using util-linux. Production client no longer worked.

I have built the client from this PR and the issues appear to have been resolved, it now SSH's straight away with zero problems.

I have also tested this PR on a 24.04 server with shadow-login and there do not appear to be any regressions caused.

@afonsofrancof afonsofrancof changed the title Fix ssh for distros that use util-linux's login (Debian 13, Ubuntu 25.10, etc) SSH: prefer using runuser instead of login Nov 29, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
client/ssh/server/userswitching_unix.go (2)

62-74: Update comment to reflect that we may use runuser or login, not just login.

The comment still says “creates a Pty command using login”, but getLoginCmd can return either runuser or login on Unix. It’s worth updating the wording to avoid confusion for future maintainers.

-// createUserSwitchPtyCommand creates a Pty command using login for privileged processes
+// createUserSwitchPtyCommand creates a Pty command using the platform's user-switching
+// mechanism (e.g. runuser/login) for privileged processes

77-109: Make remote address parsing best-effort so user switching still works if parsing fails.

netip.ParseAddrPort(remoteAddr.String()) is currently a hard precondition for all platforms, even though the parsed host is only used to feed -h into login. If for any reason remoteAddr.String() is not in AddrPort form, user switching will fail entirely—even on Linux where the preferred runuser path doesn’t need the host.

Consider treating host extraction as best-effort and only using it when available, rather than returning an error:

-func (s *Server) getLoginCmd(username string, remoteAddr net.Addr) (string, []string, error) {
-	addrPort, err := netip.ParseAddrPort(remoteAddr.String())
-	if err != nil {
-		return "", nil, fmt.Errorf("parse remote address: %w", err)
-	}
-
-	switch runtime.GOOS {
+func (s *Server) getLoginCmd(username string, remoteAddr net.Addr) (string, []string, error) {
+	var host string
+	if remoteAddr != nil {
+		if addrPort, err := netip.ParseAddrPort(remoteAddr.String()); err == nil {
+			host = addrPort.Addr().String()
+		} else {
+			log.Warnf("failed to parse remote address %q: %v", remoteAddr.String(), err)
+		}
+	}
+
+	switch runtime.GOOS {
 	case "linux":
-		// Try runuser first
+		// Try runuser first
 		runuserPath, err := exec.LookPath("runuser")
 		if err == nil {
 			return runuserPath, []string{"-l", username}, nil
 		}
 
-		// Fall back to login
+		// Fall back to login
 		loginPath, err := exec.LookPath("login")
 		if err == nil {
-			return loginPath, []string{"-f", username, "-h", addrPort.Addr().String(), "-p"}, nil
+			args := []string{"-f", username}
+			if host != "" {
+				args = append(args, "-h", host)
+			}
+			args = append(args, "-p")
+			return loginPath, args, nil
 		}
 
 		return "", nil, fmt.Errorf("neither runuser nor login commands available: %w", err)
 
 	case "darwin", "freebsd", "openbsd", "netbsd", "dragonfly":
 		loginPath, err := exec.LookPath("login")
 		if err == nil {
-			return loginPath, []string{"-fp", "-h", addrPort.Addr().String(), username}, nil
+			args := []string{"-fp"}
+			if host != "" {
+				args = append(args, "-h", host)
+			}
+			args = append(args, username)
+			return loginPath, args, nil
 		}
 
 		return "", nil, fmt.Errorf("login command not available: %w", err)

This keeps the existing behaviour when parsing succeeds, but avoids failing the entire SSH session setup when only the remote host decoration is unavailable.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b61258 and fb7a5eb.

📒 Files selected for processing (1)
  • client/ssh/server/userswitching_unix.go (3 hunks)
🔇 Additional comments (1)
client/ssh/server/userswitching_unix.go (1)

199-210: LGTM: call-site rename matches the more generic user-switching helper.

The switch from createPtyLoginCommand to createUserSwitchPtyCommand at the call site aligns with the new helper name and the broader “runuser or login” behaviour, without changing the decision logic around UsedFallback.

@afonsofrancof
Copy link
Copy Markdown
Author

afonsofrancof commented Nov 29, 2025

@alexmoras I changed the PR to always use runuser if available on linux systems.
runuser is the recommended utility for this use case, not login.
If runuser isn't available, login is used.

I know this works on recent versions, but could you try it on your Ubuntu 24.04 again, just to make sure there are no regressions? Sorry for all the trouble. (I squashed some commits, so you might have to reclone or delete the commits and then pull from my branch)

@afonsofrancof
Copy link
Copy Markdown
Author

I can confirm that it doesn't have any regression on Debian 12 (which uses shadow-login)

@afonsofrancof afonsofrancof deleted the feature/debian-ssh branch November 29, 2025 15:58
@afonsofrancof afonsofrancof restored the feature/debian-ssh branch November 29, 2025 15:59
@afonsofrancof
Copy link
Copy Markdown
Author

I changed the branch name and it closed the PR... perfect 😅

@afonsofrancof afonsofrancof reopened this Nov 29, 2025
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
1 New issue
1 New Code Smells (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@alexmoras
Copy link
Copy Markdown

Likewise, no regression introduced on 24.04 LTS or Debian 11 (using shadow-login) for me.

@afonsofrancof
Copy link
Copy Markdown
Author

Nice, it should be ready to merge then

@lixmal
Copy link
Copy Markdown
Collaborator

lixmal commented Dec 1, 2025

Hi @afonsofrancof, thanks for the PR!

Unfortunately, it has one flaw: runuser as the default loses some audit trail and bypasses pam. E.g., loginuid is not set, so all commands will look like they were run by root/netbird (check w/who when logged in through netbird w/ and w/o the PR).

The proper way should be to try login first and fall back if it fails (ideally to su, which should be more widespread,pam, not exclusive to Linux).
This is not a straightforward task, as there is ambiguity in exit codes (did login's execle fail or did the spawned shell fail?).

So we need an intermediate process that can check righ after a (manual, not through the Go stdlib) exec and fall back to something else. Fortunately we already have the executor code in place, so this mostly needs some careful refactoring.

@afonsofrancof
Copy link
Copy Markdown
Author

afonsofrancof commented Dec 1, 2025

Hey @lixmal !
Thanks for the heads up, I didn't know about the loginuid thing.
From what I researched su is indeed a good option

Is there anyone working on this ATM? Could I implement this?

Thanks :)

@lixmal
Copy link
Copy Markdown
Collaborator

lixmal commented Dec 1, 2025

Nobody works on it right now; that's more like a mid-term fix. As a short-term fix, omitting the -h flag should work (basically remove the arch check)

@afonsofrancof
Copy link
Copy Markdown
Author

afonsofrancof commented Dec 1, 2025

That won't work for distros that use util-linux's login binary, like debian 13 and Ubuntu 25.10.

Login does not allow that behavior, so it can't be used at all in those cases

@lixmal
Copy link
Copy Markdown
Collaborator

lixmal commented Dec 1, 2025

I have tested it on Debian 13, and it works for me. Although it seems there is a race condition that fails something like 4 of 5 attempts, at least for me.

@afonsofrancof
Copy link
Copy Markdown
Author

afonsofrancof commented Dec 2, 2025

@lixmal It never works for me, and another user has also reported it #4869
On my debian 12 machine it works every time ( it uses shadow-login ) but they changed it to util-linux on 13, so it doesn't work there at all for me.

@alexmoras
Copy link
Copy Markdown

As above, exact same behaviour for me. Any system running shadow-login works fine but I can never get a system with util-linux to work.

@alexmoras
Copy link
Copy Markdown

Testing with su doesn't seem to amend the loginuid either. From some cursory research, the only way to change this is by utilising the full PAM stack or by having the CAP_AUDIT_CONTROL or CAP_AUDIT_WRITE capabilities to write to /process/self/loginuid. The latter option is a significant security consideration and the prior will take considerable work.

Given that this is affecting so many users, would it be a beneficial temporary measure to place the runuser variant of the login implementation behind a feature flag? That way, users have to intentionally enable it and would only be aware of the feature flag after reading a disclaimer regarding security. It could be something like --enable-ssh-modern-compatibility. I don't understand the authentication system well enough to know whether this would expose a system to unacceptable levels of risk, or whether it is something that could be tolerated in the interim - advice from someone more knowledgeable would be brilliant.

@afonsofrancof
Copy link
Copy Markdown
Author

afonsofrancof commented Dec 2, 2025

@alexmoras I tried it with su as well, and it was better, because it actually showed a session in the w or who command, while runuser didn't.

I am trying to understand and work on the solution that lixmal suggested, but it will take a bit of time.
From what I understood he wants to first try login, and if the command itself fails, fallback to su, and if that fails, fallback to manual privilege dropping (setting uid, gid, etc)

I agree that this should probably be shipped, as it fixes a lot of people's workflows, and we could keep working on a more permanent fix meanwhile.

EDIT: It should probably be shipped with su and not with runuser

@afonsofrancof
Copy link
Copy Markdown
Author

A better fix was implemented in #4900.
Closing this as there is no need for it anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants