Skip to content

[client,management] Rewrite the SSH feature#4015

Merged
lixmal merged 113 commits intomainfrom
ssh-rewrite
Nov 17, 2025
Merged

[client,management] Rewrite the SSH feature#4015
lixmal merged 113 commits intomainfrom
ssh-rewrite

Conversation

@lixmal
Copy link
Copy Markdown
Collaborator

@lixmal lixmal commented Jun 19, 2025

Describe your changes

NetBird SSH Client

  • Port forwarding
  • Windows support
  • Non-interactive commands
  • Single command execution over SSH

SSH Server

  • SFTP
  • Port forwarding (without user switching)
  • PTY (interactive, non-interactive)
  • Non-PTY (commands)
  • Windows support
  • JWT auth (user identity instead of machine identity, can be turned off)

New Flags

# server
--enable-ssh-local-port-forwarding
--enable-ssh-remote-port-forwarding
--enable-ssh-root
--enable-ssh-sftp
--disable-ssh-auth

# client
--ssh-jwt-cache-ttl

UI

  • Better organization
image image image

Changes

  • Default port changed to 22022
  • Redirect port 22 to 22022 when SSH server enabled
  • Remove implicit OpenSSH firewall port
  • Management now passes its jwks config to peers with enabled ssh server
  • SSH server peers verify incoming clients' jwt with max token age
  • SSH clients request jwts from the IdP and send these for authentication with remote ssh peers
  • Add netbird ssh detect command to detect if the remote peer is running the NetBird server
  • Add netbird ssh proxy command for native ssh and sftp clients. The proxy requests jwts just like netbird ssh, passes them for authentication and bridges the connection between the native client and the remote server.
  • Add jwt capability to the wasm ssh client

Auth Flows

  1. NetBird SSH Flow (netbird ssh <peer>)
flowchart TD
    A[User: netbird ssh &lt;peer&gt;] --> B[Connect to peer:22]
    B --> C[Detect Server Type]
    C --> D[Send 'netbird-detect' request]

    D --> E{Server Response}

    E -->|No NetBird identifier| F[Regular SSH Server]
    E -->|NetBird + JWT required| G[NetBird with JWT Auth]
    E -->|NetBird + No JWT| H[NetBird without JWT]

    F --> I[Standard SSH Connection]
    H --> I

    G --> J[Request JWT from NetBird daemon]

    J --> J1{Check JWT cache}
    J1 -->|Cache valid| J2[Use cached JWT token]
    J1 -->|No cache/expired| J3[OIDC flow:<br/>User authorizes via IDP callback]
    J3 --> J4[Receive & cache JWT token]
    J4 --> K[Connect to peer SSH server]
    J2 --> K

    K --> M[Send JWT authentication request]
    M --> N{JWT Valid?}

    N -->|No| O[Connection Rejected]
    N -->|Yes| P[SSH Session Established]
    I --> P

    P --> Q[Interactive Shell / Execute Command / Port Forwarding]

    style G fill:#f57c00,color:#fff
    style J fill:#1976d2,color:#fff
    style J1 fill:#1976d2,color:#fff
    style J3 fill:#e65100,color:#fff
    style P fill:#388e3c,color:#fff
Loading
  1. Native SSH Flow (e.g. openssh client)
flowchart TD
    A[User: ssh &lt;peer&gt;] --> B[OpenSSH loads config:<br/>/etc/ssh/ssh_config.d/99-netbird.conf]
    B --> D{Host matches NetBird pattern?}

    D -->|No| E[Standard SSH connection]

    D -->|Yes| F[Run detection check:<br/>netbird ssh detect &lt;peer&gt; 22]
    F --> G{Is NetBird SSH server?}

    G -->|No| E

    G -->|Yes - JWT required| H[Activate ProxyCommand:<br/>netbird ssh proxy &lt;peer&gt; 22]

    H --> I[Local SSH Proxy Started]
    I --> J[OpenSSH connects to proxy via stdio]

    J --> K[Proxy requests JWT from daemon]

    K --> K1{Check JWT cache}
    K1 -->|Cache valid| K2[Use cached JWT token]
    K1 -->|No cache/expired| K3[OIDC flow:<br/>User authorizes via IDP callback]
    K3 --> K4[Receive & cache JWT token]
    K4 --> L[Proxy connects to peer SSH server]
    K2 --> L

    L --> M[Proxy sends JWT authentication]

    M --> N{JWT Valid?}
    N -->|No| O[Connection Rejected]
    N -->|Yes| P[Proxy establishes session]

    P --> Q[Bidirectional forwarding:<br/>OpenSSH ↔ Proxy ↔ Peer SSH Server]

    Q --> R[User interacts with remote shell]

    style H fill:#f57c00,color:#fff
    style I fill:#1976d2,color:#fff
    style K fill:#1976d2,color:#fff
    style K1 fill:#1976d2,color:#fff
    style K3 fill:#e65100,color:#fff
    style Q fill:#388e3c,color:#fff

    classDef proxyBox fill:#1565c0,stroke:#0d47a1,stroke-width:2px,color:#fff
    class I,K,L,M,P,Q proxyBox
Loading

Issue ticket number and link

Fixes #4759 #4672 #4456 #4039 #3985 #2498 #4457

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)
  • Extended the README / documentation, if necessary

Documentation

Select exactly one:


By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Summary by CodeRabbit

  • New Features
    • Complete SSH support: embedded SSH server and client (JWT-backed auth), SFTP, local/remote port forwarding, SSH proxy, host-key retrieval, session listing, SSH client config generation, server detection, and UI/WASM controls including SSH JWT cache TTL.
  • Bug Fixes
    • License check made quieter and more robust by suppressing noisy errors during scanning.

@lixmal lixmal marked this pull request as ready for review June 20, 2025 15:25
Copilot AI review requested due to automatic review settings June 20, 2025 15:25

This comment was marked as outdated.

@lixmal lixmal marked this pull request as draft June 22, 2025 14:49
@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
client/ssh/server/userswitching_unix.go (1)

188-200: Improve error context by checking privilegeResult.Error.

Good job adding the nil check for privilegeResult.User (this resolves the previous review concern). However, the error message can be improved by also checking privilegeResult.Error, which contains the reason for denial according to the PrivilegeCheckResult documentation.

Apply this diff to provide better error context:

 func (s *Server) createPtyCommand(privilegeResult PrivilegeCheckResult, ptyReq ssh.Pty, session ssh.Session) (*exec.Cmd, error) {
 	localUser := privilegeResult.User
 	if localUser == nil {
+		if privilegeResult.Error != nil {
+			return nil, fmt.Errorf("privilege check failed: %w", privilegeResult.Error)
+		}
 		return nil, errors.New("no user in privilege result")
 	}

 	if privilegeResult.UsedFallback {
 		return s.createDirectPtyCommand(session, localUser, ptyReq), nil
 	}

 	return s.createPtyLoginCommand(localUser, ptyReq, session)
 }
client/ssh/server/command_execution_windows.go (1)

71-124: Profile handle must be unloaded to prevent resource leak.

LoadUserProfileW mounts the user's registry hive and returns a handle in profile.hProfile that must be released with UnloadUserProfileW. Without this cleanup, each SSH session leaks registry handles and mounted hives, eventually exhausting resources and blocking subsequent logons. This is a critical operational issue that will degrade service over time.

Apply this diff to properly release the profile handle:

 	userenv := windows.NewLazySystemDLL("userenv.dll")
 	loadUserProfileW := userenv.NewProc("LoadUserProfileW")
+	unloadUserProfileW := userenv.NewProc("UnloadUserProfileW")

 	ret, _, err := loadUserProfileW.Call(
 		uintptr(userToken),
 		uintptr(unsafe.Pointer(&profile)),
 	)

 	if ret == 0 {
 		return "", fmt.Errorf("LoadUserProfileW: %w", err)
 	}

 	if profile.lpProfilePath == nil {
 		return "", fmt.Errorf("LoadUserProfileW returned null profile path")
 	}

+	if profile.hProfile != 0 {
+		defer func() {
+			if _, _, unloadErr := unloadUserProfileW.Call(uintptr(userToken), uintptr(profile.hProfile)); unloadErr != windows.ERROR_SUCCESS {
+				log.Debugf("UnloadUserProfileW failed: %v", unloadErr)
+			}
+		}()
+	}
+
 	profilePath := windows.UTF16PtrToString(profile.lpProfilePath)
 	return profilePath, nil
🧹 Nitpick comments (2)
client/ssh/server/userswitching_unix.go (1)

131-153: Consider reusing the existing user.User object to avoid redundant lookup.

getSupplementaryGroups performs a fresh user.Lookup call, but the caller parseUserCredentials already has a *user.User object that was passed in. You can eliminate the redundant lookup by changing the signature to accept *user.User directly and calling localUser.GroupIds().

Apply this diff to eliminate the redundant lookup:

-func (s *Server) getSupplementaryGroups(username string) ([]uint32, error) {
-	u, err := user.Lookup(username)
-	if err != nil {
-		return nil, fmt.Errorf("lookup user %s: %w", username, err)
-	}
-
-	groupIDStrings, err := u.GroupIds()
+func (s *Server) getSupplementaryGroups(u *user.User) ([]uint32, error) {
+	groupIDStrings, err := u.GroupIds()
 	if err != nil {
-		return nil, fmt.Errorf("get group IDs for user %s: %w", username, err)
+		return nil, fmt.Errorf("get group IDs for user %s: %w", u.Username, err)
 	}

 	groups := make([]uint32, len(groupIDStrings))
 	for i, gidStr := range groupIDStrings {
 		gid64, err := strconv.ParseUint(gidStr, 10, 32)
 		if err != nil {
-			return nil, fmt.Errorf("invalid group ID %s for user %s: %w", gidStr, username, err)
+			return nil, fmt.Errorf("invalid group ID %s for user %s: %w", gidStr, u.Username, err)
 		}
 		groups[i] = uint32(gid64)
 	}

 	return groups, nil
 }

And update the caller in parseUserCredentials:

-	groups, err := s.getSupplementaryGroups(localUser.Username)
+	groups, err := s.getSupplementaryGroups(localUser)
client/ssh/server/command_execution.go (1)

78-88: Consider skipping su when UsedFallback is true.

When privilegeResult.UsedFallback is true, the comment in user_utils.go indicates "all privilege checks have already been performed and no additional privilege dropping or root checks are needed - the current user is the target." The current code still attempts createSuCommand first, which will immediately fail and fall back to the executor. For semantic consistency and to avoid the unnecessary su attempt, consider checking UsedFallback upfront and going directly to the executor path when true.

Apply this diff for more semantically correct behavior:

 	// Try su first for system integration (PAM/audit) when privileged
-	cmd, err := s.createSuCommand(session, localUser, hasPty)
-	if err != nil || privilegeResult.UsedFallback {
+	if privilegeResult.UsedFallback {
+		log.Debugf("using fallback user, skipping su and using executor")
+		cmd, cleanup, err := s.createExecutorCommand(session, localUser, hasPty)
+		if err != nil {
+			return nil, nil, fmt.Errorf("create command with privileges: %w", err)
+		}
+		cmd.Env = s.prepareCommandEnv(localUser, session)
+		return cmd, cleanup, nil
+	}
+
+	cmd, err := s.createSuCommand(session, localUser, hasPty)
+	if err != nil {
 		log.Debugf("su command failed, falling back to executor: %v", err)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5f6d415 and 0812992.

📒 Files selected for processing (4)
  • client/cmd/ssh.go (1 hunks)
  • client/ssh/server/command_execution.go (1 hunks)
  • client/ssh/server/command_execution_windows.go (1 hunks)
  • client/ssh/server/userswitching_unix.go (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-11-13T00:29:53.220Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/cmd/ssh_exec_unix.go:53-74
Timestamp: 2025-11-13T00:29:53.220Z
Learning: In client/ssh/server/executor_unix.go, the method ExecuteWithPrivilegeDrop(ctx context.Context, config ExecutorConfig) has a void return type (no error return). It handles failures by exiting the process directly with appropriate exit codes rather than returning errors to the caller.

Applied to files:

  • client/ssh/server/command_execution.go
  • client/cmd/ssh.go
  • client/ssh/server/command_execution_windows.go
  • client/ssh/server/userswitching_unix.go
📚 Learning: 2025-11-14T13:05:31.708Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/ssh/server/userswitching_windows.go:89-139
Timestamp: 2025-11-14T13:05:31.708Z
Learning: In client/ssh/server/executor_windows.go, the WindowsExecutorConfig struct's Pty, PtyWidth, and PtyHeight fields are intentionally left unused for now and will be implemented in a future update.

Applied to files:

  • client/ssh/server/command_execution_windows.go
  • client/ssh/server/userswitching_unix.go
📚 Learning: 2025-11-14T11:11:50.779Z
Learnt from: lixmal
Repo: netbirdio/netbird PR: 4015
File: client/ssh/server/server_test.go:396-406
Timestamp: 2025-11-14T11:11:50.779Z
Learning: On Windows, the NetBird SSH server only supports PowerShell as the shell (powershell.exe or pwsh.exe). cmd.exe and other shells are not supported due to parsing quirks and complexity.

Applied to files:

  • client/ssh/server/command_execution_windows.go
🧬 Code graph analysis (4)
client/ssh/server/command_execution.go (2)
client/ssh/server/server.go (1)
  • Server (115-142)
client/ssh/server/user_utils.go (1)
  • PrivilegeCheckResult (50-66)
client/cmd/ssh.go (6)
client/ssh/server/server.go (2)
  • DefaultSSHPort (31-31)
  • New (170-184)
client/cmd/root.go (1)
  • SetFlagsFromEnvVars (198-219)
util/log.go (2)
  • FindFirstLogPath (77-84)
  • InitLog (32-74)
client/ssh/proxy/proxy.go (1)
  • New (46-61)
client/ssh/client/client.go (3)
  • Dial (284-311)
  • DialOptions (275-281)
  • Client (37-44)
client/ssh/detection/detection.go (4)
  • ServerTypeRegular (31-31)
  • Dialer (35-37)
  • Timeout (23-23)
  • DetectSSHServerType (59-99)
client/ssh/server/command_execution_windows.go (5)
client/ssh/server/server.go (1)
  • Server (115-142)
client/ssh/server/executor_windows.go (1)
  • NewPrivilegeDropper (45-47)
client/ssh/server/executor_unix.go (1)
  • NewPrivilegeDropper (41-43)
client/ssh/server/user_utils.go (1)
  • PrivilegeCheckResult (50-66)
client/ssh/server/winpty/conpty.go (3)
  • PtyConfig (36-42)
  • UserConfig (45-48)
  • ExecutePtyWithUserToken (59-71)
client/ssh/server/userswitching_unix.go (4)
client/ssh/server/server.go (2)
  • New (170-184)
  • Server (115-142)
client/ssh/server/executor_windows.go (1)
  • NewPrivilegeDropper (45-47)
client/ssh/server/executor_unix.go (2)
  • NewPrivilegeDropper (41-43)
  • ExecutorConfig (27-35)
client/ssh/server/user_utils.go (1)
  • PrivilegeCheckResult (50-66)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
  • GitHub Check: Management / Benchmark (amd64, postgres)
  • GitHub Check: Management / Integration (amd64, postgres)
  • GitHub Check: Management / Unit (amd64, mysql)
  • GitHub Check: Management / Unit (amd64, sqlite)
  • GitHub Check: Management / Benchmark (API) (amd64, sqlite)
  • GitHub Check: Management / Benchmark (API) (amd64, postgres)
  • GitHub Check: Relay / Unit (386)
  • GitHub Check: Management / Benchmark (amd64, sqlite)
  • GitHub Check: Client / Unit (386)
  • GitHub Check: Management / Unit (amd64, postgres)
  • GitHub Check: Relay / Unit (amd64, -race)
  • GitHub Check: Client / Unit (amd64)
  • GitHub Check: Client (Docker) / Unit
  • GitHub Check: Android / Build
  • GitHub Check: release
  • GitHub Check: release_ui_darwin
  • GitHub Check: Client / Unit
  • GitHub Check: JS / Lint
  • GitHub Check: Linux
  • GitHub Check: Client / Unit
  • GitHub Check: Darwin
  • GitHub Check: Windows
  • GitHub Check: Client / Unit
🔇 Additional comments (9)
client/ssh/server/userswitching_unix.go (5)

25-60: LGTM! Comprehensive username validation.

The validation logic follows POSIX standards with appropriate security checks (path components, numeric usernames) and practical limits.


76-100: LGTM! Platform-specific login command construction is correct.

The platform-specific handling (including the Arch Linux PAM edge case) is well-implemented with proper error handling.


62-74: LGTM! PTY command creation logic is correct.

Both createPtyLoginCommand and createDirectPtyCommand properly handle their respective privilege paths with appropriate environment and working directory setup.

Also applies to: 202-214


216-233: LGTM! Environment preparation logic is sound.

The environment assembly properly combines user, SSH, and terminal environments with appropriate filtering through acceptEnv for security.


155-181: All helper functions are properly defined in the codebase.

Verification confirms that getUserShell, prepareUserEnv, prepareSSHEnv, and acceptEnv are all defined in client/ssh/server/shell.go and are accessible within the same package. No issues found.

client/cmd/ssh.go (2)

390-408: Flag bindings correctly implemented.

Both short and long flag forms now properly bind to the same struct fields using IntVar/StringVar. For example, -p and --port both update flags.Port, -u and --user both update flags.Username, etc. This resolves the previous issue where long-form flags were not taking effect.


156-175: Error handling properly refactored.

The goroutine now sends errors through errCh instead of calling os.Exit directly, allowing proper cleanup via deferred functions and context cancellation. The main select handles the error channel, signal interrupts, and context completion correctly.

client/ssh/server/command_execution.go (1)

109-110: Stderr properly wired to separate stream.

The code now correctly routes stderr to session.Stderr() instead of collapsing both stdout and stderr to the session stream. This allows SSH clients and tooling to distinguish between the two streams.

client/ssh/server/command_execution_windows.go (1)

16-58: Command execution flow looks well-structured.

The Windows-specific command execution properly handles user environment retrieval, token management with cleanup via defer, and graceful fallback when profile loading fails. The flow correctly propagates the environment through to command preparation.

@lixmal lixmal merged commit d71a827 into main Nov 17, 2025
41 of 46 checks passed
@lixmal lixmal deleted the ssh-rewrite branch November 17, 2025 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected SSH access to routing peer despite non-existent policy

4 participants