Skip to content

Coordinate session ID with Node in Proxy recording mode#59850

Merged
Joerger merged 26 commits intomasterfrom
joerger/proxy-coordinate-session-id
Nov 13, 2025
Merged

Coordinate session ID with Node in Proxy recording mode#59850
Joerger merged 26 commits intomasterfrom
joerger/proxy-coordinate-session-id

Conversation

@Joerger
Copy link
Copy Markdown
Contributor

@Joerger Joerger commented Oct 2, 2025

Changelog: Fix a bug in Proxy recording mode where Teleport Node sessions would result in duplicate audit events with a different session ID.

Details

Overview

When Proxy recording mode is enabled, Teleport Node sessions consist of two actual SSH server sessions - one in an ephemeral Proxy Forwarding Node, and one on the Teleport Node. Currently, these two sessions are handled separately and mistakenly share the same responsibilities, resulting in duplicate audit events and session trackers with mismatched session IDs.

This PR introduces two primary changes to fix the above issue:

  • Coordinate the session ID between the Teleport Node and the Proxy Forwarding Node sessions.
  • Properly delegate session recording responsibilities to the forwarding Node:
    • recording the session
    • emitting audit events
    • creating the session tracker backend resource
  • Add "session-id-query-v2@goteleport.com" global server request. See the flow and backwards compatibility sections below for details.
Session ID coordination flow

In v17+ servers, the current session ID is sent to the client after a session is started. However, a session is not "started" until the shell/exec request is handled. By this time, the Forwarding Node has already created its own session with a unique ID and made decisions about whether to create a session tracker, emit events, etc.

sequenceDiagram
  participant Client
  participant Proxy
  participant Forwarding Node
  participant Teleport Node

  Client ->> Proxy: ssh Dial
  Proxy ->> Forwarding Node: Create new ephemeral Forwarding Node and forward connection
  Client ->> Forwarding Node: ssh "session" channel request
  Forwarding Node ->> Teleport Node: ssh "session" channel request
  Teleport Node ->> Forwarding Node: Accept session channel
  Forwarding Node ->> Client: Accept session channel
  Client ->> Forwarding Node: ssh "shell" channel request
  Note over Forwarding Node: Start session (set ID, create tracker, start recorder, emit events)
  Forwarding Node ->> Client: send "current-session-id@goteleport.com"
  Forwarding Node ->> Teleport Node: ssh "shell" channel request
  Note over Teleport Node: Start session (set ID, create tracker, start terminal, emit events)
  Teleport Node ->> Forwarding Node: send "current-session-id@goteleport.com"
  Client <<->> Teleport Node: Forwarded shell session success
Loading

In the new flow, the session ID is set and sent to the client as soon as the "session" channel is handled. This allows the Forwarding Node to save the session ID and use it when the session is started. Now that the session ID is shared by both server sessions, session responsibilities are properly delegated without any mismatches or duplicates.

Note that in the new flow, the Forwarding Node does not accept the session channel until it receives the current session ID from the Teleport Node. In order to determine whether the session ID will be sent, the Forwarding Node must determine whether the current session ID will actually be sent. only waits for the session ID to be sent if it receives a reply with the session-id-query-v2@goteleport.com request.

sequenceDiagram
  participant Client
  participant Proxy
  participant Forwarding Node
  participant Teleport Node

  Client ->> Proxy: ssh Dial
  Proxy ->> Forwarding Node: Create new ephemeral Forwarding Node and forward connection
  Client ->> Forwarding Node: ssh "session" channel request
  Forwarding Node ->> Teleport Node: ssh "session" channel request
  Teleport Node ->> Forwarding Node: Accept session channel and set session ID
  Forwarding Node ->> Teleport Node: send "session-id-query-v2@goteleport.com" request
  Note over Forwarding Node: If the above query returns true, wait for the current session ID. Otherwise, set new session ID
  Teleport Node ->> Forwarding Node: send "current-session-id@goteleport.com"
  Forwarding Node ->> Client: Accept session channel
  Forwarding Node ->> Client: send "current-session-id@goteleport.com"
  Client ->> Forwarding Node: ssh "shell" channel request
  Note over Forwarding Node: Start session (start terminal, start recorder, emit events)
  Forwarding Node ->> Teleport Node: ssh "shell" channel request
  Note over Teleport Node: Start session (create tracker, start terminal)
  Client <<->> Teleport Node: Forwarded shell session success
Loading
Backwards Compatibility

This change is backwards compatible in the sense that if either the Node or Proxy is outdated, every session responsibility will still be delegated to at least one of the two services. However, the outdated component may still result in duplicate and mismatched audit events and session trackers:

  • An outdated Node will not send the session ID in time for the Forwarding Node to use it. As a result, the Proxy will create its own session ID and its own session tracker. The node will continue to emit audit events with the mismatched session ID.
  • An outdated Proxy will not use the session ID sent by the Teleport Node. It will also continue to create a session tracker, resulting in a duplicate session tracker that doesn't match the Teleport Node session ID. However, only the proxy will emit audit events and these events will match the session ID of the session recording.

To enable the above, this PR introduces the "session-id-query-v2@goteleport.com" global server request. The Forwarding Node sends this request to determine whether it should wait for the Teleport Node to send the current session ID. If the Teleport Node does not reply to this request (old Teleport Node), the Forwarding Node will continue with its own session ID like it does currently.

Note that the original "session-id-query@goteleport.com" has been marked for deletion in v20, and "session-id-query-v2@goteleport.com" is marked for deletion in v21. If extended backwards compatibility is a concern (> 1 major version), servers can continue to reply for an additional major version or even though no v19+/v20+ client (respectively) will send the query.

Manual Tests

Last run: (292d064)

  • Proxy recording mode:
    • Teleport Node
      • Start a session and use teleport status to get the session ID. it should appear in the session list (e.g. tsh session ls).
      • End the session. Observe session start, end, data, and leave events are emitted with the same session ID.
        • No duplicate events
        • these events should have the forwarded_by and recording_mode: proxy fields.
    • Agentless Node
      • Start a session and use teleport status to get the session ID. it should appear in the session list (e.g. tsh session ls).
      • End the session. Observe session start, end, data, and leave events are emitted with the same session ID.
        • these events should have the forwarded_by and recording_mode: proxy fields.
  • Node recording mode:
    • Teleport Node
      • Start a session and use teleport status to get the session ID. it should appear in the session list (e.g. tsh session ls).
      • End the session. Observe session start, end, data, , and leave events are emitted with the same session ID.
        • these events should not have the forwarded_by and recording_mode: proxy fields.
    • Agentless Node
      • Start a session and use teleport status to get the session ID. it should appear in the session list (e.g. tsh session ls).
      • End the session. Observe session start, end, data, and leave events are emitted with the same session ID.
        • these events should have the forwarded_by and recording_mode: proxy fields.

Backwards compatibility:

Proxy Node Mismatched ID w/ tracker Duplicate events
new new no no
new old no yes
old new yes no
old old yes yes

Depends on #59206

Fixes #42263, #20063

@Joerger Joerger requested a review from rosstimothy October 2, 2025 22:46
@Joerger Joerger marked this pull request as ready for review October 2, 2025 22:46
@Joerger Joerger force-pushed the joerger/proxy-coordinate-session-id branch 4 times, most recently from 648a58f to 43e9859 Compare October 3, 2025 17:54
Base automatically changed from joerger/ssh-session-req-data to master October 4, 2025 00:05
@Joerger Joerger force-pushed the joerger/proxy-coordinate-session-id branch from 43e9859 to 9330a20 Compare October 6, 2025 17:25
@Joerger Joerger requested a review from rosstimothy October 9, 2025 23:50
Copy link
Copy Markdown
Contributor

@rosstimothy rosstimothy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the changes introduced in this PR causing the integration test failures?

2025-10-10T00:07:19.3723034Z {"timestamp":"2025-10-09T23:57:48Z","level":"warning","caller":"forward/sshserver.go:1208","message":"Failed to query session ID from target node. Ensure the targeted Teleport Node is upgraded to v19.0.0+ to avoid duplicate events due to mismatched session IDs.","component":"node:forward","src_addr":"127.0.0.1:40788","dst_addr":"127.0.0.1:40529"}
2025-10-10T00:07:19.3723581Z {"timestamp":"2025-10-09T23:57:48Z","level":"warning","caller":"forward/sshserver.go:1218","message":"Remote session open failed","component":"node:forward","src_addr":"127.0.0.1:40788","dst_addr":"127.0.0.1:40529","error":"EOF"}
2025-10-10T00:07:19.3725697Z {"timestamp":"2025-10-09T23:57:48Z","level":"info","caller":"events/emitter.go:287","message":"emitting audit event","event_type":"session.data","fields":{"addr.local":"127.0.0.1:40529","addr.remote":"127.0.0.1:40788","code":"T2006I","ei":2147483646,"event":"session.data","forwarded_by":"00000000-0000-0000-0000-000000000000","login":"ci","namespace":"default","private_key_policy":"none","rx":3722,"server_addr":"127.0.0.1:40529","server_hostname":"e77bb0694b97","server_id":"00000000-0000-0000-0000-000000000000","server_sub_kind":"teleport","server_version":"19.0.0-dev","sid":"","time":"2025-10-09T23:57:48.583Z","trace.component":"audit","tx":4240,"uid":"6c7225c6-8514-4d16-878c-380857132eaa","user":"ci","user_cluster_name":"local-site","user_kind":1,"user_roles":["devs"],"user_traits":{"testing":["integration"]}}}
2025-10-10T00:07:19.3726448Z {"timestamp":"2025-10-09T23:57:48Z","level":"debug","caller":"forward/sshserver.go:863","message":"Closing forwarding server connection and releasing resources","component":"node:forward","src_addr":"127.0.0.1:40788","dst_addr":"127.0.0.1:40529","server_addr":"127.0.0.1:40529"}
2025-10-10T00:07:19.3726532Z     integration_test.go:2190: 
2025-10-10T00:07:19.3726714Z         	Error Trace:	/__w/teleport/teleport/integration/integration_test.go:2190
2025-10-10T00:07:19.3726869Z         	            				/__w/teleport/teleport/integration/integration_test.go:2072
2025-10-10T00:07:19.3726943Z         	Error:      	Async error
2025-10-10T00:07:19.3727197Z         	Test:       	TestIntegrations/Disconnection/concurrent_connection_limits_exceeded_proxy_recording
2025-10-10T00:07:19.3727570Z         	Messages:   	Expected error to contain "administratively prohibited", got: ssh: rejected: connect failed (remote session open failed: EOF)

@Joerger Joerger force-pushed the joerger/proxy-coordinate-session-id branch from 0f0f18c to e465a78 Compare October 17, 2025 01:44
@Joerger
Copy link
Copy Markdown
Contributor Author

Joerger commented Oct 17, 2025

Are the changes introduced in this PR causing the integration test failures?

Yeah, it turned out to be a pretty obscure issue related to #18658, since we now send the session-id-query-v2@goteleport.com global request before the session channel request for proxy forwarded sessions. We'll see whether that fixes all of the test failures or just one.

@Joerger Joerger force-pushed the joerger/proxy-coordinate-session-id branch from e465a78 to 08e6e5d Compare October 17, 2025 18:37
@rosstimothy
Copy link
Copy Markdown
Contributor

Yeah, it turned out to be a pretty obscure issue related to #18658, since we now send the session-id-query-v2@goteleport.com global request before the session channel request for proxy forwarded sessions. We'll see whether that fixes all of the test failures or just one.

It looks like multiple tests are still failing.

@Joerger
Copy link
Copy Markdown
Contributor Author

Joerger commented Oct 20, 2025

Yeah, it turned out to be a pretty obscure issue related to #18658, since we now send the session-id-query-v2@goteleport.com global request before the session channel request for proxy forwarded sessions. We'll see whether that fixes all of the test failures or just one.

It looks like multiple tests are still failing.

Yes, there is some changes to leaf cluster behavior, working on tweaking the behavior / tests and then I'll update the comment description and request re-review.

@Joerger Joerger force-pushed the joerger/proxy-coordinate-session-id branch 2 times, most recently from 62016c1 to 292d064 Compare November 6, 2025 20:40
@Joerger Joerger force-pushed the joerger/proxy-coordinate-session-id branch from e86b682 to 835a996 Compare November 7, 2025 01:51
@Joerger Joerger requested a review from rosstimothy November 7, 2025 02:40
@Joerger
Copy link
Copy Markdown
Contributor Author

Joerger commented Nov 7, 2025

@rosstimothy @nklaassen @marcoandredinis Sorry for letting this one go stale, but all the tests are finally passing. I ended up not making changes to leaf clusters and instead prioritizing existing behavior.

@Joerger
Copy link
Copy Markdown
Contributor Author

Joerger commented Nov 11, 2025

@nklaassen @marcoandredinis Friendly ping to review

edit: I am investigating the integration test flake but I don't expect it'll require any big changes.

Joerger and others added 7 commits November 11, 2025 12:34
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
…ession ID from node. This way, the forwarder can reject client connections if there is an issue preparing the session ID (impossible join sessions).
@Joerger Joerger force-pushed the joerger/proxy-coordinate-session-id branch from 0cbc380 to 0a1a7b8 Compare November 11, 2025 20:34
@Joerger Joerger mentioned this pull request Nov 11, 2025
36 tasks
Copy link
Copy Markdown
Contributor

@marcoandredinis marcoandredinis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a small test and it seems to work fine.
Running with master, it generated two Session Ended events, but with your branch, it only generated one.

Image

@Joerger Joerger force-pushed the joerger/proxy-coordinate-session-id branch from a883a98 to 1d5961c Compare November 12, 2025 18:05
@public-teleport-github-review-bot public-teleport-github-review-bot bot removed the request for review from nklaassen November 12, 2025 18:12
@Joerger Joerger added this pull request to the merge queue Nov 13, 2025
Merged via the queue into master with commit b8ffed9 Nov 13, 2025
42 checks passed
@Joerger Joerger deleted the joerger/proxy-coordinate-session-id branch November 13, 2025 01:35
@backport-bot-workflows
Copy link
Copy Markdown
Contributor

@Joerger See the table below for backport results.

Branch Result
branch/v18 Failed

Joerger added a commit that referenced this pull request Nov 17, 2025
* Generalize PrepareToReceiveSessionID.

* Initialize session ID in the connection context and update it from node current-session-id request.

* Add session-id-query-v2@goteleport.com request and ensure new session ID is correctly set in proxy recording mode during the channel request.

* Replace PrepareToReceiveSessionID with simpler in-place logic.

* Don't emit session events or tracker when proxy forwarding to a Teleport Node.

* Fix missing session tracker for outdated Teleport Node.

* Remove extra major version grace period.

* Update integration test.

* Cleanup current session ID handling and fix failing tests.

* Fix tests.

* Address comments.

* Restructure currentSessionID handling.

* Set newSessionID in test server context.

* Fix integration test.

* Fix AuditOn integration test.

* Address comment on channel close.

* Track session on forwarding node.

* Fix web shutdown.

* Fix nil pointer dereference in test.

* Fix test flake.

* Fix nil pointer in test.

* Fix test flake.

* Update lib/srv/ctx.go

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

* Forwarding Node accepts client connection after receiving preparing session ID from node. This way, the forwarder can reject client connections if there is an issue preparing the session ID (impossible join sessions).

* Remove check for session.data event which may not be emitted in time for the test.

* Address comments.

---------

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Joerger added a commit that referenced this pull request Nov 17, 2025
* Generalize PrepareToReceiveSessionID.

* Initialize session ID in the connection context and update it from node current-session-id request.

* Add session-id-query-v2@goteleport.com request and ensure new session ID is correctly set in proxy recording mode during the channel request.

* Replace PrepareToReceiveSessionID with simpler in-place logic.

* Don't emit session events or tracker when proxy forwarding to a Teleport Node.

* Fix missing session tracker for outdated Teleport Node.

* Remove extra major version grace period.

* Update integration test.

* Cleanup current session ID handling and fix failing tests.

* Fix tests.

* Address comments.

* Restructure currentSessionID handling.

* Set newSessionID in test server context.

* Fix integration test.

* Fix AuditOn integration test.

* Address comment on channel close.

* Track session on forwarding node.

* Fix web shutdown.

* Fix nil pointer dereference in test.

* Fix test flake.

* Fix nil pointer in test.

* Fix test flake.

* Update lib/srv/ctx.go

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

* Forwarding Node accepts client connection after receiving preparing session ID from node. This way, the forwarder can reject client connections if there is an issue preparing the session ID (impossible join sessions).

* Remove check for session.data event which may not be emitted in time for the test.

* Address comments.

---------

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Joerger added a commit that referenced this pull request Nov 24, 2025
* Generalize PrepareToReceiveSessionID.

* Initialize session ID in the connection context and update it from node current-session-id request.

* Add session-id-query-v2@goteleport.com request and ensure new session ID is correctly set in proxy recording mode during the channel request.

* Replace PrepareToReceiveSessionID with simpler in-place logic.

* Don't emit session events or tracker when proxy forwarding to a Teleport Node.

* Fix missing session tracker for outdated Teleport Node.

* Remove extra major version grace period.

* Update integration test.

* Cleanup current session ID handling and fix failing tests.

* Fix tests.

* Address comments.

* Restructure currentSessionID handling.

* Set newSessionID in test server context.

* Fix integration test.

* Fix AuditOn integration test.

* Address comment on channel close.

* Track session on forwarding node.

* Fix web shutdown.

* Fix nil pointer dereference in test.

* Fix test flake.

* Fix nil pointer in test.

* Fix test flake.

* Update lib/srv/ctx.go

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

* Forwarding Node accepts client connection after receiving preparing session ID from node. This way, the forwarder can reject client connections if there is an issue preparing the session ID (impossible join sessions).

* Remove check for session.data event which may not be emitted in time for the test.

* Address comments.

---------

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
github-merge-queue bot pushed a commit that referenced this pull request Dec 3, 2025
* Fix discrepancies between Node and Proxy recording modes. (#58707)

* Replace flaky test with more straightforward event metadata test. (#59610)

* Make SSH session client provide session params upfront rather than with synchronous `envs@goteleport.com` requests (#59206)

* Coordinate session ID with Node in Proxy recording mode (#59850)

* Generalize PrepareToReceiveSessionID.

* Initialize session ID in the connection context and update it from node current-session-id request.

* Add session-id-query-v2@goteleport.com request and ensure new session ID is correctly set in proxy recording mode during the channel request.

* Replace PrepareToReceiveSessionID with simpler in-place logic.

* Don't emit session events or tracker when proxy forwarding to a Teleport Node.

* Fix missing session tracker for outdated Teleport Node.

* Remove extra major version grace period.

* Update integration test.

* Cleanup current session ID handling and fix failing tests.

* Fix tests.

* Address comments.

* Restructure currentSessionID handling.

* Set newSessionID in test server context.

* Fix integration test.

* Fix AuditOn integration test.

* Address comment on channel close.

* Track session on forwarding node.

* Fix web shutdown.

* Fix nil pointer dereference in test.

* Fix test flake.

* Fix nil pointer in test.

* Fix test flake.

* Update lib/srv/ctx.go

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

* Forwarding Node accepts client connection after receiving preparing session ID from node. This way, the forwarder can reject client connections if there is an issue preparing the session ID (impossible join sessions).

* Remove check for session.data event which may not be emitted in time for the test.

* Address comments.

---------

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

---------

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Duplicated SSH session recordings

3 participants