Coordinate session ID with Node in Proxy recording mode#59850
Coordinate session ID with Node in Proxy recording mode#59850
Conversation
648a58f to
43e9859
Compare
43e9859 to
9330a20
Compare
rosstimothy
left a comment
There was a problem hiding this comment.
Are the changes introduced in this PR causing the integration test failures?
2025-10-10T00:07:19.3723034Z {"timestamp":"2025-10-09T23:57:48Z","level":"warning","caller":"forward/sshserver.go:1208","message":"Failed to query session ID from target node. Ensure the targeted Teleport Node is upgraded to v19.0.0+ to avoid duplicate events due to mismatched session IDs.","component":"node:forward","src_addr":"127.0.0.1:40788","dst_addr":"127.0.0.1:40529"}
2025-10-10T00:07:19.3723581Z {"timestamp":"2025-10-09T23:57:48Z","level":"warning","caller":"forward/sshserver.go:1218","message":"Remote session open failed","component":"node:forward","src_addr":"127.0.0.1:40788","dst_addr":"127.0.0.1:40529","error":"EOF"}
2025-10-10T00:07:19.3725697Z {"timestamp":"2025-10-09T23:57:48Z","level":"info","caller":"events/emitter.go:287","message":"emitting audit event","event_type":"session.data","fields":{"addr.local":"127.0.0.1:40529","addr.remote":"127.0.0.1:40788","code":"T2006I","ei":2147483646,"event":"session.data","forwarded_by":"00000000-0000-0000-0000-000000000000","login":"ci","namespace":"default","private_key_policy":"none","rx":3722,"server_addr":"127.0.0.1:40529","server_hostname":"e77bb0694b97","server_id":"00000000-0000-0000-0000-000000000000","server_sub_kind":"teleport","server_version":"19.0.0-dev","sid":"","time":"2025-10-09T23:57:48.583Z","trace.component":"audit","tx":4240,"uid":"6c7225c6-8514-4d16-878c-380857132eaa","user":"ci","user_cluster_name":"local-site","user_kind":1,"user_roles":["devs"],"user_traits":{"testing":["integration"]}}}
2025-10-10T00:07:19.3726448Z {"timestamp":"2025-10-09T23:57:48Z","level":"debug","caller":"forward/sshserver.go:863","message":"Closing forwarding server connection and releasing resources","component":"node:forward","src_addr":"127.0.0.1:40788","dst_addr":"127.0.0.1:40529","server_addr":"127.0.0.1:40529"}
2025-10-10T00:07:19.3726532Z integration_test.go:2190:
2025-10-10T00:07:19.3726714Z Error Trace: /__w/teleport/teleport/integration/integration_test.go:2190
2025-10-10T00:07:19.3726869Z /__w/teleport/teleport/integration/integration_test.go:2072
2025-10-10T00:07:19.3726943Z Error: Async error
2025-10-10T00:07:19.3727197Z Test: TestIntegrations/Disconnection/concurrent_connection_limits_exceeded_proxy_recording
2025-10-10T00:07:19.3727570Z Messages: Expected error to contain "administratively prohibited", got: ssh: rejected: connect failed (remote session open failed: EOF)
0f0f18c to
e465a78
Compare
Yeah, it turned out to be a pretty obscure issue related to #18658, since we now send the |
e465a78 to
08e6e5d
Compare
It looks like multiple tests are still failing. |
Yes, there is some changes to leaf cluster behavior, working on tweaking the behavior / tests and then I'll update the comment description and request re-review. |
62016c1 to
292d064
Compare
e86b682 to
835a996
Compare
|
@rosstimothy @nklaassen @marcoandredinis Sorry for letting this one go stale, but all the tests are finally passing. I ended up not making changes to leaf clusters and instead prioritizing existing behavior. |
|
@nklaassen @marcoandredinis Friendly ping to review edit: I am investigating the integration test flake but I don't expect it'll require any big changes. |
…de current-session-id request.
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
…ession ID from node. This way, the forwarder can reject client connections if there is an issue preparing the session ID (impossible join sessions).
0cbc380 to
0a1a7b8
Compare
a883a98 to
1d5961c
Compare
* Generalize PrepareToReceiveSessionID. * Initialize session ID in the connection context and update it from node current-session-id request. * Add session-id-query-v2@goteleport.com request and ensure new session ID is correctly set in proxy recording mode during the channel request. * Replace PrepareToReceiveSessionID with simpler in-place logic. * Don't emit session events or tracker when proxy forwarding to a Teleport Node. * Fix missing session tracker for outdated Teleport Node. * Remove extra major version grace period. * Update integration test. * Cleanup current session ID handling and fix failing tests. * Fix tests. * Address comments. * Restructure currentSessionID handling. * Set newSessionID in test server context. * Fix integration test. * Fix AuditOn integration test. * Address comment on channel close. * Track session on forwarding node. * Fix web shutdown. * Fix nil pointer dereference in test. * Fix test flake. * Fix nil pointer in test. * Fix test flake. * Update lib/srv/ctx.go Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com> * Forwarding Node accepts client connection after receiving preparing session ID from node. This way, the forwarder can reject client connections if there is an issue preparing the session ID (impossible join sessions). * Remove check for session.data event which may not be emitted in time for the test. * Address comments. --------- Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
* Generalize PrepareToReceiveSessionID. * Initialize session ID in the connection context and update it from node current-session-id request. * Add session-id-query-v2@goteleport.com request and ensure new session ID is correctly set in proxy recording mode during the channel request. * Replace PrepareToReceiveSessionID with simpler in-place logic. * Don't emit session events or tracker when proxy forwarding to a Teleport Node. * Fix missing session tracker for outdated Teleport Node. * Remove extra major version grace period. * Update integration test. * Cleanup current session ID handling and fix failing tests. * Fix tests. * Address comments. * Restructure currentSessionID handling. * Set newSessionID in test server context. * Fix integration test. * Fix AuditOn integration test. * Address comment on channel close. * Track session on forwarding node. * Fix web shutdown. * Fix nil pointer dereference in test. * Fix test flake. * Fix nil pointer in test. * Fix test flake. * Update lib/srv/ctx.go Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com> * Forwarding Node accepts client connection after receiving preparing session ID from node. This way, the forwarder can reject client connections if there is an issue preparing the session ID (impossible join sessions). * Remove check for session.data event which may not be emitted in time for the test. * Address comments. --------- Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
* Generalize PrepareToReceiveSessionID. * Initialize session ID in the connection context and update it from node current-session-id request. * Add session-id-query-v2@goteleport.com request and ensure new session ID is correctly set in proxy recording mode during the channel request. * Replace PrepareToReceiveSessionID with simpler in-place logic. * Don't emit session events or tracker when proxy forwarding to a Teleport Node. * Fix missing session tracker for outdated Teleport Node. * Remove extra major version grace period. * Update integration test. * Cleanup current session ID handling and fix failing tests. * Fix tests. * Address comments. * Restructure currentSessionID handling. * Set newSessionID in test server context. * Fix integration test. * Fix AuditOn integration test. * Address comment on channel close. * Track session on forwarding node. * Fix web shutdown. * Fix nil pointer dereference in test. * Fix test flake. * Fix nil pointer in test. * Fix test flake. * Update lib/srv/ctx.go Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com> * Forwarding Node accepts client connection after receiving preparing session ID from node. This way, the forwarder can reject client connections if there is an issue preparing the session ID (impossible join sessions). * Remove check for session.data event which may not be emitted in time for the test. * Address comments. --------- Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
* Fix discrepancies between Node and Proxy recording modes. (#58707) * Replace flaky test with more straightforward event metadata test. (#59610) * Make SSH session client provide session params upfront rather than with synchronous `envs@goteleport.com` requests (#59206) * Coordinate session ID with Node in Proxy recording mode (#59850) * Generalize PrepareToReceiveSessionID. * Initialize session ID in the connection context and update it from node current-session-id request. * Add session-id-query-v2@goteleport.com request and ensure new session ID is correctly set in proxy recording mode during the channel request. * Replace PrepareToReceiveSessionID with simpler in-place logic. * Don't emit session events or tracker when proxy forwarding to a Teleport Node. * Fix missing session tracker for outdated Teleport Node. * Remove extra major version grace period. * Update integration test. * Cleanup current session ID handling and fix failing tests. * Fix tests. * Address comments. * Restructure currentSessionID handling. * Set newSessionID in test server context. * Fix integration test. * Fix AuditOn integration test. * Address comment on channel close. * Track session on forwarding node. * Fix web shutdown. * Fix nil pointer dereference in test. * Fix test flake. * Fix nil pointer in test. * Fix test flake. * Update lib/srv/ctx.go Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com> * Forwarding Node accepts client connection after receiving preparing session ID from node. This way, the forwarder can reject client connections if there is an issue preparing the session ID (impossible join sessions). * Remove check for session.data event which may not be emitted in time for the test. * Address comments. --------- Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com> --------- Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

Changelog: Fix a bug in Proxy recording mode where Teleport Node sessions would result in duplicate audit events with a different session ID.
Details
Overview
When Proxy recording mode is enabled, Teleport Node sessions consist of two actual SSH server sessions - one in an ephemeral Proxy Forwarding Node, and one on the Teleport Node. Currently, these two sessions are handled separately and mistakenly share the same responsibilities, resulting in duplicate audit events and session trackers with mismatched session IDs.
This PR introduces two primary changes to fix the above issue:
Session ID coordination flow
In v17+ servers, the current session ID is sent to the client after a session is started. However, a session is not "started" until the shell/exec request is handled. By this time, the Forwarding Node has already created its own session with a unique ID and made decisions about whether to create a session tracker, emit events, etc.
In the new flow, the session ID is set and sent to the client as soon as the "session" channel is handled. This allows the Forwarding Node to save the session ID and use it when the session is started. Now that the session ID is shared by both server sessions, session responsibilities are properly delegated without any mismatches or duplicates.
Note that in the new flow, the Forwarding Node does not accept the session channel until it receives the current session ID from the Teleport Node. In order to determine whether the session ID will be sent, the Forwarding Node must determine whether the current session ID will actually be sent. only waits for the session ID to be sent if it receives a reply with the
session-id-query-v2@goteleport.comrequest.Backwards Compatibility
This change is backwards compatible in the sense that if either the Node or Proxy is outdated, every session responsibility will still be delegated to at least one of the two services. However, the outdated component may still result in duplicate and mismatched audit events and session trackers:
To enable the above, this PR introduces the "session-id-query-v2@goteleport.com" global server request. The Forwarding Node sends this request to determine whether it should wait for the Teleport Node to send the current session ID. If the Teleport Node does not reply to this request (old Teleport Node), the Forwarding Node will continue with its own session ID like it does currently.
Note that the original "session-id-query@goteleport.com" has been marked for deletion in v20, and "session-id-query-v2@goteleport.com" is marked for deletion in v21. If extended backwards compatibility is a concern (> 1 major version), servers can continue to reply for an additional major version or even though no v19+/v20+ client (respectively) will send the query.
Manual Tests
Last run: (292d064)
teleport statusto get the session ID. it should appear in the session list (e.g.tsh session ls).start,end,data, andleaveevents are emitted with the same session ID.forwarded_byandrecording_mode: proxyfields.teleport statusto get the session ID. it should appear in the session list (e.g.tsh session ls).start,end,data, andleaveevents are emitted with the same session ID.forwarded_byandrecording_mode: proxyfields.teleport statusto get the session ID. it should appear in the session list (e.g.tsh session ls).start,end,data, , andleaveevents are emitted with the same session ID.forwarded_byandrecording_mode: proxyfields.teleport statusto get the session ID. it should appear in the session list (e.g.tsh session ls).start,end,data, andleaveevents are emitted with the same session ID.forwarded_byandrecording_mode: proxyfields.Backwards compatibility:
Depends on #59206
Fixes #42263, #20063