[v17] fix(app): prevent premature removal of app servers from session cache#59955
Merged
tigrato merged 2 commits intobranch/v17from Oct 6, 2025
Merged
[v17] fix(app): prevent premature removal of app servers from session cache#59955tigrato merged 2 commits intobranch/v17from
tigrato merged 2 commits intobranch/v17from
Conversation
When a user accesses an app and a new session (i.e. app server certificate) is created, the teleport proxy caches a `session` object for the entire duration of that certificate. The session context includes the list of app servers available at the time the session is created. This list is static and is used to dial the target app service via reverse tunnel. This is already suboptimal, as new app servers (app services with different `host_id`) that can join later will be ingored. This static list exists to create some affinity and the same proxy forwards the user requests to the same app service. The core issue liest in a piece of code that this PR removes. That code prunes app servers from the session list if the proxy fails to connect to them. This is problematic beause an app server might have temporarily gone offline (e.g. restarted or dropped the connection), and it gets removed permanently. Even if it later reconnects to the cluster, the session will no longer include it, causing all future requests to fail because the list of servers is now empty. This issue is especially harmful for machine clients that automatically retry on disconnection. As retries continue, the session removes all servers, leading to permanent failure state. Since the session is cached for the full certificate duration, the only workaround is to manually restart the teleport proxy to refresh the session state. This PR removes the session pruning logic as a temporary fix, allowing clients to retry dialing to previously failed app servers if they reconnect during the session. Signed-off-by: Tiago Silva <tiago.silva@goteleport.com>
Tener
approved these changes
Oct 6, 2025
boxofrad
approved these changes
Oct 6, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport #59900 to branch/v17
changelog: Fixed issue where temporarily unreachable app servers were permanently removed from session cache, causing persistent connection failures:
no application servers remaining to connect.