Support latency based routing for agents with multiple replicas #51879

rosstimothy · 2025-02-05T16:20:31Z

It is not uncommon for customers to deploy multiple database, app, and desktop agents which front the same resource in order to improve availability. When connecting to these agents today all routing is performed by choosing a random agent from the available pool until a successful connection is established or all options are exhausted. While this works, and helps spread out the load, it can result in suboptimal network paths being chosen.

The request is to provide an opt-in way for clusters to be configured to prefer a different routing strategy which uses latency as a heuristic instead of the random dialing that happens by default. While this in theory sounds simple, connecting to a resource requires several hops before communicating with the destination. We would need to consider the latency between peer proxies, the reverse tunnel connection between the proxy and the agent, and between the agent and underlying resource.

There are a few alternative shortcuts that could be taken to avoid such complexities. Instead of choosing at random, the current routing strategy could prefer selecting agents that are running the same process, i.e. a proxy and an agent with the same host uuid. This doesn't guarantee the lowest latency route though, as it only considers one hop in the connection path. The most crude and naive way to achieve this would be to attempt to open connections to all replicas and chose the connection that responds first.

Related to #40905.

Add an opt-in env var that tells the proxy to prefer a desktop service running in-process as an alternative to selecting a service at random. This can benefit self-hosted deployments where the proxy and desktop agent run in the same process by preventing the proxy from selecting an agent that may be in another geo over one that's available on the same host. This is a crude and specialized alternative to a more general latency-based dialing strategy (which is still something we'd like to do in the future). Updates #51879 Updates #40905

rosstimothy added the feature-request Used for new features in Teleport, improvements to current should be #enhancements label Feb 5, 2025

zmb3 mentioned this issue Feb 5, 2025

Allow the proxy to prefer an in-process desktop service #51881

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support latency based routing for agents with multiple replicas #51879

Support latency based routing for agents with multiple replicas #51879

rosstimothy commented Feb 5, 2025

Support latency based routing for agents with multiple replicas #51879

Support latency based routing for agents with multiple replicas #51879

Comments

rosstimothy commented Feb 5, 2025