Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support latency based routing for agents with multiple replicas #51879

Open
rosstimothy opened this issue Feb 5, 2025 · 0 comments
Open

Support latency based routing for agents with multiple replicas #51879

rosstimothy opened this issue Feb 5, 2025 · 0 comments
Labels
feature-request Used for new features in Teleport, improvements to current should be #enhancements

Comments

@rosstimothy
Copy link
Contributor

It is not uncommon for customers to deploy multiple database, app, and desktop agents which front the same resource in order to improve availability. When connecting to these agents today all routing is performed by choosing a random agent from the available pool until a successful connection is established or all options are exhausted. While this works, and helps spread out the load, it can result in suboptimal network paths being chosen.

The request is to provide an opt-in way for clusters to be configured to prefer a different routing strategy which uses latency as a heuristic instead of the random dialing that happens by default. While this in theory sounds simple, connecting to a resource requires several hops before communicating with the destination. We would need to consider the latency between peer proxies, the reverse tunnel connection between the proxy and the agent, and between the agent and underlying resource.

There are a few alternative shortcuts that could be taken to avoid such complexities. Instead of choosing at random, the current routing strategy could prefer selecting agents that are running the same process, i.e. a proxy and an agent with the same host uuid. This doesn't guarantee the lowest latency route though, as it only considers one hop in the connection path. The most crude and naive way to achieve this would be to attempt to open connections to all replicas and chose the connection that responds first.

Related to #40905.

@rosstimothy rosstimothy added the feature-request Used for new features in Teleport, improvements to current should be #enhancements label Feb 5, 2025
zmb3 added a commit that referenced this issue Feb 5, 2025
Add an opt-in env var that tells the proxy to prefer a desktop
service running in-process as an alternative to selecting a service
at random. This can benefit self-hosted deployments where the proxy
and desktop agent run in the same process by preventing the proxy
from selecting an agent that may be in another geo over one that's
available on the same host.

This is a crude and specialized alternative to a more general
latency-based dialing strategy (which is still something we'd
like to do in the future).

Updates #51879
Updates #40905
zmb3 added a commit that referenced this issue Feb 5, 2025
Add an opt-in env var that tells the proxy to prefer a desktop
service running in-process as an alternative to selecting a service
at random. This can benefit self-hosted deployments where the proxy
and desktop agent run in the same process by preventing the proxy
from selecting an agent that may be in another geo over one that's
available on the same host.

This is a crude and specialized alternative to a more general
latency-based dialing strategy (which is still something we'd
like to do in the future).

Updates #51879
Updates #40905
zmb3 added a commit that referenced this issue Feb 5, 2025
Add an opt-in env var that tells the proxy to prefer a desktop service running in-process
as an alternative to selecting a service at random. This can benefit self-hosted deployments
where the proxy and desktop agent run in the same process by preventing the proxy from
selecting an agent that may be in another geo over one that's available on the same host.

This is a crude and specialized alternative to a more general latency-based dialing strategy
(which is still something we'd like to do in the future).

Updates #51879
Updates #40905
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Used for new features in Teleport, improvements to current should be #enhancements
Projects
None yet
Development

No branches or pull requests

1 participant