diff --git a/docs/config.json b/docs/config.json index ec6bba174cb9f..5c5fdf52774f6 100644 --- a/docs/config.json +++ b/docs/config.json @@ -204,6 +204,14 @@ "enterprise" ] }, + { + "title": "AWS Multi-region Proxy Deployment", + "slug": "/deploy-a-cluster/deployments/aws-gslb-proxy-peering-ha-deployment/", + "forScopes": [ + "oss", + "enterprise" + ] + }, { "title": "GCP", "slug": "/deploy-a-cluster/deployments/gcp/", diff --git a/docs/cspell.json b/docs/cspell.json index 49618c40edfb4..5df80dc237eda 100644 --- a/docs/cspell.json +++ b/docs/cspell.json @@ -66,6 +66,8 @@ "Gbps", "Goland", "Grafana's", + "GSLB", + "gslb", "Gtczk", "HSTS", "Hqlo", diff --git a/docs/img/deploy-a-cluster/aws-gslb-proxy-peering-ha-deployment.png b/docs/img/deploy-a-cluster/aws-gslb-proxy-peering-ha-deployment.png new file mode 100644 index 0000000000000..e43817f5d6ca4 Binary files /dev/null and b/docs/img/deploy-a-cluster/aws-gslb-proxy-peering-ha-deployment.png differ diff --git a/docs/pages/contributing/documentation/how-to-contribute.mdx b/docs/pages/contributing/documentation/how-to-contribute.mdx index e411d9bf5b1c1..f86f4822846e8 100644 --- a/docs/pages/contributing/documentation/how-to-contribute.mdx +++ b/docs/pages/contributing/documentation/how-to-contribute.mdx @@ -93,8 +93,7 @@ the most recent version of our documentation are reflected for the versions of Teleport we currently support. -You can find our list of currently supported versions in the FAQ: -https://goteleport.com/docs/faq/#which-version-of-teleport-is-supported +You can find our list of currently supported versions in the [FAQ](../../faq.mdx#which-version-of-teleport-is-supported). There are many ways to create a backport, and we will illustrate three common diff --git a/docs/pages/database-access/troubleshooting.mdx b/docs/pages/database-access/troubleshooting.mdx index 7e80c8ada2d90..74492671a30f6 100644 --- a/docs/pages/database-access/troubleshooting.mdx +++ b/docs/pages/database-access/troubleshooting.mdx @@ -142,12 +142,12 @@ access see the [RBAC](../database-access/rbac.mdx) documentation. When TLS Routing is disable by default, the Teleport Proxy Service returns `8.0.0-Teleport` as the MySQL server version. In some cases, like connecting with a GUI Client, this can result in obtaining an `Unknown system variable 'query_cache_size'` error that indicates that MySQL capabilities were not properly negotiated between the MySQL client and server. -One way to solve this issue is to [use the TLS Routing feature](https://goteleport.com/docs/manageAment/operations/tls-routing/), where the Teleport Proxy Service propagates the correct MySQL server version via TLS Routing extensions. +One way to solve this issue is to [use the TLS Routing feature](../management/operations/tls-routing.mdx), where the Teleport Proxy Service propagates the correct MySQL server version via TLS Routing extensions. -If migration to TLS Routing is not possible, another way to bypass this error is to use the [Teleport local proxy command](https://goteleport.com/docs/connect-your-client/gui-clients/#get-connection-information), which allows you to establish a TLS Routing connection to the Teleport Proxy Service even if TLS Routing was not enabled on the Teleport cluster. +If migration to TLS Routing is not possible, another way to bypass this error is to use the [Teleport local proxy command](../connect-your-client/gui-clients.mdx#get-connection-information), which allows you to establish a TLS Routing connection to the Teleport Proxy Service even if TLS Routing was not enabled on the Teleport cluster. Another possibility is to overwrite the default MySQL server version (8.0.0-Teleport) returned by the Teleport Proxy Service. To do this, assign the `mysql_server_version` field in the `proxy_service` configuration block on your Teleport Proxy Service instances: ```yaml proxy_service: mysql_server_version: "8.0.4" -``` \ No newline at end of file +``` diff --git a/docs/pages/deploy-a-cluster/deployments/aws-gslb-proxy-peering-ha-deployment.mdx b/docs/pages/deploy-a-cluster/deployments/aws-gslb-proxy-peering-ha-deployment.mdx new file mode 100644 index 0000000000000..90991a74912aa --- /dev/null +++ b/docs/pages/deploy-a-cluster/deployments/aws-gslb-proxy-peering-ha-deployment.mdx @@ -0,0 +1,247 @@ +--- +title: "AWS Multi-Region High Availability Deployment Guide" +description: "Deploying a high-availability Teleport cluster using Proxy Peering and Route 53 to create global server load balancing." +--- + +This deployment architecture features two important design decisions: + +1. AWS Route 53 latency-based routing is used for global server load balancing + ([GSLB](https://www.cloudflare.com/learning/cdn/glossary/global-server-load-balancing-gslb/)). + This allows for efficient distribution of traffic across resources that are globally distributed. +2. Teleport's [Proxy Peering](../../architecture/proxy-peering.mdx) is used to reduce the total number of tunnel connections in the Teleport cluster. + +This deployment architecture isn't recommended for use cases where your users or resources are +clustered in a single region, or for Managed Service Providers needing to provide separate clusters +to customers. + +This architecture is best suited for globally distributed resources and end-users that prefer a single point of +entry while also ensuring minimal latency when accessing connected resources. + +## Key deployment components + +- Deployed exclusively in the AWS ecosystem +- High-availability Auto Scaling group of Auth Service instances that must remain in a single region +- High-availability Auto Scaling group of Proxy Service instances deployed across multiple regions +- [AWS Route 53 latency-based routing](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy-latency.html) +- [GSLB](https://www.cloudflare.com/learning/cdn/glossary/global-server-load-balancing-gslb/) +- [Teleport TLS Routing](../../architecture/tls-routing.mdx) to reduce the number of ports needed to use Teleport +- [Teleport Proxy Peering](../../architecture/proxy-peering.mdx) for reducing the number of resource connections +- [AWS Network Load Balancing](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) +- [AWS DynamoDB](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.html) for cluster state storage +- [AWS S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) for session recording storage + +## Advantages of this deployment architecture + +- Eliminates the complexity and cost of maintaining multiple Teleport clusters across multiple regions. +- Uses the lowest-latency path to connect users to resources. +- Provides a highly resilient, redundant HA architecture for Teleport that can quickly + scale with an organization's needs. +- All required Teleport components can be provisioned within the AWS ecosystem. +- Using load balancers for the Proxy and Auth Services allows for increased availability + during Teleport cluster upgrades. + +## Disadvantages of this deployment architecture + +- When Teleport Auth Service instances are limited to a single region, there is a higher likelihood + of decreased availability during an AWS regional outage. +- More technically complex to deploy than a single region Teleport cluster. + +![Diagram showing this Teleport +architecture](../../../img/deploy-a-cluster/aws-gslb-proxy-peering-ha-deployment.png) + + +## AWS Network Load Balancer (NLB) +AWS NLBs are required for this highly available deployment architecture. +The NLB forwards traffic from users and services to an available Teleport Proxy Service instance. This must not +terminate TLS, and must transparently forward the TCP traffic it receives. +In other words, this must be a Layer 4 load balancer, not a Layer 7 +(e.g., HTTP) load balancer. + + +Cross-zone load balancing is required for the Auth and Proxy service NLB configurations to route +traffic across multiple zones. Doing this improves resiliency against localized AWS zone outages. + + +### Configure the Proxy Service NLBs + +Configure the load balancer to forward traffic from the following ports on the +load balancer to the corresponding port on an available Teleport instance. + + + +| Port | Description | +| - | - | +| `443` | ALPN port for TLS Routing, HTTPS connections to authenticate `tsh` users into the cluster, and to serve Teleport's Web UI | + + + +### Configure the Auth Service NLB + +Configure the load balancer to forward traffic from the following ports on the +load balancer to the corresponding port on an available Teleport instance. + + +Proxies must have network access to the Auth Service NLB. You can accomplish this +in the Route53 GSLB architecture using [VPC Peering](https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html) +or [Transit Gateways](https://docs.aws.amazon.com/vpc/latest/tgw/what-is-transit-gateway.html). + + +Internal NLB Auth Service ports + +| Port | Description | +| - | - | +| `3025` | TLS port used by the Auth Service to serve its API to Proxies in a cluster | + +## TLS credential provisioning + +High-availability Teleport deployments require a system to fetch TLS +credentials from a certificate authority like Let's Encrypt, AWS Certificate +Manager, Digicert, or a trusted internal authority. The system must then +provision Teleport Proxy Service instances with these credentials and renew them +periodically. + +For high-availability deployments that use Let's Encrypt to supply TLS +credentials to Teleport instances running behind a load balancer, you need +to use the [ACME +DNS-01](https://letsencrypt.org/docs/challenge-types/#dns-01-challenge) +challenge to demonstrate domain name ownership to Let's Encrypt. In this +challenge, your TLS credential provisioning system creates a DNS TXT record with +a value expected by Let's Encrypt. + +## Global Server Load Balancing with Route 53 + +[Latency-based routing](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy-latency.html) +in a public hosted zone must be used to ensure traffic from Teleport +resources are routed to the closest or lowest latency path Proxy NLB based on the region of +the VPC the resource is connecting from. + +To create GSLB routing, create a CNAME record for each region where you have VPCs containing Teleport connected resources. +It is recommended to add a wildcard record for every region if you plan to +register applications with Teleport. + +The following CNAME record values need to be set: +- **Value:** The domain name of the NLB where `example-region-1` located Teleport resource traffic should be routed +- **Routing policy:** Latency +- **Region:** The AWS region from which traffic should be routed to the NLB listed in **Value** +- **Health Check ID:** It is recommended that you set this so that traffic is always routed to a healthy NLB + +Example Hosted Zone using AWS Route53 Latency Routing to create GSLB: + +### Root GSLB record for Teleport + +|Record name|Type|Value| +|---|---|---| +|```*.teleport.example.com```|CNAME|AWS Route 53 nameservers| + +### Teleport Proxy DNS records for GSLB + +|Record name|Type|Routing Policy|Region|Value| +|---|---|---|---|---| +|```teleport.example.com```|CNAME|Latency|us-west-1| ```elb.us-west-1.amazonaws.com``` | +|```*.teleport.example.com```|CNAME|Latency|us-west-1| ```elb.us-west-1.amazonaws.com``` | +|```teleport.example.com```|CNAME|Latency|eu-central-1| ```elb.eu_central-1.amazonaws.com```| +|```*.teleport.example.com```|CNAME|Latency|eu-central-1| ```elb.eu_central-1.amazonaws.com```| + + + +If you are using Let's Encrypt to provide TLS credentials to your Teleport +instances, the TLS credential system we mentioned earlier needs permissions to +manage Route53 DNS records in order to satisfy Let's Encrypt's DNS-01 challenge. + + + +### Teleport resource agent configuration for GSLB + +To facilitate latency-based routing, resource agents must be configured to point ```proxy_server:``` to +the GSLB domain configured in Route53, **not** the specific proxy NLB address. + +For example: + +``` +version: v3 +teleport: + nodename: ssh-node + ... + proxy_server: teleport.example.com:443 + ... + ssh_service: + enabled: yes + ... +``` +Review the [configuration reference](../../reference/config.mdx) page for +additional settings. + +## Configure Proxy Peering + +In this deployment architecture, [Proxy Peering](../../architecture/proxy-peering.mdx) is used to restrict the number of connections made from +resources to proxies in the Teleport Cluster. + +This guide covers the necessary Proxy Peering settings for deploying an HA Teleport Cluster routing resource +traffic with GSLB. + +### Auth Service Proxy Peering configuration + +The Teleport Auth Service must be configured to use the `proxy_peering` tunnel strategy as shown in the example below: + +``` +auth_service: + ... + tunnel_strategy: + type: proxy_peering + agent_connection_count: 2 +``` +Reference the [Auth Server configuration](../../reference/config.mdx#auth-service) reference page +for additional settings. + +### Proxy Service Proxy Peering configuration + +Proxies must advertise a peer address for proxy peers to establish connections to each other. +The ports exposed on the Teleport Proxy Instances depends on whether you route Proxy Peering traffic over +the public internet: + + + + +| Port | Description | +| - | - | +| `443` | ALPN port for TLS Routing, HTTPS connections to authenticate `tsh` users into the cluster, and to serve Teleport's Web UI | +| `3021`| Proxy Peering gRPC Stream | + + + + +| Port | Description | +| - | - | +| `443` | ALPN port for TLS Routing, HTTPS connections to authenticate `tsh` users into the cluster, and to serve Teleport's Web UI | + + + + +Set `peer_public_addr` to the specific name of that proxy. This is the recommended +method for lowest latency and most reliable connection. + +``` +version: v3 +teleport: +... +proxy_service: + ... + peer_public_addr: teleport-proxy-eu-west-1-host1.example.com:3021 + ... +``` + +`agent_connection_count` on the Auth service should be set to a value >=2 to decrease +the likelihood of agents being unavailable. + + +Reference the [Proxy Service configuration](../../reference/config.mdx#proxy-service) reference page +for additional settings.