diff --git a/docs/config.json b/docs/config.json index 932d9b103881b..3ed52368a422b 100644 --- a/docs/config.json +++ b/docs/config.json @@ -211,6 +211,14 @@ "enterprise" ] }, + { + "title": "AWS Multi-Region Proxy Deployment", + "slug": "/deploy-a-cluster/deployments/aws-gslb-proxy-peering-ha-deployment/", + "forScopes": [ + "oss", + "enterprise" + ] + }, { "title": "GCP", "slug": "/deploy-a-cluster/deployments/gcp/", diff --git a/docs/cspell.json b/docs/cspell.json index b047373c4a2d9..7db3e2dd8e576 100644 --- a/docs/cspell.json +++ b/docs/cspell.json @@ -66,6 +66,8 @@ "Gbps", "Goland", "Grafana's", + "gslb", + "GSLB", "Gtczk", "HSTS", "Hqlo", diff --git a/docs/img/deploy-a-cluster/aws-gslb-proxy-peering-ha-deployment.png b/docs/img/deploy-a-cluster/aws-gslb-proxy-peering-ha-deployment.png new file mode 100644 index 0000000000000..e43817f5d6ca4 Binary files /dev/null and b/docs/img/deploy-a-cluster/aws-gslb-proxy-peering-ha-deployment.png differ diff --git a/docs/pages/deploy-a-cluster/deployments.mdx b/docs/pages/deploy-a-cluster/deployments.mdx index d8d45b3a0fe5b..246bae8c7c811 100644 --- a/docs/pages/deploy-a-cluster/deployments.mdx +++ b/docs/pages/deploy-a-cluster/deployments.mdx @@ -7,6 +7,11 @@ layout: tocless-doc These guides show you how to set up a full self-hosted Teleport deployment on the platform of your choice. -- [AWS Terraform](./deployments/aws-terraform.mdx): Deploy HA Teleport with Terraform Provider on AWS. +- [AWS Terraform](./deployments/aws-terraform.mdx): Deploy HA Teleport with + Terraform Provider on AWS. +- [AWS Multi-Region Proxy + Deployment](./deployments/aws-gslb-proxy-peering-ha-deployment.mdx): Deploy HA + Teleport with Proxy Service instances in multiple regions for low-latency + access. - [GCP](./deployments/gcp.mdx): Deploy HA Teleport on GCP. - [IBM Cloud](./deployments/ibm.mdx): Deploy HA Teleport on IBM cloud. diff --git a/docs/pages/deploy-a-cluster/deployments/aws-gslb-proxy-peering-ha-deployment.mdx b/docs/pages/deploy-a-cluster/deployments/aws-gslb-proxy-peering-ha-deployment.mdx new file mode 100644 index 0000000000000..8c59ebaf45099 --- /dev/null +++ b/docs/pages/deploy-a-cluster/deployments/aws-gslb-proxy-peering-ha-deployment.mdx @@ -0,0 +1,247 @@ +--- +title: "AWS Multi-Region High Availability Deployment Guide" +description: "Deploying a high-availability Teleport cluster using Proxy Peering and Route 53 to create global server load balancing." +--- + +This deployment architecture features two important design decisions: + +1. AWS Route 53 latency-based routing is used for global server load balancing + ([GSLB](https://www.cloudflare.com/learning/cdn/glossary/global-server-load-balancing-gslb/)). + This allows for efficient distribution of traffic across resources that are globally distributed. +2. Teleport's [Proxy Peering](../../architecture/proxy-peering.mdx) is used to reduce the total number of tunnel connections in the Teleport cluster. + +This deployment architecture isn't recommended for use cases where your users or resources are +clustered in a single region, or for Managed Service Providers needing to provide separate clusters +to customers. + +This architecture is best suited for globally distributed resources and end-users that prefer a single point of +entry while also ensuring minimal latency when accessing connected resources. + +## Key deployment components + +- Deployed exclusively in the AWS ecosystem +- High-availability Auto Scaling group of Auth Service instances that must remain in a single region +- High-availability Auto Scaling group of Proxy Service instances deployed across multiple regions +- [AWS Route 53 latency-based routing](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy-latency.html) +- [GSLB](https://www.cloudflare.com/learning/cdn/glossary/global-server-load-balancing-gslb/) +- [Teleport TLS Routing](https://goteleport.com/docs/architecture/tls-routing/) to reduce the number of ports needed to use Teleport +- [Teleport Proxy Peering](https://goteleport.com/docs/architecture/proxy-peering/) for reducing the number of resource connections +- [AWS Network Load Balancing](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) +- [AWS DynamoDB](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.html) for cluster state storage +- [AWS S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) for session recording storage + +## Advantages of this deployment architecture + +- Eliminates the complexity and cost of maintaining multiple Teleport clusters across multiple regions. +- Uses the lowest-latency path to connect users to resources. +- Provides a highly resilient, redundant HA architecture for Teleport that can quickly + scale with an organization's needs. +- All required Teleport components can be provisioned within the AWS ecosystem. +- Using load balancers for the Proxy and Auth Services allows for increased availability + during Teleport cluster upgrades. + +## Disadvantages of this deployment architecture + +- When Teleport Auth Service instances are limited to a single region, there is a higher likelihood + of decreased availability during an AWS regional outage. +- More technically complex to deploy than a single region Teleport cluster. + +![Diagram showing this Teleport +architecture](../../../img/deploy-a-cluster/aws-gslb-proxy-peering-ha-deployment.png) + + +## AWS Network Load Balancer (NLB) +AWS NLBs are required for this highly available deployment architecture. +The NLB forwards traffic from users and services to an available Teleport Proxy Service instance. This must not +terminate TLS, and must transparently forward the TCP traffic it receives. +In other words, this must be a Layer 4 load balancer, not a Layer 7 +(e.g., HTTP) load balancer. + + +Cross-zone load balancing is required for the Auth and Proxy service NLB configurations to route +traffic across multiple zones. Doing this improves resiliency against localized AWS zone outages. + + +### Configure the Proxy Service NLBs + +Configure the load balancer to forward traffic from the following ports on the +load balancer to the corresponding port on an available Teleport instance. + + + +| Port | Description | +| - | - | +| `443` | ALPN port for TLS Routing, HTTPS connections to authenticate `tsh` users into the cluster, and to serve Teleport's Web UI | + + + +### Configure the Auth Service NLB + +Configure the load balancer to forward traffic from the following ports on the +load balancer to the corresponding port on an available Teleport instance. + + +Proxies must have network access to the Auth Service NLB. You can accomplish this +in the Route53 GSLB architecture using [VPC Peering](https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html) +or [Transit Gateways](https://docs.aws.amazon.com/vpc/latest/tgw/what-is-transit-gateway.html). + + +Internal NLB Auth Service ports + +| Port | Description | +| - | - | +| `3025` | TLS port used by the Auth Service to serve its API to Proxies in a cluster | + +## TLS credential provisioning + +High-availability Teleport deployments require a system to fetch TLS +credentials from a certificate authority like Let's Encrypt, AWS Certificate +Manager, Digicert, or a trusted internal authority. The system must then +provision Teleport Proxy Service instances with these credentials and renew them +periodically. + +For high-availability deployments that use Let's Encrypt to supply TLS +credentials to Teleport instances running behind a load balancer, you need +to use the [ACME +DNS-01](https://letsencrypt.org/docs/challenge-types/#dns-01-challenge) +challenge to demonstrate domain name ownership to Let's Encrypt. In this +challenge, your TLS credential provisioning system creates a DNS TXT record with +a value expected by Let's Encrypt. + +## Global Server Load Balancing with Route 53 + +[Latency-based routing](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy-latency.html) +in a public hosted zone must be used to ensure traffic from Teleport +resources are routed to the closest or lowest latency path Proxy NLB based on the region of +the VPC the resource is connecting from. + +To create GSLB routing, create a CNAME record for each region where you have VPCs containing Teleport connected resources. +It is recommended to add a wildcard record for every region if you plan to +register applications with Teleport. + +The following CNAME record values need to be set: +- **Value:** The domain name of the NLB where `example-region-1` located Teleport resource traffic should be routed +- **Routing policy:** Latency +- **Region:** The AWS region from which traffic should be routed to the NLB listed in **Value** +- **Health Check ID:** It is recommended that you set this so that traffic is always routed to a healthy NLB + +Example Hosted Zone using AWS Route53 Latency Routing to create GSLB: + +### Root GSLB record for Teleport + +|Record name|Type|Value| +|---|---|---| +|```*.teleport.example.com```|CNAME|AWS Route 53 nameservers| + +### Teleport Proxy DNS records for GSLB + +|Record name|Type|Routing Policy|Region|Value| +|---|---|---|---|---| +|```teleport.example.com```|CNAME|Latency|us-west-1| ```elb.us-west-1.amazonaws.com``` | +|```*.teleport.example.com```|CNAME|Latency|us-west-1| ```elb.us-west-1.amazonaws.com``` | +|```teleport.example.com```|CNAME|Latency|eu-central-1| ```elb.eu_central-1.amazonaws.com```| +|```*.teleport.example.com```|CNAME|Latency|eu-central-1| ```elb.eu_central-1.amazonaws.com```| + + + +If you are using Let's Encrypt to provide TLS credentials to your Teleport +instances, the TLS credential system we mentioned earlier needs permissions to +manage Route53 DNS records in order to satisfy Let's Encrypt's DNS-01 challenge. + + + +### Teleport resource agent configuration for GSLB + +To facilitate latency-based routing, resource agents must be configured to point ```proxy_server:``` to +the GSLB domain configured in Route53, **not** the specific proxy NLB address. + +For example: + +``` +version: v3 +teleport: + nodename: ssh-node + ... + proxy_server: teleport.example.com:443 + ... + ssh_service: + enabled: yes + ... +``` +Review the [configuration reference](https://goteleport.com/docs/reference/config/) page for +additional settings. + +## Configure Proxy Peering + +In this deployment architecture, [Proxy Peering](https://goteleport.com/docs/architecture/proxy-peering/) is used to restrict the number of connections made from +resources to proxies in the Teleport Cluster. + +This guide covers the necessary Proxy Peering settings for deploying an HA Teleport Cluster routing resource +traffic with GSLB. + +### Auth Service Proxy Peering configuration + +The Teleport Auth Service must be configured to use the `proxy_peering` tunnel strategy as shown in the example below: + +``` +auth_service: + ... + tunnel_strategy: + type: proxy_peering + agent_connection_count: 2 +``` +Reference the [Auth Server configuration](https://goteleport.com/docs/reference/config/#auth-service) reference page +for additional settings. + +### Proxy Service Proxy Peering configuration + +Proxies must advertise a peer address for proxy peers to establish connections to each other. +The ports exposed on the Teleport Proxy Instances depends on whether you route Proxy Peering traffic over +the public internet: + + + + +| Port | Description | +| - | - | +| `443` | ALPN port for TLS Routing, HTTPS connections to authenticate `tsh` users into the cluster, and to serve Teleport's Web UI | +| `3021`| Proxy Peering gRPC Stream | + + + + +| Port | Description | +| - | - | +| `443` | ALPN port for TLS Routing, HTTPS connections to authenticate `tsh` users into the cluster, and to serve Teleport's Web UI | + + + + +Set `peer_public_addr` to the specific name of that proxy. This is the recommended +method for lowest latency and most reliable connection. + +``` +version: v3 +teleport: +... +proxy_service: + ... + peer_public_addr: teleport-proxy-eu-west-1-host1.example.com:3021 + ... +``` + +`agent_connection_count` on the Auth service should be set to a value >=2 to decrease +the likelihood of agents being unavailable. + + +Reference the [Proxy Service configuration](https://goteleport.com/docs/reference/config/#proxy-service) reference page +for additional settings. diff --git a/docs/pages/deploy-a-cluster/introduction.mdx b/docs/pages/deploy-a-cluster/introduction.mdx index d5c6ca001cf88..a0eadd748b1db 100644 --- a/docs/pages/deploy-a-cluster/introduction.mdx +++ b/docs/pages/deploy-a-cluster/introduction.mdx @@ -28,5 +28,6 @@ In these guides, we will show you how to deploy a high-availability, VM-based Teleport cluster on your cloud: - [AWS with Terraform](./deployments/aws-terraform.mdx) +- [AWS Multi-Region Proxy Deployment](./deployments/aws-gslb-proxy-peering-ha-deployment.mdx) - [Google Cloud](./deployments/gcp.mdx) - [IBM Cloud](./deployments/ibm.mdx)