-
Notifications
You must be signed in to change notification settings - Fork 703
docs: Conceptual docs for Load Balancing and Rate Limiting #6088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
af83c20
created draft of load balancing and rate limiting docs
melsal13 f6ac2a2
fixed broken link in load balancing docs
melsal13 c3bda3a
Update site/content/en/latest/concepts/introduction/load-balancing.md
melsal13 a43f4ab
Merge branch 'main' into docs-conceptual-lb-rl
melsal13 8b24152
Update site/content/en/latest/concepts/introduction/load-balancing.md
melsal13 7a2afcb
Update site/content/en/latest/concepts/introduction/load-balancing.md
melsal13 31405c8
Update site/content/en/latest/concepts/introduction/rate-limiting.md
melsal13 40d9e7e
Update site/content/en/latest/concepts/introduction/rate-limiting.md
melsal13 de43531
Update site/content/en/latest/concepts/introduction/rate-limiting.md
melsal13 2d4b7ca
Merge branch 'main' into docs-conceptual-lb-rl
melsal13 8548e0e
added suggested changes to v1.4
melsal13 9abb1e1
Merge branch 'main' into docs-conceptual-lb-rl
melsal13 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
70 changes: 70 additions & 0 deletions
70
site/content/en/latest/concepts/introduction/load-balancing.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,70 @@ | ||
| --- | ||
| title: "Load Balancing" | ||
| --- | ||
|
|
||
| ## Overview | ||
|
|
||
| Load balancing distributes incoming requests across multiple backend services to improve availability, responsiveness, and scalability. Instead of directing all traffic to a single backend, which can cause slowdowns or outages, load balancing spreads the load across multiple instances, helping your applications stay fast and reliable under pressure. | ||
|
|
||
| ## Use Cases | ||
|
|
||
| Use load balancing to: | ||
|
|
||
| - Handle high traffic by distributing it across multiple service instances | ||
| - Keep services available even if one or more backends go down | ||
| - Improve response time by routing to less busy or closer backends | ||
|
|
||
| ## Load Balancing in Envoy Gateway | ||
|
|
||
| Envoy Gateway supports several load balancing strategies that determine how traffic is distributed across backend services. These strategies are configured using the `BackendTrafficPolicy` resource and can be applied to `Gateway`, `HTTPRoute`, or `GRPCRoute` resources either by directly referencing them using the targetRefs field or by dynamically selecting them using the targetSelectors field, which matches resources based on Kubernetes labels. | ||
|
|
||
| **Supported load balancing types:** | ||
| - **Round Robin** – Sends requests sequentially to all available backends | ||
| - **Random** – Chooses a backend at random to balance load | ||
| - **Least Request** – Sends the request to the backend with the fewest active requests (this is the default) | ||
| - **Consistent Hash** – Routes requests based on a hash (e.g., client IP or header), which helps keep repeat requests going to the same backend (useful for session affinity) | ||
|
|
||
| If no load balancing strategy is specified, Envoy Gateway uses **Least Request** by default. | ||
|
|
||
| ## Example: Round Robin Load Balancing | ||
|
|
||
| This example shows how to apply the Round Robin strategy using a `BackendTrafficPolicy` that targets a specific `HTTPRoute`: | ||
|
|
||
| ```yaml | ||
| apiVersion: gateway.envoyproxy.io/v1alpha1 | ||
| kind: BackendTrafficPolicy | ||
| metadata: | ||
| name: round-robin-policy | ||
| namespace: default | ||
| spec: | ||
| targetRefs: | ||
| - group: gateway.networking.k8s.io | ||
| kind: HTTPRoute | ||
| name: round-robin-route | ||
| loadBalancer: | ||
| type: RoundRobin | ||
| --- | ||
| apiVersion: gateway.networking.k8s.io/v1 | ||
| kind: HTTPRoute | ||
| metadata: | ||
| name: round-robin-route | ||
| namespace: default | ||
| spec: | ||
| parentRefs: | ||
| - name: eg | ||
| hostnames: | ||
| - "www.example.com" | ||
| rules: | ||
| - matches: | ||
| - path: | ||
| type: PathPrefix | ||
| value: /round | ||
| backendRefs: | ||
| - name: backend | ||
| port: 3000 | ||
| ``` | ||
| In this setup, traffic matching /round is distributed evenly across all available backend service instances. For example, if there are four replicas of the backend service, each one should receive roughly 25% of the requests. | ||
|
|
||
| ## Related Resources | ||
| - [BackendTrafficPolicy](../introduction/gateway_api_extensions/backend-traffic-policy.md) | ||
| - [Task: Load Balancing](../../tasks/traffic/load-balancing.md) | ||
107 changes: 107 additions & 0 deletions
107
site/content/en/latest/concepts/introduction/rate-limiting.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| --- | ||
| title: "Rate Limiting" | ||
| --- | ||
|
|
||
| ## Overview | ||
|
|
||
| Rate limiting is a technique for controlling the number of incoming requests over a defined period. It can be used to control usage for business purposes, like agreed usage quotas, or to ensure the stability of a system, preventing overload and protecting the system from, e.g., Denial of Service attacks. | ||
|
|
||
| ## Use Cases | ||
|
|
||
| Rate limiting is commonly used to: | ||
|
|
||
| - **Prevent Overload:** Protect internal systems like databases from excessive traffic. | ||
| - **Enhance Security:** Block or limit abusive behavior such as brute-force attempts or DDoS attacks. | ||
| - **Ensure Fair Usage:** Enforce quotas and prevent resource hogging by individual clients. | ||
| - **Implement Entitlements:** Define API usage limits based on user identity or role. | ||
|
|
||
| ## Rate Limiting in Envoy Gateway | ||
|
|
||
| Envoy Gateway supports two types of rate limiting: | ||
|
|
||
| - **Global Rate Limiting:** Shared limits across all Envoy instances. | ||
| - **Local Rate Limiting:** Independent limits per Envoy instance. | ||
|
|
||
| Envoy Gateway supports rate limiting through the `BackendTrafficPolicy` custom resource. You can define rate-limiting rules and apply them to `HTTPRoute`, `GRPCRoute`, or `Gateway` resources either by directly referencing them with the targetRefs field or by dynamically selecting them using the targetSelectors field, which matches resources based on Kubernetes labels. | ||
|
|
||
| {{% alert title="Note" color="primary" %}} | ||
| Rate limits are applied per route, even if the `BackendTrafficPolicy` targets a `Gateway`. For example, if the limit is 100r/s and a Gateway has 3 routes, each route has its own 100r/s bucket. | ||
| {{% /alert %}} | ||
|
|
||
| --- | ||
|
|
||
| ## Global Rate Limiting | ||
|
|
||
| Global rate limiting ensures a consistent request limit across the entire Envoy fleet. This is ideal for shared resources or distributed environments where coordinated enforcement is critical. | ||
|
|
||
| Global limits are enforced via Envoy’s external Rate Limit Service, which is automatically deployed and managed by the Envoy Gateway system. The Rate Limit Service requires a datastore component (commonly Redis). When a request is received, Envoy sends a descriptor to this external service to determine if the request should be allowed. | ||
|
|
||
| **Benefits of global limits:** | ||
|
|
||
| - Centralized control across instances | ||
| - Fair sharing of backend capacity | ||
| - Burst resistance during autoscaling | ||
|
|
||
| ### Example | ||
|
|
||
| ```yaml | ||
| apiVersion: gateway.envoyproxy.io/v1alpha1 | ||
| kind: BackendTrafficPolicy | ||
| metadata: | ||
| name: global-ratelimit | ||
| spec: | ||
| targetRefs: | ||
| - group: gateway.networking.k8s.io | ||
| kind: HTTPRoute | ||
| name: my-api | ||
| rateLimit: | ||
| type: Global | ||
| global: | ||
| rules: | ||
| - limit: | ||
| requests: 100 | ||
| unit: Minute | ||
|
|
||
| ``` | ||
| This configuration limits all requests across all Envoy instances for the my-api route to 100 requests per minute total. If there are multiple replicas of Envoy, the limit is shared across all of them. | ||
|
|
||
| --- | ||
|
|
||
| ## Local Rate Limiting | ||
|
|
||
|
|
||
| Local rate limiting applies limits independently within each Envoy Proxy instance. It does not rely on external services, making it lightweight and efficient—especially for blocking abusive traffic early. | ||
|
|
||
| **Benefits of local limits:** | ||
|
|
||
| - Lightweight and does not require an external rate limit service | ||
| - Fast enforcement with rate limiting at the edge | ||
| - Effective as a first line of defense against traffic bursts | ||
|
|
||
| ### Example | ||
|
|
||
| ```yaml | ||
| apiVersion: gateway.envoyproxy.io/v1alpha1 | ||
| kind: BackendTrafficPolicy | ||
| metadata: | ||
| name: local-ratelimit | ||
| spec: | ||
| targetRefs: | ||
| - group: gateway.networking.k8s.io | ||
| kind: HTTPRoute | ||
| name: my-api | ||
| rateLimit: | ||
| type: Local | ||
| local: | ||
| rules: | ||
| - limit: | ||
| requests: 50 | ||
| unit: Minute | ||
|
|
||
| ``` | ||
| This configuration limits traffic to 50 requests per minute per Envoy instance for the my-api route. If there are two Envoy replicas, up to 100 total requests per minute may be allowed (50 per replica). | ||
|
|
||
| ## Related Resources | ||
| - [BackendTrafficPolicy](../introduction/gateway_api_extensions/backend-traffic-policy.md) | ||
| - [Task: Global Rate Limit](../../tasks/traffic/global-rate-limit.md) | ||
| - [Task: Local Rate Limit](../../tasks/traffic/local-rate-limit.md) |
70 changes: 70 additions & 0 deletions
70
site/content/en/v1.4/concepts/introduction/load-balancing.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,70 @@ | ||
| --- | ||
| title: "Load Balancing" | ||
| --- | ||
|
|
||
| ## Overview | ||
|
|
||
| Load balancing distributes incoming requests across multiple backend services to improve availability, responsiveness, and scalability. Instead of directing all traffic to a single backend, which can cause slowdowns or outages, load balancing spreads the load across multiple instances, helping your applications stay fast and reliable under pressure. | ||
|
|
||
| ## Use Cases | ||
|
|
||
| Use load balancing to: | ||
|
|
||
| - Handle high traffic by distributing it across multiple service instances | ||
| - Keep services available even if one or more backends go down | ||
| - Improve response time by routing to less busy or closer backends | ||
|
|
||
| ## Load Balancing in Envoy Gateway | ||
|
|
||
| Envoy Gateway supports several load balancing strategies that determine how traffic is distributed across backend services. These strategies are configured using the `BackendTrafficPolicy` resource and can be applied to `Gateway`, `HTTPRoute`, or `GRPCRoute` resources either by directly referencing them using the targetRefs field or by dynamically selecting them using the targetSelectors field, which matches resources based on Kubernetes labels. | ||
|
|
||
| **Supported load balancing types:** | ||
| - **Round Robin** – Sends requests sequentially to all available backends | ||
| - **Random** – Chooses a backend at random to balance load | ||
| - **Least Request** – Sends the request to the backend with the fewest active requests (this is the default) | ||
| - **Consistent Hash** – Routes requests based on a hash (e.g., client IP or header), which helps keep repeat requests going to the same backend (useful for session affinity) | ||
|
|
||
| If no load balancing strategy is specified, Envoy Gateway uses **Least Request** by default. | ||
|
|
||
| ## Example: Round Robin Load Balancing | ||
|
|
||
| This example shows how to apply the Round Robin strategy using a `BackendTrafficPolicy` that targets a specific `HTTPRoute`: | ||
|
|
||
| ```yaml | ||
| apiVersion: gateway.envoyproxy.io/v1alpha1 | ||
| kind: BackendTrafficPolicy | ||
| metadata: | ||
| name: round-robin-policy | ||
| namespace: default | ||
| spec: | ||
| targetRefs: | ||
| - group: gateway.networking.k8s.io | ||
| kind: HTTPRoute | ||
| name: round-robin-route | ||
| loadBalancer: | ||
| type: RoundRobin | ||
| --- | ||
| apiVersion: gateway.networking.k8s.io/v1 | ||
| kind: HTTPRoute | ||
| metadata: | ||
| name: round-robin-route | ||
| namespace: default | ||
| spec: | ||
| parentRefs: | ||
| - name: eg | ||
| hostnames: | ||
| - "www.example.com" | ||
| rules: | ||
| - matches: | ||
| - path: | ||
| type: PathPrefix | ||
| value: /round | ||
| backendRefs: | ||
| - name: backend | ||
| port: 3000 | ||
| ``` | ||
| In this setup, traffic matching /round is distributed evenly across all available backend service instances. For example, if there are four replicas of the backend service, each one should receive roughly 25% of the requests. | ||
|
|
||
| ## Related Resources | ||
| - [BackendTrafficPolicy](../introduction/gateway_api_extensions/backend-traffic-policy.md) | ||
| - [Task: Load Balancing](../../tasks/traffic/load-balancing.md) |
107 changes: 107 additions & 0 deletions
107
site/content/en/v1.4/concepts/introduction/rate-limiting.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| --- | ||
| title: "Rate Limiting" | ||
| --- | ||
|
|
||
| ## Overview | ||
|
|
||
| Rate limiting is a technique for controlling the number of incoming requests over a defined period. It can be used to control usage for business purposes, like agreed usage quotas, or to ensure the stability of a system, preventing overload and protecting the system from, e.g., Denial of Service attacks. | ||
|
|
||
| ## Use Cases | ||
|
|
||
| Rate limiting is commonly used to: | ||
|
|
||
| - **Prevent Overload:** Protect internal systems like databases from excessive traffic. | ||
| - **Enhance Security:** Block or limit abusive behavior such as brute-force attempts or DDoS attacks. | ||
| - **Ensure Fair Usage:** Enforce quotas and prevent resource hogging by individual clients. | ||
| - **Implement Entitlements:** Define API usage limits based on user identity or role. | ||
|
|
||
| ## Rate Limiting in Envoy Gateway | ||
|
|
||
| Envoy Gateway supports two types of rate limiting: | ||
|
|
||
| - **Global Rate Limiting:** Shared limits across all Envoy instances. | ||
| - **Local Rate Limiting:** Independent limits per Envoy instance. | ||
|
|
||
| Envoy Gateway supports rate limiting through the `BackendTrafficPolicy` custom resource. You can define rate-limiting rules and apply them to `HTTPRoute`, `GRPCRoute`, or `Gateway` resources either by directly referencing them with the targetRefs field or by dynamically selecting them using the targetSelectors field, which matches resources based on Kubernetes labels. | ||
|
|
||
| {{% alert title="Note" color="primary" %}} | ||
| Rate limits are applied per route, even if the `BackendTrafficPolicy` targets a `Gateway`. For example, if the limit is 100r/s and a Gateway has 3 routes, each route has its own 100r/s bucket. | ||
| {{% /alert %}} | ||
|
|
||
| --- | ||
|
|
||
| ## Global Rate Limiting | ||
|
|
||
| Global rate limiting ensures a consistent request limit across the entire Envoy fleet. This is ideal for shared resources or distributed environments where coordinated enforcement is critical. | ||
|
|
||
| Global limits are enforced via Envoy’s external Rate Limit Service, which is automatically deployed and managed by the Envoy Gateway system. The Rate Limit Service requires a datastore component (commonly Redis). When a request is received, Envoy sends a descriptor to this external service to determine if the request should be allowed. | ||
|
|
||
| **Benefits of global limits:** | ||
|
|
||
| - Centralized control across instances | ||
| - Fair sharing of backend capacity | ||
| - Burst resistance during autoscaling | ||
|
|
||
| ### Example | ||
|
|
||
| ```yaml | ||
| apiVersion: gateway.envoyproxy.io/v1alpha1 | ||
| kind: BackendTrafficPolicy | ||
| metadata: | ||
| name: global-ratelimit | ||
| spec: | ||
| targetRefs: | ||
| - group: gateway.networking.k8s.io | ||
| kind: HTTPRoute | ||
| name: my-api | ||
| rateLimit: | ||
| type: Global | ||
| global: | ||
| rules: | ||
| - limit: | ||
| requests: 100 | ||
| unit: Minute | ||
|
|
||
| ``` | ||
| This configuration limits all requests across all Envoy instances for the my-api route to 100 requests per minute total. If there are multiple replicas of Envoy, the limit is shared across all of them. | ||
|
|
||
| --- | ||
|
|
||
| ## Local Rate Limiting | ||
|
|
||
|
|
||
| Local rate limiting applies limits independently within each Envoy Proxy instance. It does not rely on external services, making it lightweight and efficient—especially for blocking abusive traffic early. | ||
|
|
||
| **Benefits of local limits:** | ||
|
|
||
| - Lightweight and does not require an external rate limit service | ||
| - Fast enforcement with rate limiting at the edge | ||
| - Effective as a first line of defense against traffic bursts | ||
|
|
||
| ### Example | ||
|
|
||
| ```yaml | ||
| apiVersion: gateway.envoyproxy.io/v1alpha1 | ||
| kind: BackendTrafficPolicy | ||
| metadata: | ||
| name: local-ratelimit | ||
| spec: | ||
| targetRefs: | ||
| - group: gateway.networking.k8s.io | ||
| kind: HTTPRoute | ||
| name: my-api | ||
| rateLimit: | ||
| type: Local | ||
| local: | ||
| rules: | ||
| - limit: | ||
| requests: 50 | ||
| unit: Minute | ||
|
|
||
| ``` | ||
| This configuration limits traffic to 50 requests per minute per Envoy instance for the my-api route. If there are two Envoy replicas, up to 100 total requests per minute may be allowed (50 per replica). | ||
|
|
||
| ## Related Resources | ||
| - [BackendTrafficPolicy](../introduction/gateway_api_extensions/backend-traffic-policy.md) | ||
| - [Task: Global Rate Limit](../../tasks/traffic/global-rate-limit.md) | ||
| - [Task: Local Rate Limit](../../tasks/traffic/local-rate-limit.md) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.