-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Troubleshoot service to service comms #16385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
e8466bf
Troubleshoot service to service comms
boruszak f919a6e
adjustments
boruszak e552764
breaking fix
boruszak 5c40ba5
api-docs breaking fix
boruszak 51b6f50
Links added to CLI pages
boruszak 00ec4e5
Update website/content/docs/troubleshoot/troubleshoot-services.mdx
boruszak 1405ede
Update website/content/docs/troubleshoot/troubleshoot-services.mdx
boruszak 42e93d5
Update website/content/docs/troubleshoot/troubleshoot-services.mdx
boruszak f867d2e
nav re-ordering
boruszak 9482271
Edits recommended in code review
boruszak 10afcab
Merge branch 'main' into docs/service-troubleshooting-tool
boruszak File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
150 changes: 150 additions & 0 deletions
150
website/content/docs/troubleshoot/troubleshoot-services.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,150 @@ | ||
| --- | ||
| layout: docs | ||
| page_title: Service-to-service troubleshooting overview | ||
| description: >- | ||
| Consul includes a built-in tool for troubleshooting communication between services in a service mesh. Learn how to use the `consul troubleshoot` command to validate communication between upstream and downstream Envoy proxies on VM and Kubernetes deployments. | ||
| --- | ||
|
|
||
| # Service-to-service troubleshooting overview | ||
|
|
||
| This topic provides an overview of Consul’s built-in service-to-service troubleshooting capabilities. When communication between an upstream service and a downstream service in a service mesh fails, you can run the `consul troubleshoot` command to initiate a series of automated validation tests. | ||
|
|
||
| For more information, refer to the [`consul troubleshoot` CLI documentation](/consul/commands/troubleshoot) or the [`consul-k8s troubleshoot` CLI reference](/consul/docs/k8s/k8s-cli#troubleshoot). | ||
|
|
||
| ## Introduction | ||
|
|
||
| When communication between upstream and downstream services in a service mesh fails, you can diagnose the cause manually with one or more of Consul’s built-in features, including [health check queries](/consul/docs/discovery/checks), [the UI topology view](/consul/docs/connect/observability/ui-visualization), and [agent telemetry metrics](/consul/docs/agent/telemetry#metrics-reference). | ||
|
|
||
| The `consul troubleshoot` command performs several checks in sequence that enable you to discover issues that impede service-to-service communication. The process systematically queries the [Envoy administration interface API](https://www.envoyproxy.io/docs/envoy/latest/operations/admin) and the Consul API to determine the cause of the communication failure. | ||
|
|
||
| The troubleshooting command validates service-to-service communication by checking for the following common issues: | ||
|
|
||
| - Upstream service does not exist | ||
| - One or both hosts are unhealthy | ||
| - A filter affects the upstream service | ||
| - The CA has expired mTLS certificates | ||
| - The services have expired mTLS certificates | ||
|
|
||
| Consul outputs the results of these validation checks to the terminal along with suggested actions to resolve the service communication failure. When it detects rejected configurations or connection failures, Consul also outputs Envoy metrics for services. | ||
|
|
||
| ### Envoy proxies in a service mesh | ||
|
|
||
| Consul validates communication in a service mesh by checking the Envoy proxies that are deployed as sidecars for the upstream and downstream services. As a result, troubleshooting requires that [Consul’s service mesh features are enabled](/consul/docs/connect/configuration). | ||
|
|
||
| For more information about using Envoy proxies with Consul, refer to [Envoy proxy configuration for service mesh](/consul/docs/connect/proxies/envoy). | ||
|
|
||
| ## Requirements | ||
|
|
||
| - Consul v1.15 or later. | ||
| - For Kubernetes, the `consul-k8s` CLI must be installed. | ||
|
|
||
| ### Technical constraints | ||
|
|
||
| When troubleshooting service-to-service communication issues, be aware of the following constraints: | ||
|
|
||
| - The troubleshooting tool does not check service intentions. For more information about intentions, including precedence and match order, refer to [service mesh intentions](/consul/docs/connect/intentions). | ||
| - The troubleshooting tool validates one direct connection between a downstream service and an upstream service. You must run the `consul troubleshoot` command with the Envoy ID for an individual upstream service. It does support validating multiple connections simultaneously. | ||
| - The troubleshooting tool only validates Envoy configurations for sidecar proxies. As a result, the troubleshooting tool does not validate Envoy configurations on upstream proxies such as mesh gateways and terminating gateways. | ||
|
|
||
| ## Usage | ||
|
|
||
| Using the service-to-service troubleshooting tool is a two-step process: | ||
|
|
||
| 1. Find the identifier for the upstream service. | ||
| 1. Use the upstream’s identifier to validate communication. | ||
|
|
||
| In deployments without transparent proxies, the identifier is the _Envoy ID for the upstream service’s sidecar proxy_. If you use transparent proxies, the identifier is the _upstream service’s IP address_. For more information about using transparent proxies, refer to [Enable transparent proxy mode](/consul/docs/connect/transparent-proxy). | ||
|
|
||
| ### Troubleshoot on VMs | ||
|
|
||
| To troubleshoot service-to-service communication issues in deployments that use VMs or bare-metal servers: | ||
|
|
||
| 1. Run the `consul troubleshoot upstreams` command to retrieve the upstream information for the service that is experiencing communication failures. Depending on your network’s configuration, the upstream information is either an Envoy ID or an IP address. | ||
|
|
||
| ```shell-session | ||
| $ consul troubleshoot upstreams | ||
| ==> Upstreams (explicit upstreams only) (0) | ||
| ==> Upstreams IPs (transparent proxy only) (1) | ||
| [10.4.6.160 240.0.0.3] true map[backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul] | ||
| If you cannot find the upstream address or cluster for a transparent proxy upstream: | ||
| - Check intentions: Tproxy upstreams are configured based on intentions. Make sure you have configured intentions to allow traffic to your upstream. | ||
| - To check that the right cluster is being dialed, run a DNS lookup for the upstream you are dialing. For example, run `dig backend.svc.consul` to return the IP address for the `backend` service. If the address you get from that is missing from the upstream IPs, it means that your proxy may be misconfigured. | ||
| ``` | ||
|
|
||
| 1. Run the `consul troubleshoot proxy` command and specify the Envoy ID or IP address with the `-upstream-ip` flag to identify the proxy you want to perform the troubleshooting process on. The following example uses the upstream IP to validate communication with the upstream service `backend`: | ||
|
|
||
| ```shell-session | ||
| $ consul troubleshoot proxy -upstream-ip 10.4.6.160 | ||
| ==> Validation | ||
| ✓ Certificates are valid | ||
| ✓ Envoy has 0 rejected configurations | ||
| ✓ Envoy has detected 0 connection failure(s) | ||
| ✓ Listener for upstream "backend" found | ||
| ✓ Route for upstream "backend" found | ||
| ✓ Cluster "backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found | ||
| ✓ Healthy endpoints for cluster "backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found | ||
| ✓ Cluster "backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found | ||
| ! No healthy endpoints for cluster "backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found | ||
| -> Check that your upstream service is healthy and running | ||
| -> Check that your upstream service is registered with Consul | ||
| -> Check that the upstream proxy is healthy and running | ||
| -> If you are explicitly configuring upstreams, ensure the name of the upstream is correct | ||
| ``` | ||
|
|
||
| In the example output, troubleshooting upstream communication reveals that the `backend` service has two service instances running in datacenter `dc1`. One of the services is healthy, but Consul cannot detect healthy endpoints for the second service instance. This information appears in the following lines of the example: | ||
|
|
||
| ```text hideClipboard | ||
| ✓ Cluster "backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found | ||
| ✓ Healthy endpoints for cluster "backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found | ||
| ✓ Cluster "backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found | ||
| ! No healthy endpoints for cluster "backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found | ||
| ``` | ||
|
|
||
| The output from the troubleshooting process identifies service instances according to their [Consul DNS address](/consul/docs/discovery/dns#standard-lookup). Use the DNS information for failing services to diagnose the specific issues affecting the service instance. | ||
boruszak marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| For more information, refer to the [`consul troubleshoot` CLI documentation](/consul/commands/troubleshoot). | ||
|
|
||
| ### Troubleshoot on Kubernetes | ||
|
|
||
| To troubleshoot service-to-service communication issues in deployments that use Kubernetes, retrieve the upstream information for the pod that is experiencing communication failures and use the upstream information to identify the proxy you want to perform the troubleshooting process on. | ||
|
|
||
| 1. Run the `consul-k8s troubleshoot upstreams` command and specify the pod ID with the `-pod` flag to retrieve upstream information. Depending on your network’s configuration, the upstream information is either an Envoy ID or an IP address. The following example displays all transparent proxy upstreams in Consul service mesh from the given pod. | ||
|
|
||
| ```shell-session | ||
| $ consul-k8s troubleshoot upstreams -pod frontend-767ccfc8f9-6f6gx | ||
| ==> Upstreams (explicit upstreams only) (0) | ||
| ==> Upstreams IPs (transparent proxy only) (1) | ||
| [10.4.6.160 240.0.0.3] true map[backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul] | ||
| If you cannot find the upstream address or cluster for a transparent proxy upstream: | ||
| - Check intentions: Tproxy upstreams are configured based on intentions. Make sure you have configured intentions to allow traffic to your upstream. | ||
| - To check that the right cluster is being dialed, run a DNS lookup for the upstream you are dialing. For example, run `dig backend.svc.consul` to return the IP address for the `backend` service. If the address you get from that is missing from the upstream IPs, it means that your proxy may be misconfigured. | ||
| ``` | ||
|
|
||
| 1. Run the `consul-k8s troubleshoot proxy` command and specify the pod ID and upstream IP address to identify the proxy you want to troubleshoot. The following example uses the upstream IP to validate communication with the upstream service `backend`: | ||
|
|
||
| ```shell-session | ||
| $ consul-k8s troubleshoot proxy -pod frontend-767ccfc8f9-6f6gx -upstream-ip 10.4.6.160 | ||
| ==> Validation | ||
| ✓ certificates are valid | ||
| ✓ Envoy has 0 rejected configurations | ||
| ✓ Envoy has detected 0 connection failure(s) | ||
| ✓ listener for upstream "backend" found | ||
| ✓ route for upstream "backend" found | ||
| ✓ cluster "backend.default.dc1.internal..consul" for upstream "backend" found | ||
| ✓ healthy endpoints for cluster "backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found | ||
| ✓ cluster "backend2.default.dc1.internal..consul" for upstream "backend" found | ||
| ! no healthy endpoints for cluster "backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found | ||
| ``` | ||
|
|
||
| In the example output, troubleshooting upstream communication reveals that the `backend` service has two clusters in datacenter `dc1`. One of the clusters returns healthy endpoints, but Consul cannot detect healthy endpoints for the second cluster. This information appears in the following lines of the example: | ||
|
|
||
| ```text hideClipboard | ||
| ✓ cluster "backend.default.dc1.internal..consul" for upstream "backend" found | ||
| ✓ healthy endpoints for cluster "backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found | ||
| ✓ cluster "backend2.default.dc1.internal..consul" for upstream "backend" found | ||
| ! no healthy endpoints for cluster "backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found | ||
| ``` | ||
|
|
||
| The output from the troubleshooting process identifies service instances according to their [Consul DNS address](/consul/docs/k8s/dns). Use the DNS information for failing services to diagnose the specific issues affecting the service instance. | ||
boruszak marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| For more information, refer to the [`consul-k8s troubleshoot` CLI reference](/consul/docs/k8s/k8s-cli#troubleshoot). | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.