Skip to content

Commit

Permalink
Fix linkerd mc check failing in the presence of lots of mirrored se…
Browse files Browse the repository at this point in the history
…rvices (#10893)

The "all mirror services have endpoints" check can fail in the presence
of lots of mirrored services because for each service we query the kube
api for its endpoints, and those calls reuse the same golang context,
which ends up reaching its deadline.

To fix, we create a new context object per call.

## Repro

First patch `check.go` to introduce a sleep in order to simulate network
latency:

```diff
diff --git a/multicluster/cmd/check.go b/multicluster/cmd/check.go
index b2b4158bf..f3083f436 100644
--- a/multicluster/cmd/check.go
+++ b/multicluster/cmd/check.go
@@ -627,6 +627,7 @@ func (hc *healthChecker) checkIfMirrorServicesHaveEndpoints(ctx context.Context)
        for _, svc := range mirrorServices.Items {
                // Check if there is a relevant end-point
                endpoint, err := hc.KubeAPIClient().CoreV1().Endpoints(svc.Namespace).Get(ctx, svc.Name, metav1.GetOptions{})
+               time.Sleep(1 * time.Second)
                if err != nil || len(endpoint.Subsets) == 0 {
                        servicesWithNoEndpoints = append(servicesWithNoEndpoints, fmt.Sprintf("%s.%s mirrored from cluster [%s]", svc.Name, svc.Namespace, svc.Labels[k8s.RemoteClusterNameLabel]))
                }
```

Then run the `multicluster` integration tests to setup a multicluster
scenario, and then create lots of mirrored services!

```bash
$ bin/docker-build

# accommodate to your own arch
$ bin/tests --name multicluster --skip-cluster-delete $PWD/target/cli/linux-amd64/linkerd

# we are currently in the target cluster context
$ k create ns testing

# create pod
$ k -n testing run nginx --image=nginx --restart=Never

# create 50 services pointing to it, flagged to be mirrored
$ for i in {1..50}; do k -n testing expose po nginx --port 80 --name "nginx-$i" -l mirror.linkerd.io/exported=true; done

# switch to the source cluster
$ k config use-context k3d-source

# this will trigger the creation of the mirrored services, wait till the
# 50 are created
$ k create ns testing

$ bin/go-run cli mc check --verbose
github.com/linkerd/linkerd2/multicluster/cmd
github.com/linkerd/linkerd2/cli/cmd
linkerd-multicluster
--------------------
√ Link CRD exists
√ Link resources are valid
        * target
√ remote cluster access credentials are valid
        * target
√ clusters share trust anchors
        * target
√ service mirror controller has required permissions
        * target
√ service mirror controllers are running
        * target
DEBU[0000] Starting port forward to https://0.0.0.0:34201/api/v1/namespaces/linkerd-multicluster/pods/linkerd-service-mirror-target-7c4496869f-6xsp4/portforward?timeout=30s 39327:9999
DEBU[0000] Port forward initialised
√ probe services able to communicate with all gateway mirrors
        * target
DEBU[0031] error retrieving Endpoints: client rate limiter Wait returned an error: context deadline exceeded
DEBU[0032] error retrieving Endpoints: client rate limiter Wait returned an error: context deadline exceeded
DEBU[0033] error retrieving Endpoints: client rate limiter Wait returned an error: context deadline exceeded
DEBU[0034] error retrieving Endpoints: client rate limiter Wait returned an error: context deadline exceeded
DEBU[0035] error retrieving Endpoints: client rate limiter Wait returned an error: context deadline exceeded
DEBU[0036] error retrieving Endpoints: client rate limiter Wait returned an error: context deadline exceeded
DEBU[0037] error retrieving Endpoints: client rate limiter Wait returned an error: context deadline exceeded
```
  • Loading branch information
alpeb authored and hawkw committed Aug 8, 2023
1 parent da70f77 commit fb84f36
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion multicluster/cmd/check.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import (
"github.com/linkerd/linkerd2/pkg/tls"
"github.com/linkerd/linkerd2/pkg/version"
"github.com/prometheus/common/expfmt"
log "github.com/sirupsen/logrus"
"github.com/spf13/cobra"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
Expand Down Expand Up @@ -625,9 +626,12 @@ func (hc *healthChecker) checkIfMirrorServicesHaveEndpoints(ctx context.Context)
return err
}
for _, svc := range mirrorServices.Items {
// Check if there is a relevant end-point
// have to use a new ctx for each call, otherwise we risk reaching the original context deadline
ctx, cancel := context.WithTimeout(context.Background(), healthcheck.RequestTimeout)
defer cancel()
endpoint, err := hc.KubeAPIClient().CoreV1().Endpoints(svc.Namespace).Get(ctx, svc.Name, metav1.GetOptions{})
if err != nil || len(endpoint.Subsets) == 0 {
log.Debugf("error retrieving Endpoints: %s", err)
servicesWithNoEndpoints = append(servicesWithNoEndpoints, fmt.Sprintf("%s.%s mirrored from cluster [%s]", svc.Name, svc.Namespace, svc.Labels[k8s.RemoteClusterNameLabel]))
}
}
Expand Down

0 comments on commit fb84f36

Please sign in to comment.