Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix
linkerd mc check
failing in the presence of lots of mirrored se…
…rvices (#10893) The "all mirror services have endpoints" check can fail in the presence of lots of mirrored services because for each service we query the kube api for its endpoints, and those calls reuse the same golang context, which ends up reaching its deadline. To fix, we create a new context object per call. ## Repro First patch `check.go` to introduce a sleep in order to simulate network latency: ```diff diff --git a/multicluster/cmd/check.go b/multicluster/cmd/check.go index b2b4158bf..f3083f436 100644 --- a/multicluster/cmd/check.go +++ b/multicluster/cmd/check.go @@ -627,6 +627,7 @@ func (hc *healthChecker) checkIfMirrorServicesHaveEndpoints(ctx context.Context) for _, svc := range mirrorServices.Items { // Check if there is a relevant end-point endpoint, err := hc.KubeAPIClient().CoreV1().Endpoints(svc.Namespace).Get(ctx, svc.Name, metav1.GetOptions{}) + time.Sleep(1 * time.Second) if err != nil || len(endpoint.Subsets) == 0 { servicesWithNoEndpoints = append(servicesWithNoEndpoints, fmt.Sprintf("%s.%s mirrored from cluster [%s]", svc.Name, svc.Namespace, svc.Labels[k8s.RemoteClusterNameLabel])) } ``` Then run the `multicluster` integration tests to setup a multicluster scenario, and then create lots of mirrored services! ```bash $ bin/docker-build # accommodate to your own arch $ bin/tests --name multicluster --skip-cluster-delete $PWD/target/cli/linux-amd64/linkerd # we are currently in the target cluster context $ k create ns testing # create pod $ k -n testing run nginx --image=nginx --restart=Never # create 50 services pointing to it, flagged to be mirrored $ for i in {1..50}; do k -n testing expose po nginx --port 80 --name "nginx-$i" -l mirror.linkerd.io/exported=true; done # switch to the source cluster $ k config use-context k3d-source # this will trigger the creation of the mirrored services, wait till the # 50 are created $ k create ns testing $ bin/go-run cli mc check --verbose github.com/linkerd/linkerd2/multicluster/cmd github.com/linkerd/linkerd2/cli/cmd linkerd-multicluster -------------------- √ Link CRD exists √ Link resources are valid * target √ remote cluster access credentials are valid * target √ clusters share trust anchors * target √ service mirror controller has required permissions * target √ service mirror controllers are running * target DEBU[0000] Starting port forward to https://0.0.0.0:34201/api/v1/namespaces/linkerd-multicluster/pods/linkerd-service-mirror-target-7c4496869f-6xsp4/portforward?timeout=30s 39327:9999 DEBU[0000] Port forward initialised √ probe services able to communicate with all gateway mirrors * target DEBU[0031] error retrieving Endpoints: client rate limiter Wait returned an error: context deadline exceeded DEBU[0032] error retrieving Endpoints: client rate limiter Wait returned an error: context deadline exceeded DEBU[0033] error retrieving Endpoints: client rate limiter Wait returned an error: context deadline exceeded DEBU[0034] error retrieving Endpoints: client rate limiter Wait returned an error: context deadline exceeded DEBU[0035] error retrieving Endpoints: client rate limiter Wait returned an error: context deadline exceeded DEBU[0036] error retrieving Endpoints: client rate limiter Wait returned an error: context deadline exceeded DEBU[0037] error retrieving Endpoints: client rate limiter Wait returned an error: context deadline exceeded ```
- Loading branch information