Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linkerd fails to find Lease component for service-mirror #11509

Closed
hawkw opened this issue Oct 19, 2023 Discussed in #11503 · 0 comments · Fixed by #11629 or #11642
Closed

linkerd fails to find Lease component for service-mirror #11509

hawkw opened this issue Oct 19, 2023 Discussed in #11503 · 0 comments · Fixed by #11629 or #11642

Comments

@hawkw
Copy link
Contributor

hawkw commented Oct 19, 2023

Discussed in #11503

Originally posted by lboutin-mwm October 18, 2023
Hello,

I use the linkerd multicluster plugin to ease communications between some of my clusters and deploy it using its helm chart.

I recently upgraded the stack to stable-2.14.1 (from stable-2.14.0) and one of my clusters is unable to find the lease created when linking two clusters:

This is the result of linkerd multicluster check on my central cluster:

linkerd-multicluster
--------------------
√ Link CRD exists
√ Link resources are valid
        * ml-demos-us
        * ml-staging-us
√ remote cluster access credentials are valid
        * ml-demos-us
        * ml-staging-us
√ clusters share trust anchors
        * ml-demos-us
        * ml-staging-us
√ service mirror controller has required permissions
        * ml-demos-us
        * ml-staging-us
√ service mirror controllers are running
        * ml-demos-us
        * ml-staging-us
√ probe services able to communicate with all gateway mirrors
        * ml-demos-us
        * ml-staging-us
√ all mirror services have endpoints
√ all mirror services are part of a Link
√ multicluster extension proxies are healthy
√ multicluster extension proxies are up-to-date
√ multicluster extension proxies and cli versions match

Status check results are √

and this is the result of the same command on my faulty cluster:

linkerd-multicluster
--------------------
√ Link CRD exists
√ Link resources are valid
        * tools-eu
√ remote cluster access credentials are valid
        * tools-eu
√ clusters share trust anchors
        * tools-eu
√ service mirror controller has required permissions
        * tools-eu
√ service mirror controllers are running
        * tools-eu
× probe services able to communicate with all gateway mirrors
        failed to get the service-mirror component Lease for target cluster tools-eu: leases.coordination.k8s.io "service-mirror-write-tools-eu" not found
    see https://linkerd.io/2.14/checks/#l5d-multicluster-gateways-endpoints for hints
√ all mirror services have endpoints
√ all mirror services are part of a Link
√ multicluster extension proxies are healthy
√ multicluster extension proxies are up-to-date
√ multicluster extension proxies and cli versions match

Status check results are ×

For some reason it seems to work one way and not the other.
The lease does in fact exist:

kctl get leases -n linkerd-multicluster
NAME                            HOLDER                                             AGE
service-mirror-write-tools-eu   linkerd-service-mirror-tools-eu-7854857ffc-j6c8k   44m

and I can get its description just fine:

Name:         service-mirror-write-tools-eu
Namespace:    linkerd-multicluster
Labels:       <none>
Annotations:  <none>
API Version:  coordination.k8s.io/v1
Kind:         Lease
Metadata:
  Creation Timestamp:  2023-10-18T15:28:29Z
  Managed Fields:
    API Version:  coordination.k8s.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:acquireTime:
        f:holderIdentity:
        f:leaseDurationSeconds:
        f:leaseTransitions:
        f:renewTime:
    Manager:         controller
    Operation:       Update
    Time:            2023-10-18T16:14:20Z
  Resource Version:  178043801
  UID:               99550f50-0d88-4076-a7fb-1abcc07c7386
Spec:
  Acquire Time:            2023-10-18T15:28:29.396070Z
  Holder Identity:         linkerd-service-mirror-tools-eu-7854857ffc-j6c8k
  Lease Duration Seconds:  30
  Lease Transitions:       0
  Renew Time:              2023-10-18T16:14:20.345628Z
Events:                    <none>

All of the aforementioned clusters are running in GKE with version 1.25.12-gke.500 of k8s

These are the commands I used to link both clusters:
linkerd --context=tools-eu multicluster link --cluster-name tools-eu | kubectl --context=ml-demos-us apply -f -
linkerd --context=ml-demos-us multicluster link --cluster-name ml-demos-us | kubectl --context=tools-eu apply -f -

Only one of three clusters connected to the central one is affected and there was no issue with the earlier 2.14.0 version of linkerd.
It also does seem like the mirroring and connection between clusters is working, this might just only be an error with the check command

mateiidavid added a commit that referenced this issue Nov 17, 2023
Linkerd's extension model requires that each namespace that "owns" an
extension to be labelled with the extension name. Core extensions in
particular strictly follow this pattern. For example, the namespace viz
is installed in would be labelled with `linkerd.io/extension=viz`.

The extension is used by the CLI in many different instances. It is used
in checks, it is used in uninstalls, and so on. Whenever a namespace
contains a duplicate label value (e.g. two namespaces are registered as
the owner of "viz") we introduce undefined behaviour. Extension checks
or uninstalls may or may not work correctly. These issues are not
straightforward to debug. Misconfiguration can be introduced due to a
variety of reasons.

This change adds a new "core" category (`linkerd-extension-checks`) and
a new checker that asserts all extension namespaces are configured
properly. There are two reasons why this has been made a "core"
extension:

* Extensions may have their own health checking library. It is hard to
  share a common abstraction here without duplicating the logic. For
  example, viz imports the healthchecking package whereas the
  multicluster extension has its own. A dedicated core check will work
  better with all extensions that opt-in to use linkerd's extension
  label.
* Being part of the core checks means this is going to run before any of
  the other extension checks do which might improve visibility.

The change is straightforward; if an extension value is used for the
label key more than once across the cluster, the check issues a warning
along with the namespaces the label key and value tuple exists on.

This should be followed-up with a docs change.

Closes #11509

Signed-off-by: Matei David <[email protected]>
mateiidavid added a commit that referenced this issue Nov 20, 2023
* Introduce a new check for extension namespace configuration

Linkerd's extension model requires that each namespace that "owns" an
extension to be labelled with the extension name. Core extensions in
particular strictly follow this pattern. For example, the namespace viz
is installed in would be labelled with `linkerd.io/extension=viz`.

The extension is used by the CLI in many different instances. It is used
in checks, it is used in uninstalls, and so on. Whenever a namespace
contains a duplicate label value (e.g. two namespaces are registered as
the owner of "viz") we introduce undefined behaviour. Extension checks
or uninstalls may or may not work correctly. These issues are not
straightforward to debug. Misconfiguration can be introduced due to a
variety of reasons.

This change adds a new "core" category (`linkerd-extension-checks`) and
a new checker that asserts all extension namespaces are configured
properly. There are two reasons why this has been made a "core"
extension:

* Extensions may have their own health checking library. It is hard to
  share a common abstraction here without duplicating the logic. For
  example, viz imports the healthchecking package whereas the
  multicluster extension has its own. A dedicated core check will work
  better with all extensions that opt-in to use linkerd's extension
  label.
* Being part of the core checks means this is going to run before any of
  the other extension checks do which might improve visibility.

The change is straightforward; if an extension value is used for the
label key more than once across the cluster, the check issues a warning
along with the namespaces the label key and value tuple exists on.

This should be followed-up with a docs change.

Closes #11509

Signed-off-by: Matei David <[email protected]>
@hawkw hawkw mentioned this issue Nov 22, 2023
hawkw added a commit that referenced this issue Nov 22, 2023
## edge-23.11.4

This edge release introduces support for the native sidecar containers
entering beta support in Kubernetes 1.29. This improves the startup and
shutdown ordering for the proxy relative to other containers, fixing the
long-standing shutdown issue with injected `Job`s. Furthermore, traffic
from other `initContainer`s can now be proxied by Linkerd.

In addition, this edge release includes Helm chart improvements, and
improvements to the multicluster extension.

* Added a new `config.alpha.linkerd.io/proxy-enable-native-sidecar`
  annotation and `Proxy.NativeSidecar` Helm option that causes the proxy
  container to run as an init-container (thanks @teejaded!) (#11465;
  fixes #11461)
* Fixed broken affinity rules for the multicluster `service-mirror` when
  running in HA mode (#11609; fixes #11603)
* Added a new check to `linkerd check` that ensures all extension
  namespaces are configured properly (#11629; fixes #11509)
* Updated the Prometheus Docker image used by the `linkerd-viz`
  extension to v2.48.0, resolving a number of CVEs in older Prometheus
  versions (#11633)
* Added `nodeAffinity` to `deployment` templates in the `linkerd-viz`
  and `linkerd-jaeger` Helm charts (thanks @naing2victor!) (#11464;
  fixes #10680)
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
2 participants