-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make service mirror controller per target cluster #4710
Conversation
@@ -912,7 +912,7 @@ func (rcsw *RemoteClusterServiceWatcher) affectedMirroredServicesForGatewayUpdat | |||
affectedServices := []*corev1.Service{} | |||
for _, srv := range services { | |||
ver, ok := srv.Annotations[consts.RemoteGatewayResourceVersionAnnotation] | |||
if ok && ver != latestResourceVersion { | |||
if !ok || ver != latestResourceVersion { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unrelated to this change, but I believe it's a bug that was blocking my testing.
I had a situation where a mirror service was created before the gateway service (or during a period where the gateway service had been deleted). This caused the mirror service to be created without the RemoteGatewayResourceVersionAnnotation
. When the gateway service was later created, this event did not trigger an update on the mirror service because the mirror service did not have the annotation (ok
was false here). This seems like the wrong behavior. If the mirror service is missing this annotation, it should accept all gateway update events instead of rejecting all gateway update events.
Version: "v1alpha1", | ||
Resource: "links", | ||
} | ||
linkClient := k8sAPI.DynamicClient.Resource(gvr).Namespace(*namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why not add that to the api go client the same way we are doing it for TrafficSplits, etc. Why use the DynamicClient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We avoid using the code-generated client to have more flexible control over the deserialization and so that we can build in backwards compatibility if we later want to change the schema of the Link custom resource.
restartClusterWatcher(link, *namespace, creds, controllerK8sAPI, *requeueLimit, *repairPeriod) | ||
case watch.Deleted: | ||
log.Infof("Link %s deleted", linkName) | ||
// TODO: should we delete all mirror resources? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assume so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can tie all mirror services to this link why wouldn't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This look like it is going to be a great simplification. Just left a few questions. I assume you are not deleting the whole config watcher code to avoid making the diff large ?
This is a TODO, see the PR description. |
Update: Gateway metrics and the This involved further refactoring:
|
Update: the multicluster checks have been updated as described in #4705 |
Signed-off-by: Alex Leong <[email protected]>
Signed-off-by: Alex Leong <[email protected]>
Signed-off-by: Alex Leong <[email protected]>
Signed-off-by: Alex Leong <[email protected]>
Signed-off-by: Alex Leong <[email protected]>
Signed-off-by: Alex Leong <[email protected]>
Signed-off-by: Alex Leong <[email protected]>
Signed-off-by: Alex Leong <[email protected]>
Signed-off-by: Alex Leong <[email protected]>
1255dfc
to
a913f8f
Compare
…ndpointslices. Fix some helm template issues Signed-off-by: Alex Leong <[email protected]>
Update: I have manually merged #4740 into this branch to pick up fix for a bug that was preventing the destination controller from starting in clusters that don't support EndpointSlices. With that fix in place, I have tested this end-to-end with a simple multicluster + traffic shift workflow. Since this now has feature parity with what is on main, I'd like the next step to be getting this PR merged. Adding service selectors, an Your assistance testing and reviewing this PR is greatly appreciated. It may be helpful to run through several testcases outlined in #4387 to check for any regressions in behavior. |
Some notes on the test cases:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still doing some behavior testing right now and need to look through the actual code changes, but I'm having some trouble getting this to work as expected.
On two separate clusters, I've installed Linkerd and installed the multicluster resources. They both pass linkerd check --multicluster
.
I then installed the link in one cluster:
linkerd --context k3d-y mc link --cluster-name cluster-y |kubectl --context k3d-x apply -f -
and can see that it was created properly with:
kubectl --context k3d-x get -n linkerd-multicluster links
The output I see for gateways
is:
❯ bin/linkerd --context k3d-x mc gateways
CLUSTER ALIVE NUM_SVC LATENCY_P50 LATENCY_P95 LATENCY_P99
This output is the same when I have a properly labeled service in k3d-y
cluster as well and I don't see services mirrored.
I thought that typing this out I would come across my own user error, but I think following the UX changes explained in the RFC this should be correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good! I left some minor comments, but the overall implementation change makes sense.
We've DM'd about it a little and I'm still working through some issues on k3d, but I do have this working now with a GKE cluster and things work as expected.
// Check if there is a relevant end-point | ||
endpoint, err := hc.kubeAPI.CoreV1().Endpoints(svc.Namespace).Get(svc.Name, metav1.GetOptions{}) | ||
if err != nil || len(endpoint.Subsets) == 0 { | ||
servicesWithNoEndpoints = append(servicesWithNoEndpoints, fmt.Sprintf("%s.%s mirrored from cluster [%s] (gateway: [%s/%s])", svc.Name, svc.Namespace, svc.Labels[k8s.RemoteClusterNameLabel], svc.Labels[k8s.RemoteGatewayNsLabel], svc.Labels[k8s.RemoteGatewayNameLabel])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth making this an error instead of a warning. I ran into this issue above where I assumed check was passing (with some warnings, including this) and didn't look much into it.
Turns out, the issue of no endpoints was because the cache was waiting to sync and that was hanging from an incorrect kubeconfig
in the secret. Has this been an error, it would have been more obvious to me as the user that something about my setup was incorrect.
} | ||
|
||
// GetLink fetches a Link object from Kubernetes by name/namespace. | ||
func GetLink(client dynamic.Interface, namespace, name string) (Link, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is not used?
matchLabels := map[string]string{ | ||
consts.MirroredResourceLabel: "true", | ||
consts.RemoteClusterNameLabel: rcsw.clusterName, | ||
consts.RemoteClusterNameLabel: rcsw.link.TargetClusterName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this can just be matchLabels := rcsw.getMirroredServiceLabels()
} else { | ||
// Exists so we should update it. | ||
_, err = rcsw.localAPIClient.Client.CoreV1().Endpoints(ep.Namespace).Update(ep) | ||
if err != nil { | ||
return err | ||
} | ||
} | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else { | |
// Exists so we should update it. | |
_, err = rcsw.localAPIClient.Client.CoreV1().Endpoints(ep.Namespace).Update(ep) | |
if err != nil { | |
return err | |
} | |
} | |
return nil | |
} | |
// Exists so we should update it. | |
_, err = rcsw.localAPIClient.Client.CoreV1().Endpoints(ep.Namespace).Update(ep) | |
if err != nil { | |
return err | |
} | |
return nil |
restartClusterWatcher(link, *namespace, creds, controllerK8sAPI, *requeueLimit, *repairPeriod) | ||
case watch.Deleted: | ||
log.Infof("Link %s deleted", linkName) | ||
// TODO: should we delete all mirror resources? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can tie all mirror services to this link why wouldn't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, although I think it can be simplified further. Left some minor comments. My main point is that we can verify whether our gateway has everything it needs to have during link time. Since we are not updating the gateway at all now, we can do that. Currently linking with a cluster that has a gateway that has not gotten an external IP yet results in an invalid state which I believe cannot be repaired easily:
Spec:
Cluster Credentials Secret: cluster-credentials-west
Gateway Address:
Gateway Identity: linkerd-gateway.linkerd-multicluster.serviceaccount.identity.linkerd.cluster.local
Gateway Port: 4143
Probe Spec:
Path: /health
Period: 3s
Port: 4181
Target Cluster Domain: cluster.local
Target Cluster Linkerd Namespace: linkerd
Target Cluster Name: west
Events: <none>
linkerd-multicluster
--------------------
√ Link CRD exists
√ Link resources are valid
* west
√ service mirror controller has required permissions
* west
√ service mirror controllers are running
* west
√ remote cluster access credentials are valid
* west
√ clusters share trust anchors
* west
‼ all gateway mirrors are healthy
probe-gateway-west.linkerd-multicluster mirrored from cluster [west] has no endpoints
see https://linkerd.io/checks/#l5d-multicluster-gateways-endpoints for hints
if len(gatewayAddress) > 0 { | ||
endpointsToCreate.Subsets = []corev1.EndpointSubset{ | ||
{ | ||
Addresses: gatewayAddress, | ||
Ports: rcsw.getEndpointsPorts(ev.service), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether this can be a bit less fragile. So it seems that when doing link
we do not verify that the gateway has ready external addresses. Is it correct to say that if we link a cluster with a gateway that has no external addresses, this state will not be updated even when the addresses become ready? can we do a check in link to verify that there is at least one external IP ready? I think that this is a nice constraint to put
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a great idea
gatewayProbePort: 4181 | ||
namespace: linkerd-multicluster | ||
logLevel: info | ||
serviceMirrorRetryLimit: 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is worth getting rid of this retry mechanism alltogether. Frankly in practice I have not seen it pull its weight. I think this was a bit of overengineering on my end. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this refactor mostly didn't touch the event processing loop. we can think about removing it, but I don't think that change has to be tied to this refactor.
if err != nil { | ||
return nil, err | ||
} | ||
if rcsw.link.GatewayIdentity != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check in link
and assume here the gateway identitiy is present, so we alsways set the annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
Nit: both readmes for |
Hiya, wanted to give a hand so I started going through some of the test cases @adleong posted (based on #4387). Here are my results. I'm not sure if I tested anything beyond the scope of this PR but I noticed some undefined behaviour which might be a bit besides the point. Anyway, thought I'd share it. Tested on 2 GKE clusters
I thought this is a bit odd, went and checked the service mirror logs in west (service mirror pod was in crash loopback):
The weirder thing is that after the pod came up again:
I'm not sure this is a problem, if I tested the wrong thing or not, but the behaviour was interesting enough for me to include in this. It's possible I deleted/misconfigured something and this happened. Tl;dr everything works as expected on 2 GKE clusters except breaking the gateway which was very broken in my case, to say the least. |
@Matei207 Good catch on breaking the gateway. The service mirror controller crashing definitely shouldn't happen; I'll look into it. As for the double delete, if you look closely you can see that the first delete is deleting the service and the second one is deleting its endpoints. But since the endpoints delete fails with a NotFound, I suspect that the endpoints object must get automatically deleted when its service does. So I should be able to remove the endpoints deletion! |
@Matei207 I wasn't able to reproduce the crash that you describe. If you can still reproduce this behavior, would you mind grabbing the log output from the service mirror controller that contains the crash? you can use the |
Signed-off-by: Alex Leong <[email protected]>
Signed-off-by: Alex Leong <[email protected]>
@adleong that's interesting, I thought I'd for sure have some issues reproducing it but the same still happens. As a matter of fact, the service mirror component keeps restarting and the logs aren't much help. To give you a bit of context: the last time I installed after simply building the cli off this branch (the images got tagged with the commit and the images were found), today I wasn't so lucky, I built the images myself and pushed them to my public docker repo (so I can use pull them in my gke clusters). I built the binaries, installed with NAME READY STATUS RESTARTS AGE
linkerd-controller-576c987fc4-p6pjv 2/2 Running 0 62m
linkerd-destination-59cb4545f5-rqdl9 2/2 Running 0 62m
linkerd-grafana-5ccb4884cc-qt88l 2/2 Running 0 62m
linkerd-identity-66d8cbc8d-mrmvv 2/2 Running 0 62m
linkerd-prometheus-5749478bbd-7w4hn 2/2 Running 0 62m
linkerd-proxy-injector-666f55898c-x7kbz 2/2 Running 0 62m
linkerd-sp-validator-787875c5c6-2dgss 2/2 Running 0 62m
linkerd-tap-5fccf46cf5-26nvb 2/2 Running 0 62m
linkerd-web-7bf5f9778f-25dxv 2/2 Running 0 62m when I linked the two clusters, I had to edit the
❯ kgp
NAME READY STATUS RESTARTS AGE
linkerd-gateway-7b7fd866d6-b4p9j 2/2 Running 0 17m
linkerd-service-mirror-east-595c85697-n44wn 2/2 Running 1 2m11s
# 1 restart for linkerd service mirror 2 mins after changing the img repository
# same behaviour as yday Thought that's odd, tried getting logs:
I remembered my GKE clusters are default config, so I thought maybe it's a resource failure and my pods are crashing because the system is saturated: ❯ k top no
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
gke-west-1-default-pool-d0db0902-7p8d 117m 6% 928Mi 32%
gke-west-1-default-pool-d0db0902-lr34 212m 10% 1081Mi 38%
gke-west-1-default-pool-d0db0902-ng7b 185m 9% 948Mi 33% By this point, my pod restarted twice: ❯ kgp
NAME READY STATUS RESTARTS AGE
linkerd-gateway-7b7fd866d6-b4p9j 2/2 Running 0 20m
linkerd-service-mirror-east-595c85697-n44wn 2/2 Running 2 4m36s I thought I'd use the # logs for service mirror, didn't copy over cmd
0721 13:05:06.427788 1 round_trippers.go:443] PUT https://10.126.0.1:443/api/v1/namespaces/test/endpoints/podinfo-east 200 OK in 3 milliseconds
I0721 13:05:06.427811 1 round_trippers.go:449] Response Headers:
I0721 13:05:06.427816 1 round_trippers.go:452] Audit-Id: 8c861ea3-2a79-4887-95b7-599f4443cec6
I0721 13:05:06.427820 1 round_trippers.go:452] Content-Type: application/json
I0721 13:05:06.427824 1 round_trippers.go:452] Content-Length: 729
I0721 13:05:06.427827 1 round_trippers.go:452] Date: Tue, 21 Jul 2020 13:05:06 GMT
I0721 13:05:06.427858 1 request.go:1017] Response Body: {"kind":"Endpoints","apiVersion":"v1","metadata":{"name":"podinfo-east","namespace":"test","selfLink":"/api/v1/namespaces/test/endpoints/podinfo-east","uid":"8a04c295-cb52-11ea-b4bb-42010a8a01c1","resourceVersion":"46632","creationTimestamp":"2020-07-21T13:03:15Z","labels":{"mirror.linkerd.io/cluster-name":"east","mirror.linkerd.io/mirrored-service":"true"},"annotations":{"mirror.linkerd.io/remote-gateway-identity":"linkerd-gateway.linkerd-multicluster.serviceaccount.identity.linkerd.cluster.local","mirror.linkerd.io/remote-svc-fq-name":"podinfo.test.svc.cluster.local"}},"subsets":[{"addresses":[{"ip":"34.74.94.207"}],"ports":[{"name":"http","port":4143,"protocol":"TCP"},{"name":"grpc","port":4143,"protocol":"TCP"}]}]}
time="2020-07-21T13:05:08Z" level=debug msg="Gateway is healthy" probe-key=east
time="2020-07-21T13:05:12Z" level=debug msg="Gateway is healthy" probe-key=east
time="2020-07-21T13:05:15Z" level=debug msg="Gateway is healthy" probe-key=east
time="2020-07-21T13:05:18Z" level=debug msg="Gateway is healthy" probe-key=east
time="2020-07-21T13:05:21Z" level=debug msg="Gateway is healthy" probe-key=east
time="2020-07-21T13:05:24Z" level=debug msg="Gateway is healthy" probe-key=east
time="2020-07-21T13:05:27Z" level=debug msg="Gateway is healthy" probe-key=east
time="2020-07-21T13:05:31Z" level=debug msg="Gateway is healthy" probe-key=east
❯ k logs linkerd-service-mirror-east-595c85697-n44wn linkerd-proxy -p
Error from server (BadRequest): previous terminated container "linkerd-proxy" in pod "linkerd-service-mirror-east-595c85697-n44wn" not found
❯ k logs linkerd-service-mirror-east-595c85697-n44wn linkerd-proxy -f
time="2020-07-21T13:02:05Z" level=info msg="running version git-474b0839"
[ 0.13569131s] INFO linkerd2_proxy: Admin interface on 0.0.0.0:4191
[ 0.13597942s] INFO linkerd2_proxy: Inbound interface on 0.0.0.0:4143
[ 0.13603820s] INFO linkerd2_proxy: Outbound interface on 127.0.0.1:4140
[ 0.13607948s] INFO linkerd2_proxy: Tap interface on 0.0.0.0:4190
[ 0.13612487s] INFO linkerd2_proxy: Local identity is linkerd-service-mirror-east.linkerd-multicluster.serviceaccount.identity.linkerd.cluster.local
[ 0.13621721s] INFO linkerd2_proxy: Identity verified via linkerd-identity.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[ 0.13627490s] INFO linkerd2_proxy: Destinations resolved via linkerd-dst.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local)
[ 0.14513511s] INFO outbound: linkerd2_app_outbound: Serving addr=127.0.0.1:4140
[ 0.14594018s] INFO inbound: linkerd2_app_inbound: Serving addr=0.0.0.0:4143
[ 1.105313965s] INFO daemon:identity: linkerd2_app: Certified identity: linkerd-service-mirror-east.linkerd-multicluster.serviceaccount.identity.linkerd.cluster.local
[ 214.164237265s] WARN inbound:accept{peer.addr=10.60.1.27:55198}:source{target.addr=10.60.1.33:9999}: linkerd2_app_core::errors: Failed to proxy request: error trying to connect: Connection refused (os error 111)
[ 344.164643442s] WARN inbound:accept{peer.addr=10.60.1.27:56328}:source{target.addr=10.60.1.33:9999}: linkerd2_app_core::errors: Failed to proxy request: error trying to connect: Connection refused (os error 111)
[ 354.164349554s] WARN inbound:accept{peer.addr=10.60.1.27:56416}:source{target.addr=10.60.1.33:9999}: linkerd2_app_core::errors: Failed to proxy request: error trying to connect: Connection refused (os error 111) Thought I'd set
and
|
I've managed to trigger a crash in the service mirror controller:
I triggered this by manually editing a mirror service endpoints resource (Endpoints/voting-svc-gke) and deleting the annotations section. Then I edited the exported service in the target cluster to trigger an update which crashed the service mirror controller. I have no idea if this is the same issue that @Matei207 is seeing, but I'll put together a fix for this. |
Signed-off-by: Alex Leong <[email protected]>
cli/cmd/multicluster.go
Outdated
cmd.Flags().StringVar(&opts.clusterName, "cluster-name", "", "Cluster name") | ||
cmd.Flags().StringVar(&opts.apiServerAddress, "api-server-address", "", "The api server address of the target cluster") | ||
cmd.Flags().StringVar(&opts.serviceAccountName, "service-account-name", defaultServiceAccountName, "The name of the service account associated with the credentials") | ||
cmd.Flags().StringVar(&opts.controlPlaneVersion, "control-plane-version", opts.controlPlaneVersion, "(Development) Tag to be used for the control plane component images") | ||
cmd.Flags().StringVar(&opts.gatewayName, "gateway-name", defaultGatewayName, "The name of the gateway service") | ||
cmd.Flags().StringVar(&opts.gatewayNamespace, "gateway-namespace", defaultMulticlusterNamespace, "The namespace of the gateway service") | ||
cmd.Flags().Uint32Var(&opts.serviceMirrorRetryLimit, "service-mirror-retry-limit", opts.serviceMirrorRetryLimit, "The number of times a failed update from the target cluster is allowed to be retried") | ||
cmd.Flags().StringVar(&opts.logLevel, "log-level", opts.logLevel, "Log level for the Multicluster components") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to include dockerRegistry
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second this, simply because in my tests I had to manually edit the deployment to include my own registry (install command ships with --registry
.
I tried to replicate the problem that @Matei207 is hitting. Could not. I wonder whether this has anything to do with changing the deployment of the mirror to have a different image. This seems highly unlikely though... @Matei207 are you seeing any k8s events that are useful on the deployment/pod once it goes into a crash loop? Also it seems this happens while the |
@adleong @zaharidichev no events other than the crash loopback, I have built the latest commit and my service mirror controller is still crashing. It does seem to happen when
|
Signed-off-by: Alex Leong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reproduced the error talked about above, but this otherwise looks good to me
// We need to issue a RepairEndpoints immediately to populate the gateway | ||
// mirror endpoints. | ||
ev := RepairEndpoints{} | ||
rcsw.eventsQueue.Add(&ev) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes the issue of 503s that I was seeing 👍
Signed-off-by: Alex Leong <[email protected]>
I talked to @Matei207 and it seems like my latest commit which restarts watches that have completed fixes the crashes. 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR removes the service mirror controller from `linkerd mc install` to `linkerd mc link`, as described in linkerd/rfc#31. For fuller context, please see that RFC. Basic multicluster functionality works here including: * `linkerd mc install` installs the Link CRD but not any service mirror controllers * `linkerd mc link` creates a Link resource and installs a service mirror controller which uses that Link * The service mirror controller creates and manages mirror services, a gateway mirror, and their endpoints. * The `linkerd mc gateways` command lists all linked target clusters, their liveliness, and probe latences. * The `linkerd check` multicluster checks have been updated for the new architecture. Several checks have been rendered obsolete by the new architecture and have been removed. The following are known issues requiring further work: * the service mirror controller uses the existing `mirror.linkerd.io/gateway-name` and `mirror.linkerd.io/gateway-ns` annotations to select which services to mirror. it does not yet support configuring a label selector. * an unlink command is needed for removing multicluster links: see linkerd#4707 * an mc uninstall command is needed for uninstalling the multicluster addon: see linkerd#4708 Signed-off-by: Alex Leong <[email protected]>
This PR removes the service mirror controller from `linkerd mc install` to `linkerd mc link`, as described in linkerd/rfc#31. For fuller context, please see that RFC. Basic multicluster functionality works here including: * `linkerd mc install` installs the Link CRD but not any service mirror controllers * `linkerd mc link` creates a Link resource and installs a service mirror controller which uses that Link * The service mirror controller creates and manages mirror services, a gateway mirror, and their endpoints. * The `linkerd mc gateways` command lists all linked target clusters, their liveliness, and probe latences. * The `linkerd check` multicluster checks have been updated for the new architecture. Several checks have been rendered obsolete by the new architecture and have been removed. The following are known issues requiring further work: * the service mirror controller uses the existing `mirror.linkerd.io/gateway-name` and `mirror.linkerd.io/gateway-ns` annotations to select which services to mirror. it does not yet support configuring a label selector. * an unlink command is needed for removing multicluster links: see linkerd#4707 * an mc uninstall command is needed for uninstalling the multicluster addon: see linkerd#4708 Signed-off-by: Alex Leong <[email protected]> Signed-off-by: Eric Solomon <[email protected]>
This PR removes the service mirror controller from
linkerd mc install
tolinkerd mc link
, as described in linkerd/rfc#31. For fuller context, please see that RFC.Basic multicluster functionality works here including:
linkerd mc install
installs the Link CRD but not any service mirror controllerslinkerd mc link
creates a Link resource and installs a service mirror controller which uses that Linklinkerd mc gateways
command lists all linked target clusters, their liveliness, and probe latences.linkerd check
multicluster checks have been updated for the new architecture. Several checks have been rendered obsolete by the new architecture and have been removed.The following are known issues requiring further work:
mirror.linkerd.io/gateway-name
andmirror.linkerd.io/gateway-ns
annotations to select which services to mirror. it does not yet support configuring a label selector.