Skip to content

Added Anti Affinity when HA is configured#2893

Merged
ihcsim merged 8 commits intolinkerd:masterfrom
Pothulapati:ha
Jul 18, 2019
Merged

Added Anti Affinity when HA is configured#2893
ihcsim merged 8 commits intolinkerd:masterfrom
Pothulapati:ha

Conversation

@Pothulapati
Copy link
Contributor

@Pothulapati Pothulapati commented Jun 4, 2019

Fixes #1895

The following PR adds anti-affinity rules to proxy-injector, sp-validator, linkerd-controller, tap deployments.

The idea was to make anti-affinity rules both based on kubernetes.io/hostname and failure-domain.beta.kubernetes.io/zone preferred when only the the --ha flag is configured.

if the --required-host-anti-affinity is also configured along with --ha, then the kubernetes.io/hostname is required while failure-domain.beta.kubernetes.io/zone is still preferred.

@ihcsim @alpeb @grampelberg @olix0r

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
@wmorgan
Copy link
Member

wmorgan commented Jun 4, 2019

Thankx @Pothulapati !

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
@ihcsim
Copy link
Contributor

ihcsim commented Jul 16, 2019

@grampelberg The changes in this PR, by default, adds the pod anti-affinity hostname and zone constraints, when in HA mode. The k8s doc recommends not adding these constraints for clusters with several hundred nodes. Shall we make these new pod anti-affinity rules optional?

@grampelberg
Copy link
Contributor

@ihcsim my vote is to leave them with the assumption that most clusters are under a few hundred nodes.

The —ha flag has been getting more and more cumbersome as issues like this come up. If we go back to the beginning, the product goal is to provide a sane set of defaults for most users. So far, I think we’re doing an alright job there. Unfortunately, as we’ve rolled a lot of these settings up into the sane defaults, very advanced users can’t toggle them on and off. With the current state of install, it is pretty tough to add new flags as the whole thing just gets more complex.

With the helm chart coming out, WDYT about making all these types of configuration individual settings in the chart, having install —ha install sane defaults and providing some example values.yaml that set the same default configurations?

@ihcsim
Copy link
Contributor

ihcsim commented Jul 17, 2019

@grampelberg Agreed with keeping a set of sane defaults, and let users use helm (or whatever tools) to handle more advanced use cases (like defining required vs. preferred affinity/anti-affinity rules etc.). And yes, we can definitely come up with example values.yaml for configurable fields like weight, topologyKey etc., as part of the bigger Linkerd chart task.

Following #1895 (comment), how do you feel about omitting the --required-host-anti-affinity flag entirely, and just keep the preferred anti-affinity defaults (on both hostname and zone)? I feel required is too restrictive, and we don't want the HA install experience to fail due to cluster topology outside of our control.

@ihcsim
Copy link
Contributor

ihcsim commented Jul 17, 2019

@Pothulapati This is working great on my GKE regional cluster! Can you update the identity.yaml template to include the anti-affinity rules as well?

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
@grampelberg
Copy link
Contributor

@ihcsim I’d actually go the other way, omit --required-host-anti-affinity and make required the default. Here’s my thinking, WDYT?

  • If I’m running real HA, I want there to be hard separation. If there isn’t, I want it to fail fast and tell me.
  • For clusters that can’t handle required, the deploy will immediately fail and users will get to choose whether they want required with extra nodes or preferred and use something like kustomize to edit.
  • Defaulting to preferred feels like we’re introducing two types of HA, “kinda” HA and “real” HA.

Obviously, this will all get better when advanced users can tweak to their heart’s content. I’m thinking in the meantime we should have the most restrictive HA possible to surface any possible issues up front and fail fast.

@Pothulapati
Copy link
Contributor Author

Pothulapati commented Jul 17, 2019

@grampelberg that approach feels right. Making the anti-affinity requirements fixed by default is fine, but that would not allow users to have other ha things like better resource limits and requests, etc without satisfying anti-affinity requirements. But I guess that should be okay, advanced users can always use kustomize and tweak them.

Now, Should we make both zone and host as required? Host should be okay, but I don't think zones would work, as users may not have a cross-zone cluster

@ihcsim
Copy link
Contributor

ihcsim commented Jul 17, 2019

Defaulting to preferred feels like we’re introducing two types of HA, “kinda” HA and “real” HA.

👍 Agreed.

@grampelberg @Pothulapati So to summarize,

  • Remove the --required-host-anti-affinity option
  • By default, HA mode uses required hostname anti-affinity and preferred zone anti-affinity (there will almost always be more replicas than zones, so required zone anti-affinity won't work.)

If the required hostname anti-affinity is causing an installation failure, linkerd check actually detects the problem and outputs the reason:

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ control plane replica sets are ready
× no unschedulable pods
    linkerd-controller-7d9bdd85b8-2sk7q: 0/6 nodes are available: 1 Insufficient cpu, 5 node(s) didn't match pod affinity/anti-affinity, 5 node(s) didn't satisfy existing pods anti-affinity rules.
    linkerd-controller-7d9bdd85b8-74vsr: 0/6 nodes are available: 1 Insufficient cpu, 5 node(s) didn't match pod affinity/anti-affinity, 5 node(s) didn't satisfy existing pods anti-affinity rules.
    linkerd-controller-7d9bdd85b8-97lds: 0/6 nodes are available: 1 Insufficient cpu, 5 node(s) didn't match pod affinity/anti-affinity, 5 node(s) didn't satisfy existing pods anti-affinity rules.
    linkerd-controller-7d9bdd85b8-9lj9v: 0/6 nodes are available: 1 Insufficient cpu, 5 node(s) didn't match pod affinity/anti-affinity, 5 node(s) didn't satisfy existing pods anti-affinity rules.
    linkerd-controller-7d9bdd85b8-9sx8d: 0/6 nodes are available: 1 Insufficient cpu, 5 node(s) didn't match pod affinity/anti-affinity, 5 node(s) didn't satisfy existing pods anti-affinity rules.
    linkerd-controller-7d9bdd85b8-kbzwn: 0/6 nodes are available: 1 Insufficient cpu, 5 node(s) didn't match pod affinity/anti-affinity, 5 node(s) didn't satisfy existing pods anti-affinity rules.
    linkerd-controller-7d9bdd85b8-lcgww: 0/6 nodes are available: 1 Insufficient cpu, 5 node(s) didn't match pod affinity/anti-affinity, 5 node(s) didn't satisfy existing pods anti-affinity rules.
    linkerd-controller-7d9bdd85b8-zmsn4: 0/6 nodes are available: 1 Insufficient cpu, 5 node(s) didn't match pod affinity/anti-affinity, 5 node(s) didn't satisfy existing pods anti-affinity rules.
    linkerd-proxy-injector-76c4f5c7d9-98gm8: 0/6 nodes are available: 6 node(s) didn't match pod affinity/anti-affinity, 6 node(s) didn't satisfy existing pods anti-affinity rules.
    linkerd-proxy-injector-76c4f5c7d9-w7jwc: 0/6 nodes are available: 6 node(s) didn't match pod affinity/anti-affinity, 6 node(s) didn't satisfy existing pods anti-affinity rules.
    linkerd-sp-validator-6bc6cc666b-4k5zs: 0/6 nodes are available: 6 node(s) didn't match pod affinity/anti-affinity, 6 node(s) didn't satisfy existing pods anti-affinity rules.
    linkerd-sp-validator-6bc6cc666b-qvkdb: 0/6 nodes are available: 6 node(s) didn't match pod affinity/anti-affinity, 6 node(s) didn't satisfy existing pods anti-affinity rules.
    linkerd-tap-8688fdf4f-c5tgb: 0/6 nodes are available: 6 node(s) didn't match pod affinity/anti-affinity, 6 node(s) didn't satisfy existing pods anti-affinity rules.
    linkerd-tap-8688fdf4f-wmlnz: 0/6 nodes are available: 6 node(s) didn't match pod affinity/anti-affinity, 6 node(s) didn't satisfy existing pods anti-affinity rules.
    see https://linkerd.io/checks/#l5d-existence-unschedulable-pods for hints

@grampelberg
Copy link
Contributor

@Pothulapati let’s not do zone, I think you’re right, that’ll be an advanced option for folks who know they want it.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
@Pothulapati
Copy link
Contributor Author

Updated the PR to have zone anti-affinity as preferred and host anti-affinity as required by default.

@ihcsim
Copy link
Contributor

ihcsim commented Jul 17, 2019

This is awesome!

✗ k get no -L failure-domain.beta.kubernetes.io/zone
NAME                                         STATUS   ROLES    AGE   VERSION         ZONE
gke-isim-dev-ha-default-pool-4b003e42-1m9b   Ready    <none>   22h   v1.13.7-gke.8   us-east1-b
gke-isim-dev-ha-default-pool-4b003e42-p1ps   Ready    <none>   22h   v1.13.7-gke.8   us-east1-b
gke-isim-dev-ha-default-pool-560160bf-7sl6   Ready    <none>   20h   v1.13.7-gke.8   us-east1-c
gke-isim-dev-ha-default-pool-560160bf-cnzh   Ready    <none>   22h   v1.13.7-gke.8   us-east1-c
gke-isim-dev-ha-default-pool-f4da19f4-4n3c   Ready    <none>   22h   v1.13.7-gke.8   us-east1-d
gke-isim-dev-ha-default-pool-f4da19f4-g79p   Ready    <none>   22h   v1.13.7-gke.8   us-east1-d

✗ k -n linkerd get po -owide
NAME                                      READY   STATUS    RESTARTS   AGE   IP           NODE                                         NOMINATED NODE   READINESS GATES
linkerd-controller-5b5765b845-7cnpz       3/3     Running   0          18m   10.60.1.27   gke-isim-dev-ha-default-pool-4b003e42-p1ps   <none>           <none> # us-east1-b
linkerd-controller-5b5765b845-j6ss9       3/3     Running   0          18m   10.60.3.20   gke-isim-dev-ha-default-pool-560160bf-cnzh   <none>           <none> # us-east1-c
linkerd-controller-5b5765b845-r92md       3/3     Running   0          18m   10.60.4.23   gke-isim-dev-ha-default-pool-f4da19f4-4n3c   <none>           <none> # us-east1-d
linkerd-grafana-7df55df848-888ck          2/2     Running   0          12m   10.60.1.30   gke-isim-dev-ha-default-pool-4b003e42-p1ps   <none>           <none> 
linkerd-identity-74cf6f446f-7r57m         2/2     Running   0          18m   10.60.4.22   gke-isim-dev-ha-default-pool-f4da19f4-4n3c   <none>           <none> # us-east1-d
linkerd-identity-74cf6f446f-b9gfc         2/2     Running   0          18m   10.60.1.26   gke-isim-dev-ha-default-pool-4b003e42-p1ps   <none>           <none> #us-east1-b 
linkerd-identity-74cf6f446f-fbw5w         2/2     Running   0          18m   10.60.2.20   gke-isim-dev-ha-default-pool-560160bf-7sl6   <none>           <none> # us-east1-c
linkerd-prometheus-7bcc6c5b66-rv7n4       2/2     Running   3          18m   10.60.0.17   gke-isim-dev-ha-default-pool-4b003e42-1m9b   <none>           <none>
linkerd-proxy-injector-746bfbb494-8w8qk   2/2     Running   0          18m   10.60.3.21   gke-isim-dev-ha-default-pool-560160bf-cnzh   <none>           <none> # us-east1-c
linkerd-proxy-injector-746bfbb494-s42qm   2/2     Running   0          18m   10.60.4.24   gke-isim-dev-ha-default-pool-f4da19f4-4n3c   <none>           <none> # us-east1-d
linkerd-proxy-injector-746bfbb494-wlxxf   2/2     Running   0          18m   10.60.1.28   gke-isim-dev-ha-default-pool-4b003e42-p1ps   <none>           <none> # us-east1-b
linkerd-sp-validator-6947dff89c-4nqrs     2/2     Running   0          18m   10.60.0.18   gke-isim-dev-ha-default-pool-4b003e42-1m9b   <none>           <none> # us-east1-b
linkerd-sp-validator-6947dff89c-64vv9     2/2     Running   0          18m   10.60.2.23   gke-isim-dev-ha-default-pool-560160bf-7sl6   <none>           <none> # us-east1-c
linkerd-sp-validator-6947dff89c-cdd4q     2/2     Running   0          18m   10.60.5.20   gke-isim-dev-ha-default-pool-f4da19f4-g79p   <none>           <none> # us-east1-d
linkerd-tap-5d7745b8c8-n9xwj              2/2     Running   0          18m   10.60.5.21   gke-isim-dev-ha-default-pool-f4da19f4-g79p   <none>           <none> # us-east1-d
linkerd-tap-5d7745b8c8-q9rtx              2/2     Running   0          18m   10.60.3.22   gke-isim-dev-ha-default-pool-560160bf-cnzh   <none>           <none>  # us-east1-c
linkerd-tap-5d7745b8c8-rjznc              2/2     Running   0          18m   10.60.1.29   gke-isim-dev-ha-default-pool-4b003e42-p1ps   <none>           <none>  # us-east1-b
linkerd-web-7cd4bf9d7-cd4bv               2/2     Running   0          18m   10.60.2.21   gke-isim-dev-ha-default-pool-560160bf-7sl6   <none>           <none>

If the scheduler can't satisfy the anti-affinity hostname rule, linkerd check will output an error:

✗ k -n linkerd scale deploy/linkerd-controller --replicas=10
deployment.extensions/linkerd-controller scaled
✗ linkerd check
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ control plane replica sets are ready
× no unschedulable pods
    linkerd-controller-5b5765b845-9nq7d: 0/6 nodes are available: 6 node(s) didn't match pod affinity/anti-affinity, 6 node(s) didn't satisfy existing pods anti-affinity rules.
    linkerd-controller-5b5765b845-ncztn: 0/6 nodes are available: 6 node(s) didn't match pod affinity/anti-affinity, 6 node(s) didn't satisfy existing pods anti-affinity rules.
    linkerd-controller-5b5765b845-tgdlp: 0/6 nodes are available: 6 node(s) didn't match pod affinity/anti-affinity, 6 node(s) didn't satisfy existing pods anti-affinity rules.
    linkerd-controller-5b5765b845-xshbc: 0/6 nodes are available: 6 node(s) didn't match pod affinity/anti-affinity, 6 node(s) didn't satisfy existing pods anti-affinity rules.
    see https://linkerd.io/checks/#l5d-existence-unschedulable-pods for hints

@ihcsim
Copy link
Contributor

ihcsim commented Jul 17, 2019

@Pothulapati In anticipation of the upcoming Linkerd Helm chart work, can I trouble you to extract out the common anti-affinity rules into partial template. I tested the following on my GKE cluster, and everything works:

diff --git a/chart/templates/_pod_affinity.yaml b/chart/templates/_pod_affinity.yaml
new file mode 100644
index 00000000..dd11873b
--- /dev/null
+++ b/chart/templates/_affinity.yaml
@@ -0,0 +1,22 @@
+{{- define "pod-affinity" }}
+affinity:
+  podAntiAffinity:
+    preferredDuringSchedulingIgnoredDuringExecution:
+    - weight: 100
+      podAffinityTerm:
+        labelSelector:
+          matchExpressions:
+          - key: {{ .Label }}
+            operator: In
+            values:
+            - {{ .Component }}
+        topologyKey: failure-domain.beta.kubernetes.io/zone
+    requiredDuringSchedulingIgnoredDuringExecution:
+    - labelSelector:
+        matchExpressions:
+        - key: {{ .Label }}
+          operator: In
+          values:
+          - {{ .Component }}
+      topologyKey: kubernetes.io/hostname
+{{- end }}
diff --git a/chart/templates/controller.yaml b/chart/templates/controller.yaml
index 960d3c3d..980ac140 100644
--- a/chart/templates/controller.yaml
+++ b/chart/templates/controller.yaml
@@ -131,26 +131,8 @@ spec:
       - name: config
         configMap:
           name: linkerd-config
-{{- if .HighAvailability}}
-      affinity:
-        podAntiAffinity:
-          preferredDuringSchedulingIgnoredDuringExecution:
-            - weight: 100
-              podAffinityTerm:
-                labelSelector:
-                  matchExpressions:
-                    - key: {{.ControllerComponentLabel}}
-                      operator: In
-                      values:
-                        - controller
-                topologyKey: failure-domain.beta.kubernetes.io/zone
-          requiredDuringSchedulingIgnoredDuringExecution:
-            - labelSelector:
-                matchExpressions:
-                  - key: {{.ControllerComponentLabel}}
-                    operator: In
-                    values:
-                      - controller
-              topologyKey: kubernetes.io/hostname
-{{ end }}
+      {{- if .HighAvailability }}
+      {{- $local := dict "Label" .ControllerComponentLabel "Component" "controller" }}
+      {{- include "pod-affinity" $local | nindent 6 }}
+      {{- end }}
 {{end -}}
diff --git a/chart/templates/identity.yaml b/chart/templates/identity.yaml
index dbf274a2..1c035f1c 100644
--- a/chart/templates/identity.yaml
+++ b/chart/templates/identity.yaml
@@ -102,27 +102,9 @@ spec:
       - name: identity-issuer
         secret:
           secretName: linkerd-identity-issuer
-{{- if .HighAvailability}}
-      affinity:
-        podAntiAffinity:
-          preferredDuringSchedulingIgnoredDuringExecution:
-            - weight: 100
-              podAffinityTerm:
-                labelSelector:
-                  matchExpressions:
-                    - key: {{.ControllerComponentLabel}}
-                      operator: In
-                      values:
-                        - identity
-                topologyKey: failure-domain.beta.kubernetes.io/zone
-          requiredDuringSchedulingIgnoredDuringExecution:
-            - labelSelector:
-                matchExpressions:
-                  - key: {{.ControllerComponentLabel}}
-                    operator: In
-                    values:
-                      - identity
-              topologyKey: kubernetes.io/hostname
-{{ end }}
+      {{- if .HighAvailability }}
+      {{- $local := dict "Label" .ControllerComponentLabel "Component" "identity" }}
+      {{- include "pod-affinity" $local | nindent 6 }}
+      {{- end }}
 {{end -}}
 {{end -}}
diff --git a/chart/templates/proxy_injector.yaml b/chart/templates/proxy_injector.yaml
index 465a9157..9da1481f 100644
--- a/chart/templates/proxy_injector.yaml
+++ b/chart/templates/proxy_injector.yaml
@@ -67,29 +67,10 @@ spec:
       - name: tls
         secret:
           secretName: linkerd-proxy-injector-tls
-{{- if .HighAvailability}}
-      affinity:
-        podAntiAffinity:
-          preferredDuringSchedulingIgnoredDuringExecution:
-            - weight: 100
-              podAffinityTerm:
-                labelSelector:
-                  matchExpressions:
-                    - key: {{.ControllerComponentLabel}}
-                      operator: In
-                      values:
-                        - proxy-injector
-                topologyKey: failure-domain.beta.kubernetes.io/zone
-          requiredDuringSchedulingIgnoredDuringExecution:
-            - labelSelector:
-                matchExpressions:
-                  - key: {{.ControllerComponentLabel}}
-                    operator: In
-                    values:
-                      - proxy-injector
-              topologyKey: kubernetes.io/hostname
-{{ end }}
-
+      {{- if .HighAvailability}}
+      {{- $local := dict "Label" .ControllerComponentLabel "Component" "proxy-injector" }}
+      {{- include "pod-affinity" $local | nindent 6 }}
+      {{- end }}
 ---
 kind: Service
 apiVersion: v1
diff --git a/chart/templates/sp_validator.yaml b/chart/templates/sp_validator.yaml
index 18f2545a..dc36daa0 100644
--- a/chart/templates/sp_validator.yaml
+++ b/chart/templates/sp_validator.yaml
@@ -81,26 +81,8 @@ spec:
       - name: tls
         secret:
           secretName: linkerd-sp-validator-tls
-{{- if .HighAvailability}}
-      affinity:
-          podAntiAffinity:
-              preferredDuringSchedulingIgnoredDuringExecution:
-                  - weight: 100
-                    podAffinityTerm:
-                        labelSelector:
-                            matchExpressions:
-                                - key: {{.ControllerComponentLabel}}
-                                  operator: In
-                                  values:
-                                      - sp-validator
-                        topologyKey: failure-domain.beta.kubernetes.io/zone
-              requiredDuringSchedulingIgnoredDuringExecution:
-                  - labelSelector:
-                        matchExpressions:
-                            - key: {{.ControllerComponentLabel}}
-                              operator: In
-                              values:
-                                  - sp-validator
-                    topologyKey: kubernetes.io/hostname
-{{ end }}
+      {{- if .HighAvailability}}
+      {{- $local := dict "Label" .ControllerComponentLabel "Component" "sp-validator" }}
+      {{- include "pod-affinity" $local | nindent 6 }}
+      {{- end }}
 {{end -}}
diff --git a/chart/templates/tap.yaml b/chart/templates/tap.yaml
index 1b231f25..ab608b6f 100644
--- a/chart/templates/tap.yaml
+++ b/chart/templates/tap.yaml
@@ -71,26 +71,8 @@ spec:
         {{ end -}}
         securityContext:
           runAsUser: {{.ControllerUID}}
-{{- if .HighAvailability}}
-      affinity:
-        podAntiAffinity:
-          preferredDuringSchedulingIgnoredDuringExecution:
-            - weight: 100
-              podAffinityTerm:
-                labelSelector:
-                  matchExpressions:
-                    - key: {{.ControllerComponentLabel}}
-                      operator: In
-                      values:
-                        - tap
-                topologyKey: failure-domain.beta.kubernetes.io/zone
-          requiredDuringSchedulingIgnoredDuringExecution:
-            - labelSelector:
-                matchExpressions:
-                  - key: {{.ControllerComponentLabel}}
-                    operator: In
-                    values:
-                      - tap
-              topologyKey: kubernetes.io/hostname
-{{ end }}
+      {{- if .HighAvailability}}
+      {{- $local := dict "Label" .ControllerComponentLabel "Component" "tap" }}
+      {{- include "pod-affinity" $local | nindent 6 }}
+      {{- end }}
 {{end -}}
diff --git a/cli/cmd/install.go b/cli/cmd/install.go
index 9d6adfba..c5ac4d3d 100644
--- a/cli/cmd/install.go
+++ b/cli/cmd/install.go
@@ -703,6 +703,7 @@ func (values *installValues) render(w io.Writer, configs *pb.All) error {
 	if values.stage == "" || values.stage == controlPlaneStage {
 		files = append(files, []*chartutil.BufferedFile{
 			{Name: "templates/_resources.yaml"},
+			{Name: "templates/_affinity.yaml"},
 			{Name: "templates/config.yaml"},
 			{Name: "templates/identity.yaml"},
 			{Name: "templates/controller.yaml"},

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
@Pothulapati
Copy link
Contributor Author

@ihcsim Updated PR accordingly

Copy link
Contributor

@ihcsim ihcsim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 🥇

@ihcsim ihcsim merged commit fcec1cf into linkerd:master Jul 18, 2019
@Pothulapati Pothulapati deleted the ha branch July 18, 2019 17:13
grampelberg pushed a commit to linkerd/website that referenced this pull request Sep 10, 2019
The current HA documentation is outdated with respect to anti-affinity rules for critical pods as of 2.5.0 stable release.

Document the impact of linkerd/linkerd2#2893 on anti-affinity rules for control plane pods.

Related to #400

Signed-off-by: Christian Nicolai <cn@mycrobase.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add anti-afinity settings for controller pod when --ha flag is present

4 participants