Skip to content
This repository has been archived by the owner on Jul 3, 2021. It is now read-only.

Adding/removing a master node to the cluster is kicking off a whole cluster restart (canary deployment) #344

Open
lgunta2018 opened this issue Sep 17, 2018 · 7 comments

Comments

@lgunta2018
Copy link

lgunta2018 commented Sep 17, 2018

What happened:
Adding/removing a master node to the cluster is kicking off the whole cluster restart (canary deployment)

What you expected to happen:
Adding/removing a master node should not kick off the whole cluster restart

How to reproduce it (as minimally and precisely as possible):

Steps:

  1. Create a cluster with 3 master and 3 worker nodes
  2. Update the number of master nodes instances to 2 in cfcr.yml file
  3. Run the bosh deployment command, you can see the whole cluster restart

runlog:

Removing master node:

bosh deploy -d cfcr ${KD}/manifests/cfcr.yml -o ${KD}/manifests/ops-files/iaas/aws/cloud-provider.yml -o cfcr-ops.yml -l <(bbl outputs)
Using environment 'https://10.0.0.6:25555' as client 'admin'

Using deployment 'cfcr'

Release 'cfcr-etcd/1.5.0' already exists.

Release 'bpm/0.12.3' already exists.

addons:

  • include:
    stemcells:
    • os: ubuntu-trusty
    • os: ubuntu-xenial
      name: bosh-dns
  • jobs:
    • name: kubo-dns-aliases
      release: kubo
      name: bosh-dns-aliases
      features:
      use_dns_addresses: true
      instance_groups:
  • azs:
    • z1
      instances: 1
      jobs:
    • name: apply-specs
      properties:
      addons:
      • kube-dns
      • metrics-server
      • heapster
      • kubernetes-dashboard
        admin-password: ((kubo-admin-password))
        admin-username: admin
        api-token: ((kubelet-password))
        tls:
        heapster: ((tls-heapster))
        influxdb: ((tls-influxdb))
        kubernetes: ((tls-kubernetes))
        kubernetes-dashboard: ((tls-kubernetes-dashboard))
        metrics-server: ((tls-metrics-server))
        release: kubo
        lifecycle: errand
        name: apply-addons
        networks:
    • name: default
      stemcell: default
      "kubo-deployment/manifests/cfcr.yml" 297L, 7524C
      type: certificate
  • name: tls-etcdctl
    options:
    ca: kubo_ca
    common_name: etcdClient
    extended_key_usage:
    • client_auth
      type: certificate
  • name: tls-metrics-server
    options:
    alternative_names:
    • metrics-server.kube-system.svc
      ca: kubo_ca
      common_name: metrics-server
      type: certificate
  • name: tls-heapster
    options:
    alternative_names:
    • heapster.kube-system.svc.cluster.local
      ca: kubo_ca
      common_name: heapster
      type: certificate
  • name: tls-influxdb
    options:
    alternative_names: []
    ca: kubo_ca
    common_name: monitoring-influxdb
    type: certificate
  • name: kubernetes-dashboard-ca
    options:
    common_name: ca
    is_ca: true
    type: certificate
  • name: tls-kubernetes-dashboard
    options:
    alternative_names: []
    ca: kubernetes-dashboard-ca
    common_name: kubernetesdashboard.cfcr.internal
    type: certificate
    ?instances
    certificate: ((tls-etcdctl.certificate))
    Release 'bosh-dns/1.8.0' already exists.

Release 'docker/32.0.0' already exists.

instance_groups:

  • name: master
  • instances: 3
  • instances: 2

Continue? [yN]: y

Task 120

Task 120 | 23:47:07 | Preparing deployment: Preparing deployment (00:00:07)
Task 120 | 23:47:39 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 120 | 23:47:39 | Deleting unneeded instances master: master/a02c36fb-a6d0-4ca9-8178-8a929087d32e (2) (00:00:43)
Task 120 | 23:48:22 | Updating instance master: master/ba1fbbda-5080-4094-b8c8-671d9abb34a6 (0) (canary) (00:01:05)
Task 120 | 23:49:27 | Updating instance master: master/c5dd8b42-ee9e-4607-ad44-152144a7eebf (1) (00:01:22)
Task 120 | 23:50:49 | Updating instance worker: worker/43ac3278-5a5b-4a9b-a782-e3b52254f98d (0) (canary) (00:00:33)
Task 120 | 23:51:22 | Updating instance worker: worker/41823365-c567-4acc-a2c3-d3df897ee8b3 (1) (00:00:35)
Task 120 | 23:51:57 | Updating instance worker: worker/598eec97-2305-4eac-a784-2fee04d6121b (2) (00:00:41)

Same behavior is observed for adding a new master node:

→ bosh deploy -d cfcr ${KD}/manifests/cfcr.yml -o ${KD}/manifests/ops-files/iaas/aws/cloud-provider.yml -o cfcr-ops.yml -l <(bbl outputs)
Using environment 'https://10.0.0.6:25555' as client 'admin'

Using deployment 'cfcr'

Release 'bpm/0.12.3' already exists.

Release 'docker/32.0.0' already exists.

Release 'cfcr-etcd/1.5.0' already exists.

Release 'bosh-dns/1.8.0' already exists.

instance_groups:

  • name: master
  • instances: 2
  • instances: 3

Continue? [yN]: y

Task 234

Task 234 | 00:10:30 | Preparing deployment: Preparing deployment (00:00:06)
Task 234 | 00:11:07 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 234 | 00:11:07 | Creating missing vms: master/09e9e7e6-7e1f-4d51-b591-1c0f1e8ca12d (2) (00:01:27)
Task 234 | 00:12:34 | Updating instance master: master/ba1fbbda-5080-4094-b8c8-671d9abb34a6 (0) (canary) (00:01:07)
Task 234 | 00:13:41 | Updating instance master: master/c5dd8b42-ee9e-4607-ad44-152144a7eebf (1) (00:01:05)
Task 234 | 00:14:46 | Updating instance master: master/09e9e7e6-7e1f-4d51-b591-1c0f1e8ca12d (2) (00:01:27)
Task 234 | 00:16:13 | Updating instance worker: worker/43ac3278-5a5b-4a9b-a782-e3b52254f98d (0) (canary) (00:00:33)
Task 234 | 00:16:46 | Updating instance worker: worker/41823365-c567-4acc-a2c3-d3df897ee8b3 (1) (00:00:42)
Task 234 | 00:17:28 | Updating instance worker: worker/598eec97-2305-4eac-a784-2fee04d6121b (2) (00:00:35)

Task 234 Started Sat Sep 15 00:10:30 UTC 2018
Task 234 Finished Sat Sep 15 00:18:03 UTC 2018
Task 234 Duration 00:07:33
Task 234 done

Anything else we need to know?:
Adding worker nodes is working fine, it is not restarting the whole cluster.
kube-deployment : v0.21.0

Environment:

  • Deployment Info (bosh -d <deployment> deployment):
    bosh deploy -d cfcr ${KD}/manifests/cfcr.yml -o ${KD}/manifests/ops-files/iaas/aws/cloud-provider.yml -o cfcr-ops.yml -l <(bbl outputs)

ame Release(s) Stemcell(s) Config(s) Team(s)
cfcr bosh-dns/1.8.0 bosh-aws-xen-hvm-ubuntu-xenial-go_agent/97.16 1 cloud/default -
bpm/0.12.3 2 runtime/dns
cfcr-etcd/1.5.0
docker/32.0.0
kubo/0.21.0

  • Environment Info (bosh -e <environment> environment):
    Name bosh-cfcr-lgunta
    UUID 58cd4562-b14b-4310-aaa3-bd1582250f34
    Version 267.5.0 (00000000)
    CPI aws_cpi
    Features compiled_package_cache: disabled
    config_server: enabled
    dns: disabled
    snapshots: disabled
    User admin

  • Kubernetes version (kubectl version):

Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-10T11:44:36Z", GoVersion:"go1.11", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:08:19Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

  • Cloud provider (e.g. aws, gcp, vsphere):
    AWS
@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/160542355

The labels on this github issue will be updated when the story is started.

@alex-slynko
Copy link
Member

Hi @lgunta2018

This is intended behaviour right now with current way CFCR is deployed.
Each master is collocated with etcd node.
Each worker uses flannel networking that is connected to the etcd. Flannel is reconfigured each time etcd node is added or removed.
Bosh team is considering improving the flow so that only single job will get restarted, but this is not in their short or mid-term plans.

Out of curiosity, what was the reason to scale masters down to 2 VMs?

@lgunta2018
Copy link
Author

lgunta2018 commented Sep 17, 2018

Hi @alex-slynko
Thanks for the quick update, just to see how the cluster reacts when we remove a master node. So, if we lose the master node due to some issue, it means it restart the whole cluster to bring a new master node in place of the existing master node?
Why do we need to restart the whole cluster when we adding a master node to the cluster?

@lgunta2018
Copy link
Author

lgunta2018 commented Sep 17, 2018

Hey @alex-slynko ,
Thank you for the response, however we would like our apps deployed in the workers not to be restarted while handling a master node scale down/up/. Our expectation was borne out of the fact that though this is a control plane update, it shouldn't need to restart other parts of the control plane like the workers. Would it be possible to externalize the etcd from the master nodes (create etcd cluster with LB) and would that solve this issue of a rolling restart across the cluster? All our workloads are not cloud native apps and high have stateful sessions which we would like to keep from restarting unless necessary to reduce downtime on those apps.

@youreddy
Copy link
Contributor

@lgunta2018,

if we lose the master node due to some issue, it means it restart the whole cluster to bring a new master node in place of the existing master node?

It depends, if bosh resurrector has noticed one of your master vms has gone missing then it will recreate it without touching the workers. Also, say your manifest states master instances: 3 but one of the masters is failing, in this case bosh hasn't seen a change to the instances count and it won't re-template the jobs and touch the workers

Why do we need to restart the whole cluster when we adding a master node to the cluster?

Basically what Oleksandr said above. flanneld is running on every vm in the cluster. The flannel job consumes etcd bosh links and iterates through the list of etcd instances. So every time the number of etcds change (aka the number of masters since they're colocated), the flannel job will get re-templated on every vm and bosh will update the instances.

Would it be possible to externalize the etcd from the master nodes (create etcd cluster with LB) and would that solve this issue of a rolling restart across the cluster?

Externalizing etcd would be a larger architectural change and may not necessarily solve the problem. We might be able to solve this in simpler way by not iterating through the list of etcd links and configuring flanneld to use the etcd bosh dns entry. We discussed this approach this morning but we need to spike it out because there's probably more than just those etcd links alone. There's a spike in our public tracker if you want to follow progress.

@alex-slynko
Copy link
Member

Hi @lgunta2018

  1. If you want to test how the cluster react to removing master node, you need to use delete-vm or stop command. I wrote a tiny blog post where I tried to explain the difference.
  2. We created a spike to investigate it more, but I can't guarantee we will work on it or fix it soon. It might require big redesign for CFCR or some architectural Bosh changes. We might prioritize it if we see a business value in improving it.

There are two workarounds for this I can think of, but we haven't tried them

  • create external load balancer and use manual links to pass traffic through this load balancer.
  • split deployments into two: control plane and workers and use cross-deployment links

Feel free to ask more questions in Slack channel or here.

@lgunta2018
Copy link
Author

Thanks, @youreddy && @alex-slynko for looking into this issue. I will try to use your workarounds to solve this problem for now. But its very useful feature in my case. I hope we will see some traction in this regard. I will let you know if the workarounds do not work for me.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants