Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Google Cloud, multi-controller setups only have 1 controller #54

Closed
dghubble opened this issue Nov 8, 2017 · 0 comments
Closed

On Google Cloud, multi-controller setups only have 1 controller #54

dghubble opened this issue Nov 8, 2017 · 0 comments

Comments

@dghubble
Copy link
Member

dghubble commented Nov 8, 2017

Bug

Environment

  • Platform: google-cloud
  • OS: container-linux

Problem

Google Cloud network load balancers map a single regional IP to a target pool of health checked nodes. From a load balanced node, a Google NLB bug results in requests always being sent to the node itself, even if the health checks are failing.

As a result, launching a multi-controller cluster (i.e. controller_count = 3) will create 3 controllers, run bootkube start on the first, and the other 2 controllers will never be able to connect to the bootstrapped controller because the network load balancer routes their requests to themselves, even if you write a proper health check based on the apiserver availability on each node. In effect, you will only ever see the first controller in kubectl get nodes.

Workarounds

There are several workarounds, but the tradeoffs are poor.

  1. Kubernetes requires a single DNS FQDN, create DNS records for each controller. This is effectively the same round-robin DNS setup used on platforms that don't support load balancing. Bleh.
  2. SSH to additional controllers, temporarily add an /etc/hosts record to point them directly at the 0th controller to register and bootstrap themselves. Then remove the record. Manual.
  3. Use a Google Cloud global TCP load balancer, instance group, etc. This creates a lot more infrastructure, slows down provisioning time, introduces timeouts to kubectl log and exec commands, and isn't ideal. You can check the google-load-balancing branch, but note that I don't expect to merge it, its below the bar.

Recommendation

For now, I recommend folks keep deploying single controller clusters on Google Cloud.

This only affects Google Cloud. Multi-controller setups on all other platforms are supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant