On Google Cloud, multi-controller setups only have 1 controller #54

dghubble · 2017-11-08T03:36:22Z

Bug

Environment

Platform: google-cloud
OS: container-linux

Problem

Google Cloud network load balancers map a single regional IP to a target pool of health checked nodes. From a load balanced node, a Google NLB bug results in requests always being sent to the node itself, even if the health checks are failing.

As a result, launching a multi-controller cluster (i.e. controller_count = 3) will create 3 controllers, run bootkube start on the first, and the other 2 controllers will never be able to connect to the bootstrapped controller because the network load balancer routes their requests to themselves, even if you write a proper health check based on the apiserver availability on each node. In effect, you will only ever see the first controller in kubectl get nodes.

Workarounds

There are several workarounds, but the tradeoffs are poor.

Kubernetes requires a single DNS FQDN, create DNS records for each controller. This is effectively the same round-robin DNS setup used on platforms that don't support load balancing. Bleh.
SSH to additional controllers, temporarily add an /etc/hosts record to point them directly at the 0th controller to register and bootstrap themselves. Then remove the record. Manual.
Use a Google Cloud global TCP load balancer, instance group, etc. This creates a lot more infrastructure, slows down provisioning time, introduces timeouts to kubectl log and exec commands, and isn't ideal. You can check the google-load-balancing branch, but note that I don't expect to merge it, its below the bar.

Recommendation

For now, I recommend folks keep deploying single controller clusters on Google Cloud.

This only affects Google Cloud. Multi-controller setups on all other platforms are supported.

The text was updated successfully, but these errors were encountered:

dghubble added kind/bug platform/google-cloud labels Nov 8, 2017

dghubble mentioned this issue Apr 15, 2018

Switch GCP network lb to global TCP proxy lb #190

Merged

dghubble closed this as completed in #190 Apr 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On Google Cloud, multi-controller setups only have 1 controller #54

On Google Cloud, multi-controller setups only have 1 controller #54

dghubble commented Nov 8, 2017 •

edited

Loading

On Google Cloud, multi-controller setups only have 1 controller #54

On Google Cloud, multi-controller setups only have 1 controller #54

Comments

dghubble commented Nov 8, 2017 • edited Loading

Bug

Environment

Problem

Workarounds

Recommendation

dghubble commented Nov 8, 2017 •

edited

Loading