You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Google Cloud network load balancers map a single regional IP to a target pool of health checked nodes. From a load balanced node, a Google NLB bug results in requests always being sent to the node itself, even if the health checks are failing.
As a result, launching a multi-controller cluster (i.e. controller_count = 3) will create 3 controllers, run bootkube start on the first, and the other 2 controllers will never be able to connect to the bootstrapped controller because the network load balancer routes their requests to themselves, even if you write a proper health check based on the apiserver availability on each node. In effect, you will only ever see the first controller in kubectl get nodes.
Workarounds
There are several workarounds, but the tradeoffs are poor.
Kubernetes requires a single DNS FQDN, create DNS records for each controller. This is effectively the same round-robin DNS setup used on platforms that don't support load balancing. Bleh.
SSH to additional controllers, temporarily add an /etc/hosts record to point them directly at the 0th controller to register and bootstrap themselves. Then remove the record. Manual.
Use a Google Cloud global TCP load balancer, instance group, etc. This creates a lot more infrastructure, slows down provisioning time, introduces timeouts to kubectl log and exec commands, and isn't ideal. You can check the google-load-balancing branch, but note that I don't expect to merge it, its below the bar.
Recommendation
For now, I recommend folks keep deploying single controller clusters on Google Cloud.
This only affects Google Cloud. Multi-controller setups on all other platforms are supported.
The text was updated successfully, but these errors were encountered:
Bug
Environment
Problem
Google Cloud network load balancers map a single regional IP to a target pool of health checked nodes. From a load balanced node, a Google NLB bug results in requests always being sent to the node itself, even if the health checks are failing.
As a result, launching a multi-controller cluster (i.e.
controller_count = 3
) will create 3 controllers, run bootkube start on the first, and the other 2 controllers will never be able to connect to the bootstrapped controller because the network load balancer routes their requests to themselves, even if you write a proper health check based on the apiserver availability on each node. In effect, you will only ever see the first controller inkubectl get nodes
.Workarounds
There are several workarounds, but the tradeoffs are poor.
Recommendation
For now, I recommend folks keep deploying single controller clusters on Google Cloud.
This only affects Google Cloud. Multi-controller setups on all other platforms are supported.
The text was updated successfully, but these errors were encountered: