-
Notifications
You must be signed in to change notification settings - Fork 462
Bug 1868158: gcp, azure: Handle azure vips similar to GCP #2011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,75 @@ | ||
| # apiserver-watcher | ||
|
|
||
| ## Background | ||
|
|
||
| Some cloud provider load balancers need special handling for hairpin scenarios. | ||
| Because default OpenShift installations are "self-driving", i.e. the control | ||
| plane is hosted as part of the cluster, we rely on hairpin extensively. | ||
|
|
||
|
|
||
| ``` | ||
| +---------------+ | ||
| | | +-----------------+ | ||
| | +---------+ | | | | ||
| | | kubelet +-------------> layer-3 | | ||
| | +---------+ | | load balancer | | ||
| | | +----+ | | ||
| | +---------+ | | +-----------------+ | ||
| | |apiserver+<-------+ | ||
| | +---------+ | | ||
| | | | ||
| +---------------+ | ||
| ``` | ||
|
|
||
| We have iptables workarounds to fix these scenarios, but they need to know when | ||
| the local apiserver is up or down. Hence, the apiserver-watcher. | ||
|
|
||
| ### GCP | ||
|
|
||
| Google cloud load balancer is a L3LB that is special. It doesn't do DNAT; instead, it | ||
| just redirects traffic to backends and preserves the VIP as the destination IP. | ||
|
|
||
| So, an agent exists on the node. It programs the node (either via iptables or routing tables) to | ||
| accept traffic destined for the VIP. However, this has a problem: all hairpin traffic | ||
| to the balanced servce is *always* handled by that backend, even if it is down | ||
| or otherwise out of rotation. | ||
|
|
||
| We want to withdraw the internal API service from google-routes redirection when | ||
| it's down, or else the node (i.e. kubelet) loses access to the apiserver VIP | ||
| and becomes unmanagable. | ||
|
|
||
|
|
||
| See `templates/master/00-master/gcp/files/opt-libexec-openshift-gcp-routes-sh.yaml` | ||
|
|
||
| ### Azure | ||
|
||
|
|
||
| Azure L3LB does do DNAT, which presents a different problem: we can never reply | ||
| to hairpinned traffic. The problem looks something like this: | ||
|
|
||
| ``` | ||
| TCP SYN master-1 -> vip outgoing | ||
| (load balance happens) | ||
| TCP SYN master1 -> master1 incoming | ||
|
|
||
| (server socket accepts, reply generated) | ||
| TCP SYN, ACK master1 -> master1 | ||
| ``` | ||
|
|
||
| This last packet is dropped, because the client socket is expecting a SYN,ACK with | ||
| a source IP of the VIP, not master1. | ||
|
|
||
| So, when the apiserver is up, we want to direct all local traffic to ourselves. | ||
| When it is down, we would like it to go over the load balancer. | ||
|
|
||
| See `templates/master/00-master/azure/files/opt-libexec-openshift-azure-routes-sh.yaml` | ||
|
|
||
| ## Functionality | ||
|
|
||
| The apiserver-watcher is installed on all the masters and monitors the | ||
| apiserver process /readyz. | ||
|
|
||
| When /readyz fails, write `/run/cloud-routes/$VIP.down`, which tells the | ||
| provider-specific service to update iptables rules. When it is up, write `$VIP.up`. | ||
|
|
||
| Separately, a provider-specific process watches that directory and, as necessary, | ||
| updates iptables rules accordingly. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,15 +4,14 @@ import ( | |
| "crypto/tls" | ||
| "flag" | ||
| "fmt" | ||
| "io/ioutil" | ||
| "net" | ||
| "net/http" | ||
| "net/url" | ||
| "os" | ||
| "os/exec" | ||
| "os/signal" | ||
| "path" | ||
| "sync" | ||
| "syscall" | ||
| "time" | ||
|
|
||
| health "github.com/InVisionApp/go-health" | ||
|
|
@@ -26,41 +25,31 @@ import ( | |
| var ( | ||
| runCmd = &cobra.Command{ | ||
| Use: "run", | ||
| Short: "Runs the gcp-routes-controller", | ||
| Short: "Runs the apiserver-watcher", | ||
| Long: "", | ||
| RunE: runRunCmd, | ||
| } | ||
|
|
||
| runOpts struct { | ||
| gcpRoutesService string | ||
| rootMount string | ||
| healthCheckURL string | ||
| vip string | ||
| rootMount string | ||
| healthCheckURL string | ||
| vip string | ||
| } | ||
| ) | ||
|
|
||
| // downFileDir is the directory in which gcp-routes will look for a flag-file that | ||
| // indicates the route to the VIP should be withdrawn. | ||
| const downFileDir = "/run/gcp-routes" | ||
| const downFileDir = "/run/cloud-routes" | ||
|
|
||
| func init() { | ||
| rootCmd.AddCommand(runCmd) | ||
| runCmd.PersistentFlags().StringVar(&runOpts.gcpRoutesService, "gcp-routes-service", "openshift-gcp-routes.service", "The name for the service controlling gcp routes on host") | ||
| runCmd.PersistentFlags().StringVar(&runOpts.rootMount, "root-mount", "/rootfs", "where the nodes root filesystem is mounted for writing down files or chrooting.") | ||
| runCmd.PersistentFlags().StringVar(&runOpts.healthCheckURL, "health-check-url", "", "HTTP(s) URL for the health check") | ||
| runCmd.PersistentFlags().StringVar(&runOpts.vip, "vip", "", "The VIP to remove if the health check fails. Determined from URL if not provided") | ||
| } | ||
|
|
||
| type downMode int | ||
|
|
||
| const ( | ||
| modeStopService = iota | ||
| modeDownFile | ||
| ) | ||
|
|
||
| type handler struct { | ||
| mode downMode | ||
| vip string | ||
| vip string | ||
| } | ||
|
|
||
| func runRunCmd(cmd *cobra.Command, args []string) error { | ||
|
|
@@ -104,6 +93,7 @@ func runRunCmd(cmd *cobra.Command, args []string) error { | |
| // as a backend in the load-balancer, and add routes before we've been | ||
| // re-added. | ||
| // see openshift/installer/data/data/gcp/network/lb-private.tf | ||
| // see openshift/installer/data/data/azure/vnet/internal-lb.tf | ||
| tracker := &healthTracker{ | ||
| state: unknownTrackerState, | ||
| ErrCh: errCh, | ||
|
|
@@ -130,7 +120,7 @@ func runRunCmd(cmd *cobra.Command, args []string) error { | |
| signal.Notify(c, os.Interrupt) | ||
| go func() { | ||
| for sig := range c { | ||
| glog.Infof("Signal %s received: shutting down gcp routes service", sig) | ||
| glog.Infof("Signal %s received: treating service as down", sig) | ||
| if err := handler.onFailure(); err != nil { | ||
| glog.Infof("Failed to mark service down on signal: %s", err) | ||
| } | ||
|
|
@@ -151,83 +141,61 @@ func runRunCmd(cmd *cobra.Command, args []string) error { | |
| func newHandler(uri *url.URL) (*handler, error) { | ||
| h := handler{} | ||
|
|
||
| // determine mode: if /run/gcp-routes exists, we can us the downfile mode | ||
| realPath := path.Join(runOpts.rootMount, downFileDir) | ||
| fi, err := os.Stat(realPath) | ||
| if err == nil && fi.IsDir() { | ||
| glog.Infof("%s exists, starting in downfile mode", realPath) | ||
| h.mode = modeDownFile | ||
| if runOpts.vip != "" { | ||
| h.vip = runOpts.vip | ||
| } else { | ||
| glog.Infof("%s not accessible, will stop gcp-routes.service on health failure", realPath) | ||
| h.mode = modeStopService | ||
| } | ||
|
|
||
| // if StopService mode and rootfs specified, chroot | ||
| if h.mode == modeStopService && runOpts.rootMount != "" { | ||
| glog.Infof(`Calling chroot("%s")`, runOpts.rootMount) | ||
| if err := syscall.Chroot(runOpts.rootMount); err != nil { | ||
| return nil, fmt.Errorf("unable to chroot to %s: %s", runOpts.rootMount, err) | ||
| } | ||
|
|
||
| glog.V(2).Infof("Moving to / inside the chroot") | ||
| if err := os.Chdir("/"); err != nil { | ||
| return nil, fmt.Errorf("unable to change directory to /: %s", err) | ||
| addrs, err := net.LookupHost(uri.Hostname()) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to lookup host %s: %v", uri.Hostname(), err) | ||
| } | ||
| } | ||
|
|
||
| // otherwise, resolve vip | ||
| if h.mode == modeDownFile { | ||
| if runOpts.vip != "" { | ||
| h.vip = runOpts.vip | ||
| } else { | ||
| addrs, err := net.LookupHost(uri.Hostname()) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to lookup host %s: %v", uri.Hostname(), err) | ||
| } | ||
| if len(addrs) != 1 { | ||
| return nil, fmt.Errorf("hostname %s has %d addresses, expected 1 - aborting", uri.Hostname(), len(addrs)) | ||
| } | ||
| h.vip = addrs[0] | ||
| glog.Infof("Using VIP %s", h.vip) | ||
| if len(addrs) != 1 { | ||
| return nil, fmt.Errorf("hostname %s has %d addresses, expected 1 - aborting", uri.Hostname(), len(addrs)) | ||
|
||
| } | ||
| h.vip = addrs[0] | ||
| glog.Infof("Using VIP %s", h.vip) | ||
squeed marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| } | ||
|
|
||
| return &h, nil | ||
| } | ||
|
|
||
| // onFailure: either stop the routes service, or write downfile | ||
| func (h *handler) onFailure() error { | ||
| if h.mode == modeDownFile { | ||
| downFile := path.Join(runOpts.rootMount, downFileDir, fmt.Sprintf("%s.down", h.vip)) | ||
| fp, err := os.OpenFile(downFile, os.O_CREATE, 0644) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to create downfile (%s): %v", downFile, err) | ||
| } | ||
| _ = fp.Close() | ||
| glog.Infof("healthcheck failed, created downfile %s", downFile) | ||
| } else { | ||
| if err := exec.Command("systemctl", "stop", runOpts.gcpRoutesService).Run(); err != nil { | ||
| return fmt.Errorf("Failed to terminate gcp routes service %v", err) | ||
| } | ||
| glog.Infof("healthcheck failed, stopped %s", runOpts.gcpRoutesService) | ||
| if err := writeVipStateFile(h.vip, "down"); err != nil { | ||
| return err | ||
| } | ||
| glog.Infof("healthcheck failed, created downfile %s.down", h.vip) | ||
| if err := removeVipStateFile(h.vip, "up"); err != nil { | ||
| return err | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| // onSuccess: either start routes service, or remove down file | ||
| func (h *handler) onSuccess() error { | ||
| if h.mode == modeDownFile { | ||
| downFile := path.Join(runOpts.rootMount, downFileDir, fmt.Sprintf("%s.down", h.vip)) | ||
| err := os.Remove(downFile) | ||
| if err != nil && !os.IsNotExist(err) { | ||
| return fmt.Errorf("failed to remove downfile (%s): %v", downFile, err) | ||
| } | ||
| glog.Infof("healthcheck succeeded, removed downfile %s", downFile) | ||
| } else { | ||
| if err := exec.Command("systemctl", "start", runOpts.gcpRoutesService).Run(); err != nil { | ||
| return fmt.Errorf("Failed to terminate gcp routes service %v", err) | ||
| } | ||
| glog.Infof("healthcheck succeeded, started %s", runOpts.gcpRoutesService) | ||
| if err := removeVipStateFile(h.vip, "down"); err != nil { | ||
| return err | ||
| } | ||
| glog.Infof("healthcheck succeeded, removed downfile %s.down", h.vip) | ||
| if err := writeVipStateFile(h.vip, "up"); err != nil { | ||
| return err | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| func writeVipStateFile(vip, state string) error { | ||
| file := path.Join(runOpts.rootMount, downFileDir, fmt.Sprintf("%s.%s", vip, state)) | ||
cgwalters marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| err := ioutil.WriteFile(file, nil, 0644) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to create file (%s): %v", file, err) | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| func removeVipStateFile(vip, state string) error { | ||
| file := path.Join(runOpts.rootMount, downFileDir, fmt.Sprintf("%s.%s", vip, state)) | ||
| err := os.Remove(file) | ||
| if err != nil && !os.IsNotExist(err) { | ||
| return fmt.Errorf("failed to remove file (%s): %v", file, err) | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,17 +1,17 @@ | ||
| mode: 0644 | ||
| path: "/etc/kubernetes/manifests/gcp-routes-controller.yaml" | ||
| path: "/etc/kubernetes/manifests/apiserver-watcher.yaml" | ||
|
||
| contents: | ||
| inline: | | ||
| apiVersion: v1 | ||
| kind: Pod | ||
| metadata: | ||
| name: gcp-routes-controller | ||
| name: apiserver-watcher | ||
| namespace: kube-system | ||
| spec: | ||
| containers: | ||
| - name: gcp-routes-controller | ||
| image: "{{.Images.gcpRoutesControllerKey}}" | ||
| command: ["gcp-routes-controller"] | ||
| - name: apiserver-watcher | ||
| image: "{{.Images.apiServerWatcherKey}}" | ||
| command: ["apiserver-watcher"] | ||
| args: | ||
| - "run" | ||
| - "--health-check-url={{.Infra.Status.APIServerInternalURL}}/readyz" | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ascii figures 😍