Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
MCO_COMPONENTS = daemon controller server operator
EXTRA_COMPONENTS = gcp-routes-controller
EXTRA_COMPONENTS = apiserver-watcher
ALL_COMPONENTS = $(patsubst %,machine-config-%,$(MCO_COMPONENTS)) $(EXTRA_COMPONENTS)
PREFIX ?= /usr
GO111MODULE?=on
Expand Down Expand Up @@ -29,7 +29,7 @@ clean:

# Build machine configs. Intended to be called via another target.
# Example:
# make _build-gcp-routes-controller
# make _build-machine-config-operator
_build-%:
WHAT=$* hack/build-go.sh

Expand Down
75 changes: 75 additions & 0 deletions cmd/apiserver-watcher/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# apiserver-watcher

## Background

Some cloud provider load balancers need special handling for hairpin scenarios.
Because default OpenShift installations are "self-driving", i.e. the control
plane is hosted as part of the cluster, we rely on hairpin extensively.


```
+---------------+
| | +-----------------+
| +---------+ | | |
| | kubelet +-------------> layer-3 |
| +---------+ | | load balancer |
| | +----+ |
| +---------+ | | +-----------------+
| |apiserver+<-------+
| +---------+ |
| |
+---------------+
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ascii figures 😍


We have iptables workarounds to fix these scenarios, but they need to know when
the local apiserver is up or down. Hence, the apiserver-watcher.

### GCP

Google cloud load balancer is a L3LB that is special. It doesn't do DNAT; instead, it
just redirects traffic to backends and preserves the VIP as the destination IP.

So, an agent exists on the node. It programs the node (either via iptables or routing tables) to
accept traffic destined for the VIP. However, this has a problem: all hairpin traffic
to the balanced servce is *always* handled by that backend, even if it is down
or otherwise out of rotation.

We want to withdraw the internal API service from google-routes redirection when
it's down, or else the node (i.e. kubelet) loses access to the apiserver VIP
and becomes unmanagable.


See `templates/master/00-master/gcp/files/opt-libexec-openshift-gcp-routes-sh.yaml`

### Azure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to also add some platform specific descriptions to how the service operates on that platform, so its more clear how differences are handled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean exactly; apiserver-watcher is identical on azure and gzp. I did add pointers to the cloud-provider-specific scripts, so maybe that's helpful?


Azure L3LB does do DNAT, which presents a different problem: we can never reply
to hairpinned traffic. The problem looks something like this:

```
TCP SYN master-1 -> vip outgoing
(load balance happens)
TCP SYN master1 -> master1 incoming

(server socket accepts, reply generated)
TCP SYN, ACK master1 -> master1
```

This last packet is dropped, because the client socket is expecting a SYN,ACK with
a source IP of the VIP, not master1.

So, when the apiserver is up, we want to direct all local traffic to ourselves.
When it is down, we would like it to go over the load balancer.

See `templates/master/00-master/azure/files/opt-libexec-openshift-azure-routes-sh.yaml`

## Functionality

The apiserver-watcher is installed on all the masters and monitors the
apiserver process /readyz.

When /readyz fails, write `/run/cloud-routes/$VIP.down`, which tells the
provider-specific service to update iptables rules. When it is up, write `$VIP.up`.

Separately, a provider-specific process watches that directory and, as necessary,
updates iptables rules accordingly.
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,13 @@ import (
)

const (
componentName = "gcp-routes-controller"
componentName = "apisever-watcher"
)

var (
rootCmd = &cobra.Command{
Use: componentName,
Short: "Controls the gcp-routes.service on RHCOS hosts based on health checks",
Short: "Monitors the local apiserver and writes cloud-routes downfiles",
Long: "",
SilenceErrors: true,
SilenceUsage: true,
Expand Down
128 changes: 48 additions & 80 deletions cmd/gcp-routes-controller/run.go → cmd/apiserver-watcher/run.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,14 @@ import (
"crypto/tls"
"flag"
"fmt"
"io/ioutil"
"net"
"net/http"
"net/url"
"os"
"os/exec"
"os/signal"
"path"
"sync"
"syscall"
"time"

health "github.com/InVisionApp/go-health"
Expand All @@ -26,41 +25,31 @@ import (
var (
runCmd = &cobra.Command{
Use: "run",
Short: "Runs the gcp-routes-controller",
Short: "Runs the apiserver-watcher",
Long: "",
RunE: runRunCmd,
}

runOpts struct {
gcpRoutesService string
rootMount string
healthCheckURL string
vip string
rootMount string
healthCheckURL string
vip string
}
)

// downFileDir is the directory in which gcp-routes will look for a flag-file that
// indicates the route to the VIP should be withdrawn.
const downFileDir = "/run/gcp-routes"
const downFileDir = "/run/cloud-routes"

func init() {
rootCmd.AddCommand(runCmd)
runCmd.PersistentFlags().StringVar(&runOpts.gcpRoutesService, "gcp-routes-service", "openshift-gcp-routes.service", "The name for the service controlling gcp routes on host")
runCmd.PersistentFlags().StringVar(&runOpts.rootMount, "root-mount", "/rootfs", "where the nodes root filesystem is mounted for writing down files or chrooting.")
runCmd.PersistentFlags().StringVar(&runOpts.healthCheckURL, "health-check-url", "", "HTTP(s) URL for the health check")
runCmd.PersistentFlags().StringVar(&runOpts.vip, "vip", "", "The VIP to remove if the health check fails. Determined from URL if not provided")
}

type downMode int

const (
modeStopService = iota
modeDownFile
)

type handler struct {
mode downMode
vip string
vip string
}

func runRunCmd(cmd *cobra.Command, args []string) error {
Expand Down Expand Up @@ -104,6 +93,7 @@ func runRunCmd(cmd *cobra.Command, args []string) error {
// as a backend in the load-balancer, and add routes before we've been
// re-added.
// see openshift/installer/data/data/gcp/network/lb-private.tf
// see openshift/installer/data/data/azure/vnet/internal-lb.tf
tracker := &healthTracker{
state: unknownTrackerState,
ErrCh: errCh,
Expand All @@ -130,7 +120,7 @@ func runRunCmd(cmd *cobra.Command, args []string) error {
signal.Notify(c, os.Interrupt)
go func() {
for sig := range c {
glog.Infof("Signal %s received: shutting down gcp routes service", sig)
glog.Infof("Signal %s received: treating service as down", sig)
if err := handler.onFailure(); err != nil {
glog.Infof("Failed to mark service down on signal: %s", err)
}
Expand All @@ -151,83 +141,61 @@ func runRunCmd(cmd *cobra.Command, args []string) error {
func newHandler(uri *url.URL) (*handler, error) {
h := handler{}

// determine mode: if /run/gcp-routes exists, we can us the downfile mode
realPath := path.Join(runOpts.rootMount, downFileDir)
fi, err := os.Stat(realPath)
if err == nil && fi.IsDir() {
glog.Infof("%s exists, starting in downfile mode", realPath)
h.mode = modeDownFile
if runOpts.vip != "" {
h.vip = runOpts.vip
} else {
glog.Infof("%s not accessible, will stop gcp-routes.service on health failure", realPath)
h.mode = modeStopService
}

// if StopService mode and rootfs specified, chroot
if h.mode == modeStopService && runOpts.rootMount != "" {
glog.Infof(`Calling chroot("%s")`, runOpts.rootMount)
if err := syscall.Chroot(runOpts.rootMount); err != nil {
return nil, fmt.Errorf("unable to chroot to %s: %s", runOpts.rootMount, err)
}

glog.V(2).Infof("Moving to / inside the chroot")
if err := os.Chdir("/"); err != nil {
return nil, fmt.Errorf("unable to change directory to /: %s", err)
addrs, err := net.LookupHost(uri.Hostname())
if err != nil {
return nil, fmt.Errorf("failed to lookup host %s: %v", uri.Hostname(), err)
}
}

// otherwise, resolve vip
if h.mode == modeDownFile {
if runOpts.vip != "" {
h.vip = runOpts.vip
} else {
addrs, err := net.LookupHost(uri.Hostname())
if err != nil {
return nil, fmt.Errorf("failed to lookup host %s: %v", uri.Hostname(), err)
}
if len(addrs) != 1 {
return nil, fmt.Errorf("hostname %s has %d addresses, expected 1 - aborting", uri.Hostname(), len(addrs))
}
h.vip = addrs[0]
glog.Infof("Using VIP %s", h.vip)
if len(addrs) != 1 {
return nil, fmt.Errorf("hostname %s has %d addresses, expected 1 - aborting", uri.Hostname(), len(addrs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when will this happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This isn't a behavior change, just removing some dead code and a corresponding re-indentation)

It could happen if we somehow switch to RRDNS.

}
h.vip = addrs[0]
glog.Infof("Using VIP %s", h.vip)
}

return &h, nil
}

// onFailure: either stop the routes service, or write downfile
func (h *handler) onFailure() error {
if h.mode == modeDownFile {
downFile := path.Join(runOpts.rootMount, downFileDir, fmt.Sprintf("%s.down", h.vip))
fp, err := os.OpenFile(downFile, os.O_CREATE, 0644)
if err != nil {
return fmt.Errorf("failed to create downfile (%s): %v", downFile, err)
}
_ = fp.Close()
glog.Infof("healthcheck failed, created downfile %s", downFile)
} else {
if err := exec.Command("systemctl", "stop", runOpts.gcpRoutesService).Run(); err != nil {
return fmt.Errorf("Failed to terminate gcp routes service %v", err)
}
glog.Infof("healthcheck failed, stopped %s", runOpts.gcpRoutesService)
if err := writeVipStateFile(h.vip, "down"); err != nil {
return err
}
glog.Infof("healthcheck failed, created downfile %s.down", h.vip)
if err := removeVipStateFile(h.vip, "up"); err != nil {
return err
}
return nil
}

// onSuccess: either start routes service, or remove down file
func (h *handler) onSuccess() error {
if h.mode == modeDownFile {
downFile := path.Join(runOpts.rootMount, downFileDir, fmt.Sprintf("%s.down", h.vip))
err := os.Remove(downFile)
if err != nil && !os.IsNotExist(err) {
return fmt.Errorf("failed to remove downfile (%s): %v", downFile, err)
}
glog.Infof("healthcheck succeeded, removed downfile %s", downFile)
} else {
if err := exec.Command("systemctl", "start", runOpts.gcpRoutesService).Run(); err != nil {
return fmt.Errorf("Failed to terminate gcp routes service %v", err)
}
glog.Infof("healthcheck succeeded, started %s", runOpts.gcpRoutesService)
if err := removeVipStateFile(h.vip, "down"); err != nil {
return err
}
glog.Infof("healthcheck succeeded, removed downfile %s.down", h.vip)
if err := writeVipStateFile(h.vip, "up"); err != nil {
return err
}
return nil
}

func writeVipStateFile(vip, state string) error {
file := path.Join(runOpts.rootMount, downFileDir, fmt.Sprintf("%s.%s", vip, state))
err := ioutil.WriteFile(file, nil, 0644)
if err != nil {
return fmt.Errorf("failed to create file (%s): %v", file, err)
}
return nil
}

func removeVipStateFile(vip, state string) error {
file := path.Join(runOpts.rootMount, downFileDir, fmt.Sprintf("%s.%s", vip, state))
err := os.Remove(file)
if err != nil && !os.IsNotExist(err) {
return fmt.Errorf("failed to remove file (%s): %v", file, err)
}
return nil
}
Expand Down
File renamed without changes.
23 changes: 0 additions & 23 deletions cmd/gcp-routes-controller/README.md

This file was deleted.

4 changes: 2 additions & 2 deletions pkg/controller/template/constants.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ const (
// MachineConfigOperatorKey is our own image used by e.g. machine-config-daemon-pull.service
MachineConfigOperatorKey string = "machineConfigOperator"

// GCPRoutesControllerKey is the key that references the gcp-routes-controller image in the controller
GCPRoutesControllerKey string = "gcpRoutesControllerKey"
// APIServerWatcherKey is the key that references the apiserver-watcher image
APIServerWatcherKey string = "apiServerWatcherKey"

// InfraImageKey is the key that references the infra image in the controller for crio.conf
InfraImageKey string = "infraImageKey"
Expand Down
2 changes: 1 addition & 1 deletion pkg/operator/bootstrap.go
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ func RenderBootstrap(
spec.Images = map[string]string{
templatectrl.MachineConfigOperatorKey: imgs.MachineConfigOperator,

templatectrl.GCPRoutesControllerKey: imgs.MachineConfigOperator,
templatectrl.APIServerWatcherKey: imgs.MachineConfigOperator,
templatectrl.InfraImageKey: imgs.InfraImage,
templatectrl.KeepalivedKey: imgs.Keepalived,
templatectrl.CorednsKey: imgs.Coredns,
Expand Down
2 changes: 1 addition & 1 deletion pkg/operator/sync.go
Original file line number Diff line number Diff line change
Expand Up @@ -270,7 +270,7 @@ func (optr *Operator) syncRenderConfig(_ *renderConfig) error {
spec.Images = map[string]string{
templatectrl.MachineConfigOperatorKey: imgs.MachineConfigOperator,

templatectrl.GCPRoutesControllerKey: imgs.MachineConfigOperator,
templatectrl.APIServerWatcherKey: imgs.MachineConfigOperator,
templatectrl.InfraImageKey: imgs.InfraImage,
templatectrl.KeepalivedKey: imgs.Keepalived,
templatectrl.CorednsKey: imgs.Coredns,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
mode: 0644
path: "/etc/kubernetes/manifests/gcp-routes-controller.yaml"
path: "/etc/kubernetes/manifests/apiserver-watcher.yaml"
Copy link
Contributor

@sttts sttts Aug 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you rely on a fresh base image (i.e. reboot) to remove the old static pod?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always reboot for config changes today, yes.

Though this gets into #1190 and in fact due to the way the MCO works today there will be a window where both are running unfortunately.

We probably need to change the new code to at least detect the case where the old static pod exists and exit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could also keep it as the same filename; the filename definitely doesn't matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old static pod doesn't matter; it writes to /run/gcp-routes while the new one is /run/cloud-routes, so they can happily coexist (and should, until the service is swapped).

contents:
inline: |
apiVersion: v1
kind: Pod
metadata:
name: gcp-routes-controller
name: apiserver-watcher
namespace: kube-system
spec:
containers:
- name: gcp-routes-controller
image: "{{.Images.gcpRoutesControllerKey}}"
command: ["gcp-routes-controller"]
- name: apiserver-watcher
image: "{{.Images.apiServerWatcherKey}}"
command: ["apiserver-watcher"]
args:
- "run"
- "--health-check-url={{.Infra.Status.APIServerInternalURL}}/readyz"
Expand Down
Loading