Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
959631e
Add etcd-quorum-guard manifests, doc, and (currently disabled) test.
RobertKrawitz Apr 9, 2019
56171c7
Build test
RobertKrawitz Apr 9, 2019
23ac78f
Add doc to test.
RobertKrawitz Apr 10, 2019
ccb0b81
Additional changes
RobertKrawitz Apr 11, 2019
c46bd7b
Wait for deployment to roll out
RobertKrawitz Apr 12, 2019
8c71809
Add etcd-quorum-guard-image to image stream.
RobertKrawitz Apr 12, 2019
7a2be5c
Remove explicit reference to the image; use the templated
RobertKrawitz Apr 12, 2019
1073a6d
Ratcheting changes, per runcom
RobertKrawitz Apr 12, 2019
4a80ef5
Fix some typos
RobertKrawitz Apr 12, 2019
515fb7b
Another round of reviews 04/13
RobertKrawitz Apr 15, 2019
68d9e41
Remove eqg from the bootstrap
RobertKrawitz Apr 15, 2019
b3915b4
Fix image name
RobertKrawitz Apr 15, 2019
e51dea4
Temporarily turn off the sync in case that's got an error.
RobertKrawitz Apr 15, 2019
2888f72
Fix name of disruption budget .yaml file
RobertKrawitz Apr 15, 2019
4f031c9
Clean up naming
RobertKrawitz Apr 15, 2019
10fb2fb
Enable e2e
RobertKrawitz Apr 16, 2019
f8b8f16
Try (again) to make the quorum guard sync.
RobertKrawitz Apr 16, 2019
d25a3b9
Turn off sync so we can see more detailed log files in a
RobertKrawitz Apr 16, 2019
149ea8a
Remove apparently unneeded dummy image.
RobertKrawitz Apr 16, 2019
13c86ec
Remove debugging code from quorum guard
RobertKrawitz Apr 16, 2019
b1a8279
Another attempt to sync
RobertKrawitz Apr 16, 2019
fe855a0
Review comments from Abhinav
RobertKrawitz Apr 17, 2019
fea9fad
Try putting eqg in mco namespace
RobertKrawitz Apr 18, 2019
ce3acd6
Apply #623
RobertKrawitz Apr 18, 2019
da7a6cf
Turn sync off again to see what the quorum guard does.
RobertKrawitz Apr 23, 2019
b594807
Fix the rest of the namespace names in eqg test
RobertKrawitz Apr 23, 2019
901e42a
Try use kube-system namespace and full powered cert
RobertKrawitz Apr 24, 2019
0882043
Try using jedi cert in openshift-machine-config-operator namespace
RobertKrawitz Apr 24, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions cmd/machine-config-operator/bootstrap.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ var (
infraImage string
kubeClientAgentImage string
destinationDir string
etcdQuorumGuardImage string
}
)

Expand All @@ -66,6 +67,8 @@ func init() {
bootstrapCmd.MarkFlagRequired("etcd-image")
bootstrapCmd.PersistentFlags().StringVar(&bootstrapOpts.setupEtcdEnvImage, "setup-etcd-env-image", "", "Image for Setup etcd Environment.")
bootstrapCmd.MarkFlagRequired("setup-etcd-env-image")
bootstrapCmd.PersistentFlags().StringVar(&bootstrapOpts.etcdQuorumGuardImage, "etcd-quorum-guard-image", "registry.svc.ci.openshift.org/openshift/origin-v4.0:base", "Image for etcd Quorum Guard.")
bootstrapCmd.MarkFlagRequired("etcd-quorum-guard-image")
bootstrapCmd.PersistentFlags().StringVar(&bootstrapOpts.kubeClientAgentImage, "kube-client-agent-image", "", "Image for Kube Client Agent.")
bootstrapCmd.MarkFlagRequired("kube-client-agent-image")
bootstrapCmd.PersistentFlags().StringVar(&bootstrapOpts.infraImage, "infra-image", "", "Image for Infra Containers.")
Expand All @@ -90,6 +93,7 @@ func runBootstrapCmd(cmd *cobra.Command, args []string) {
MachineOSContent: bootstrapOpts.oscontentImage,
Etcd: bootstrapOpts.etcdImage,
SetupEtcdEnv: bootstrapOpts.setupEtcdEnvImage,
EtcdQuorumGuardImage: "registry.svc.ci.openshift.org/openshift/origin-v4.0:base",
InfraImage: bootstrapOpts.infraImage,
KubeClientAgent: bootstrapOpts.kubeClientAgentImage,
}
Expand Down
1 change: 1 addition & 0 deletions cmd/machine-config-operator/start.go
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ func runStartCmd(cmd *cobra.Command, args []string) {
ctrlctx.KubeNamespacedInformerFactory.Rbac().V1().ClusterRoles(),
ctrlctx.KubeNamespacedInformerFactory.Rbac().V1().ClusterRoleBindings(),
ctrlctx.KubeNamespacedInformerFactory.Core().V1().ConfigMaps(),
ctrlctx.KubeNamespacedInformerFactory.Policy().V1beta1().PodDisruptionBudgets(),
ctrlctx.KubeInformerFactory.Core().V1().ConfigMaps(),
ctrlctx.ConfigInformerFactory.Config().V1().Infrastructures(),
ctrlctx.ConfigInformerFactory.Config().V1().Networks(),
Expand Down
36 changes: 36 additions & 0 deletions docs/etcd-quorum-guard.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# etcd Quorum Guard

The etcd Quorum Guard ensures that quorum is maintained for etcd for
[OpenShift](https://openshift.io/).

For the etcd cluster to remain usable, we must maintain quorum, which
is a majority of all etcd members. For example, an etcd cluster with
3 members (i.e. a 3 master deployment) must have at least 2 healthy
etcd members to meet the quorum limit.

There are situations where 2 etcd members could be down at once:

* a master has gone offline and the MachineConfig Controller (MCC)
tries to rollout a new MachineConfig (MC) by rebooting masters
* the MCC is doing a MachineConfig rollout and doesn't wait for the
etcd on the previous master to become healthy again before rebooting
the next master

The etcd Quorum Guard ensures that a drain on a master is not allowed
to proceed if the reboot of the master would cause etcd quorum loss.
It is implemented as a deployment, with one pod per master node.

The etcd Quorum Guard checks the health of etcd by querying the health
endpoint of etcd; if etcd reports itself unhealthy or is not present,
the quorum guard reports itself not ready. A disruption budget is
used to allow no more than one unhealthy/missing quorum guard (and
hence etcd). If one etcd is already not healthy or missing, this
disruption budget will act as a drain gate, not allowing an attempt to
drain another node.

This drain gate cannot protect against a second node failing due to
e. g. hardware failure; it can only protect against an attempt to
drain the node in preparation for taking it down.

There is no user or administrator action necessary or available for
the etcd Quorum Guard.
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,6 @@ data:
"etcd": "registry.svc.ci.openshift.org/openshift/origin-v4.0:etcd",
"setupEtcdEnv": "registry.svc.ci.openshift.org/openshift/origin-v4.0:setup-etcd-environment",
"infraImage": "quay.io/openshift/origin-pod:v4.0",
"kubeClientAgentImage": "registry.svc.ci.openshift.org/openshift/origin-v4.0:kube-client-agent"
"kubeClientAgentImage": "registry.svc.ci.openshift.org/openshift/origin-v4.0:kube-client-agent",
"etcdQuorumGuardImage": "registry.svc.ci.openshift.org/openshift/origin-v4.0:base"
}
4 changes: 4 additions & 0 deletions install/image-references
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ spec:
from:
kind: DockerImage
name: quay.io/openshift/origin-pod:v4.0
- name: etcd-quorum-guard
from:
kind: DockerImage
name: registry.svc.ci.openshift.org/openshift/origin-v4.0:base
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/openshift/etcd-quorum-guard/ is actually now dead code (at least until/unless we revive it as a full-fledged operator).

- name: setup-etcd-environment
from:
kind: DockerImage
Expand Down
91 changes: 91 additions & 0 deletions lib/resourceapply/core.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,94 @@ func ApplySecret(client coreclientv1.SecretsGetter, required *corev1.Secret) (*c
actual, err := client.Secrets(required.Namespace).Update(existing)
return actual, true, err
}

// ApplyConfigMap merges objectmeta, requires data
func ApplyConfigMap(client coreclientv1.ConfigMapsGetter, required *corev1.ConfigMap) (*corev1.ConfigMap, bool, error) {
existing, err := client.ConfigMaps(required.Namespace).Get(required.Name, metav1.GetOptions{})
if apierrors.IsNotFound(err) {
actual, err := client.ConfigMaps(required.Namespace).Create(required)
return actual, true, err
}
if err != nil {
return nil, false, err
}

modified := resourcemerge.BoolPtr(false)
existingCopy := existing.DeepCopy()

resourcemerge.EnsureObjectMeta(modified, &existingCopy.ObjectMeta, required.ObjectMeta)

var modifiedKeys []string
for existingCopyKey, existingCopyValue := range existingCopy.Data {
if requiredValue, ok := required.Data[existingCopyKey]; !ok || (existingCopyValue != requiredValue) {
modifiedKeys = append(modifiedKeys, "data."+existingCopyKey)
}
}
for requiredKey := range required.Data {
if _, ok := existingCopy.Data[requiredKey]; !ok {
modifiedKeys = append(modifiedKeys, "data."+requiredKey)
}
}

dataSame := len(modifiedKeys) == 0
if dataSame && !*modified {
return existingCopy, false, nil
}
existingCopy.Data = required.Data

actual, err := client.ConfigMaps(required.Namespace).Update(existingCopy)

return actual, true, err
}

// SyncConfigMap syncs ConfigMap opjectmeta from source namespace to destination
func SyncConfigMap(client coreclientv1.ConfigMapsGetter, sourceNamespace, sourceName, targetNamespace, targetName string, ownerRefs []metav1.OwnerReference) (*corev1.ConfigMap, bool, error) {
source, err := client.ConfigMaps(sourceNamespace).Get(sourceName, metav1.GetOptions{})
sourceCopy := source.DeepCopy()

switch {
case apierrors.IsNotFound(err):
deleteErr := client.ConfigMaps(targetNamespace).Delete(targetName, nil)
if apierrors.IsNotFound(deleteErr) {
return nil, false, nil
}
if deleteErr == nil {
return nil, true, nil
}
return nil, false, deleteErr
case err != nil:
return nil, false, err
default:
sourceCopy.Namespace = targetNamespace
sourceCopy.Name = targetName
sourceCopy.ResourceVersion = ""
sourceCopy.OwnerReferences = ownerRefs
return ApplyConfigMap(client, sourceCopy)
}
}

// SyncConfigMap syncs Secret opjectmeta from source namespace to destination
func SyncSecret(client coreclientv1.SecretsGetter, sourceNamespace, sourceName, targetNamespace, targetName string, ownerRefs []metav1.OwnerReference) (*corev1.Secret, bool, error) {
source, err := client.Secrets(sourceNamespace).Get(sourceName, metav1.GetOptions{})
sourceCopy := source.DeepCopy()

switch {
case apierrors.IsNotFound(err):
deleteErr := client.Secrets(targetNamespace).Delete(targetName, nil)
if apierrors.IsNotFound(deleteErr) {
return nil, false, nil
}
if deleteErr == nil {
return nil, true, nil
}
return nil, false, deleteErr
case err != nil:
return nil, false, err
default:
sourceCopy.Namespace = targetNamespace
sourceCopy.Name = targetName
sourceCopy.ResourceVersion = ""
sourceCopy.OwnerReferences = ownerRefs
return ApplySecret(client, sourceCopy)
}
}
30 changes: 30 additions & 0 deletions lib/resourceapply/policy.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
package resourceapply

import (
"github.com/openshift/machine-config-operator/lib/resourcemerge"
policyv1 "k8s.io/api/policy/v1beta1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
policyclientv1 "k8s.io/client-go/kubernetes/typed/policy/v1beta1"
)

// ApplyPodDisruptionBudget applies the required podDisruptionBudget to the cluster.
func ApplyPodDisruptionBudget(client policyclientv1.PodDisruptionBudgetsGetter, required *policyv1.PodDisruptionBudget) (*policyv1.PodDisruptionBudget, bool, error) {
existing, err := client.PodDisruptionBudgets(required.Namespace).Get(required.Name, metav1.GetOptions{})
if apierrors.IsNotFound(err) {
actual, err := client.PodDisruptionBudgets(required.Namespace).Create(required)
return actual, true, err
}
if err != nil {
return nil, false, err
}

modified := resourcemerge.BoolPtr(false)
resourcemerge.EnsurePodDisruptionBudget(modified, existing, *required)
if !*modified {
return existing, false, nil
}

actual, err := client.PodDisruptionBudgets(required.Namespace).Update(existing)
return actual, true, err
}
16 changes: 16 additions & 0 deletions lib/resourcemerge/policy.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
package resourcemerge

import (
policyv1 "k8s.io/api/policy/v1beta1"
"k8s.io/apimachinery/pkg/api/equality"
)

// EnsurePodDisruptionBudget ensures that the existing matches the required.
// modified is set to true when existing had to be updated with required.
func EnsurePodDisruptionBudget(modified *bool, existing *policyv1.PodDisruptionBudget, required policyv1.PodDisruptionBudget) {
EnsureObjectMeta(modified, &existing.ObjectMeta, required.ObjectMeta)
if !equality.Semantic.DeepEqual(existing.Spec, required.Spec) {
*modified = true
existing.Spec = required.Spec
}
}
27 changes: 27 additions & 0 deletions lib/resourceread/policy.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
package resourceread

import (
policyv1 "k8s.io/api/policy/v1beta1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/runtime/serializer"
)

var (
policyScheme = runtime.NewScheme()
policyCodecs = serializer.NewCodecFactory(policyScheme)
)

func init() {
if err := policyv1.AddToScheme(policyScheme); err != nil {
panic(err)
}
}

// ReadPodDisruptionBudgetV1OrDie reads podDisruptionBudget object from bytes. Panics on error.
func ReadPodDisruptionBudgetV1OrDie(objBytes []byte) *policyv1.PodDisruptionBudget {
requiredObj, err := runtime.Decode(policyCodecs.UniversalDecoder(policyv1.SchemeGroupVersion), objBytes)
if err != nil {
panic(err)
}
return requiredObj.(*policyv1.PodDisruptionBudget)
}
101 changes: 101 additions & 0 deletions manifests/etcdquorumguard/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: etcd-quorum-guard
namespace: {{.TargetNamespace}}
spec:
replicas: 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be possible to teach mco to scale up / decide the replica count based on number of master node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed beyond 4.1; for 4.1, it has been decided to only support 3 masters.

selector:
matchLabels:
k8s-app: etcd-quorum-guard
strategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
name: etcd-quorum-guard
k8s-app: etcd-quorum-guard
spec:
hostNetwork: true
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- "etcd-quorum-guard"
topologyKey: kubernetes.io/hostname
nodeSelector:
node-role.kubernetes.io/master: ""
priorityClassName: "system-cluster-critical"
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
operator: Exists
- key: node.kubernetes.io/memory-pressure
effect: NoSchedule
operator: Exists
- key: node.kubernetes.io/disk-pressure
effect: NoSchedule
operator: Exists
- key: node.kubernetes.io/not-ready
effect: NoExecute
operator: Exists
- key: node.kubernetes.io/unreachable
effect: NoExecute
operator: Exists
- key: node.kubernetes.io/unschedulable
effect: NoExecute
operator: Exists
- key: node-role.kubernetes.io/etcd
operator: Exists
effect: NoSchedule
containers:
- image: "{{.Images.EtcdQuorumGuardImage}}"
imagePullPolicy: IfNotPresent
name: etcd-quorum-guard-container
volumeMounts:
- mountPath: /mnt/kube
name: kubecerts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should be called etcdcerts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

command:
- "/bin/sh"
args:
- "-c"
- |
declare -r croot=/mnt/kube
set -x
declare -r health_endpoint="https://127.0.0.1:2379/health"
declare -r cert="$(find $croot -name 'system:etcd-peer*.crt' -print -quit)"
declare -r key="${cert%.crt}.key"
declare -r cacert="$croot/ca.crt"
ls -lR "$croot"
ls -lRL "$croot"
while : ; do date; curl --max-time 2 --cert "${cert//:/\:}" --key "$key" --cacert "$cacert" "$health_endpoint"; sleep 5; done
readinessProbe:
exec:
command:
- /bin/sh
- -c
- |
declare -r croot=/mnt/kube
declare -r health_endpoint="https://127.0.0.1:2379/health"
declare -r cert="$(find $croot -name 'system:etcd-peer*.crt' -print -quit)"
declare -r key="${cert%.crt}.key"
declare -r cacert="$croot/ca.crt"
[[ -z $cert || -z $key ]] && exit 1
curl --max-time 2 --silent --cert "${cert//:/\:}" --key "$key" --cacert "$cacert" "$health_endpoint" |grep '{ *"health" *: *"true" *}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @hexfusion
please use the metrics client certs that were created to connect to etcd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are those certs located?

Copy link
Contributor

@hexfusion hexfusion Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could get crt/key with something like

oc -n openshift-config get secrets etcd-metric-client -o yaml 

ca

oc get configmap -n openshift-config etcd-metric-serving-ca -o yaml

Copy link
Contributor

@hexfusion hexfusion Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use etcd proxy for /health with these certs. port 9979 vs 2379

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that inside the pod?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the rationale; my question is how to get the appropriate cert.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

working on this now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the etcd-quorum-guard proper does not have any Go code in it; it's simply (right now) a static deployment and disruption budget, with the lone pod being a trivial script.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with #623 you should be able to mount the resources and then consume in your bash as local files. Something like.

        volumeMounts:
          - mountPath: "/etc/ssl/certs/etcd"
            name: etcd-metric-client
            readOnly: true
      volumes:
        - name: etcd-metric-client
          secret:
            secretName: etcd-metric-client

initialDelaySecond: 5
periodSecond: 5
resources:
requests:
cpu: 10m
memory: 5Mi
volumes:
- name: kubecerts
hostPath:
path: /etc/kubernetes/static-pod-resources/etcd-member
10 changes: 10 additions & 0 deletions manifests/etcdquorumguard/disruption-budget.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
namespace: {{.TargetNamespace}}
name: etcd-quorum-guard
spec:
maxUnavailable: 1
selector:
matchLabels:
k8s-app: etcd-quorum-guard
3 changes: 3 additions & 0 deletions pkg/controller/template/constants.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,7 @@ const (

// KubeClientAgentImageKey is the key that references the kube-client-agent image in the controller
KubeClientAgentImageKey string = "kubeClientAgentImage"

// EtcdQuorumGuardImageKey is the key that references the etcd-quorum-guard image
EtcdQuorumGuardImageKey string = "etcdQuorumGuardImage"
)
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ spec:
setupEtcdEnv: image/setupEtcdEnv:1
infraImage: image/infraImage:1
kubeClientAgentImage: image/kubeClientAgentImage:1
etcdQuorumGuardImage: image/etcdQuorumGuardImage:1
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ spec:
setupEtcdEnv: image/setupEtcdEnv:1
infraImage: image/infraImage:1
kubeClientAgentImage: image/kubeClientAgentImage:1
etcdQuorumGuardImage: image/etcdQuorumGuardImage:1
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ spec:
setupEtcdEnv: image/setupEtcdEnv:1
infraImage: image/infraImage:1
kubeClientAgentImage: image/kubeClientAgentImage:1
etcdQuorumGuardImage: image/etcdQuorumGuardImage:1
Loading