Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
eea2092
pkg/cvo/sync_worker: Generalize CancelError to ContextError
wking May 28, 2020
adf8fd6
pkg/cvo/sync_worker: Do not treat "All errors were context errors..."…
wking May 28, 2020
99f6ba5
Merge pull request #378 from openshift-cherrypick-robot/cherry-pick-3…
openshift-merge-robot Jun 3, 2020
202578a
Expand supported set of probe field mutations
ironcladlou Jun 15, 2020
40ec7e4
Merge pull request #389 from openshift-cherrypick-robot/cherry-pick-3…
openshift-merge-robot Jun 19, 2020
2326bcc
Bug 1855577: Updating the golang.org/x/text version to v0.3.3
LalatenduMohanty Jul 14, 2020
dc662b7
pkg/cvo: Set NoDesiredImage reason when desired.Image is empty
wking May 27, 2020
f00e20f
pkg/cvo/status: Raise Operator leveling grace-period to 20 minutes
wking Jul 31, 2020
b0f92e5
Merge pull request #427 from wking/raise-operator-leveling-timeout-4.5
openshift-merge-robot Aug 19, 2020
9713dc5
Merge pull request #409 from openshift-cherrypick-robot/cherry-pick-4…
openshift-merge-robot Aug 20, 2020
55ff603
pkg/start: Drop the internal EnableMetrics
wking Apr 15, 2020
d257c32
pkg/cvo/metrics: Graceful server shutdown
wking Apr 15, 2020
f8774c0
pkg/start: Register metrics directly
wking Apr 15, 2020
d8ca134
pkg/cvo/egress: Pull HTTPS/Proxy egress into separate file
wking Apr 21, 2020
905b305
pkg/start: Release leader lease on graceful shutdown
wking Aug 3, 2020
c8af639
pkg/start/start_integration_test: Do not assume "deleted" for ConfigM…
wking Aug 5, 2020
c8f99b2
pkg/start: Fill in deferred HandleCrash
wking Aug 6, 2020
a42bfb7
cmd/start: Include the version in the outgoing log line
wking Aug 25, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions bootstrap/bootstrap-pod.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ spec:
fieldRef:
fieldPath: spec.nodeName
hostNetwork: true
terminationGracePeriodSeconds: 130
volumes:
- name: kubeconfig
hostPath:
Expand Down
7 changes: 5 additions & 2 deletions cmd/start.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
package main

import (
"context"

"github.com/spf13/cobra"
"k8s.io/klog"

Expand All @@ -16,11 +18,12 @@ func init() {
Long: "",
Run: func(cmd *cobra.Command, args []string) {
// To help debugging, immediately log version
klog.Infof("%s", version.String)
klog.Info(version.String)

if err := opts.Run(); err != nil {
if err := opts.Run(context.Background()); err != nil {
klog.Fatalf("error: %v", err)
}
klog.Infof("Graceful shutdown complete for %s.", version.String)
},
}

Expand Down
8 changes: 4 additions & 4 deletions docs/user/reconciliation.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,22 +93,22 @@ So the graph nodes are all parallelized with the by-number ordering flattened ou

For the usual reconciliation loop (neither an upgrade between releases nor a fresh install), the flattened graph is also randomly permuted to avoid hanging on ordering bugs.

## Synchronizing the graph
## Reconciling the graph

The cluster-version operator spawns worker goroutines that walk the graph, pushing manifests in their queue.
For each manifest in the node, the worker synchronizes the cluster with the manifest using a resource builder.
For each manifest in the node, the worker reconciles the cluster with the manifest using a resource builder.
On error (or timeout), the worker abandons the manifest, graph node, and any dependencies of that graph node.
On success, the worker proceeds to the next manifest in the graph node.

## Resource builders

Resource builders synchronize the cluster with a manifest from the release image.
Resource builders reconcile a cluster object with a manifest from the release image.
The general approach is to generates a merged manifest combining critical spec properties from the release-image manifest with data from a preexisting in-cluster object, if any.
If the merged manifest differs from the in-cluster object, the merged manifest is pushed back into the cluster.

Some types have additional logic, as described in the following subsections.
Note that this logic only applies to manifests included in the release image itself.
For example, only [ClusterOperator](../dev/clusteroperator.md) from the release image will have the blocking logic described [below](#clusteroperator); if an admin or secondary operator pushed a ClusterOperator object, it would not impact the cluster-version operator's graph synchronization.
For example, only [ClusterOperator](../dev/clusteroperator.md) from the release image will have the blocking logic described [below](#clusteroperator); if an admin or secondary operator pushed a ClusterOperator object, it would not impact the cluster-version operator's graph reconciliation.

### ClusterOperator

Expand Down
22 changes: 22 additions & 0 deletions docs/user/status.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,27 @@
[The ClusterVersion object](../dev/clusterversion.md) sets `conditions` describing the state of the cluster-version operator (CVO).
This document describes those conditions and, where appropriate, suggests possible mitigations.

## Failing

When `Failing` is True, the CVO is failing to reconcile the cluster with the desired release image.
In all cases, the impact on the cluster will be that dependent nodes in [the manifest graph](reconciliation.md#manifest-graph) may not be [reconciled](reconciliation.md#reconciling-the-graph).
Note that the graph [may be flattened](reconciliation.md#manifest-graph), in which case there are no dependent nodes.

Most reconciliation errors will result in `Failing=True`, although [`ClusterOperatorNotAvailable`](#clusteroperatornotavailable) has special handling.

### NoDesiredImage

The CVO has not been given a release image to reconcile.

If this happens it is a CVO coding error, because clearing [`desiredUpdate`][api-desired-update] should return you to the current CVO's release image.

### ClusterOperatorNotAvailable

`ClusterOperatorNotAvailable` (or the consolidated `ClusterOperatorsNotAvailable`) is set when the CVO fails to retrieve the ClusterOperator from the cluster or when the retrieved ClusterOperator does not satisfy [the reconciliation conditions](reconciliation.md#clusteroperator).

Unlike most manifest-reconciliation failures, this error does not immediately result in `Failing=True`.
Under some conditions during installs and updates, the CVO will treat this condition as a `Progressing=True` condition and give the operator up to twenty minutes to level before reporting `Failing=True`.

## RetrievedUpdates

When `RetrievedUpdates` is `True`, the CVO is succesfully retrieving updates, which is good.
Expand Down Expand Up @@ -107,5 +128,6 @@ If this error occurs because you forced an update to a release that is not in an
If this happens it is a CVO coding error.
There is no mitigation short of updating to a new release image with a fixed CVO.

[api-desired-update]: https://github.com/openshift/api/blob/34f54f12813aaed8822bb5bc56e97cbbfa92171d/config/v1/types_cluster_version.go#L40-L54
[channels]: https://docs.openshift.com/container-platform/4.3/updating/updating-cluster-between-minor.html#understanding-upgrade-channels_updating-cluster-between-minor
[Cincinnati]: https://github.com/openshift/cincinnati/blob/master/docs/design/openshift.md
2 changes: 2 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ module github.com/openshift/cluster-version-operator

go 1.13

replace golang.org/x/text => golang.org/x/text v0.3.3

require (
github.com/blang/semver v3.5.0+incompatible
github.com/davecgh/go-spew v1.1.1
Expand Down
7 changes: 2 additions & 5 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -397,11 +397,8 @@ golang.org/x/sys v0.0.0-20190801041406-cbf593c0f2f3/go.mod h1:h1NjWce9XRLGQEsW7w
golang.org/x/sys v0.0.0-20190826190057-c7b8b68b1456/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20191003212358-c178f38b412c h1:6Zx7DRlKXf79yfxuQ/7GqV3w2y7aDsk6bGg0MzF5RVU=
golang.org/x/sys v0.0.0-20191003212358-c178f38b412c/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/text v0.0.0-20160726164857-2910a502d2bf/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.1-0.20180807135948-17ff2d5776d2/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.2 h1:tW2bmiBqwgJj/UpqtC8EpXEZVYOwU0yG4iWbprSVAcs=
golang.org/x/text v0.3.2/go.mod h1:bEr9sfX3Q8Zfm5fL9x+3itogRgK3+ptLWKqgva+5dAk=
golang.org/x/text v0.3.3 h1:cokOdA+Jmi5PJGXLlLllQSgYigAEfHXJAERHVMaCc2k=
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/time v0.0.0-20180412165947-fbb02b2291d2/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
golang.org/x/time v0.0.0-20181108054448-85acf8d2951c/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
golang.org/x/time v0.0.0-20190308202827-9d24e82272b4/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ spec:
nodeSelector:
node-role.kubernetes.io/master: ""
priorityClassName: "system-cluster-critical"
terminationGracePeriodSeconds: 130
tolerations:
- key: "node-role.kubernetes.io/master"
operator: Exists
Expand Down
4 changes: 4 additions & 0 deletions lib/resourcemerge/core.go
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,10 @@ func ensureProbePtr(modified *bool, existing **corev1.Probe, required *corev1.Pr

func ensureProbe(modified *bool, existing *corev1.Probe, required corev1.Probe) {
setInt32(modified, &existing.InitialDelaySeconds, required.InitialDelaySeconds)
setInt32(modified, &existing.TimeoutSeconds, required.TimeoutSeconds)
setInt32(modified, &existing.PeriodSeconds, required.PeriodSeconds)
setInt32(modified, &existing.SuccessThreshold, required.SuccessThreshold)
setInt32(modified, &existing.FailureThreshold, required.FailureThreshold)

ensureProbeHandler(modified, &existing.Handler, required.Handler)
}
Expand Down
92 changes: 92 additions & 0 deletions lib/resourcemerge/core_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -359,6 +359,98 @@ func TestEnsurePodSpec(t *testing.T) {
},
},
},
{
name: "modify container readiness probe",
existing: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: "test",
ReadinessProbe: &corev1.Probe{
InitialDelaySeconds: 1,
TimeoutSeconds: 2,
PeriodSeconds: 3,
SuccessThreshold: 4,
FailureThreshold: 5,
},
},
},
},
input: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: "test",
ReadinessProbe: &corev1.Probe{
InitialDelaySeconds: 7,
TimeoutSeconds: 8,
PeriodSeconds: 9,
SuccessThreshold: 10,
FailureThreshold: 11,
},
},
},
},
expectedModified: true,
expected: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: "test",
ReadinessProbe: &corev1.Probe{
InitialDelaySeconds: 7,
TimeoutSeconds: 8,
PeriodSeconds: 9,
SuccessThreshold: 10,
FailureThreshold: 11,
},
},
},
},
},
{
name: "modify container liveness probe",
existing: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: "test",
LivenessProbe: &corev1.Probe{
InitialDelaySeconds: 1,
TimeoutSeconds: 2,
PeriodSeconds: 3,
SuccessThreshold: 4,
FailureThreshold: 5,
},
},
},
},
input: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: "test",
LivenessProbe: &corev1.Probe{
InitialDelaySeconds: 7,
TimeoutSeconds: 8,
PeriodSeconds: 9,
SuccessThreshold: 10,
FailureThreshold: 11,
},
},
},
},
expectedModified: true,
expected: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: "test",
LivenessProbe: &corev1.Probe{
InitialDelaySeconds: 7,
TimeoutSeconds: 8,
PeriodSeconds: 9,
SuccessThreshold: 10,
FailureThreshold: 11,
},
},
},
},
},
}

for _, test := range tests {
Expand Down
7 changes: 4 additions & 3 deletions pkg/autoupdate/autoupdate.go
Original file line number Diff line number Diff line change
Expand Up @@ -87,23 +87,24 @@ func New(
}

// Run runs the autoupdate controller.
func (ctrl *Controller) Run(workers int, stopCh <-chan struct{}) {
func (ctrl *Controller) Run(workers int, stopCh <-chan struct{}) error {
defer utilruntime.HandleCrash()
defer ctrl.queue.ShutDown()

klog.Info("Starting AutoUpdateController")
defer klog.Info("Shutting down AutoUpdateController")

if !cache.WaitForCacheSync(stopCh, ctrl.cacheSynced...) {
klog.Info("Caches never synchronized")
return
return fmt.Errorf("caches never synchronized")
}

for i := 0; i < workers; i++ {
// FIXME: actually wait until these complete if the Context is canceled. And possibly add utilruntime.HandleCrash.
go wait.Until(ctrl.worker, time.Second, stopCh)
}

<-stopCh
return nil
}

func (ctrl *Controller) eventHandler() cache.ResourceEventHandler {
Expand Down
53 changes: 0 additions & 53 deletions pkg/cvo/availableupdates.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ package cvo

import (
"crypto/tls"
"crypto/x509"
"fmt"
"net/url"
"runtime"
Expand All @@ -11,7 +10,6 @@ import (
"github.com/blang/semver"
"github.com/google/uuid"
"k8s.io/apimachinery/pkg/api/equality"
"k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/klog"

Expand Down Expand Up @@ -197,54 +195,3 @@ func calculateAvailableUpdatesStatus(clusterID string, proxyURL *url.URL, tlsCon
LastTransitionTime: metav1.Now(),
}
}

// getHTTPSProxyURL returns a url.URL object for the configured
// https proxy only. It can be nil if does not exist or there is an error.
func (optr *Operator) getHTTPSProxyURL() (*url.URL, string, error) {
proxy, err := optr.proxyLister.Get("cluster")

if errors.IsNotFound(err) {
return nil, "", nil
}
if err != nil {
return nil, "", err
}

if &proxy.Spec != nil {
if proxy.Spec.HTTPSProxy != "" {
proxyURL, err := url.Parse(proxy.Spec.HTTPSProxy)
if err != nil {
return nil, "", err
}
return proxyURL, proxy.Spec.TrustedCA.Name, nil
}
}
return nil, "", nil
}

func (optr *Operator) getTLSConfig(cmNameRef string) (*tls.Config, error) {
cm, err := optr.cmConfigLister.Get(cmNameRef)

if err != nil {
return nil, err
}

certPool, _ := x509.SystemCertPool()
if certPool == nil {
certPool = x509.NewCertPool()
}

if cm.Data["ca-bundle.crt"] != "" {
if ok := certPool.AppendCertsFromPEM([]byte(cm.Data["ca-bundle.crt"])); !ok {
return nil, fmt.Errorf("unable to add ca-bundle.crt certificates")
}
} else {
return nil, nil
}

config := &tls.Config{
RootCAs: certPool,
}

return config, nil
}
19 changes: 8 additions & 11 deletions pkg/cvo/cvo.go
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,6 @@ func New(
proxyInformer configinformersv1.ProxyInformer,
client clientset.Interface,
kubeClient kubernetes.Interface,
enableMetrics bool,
exclude string,
) *Operator {
eventBroadcaster := record.NewBroadcaster()
Expand Down Expand Up @@ -214,11 +213,6 @@ func New(
// make sure this is initialized after all the listers are initialized
optr.upgradeableChecks = optr.defaultUpgradeableChecks()

if enableMetrics {
if err := optr.registerMetrics(coInformer.Informer()); err != nil {
panic(err)
}
}
return optr
}

Expand Down Expand Up @@ -321,8 +315,7 @@ func loadConfigMapVerifierDataFromUpdate(update *payload.Update, clientBuilder v
}

// Run runs the cluster version operator until stopCh is completed. Workers is ignored for now.
func (optr *Operator) Run(ctx context.Context, workers int) {
defer utilruntime.HandleCrash()
func (optr *Operator) Run(ctx context.Context, workers int) error {
defer optr.queue.ShutDown()
stopCh := ctx.Done()
workerStopCh := make(chan struct{})
Expand All @@ -331,8 +324,7 @@ func (optr *Operator) Run(ctx context.Context, workers int) {
defer klog.Info("Shutting down ClusterVersionOperator")

if !cache.WaitForCacheSync(stopCh, optr.cacheSynced...) {
klog.Info("Caches never synchronized")
return
return fmt.Errorf("caches never synchronized: %w", ctx.Err())
}

// trigger the first cluster version reconcile always
Expand Down Expand Up @@ -361,6 +353,8 @@ func (optr *Operator) Run(ctx context.Context, workers int) {
// stop the queue, then wait for the worker to exit
optr.queue.ShutDown()
<-workerStopCh

return nil
}

func (optr *Operator) queueKey() string {
Expand Down Expand Up @@ -472,7 +466,10 @@ func (optr *Operator) sync(key string) error {
// handle the case of a misconfigured CVO by doing nothing
if len(desired.Image) == 0 {
return optr.syncStatus(original, config, &SyncWorkerStatus{
Failure: fmt.Errorf("No configured operator version, unable to update cluster"),
Failure: &payload.UpdateError{
Reason: "NoDesiredImage",
Message: "No configured operator version, unable to update cluster",
},
}, errs)
}

Expand Down
Loading