Add CVO sync from payload #10

abhinavdahiya · 2018-08-21T00:47:16Z

adds funcs to load manifests from disk
add resource builder that allows Do on any lib.Manifest
adds syncUpdatePayloadContents that ensures /
contains the update payload
adds syncUpdatePayload that uses lib.resourcebuilder to generically
apply the update payload in order.

yifan-gu · 2018-08-21T21:50:29Z

lib/manifest.go

+	if m == nil {
+		return errors.New("Manifest: UnmarshalJSON on nil pointer")
+	}
+	if !bytes.Equal(in, []byte("null")) {


nit: why need to special case this? won't this be handled by the Decode()?

update it add comment. look here

yifan-gu · 2018-08-21T21:52:07Z

lib/manifest.go

+func (m *Manifest) Object() metav1.Object { return m.obj }
+
+const (
+	rootDirKey = "000"


@abhinavdahiya Could you comment why we need this?

yifan-gu · 2018-08-21T21:52:16Z

lib/manifest.go

+// returns map
+// 000: [manifest0, manifest1]
+// 00_subdir0: [manifest0, manifest1]
+// 00_subdir0: [manifest0, manifest1]


this might be 01_subdir1?

yifan-gu

some nits, otherwise lgtm.
Nice work @abhinavdahiya !

* adds syncUpdatePayloadContents that ensures <prefix>/<version> contains the <version> update payload * adds syncUpdatePayload that uses lib.resourcebuilder to generically apply the update payload in order.

abhinavdahiya · 2018-08-22T18:28:23Z

/cc @smarterclayton @crawford

smarterclayton · 2018-08-22T23:33:14Z

lib/manifest.go

+	// passed to LoadManifests
+	// It is set to `000` to give it more priority if the actor sorts
+	// based on keys.
+	rootDirKey = "000"


Alternatively we could just force the root dir to be a real dir and ignore files in the root (treat as not manifests). The only content we had in there in the payload right now was the image mapping and the Cincinnati file.

If the update payload is

/cincinnati.json /images.json /manifests/ .... ....

The root here would mean manifests dir.

making manifests in /manifests (or root in code) gives us a way to add things like job-migrations with highest priority above any operators.

+1 on keeping the manifests dir, seems pretty clean to me and not much overhead.

abhinavdahiya · 2018-08-27T17:05:57Z

@yifan-gu @smarterclayton any updates ??

yifan-gu · 2018-08-27T19:38:36Z

/lgtm
Let's go with it!

openshift-ci-robot · 2018-08-27T19:38:40Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, yifan-gu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [yifan-gu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

smarterclayton · 2018-08-28T15:13:02Z

I'm going to test the prototype release build tool against this image today.

…ow?" We've had 'if updated' guards around waitFor*Completion since the library landed in 2d334c2 (lib: add resource builder that allows Do on any lib.Manifest, 2018-08-20, openshift#10). But, only waiting when 'updated' is true is a weak block, because if/when we fail to complete, Task.Run will back-off and call builder.Apply again. That new Apply will see the already-updated object, set 'updated' false, and not wait. So whether we block or not is orthogonal to 'updated'; nobody cares about whether the most recent update happened in this builder.Apply, this sync cycle, or a previous cycle. We don't even care all that much about whether the Deployment, DaemonSet, CustomResourceDefinition, or Job succeeded. Most feedback is going to come from the ClusterOperator, so with this commit we continue past the resource wait-for unless the resource is really hurting, in which case we fail immediately (inside builder.Apply, Task.Run will still hit us a few times) to bubble that up. In situations where we don't see anything too terrible going on, we'll continue on past and later block on ClusterOperator not being ready. Other changes in this commit: * I've pulled 'b.modifier(crd)' out of the switch, because that should happen regardless of the CRD version. * I've added an error for unrecognized CRD version, because that will be easier to debug than trying to figure out why a CRD manifest is being silently ignored by the CVO. * There's no object status for CRDs or DaemonSets that marks "we are really hurting". The v1.18.0 Kubernetes CRD and DaemonSet controllers do not set any conditions in their operand status (although the API for those conditions exists [1,2]). With this commit, we have very minimal wait logic for either. Sufficiently unhealthy DaemonSet should be reported on via their associated ClusterOperator, and sufficiently unhealthy CRD should be reported on when we fail to push any custom resources consuming them (Task.Run retries will give the API server time to ready itself after accepting a CRD update before the CVO fails its sync cycle). * deploymentBuilder and daemonsetBuilder grow mode properties. They had been using 'actual.Generation > 1' as a proxy for "post-install" since 14fab0b (add generic 2-way merge handler for random types, 2018-09-27, openshift#26), but generation 1 is just "we haven't changed the object since it was created", not "we're installing a fresh cluster". For example, a new Deployment or DaemonSet could be added as part of a cluster update, and we don't want special install-time "we don't care about specific manifest failures" then. [1]: https://github.com/kubernetes/api/blob/v0.18.0/apps/v1/types.go#L586-L590 [2]: https://github.com/kubernetes/apiextensions-apiserver/blob/v0.18.0/pkg/apis/apiextensions/types.go#L319-L320

…ow?" We've had 'if updated' guards around waitFor*Completion since the library landed in 2d334c2 (lib: add resource builder that allows Do on any lib.Manifest, 2018-08-20, openshift#10). But, only waiting when 'updated' is true is a weak block, because if/when we fail to complete, Task.Run will back-off and call builder.Apply again. That new Apply will see the already-updated object, set 'updated' false, and not wait. So whether we block or not is orthogonal to 'updated'; nobody cares about whether the most recent update happened in this builder.Apply, this sync cycle, or a previous cycle. We don't even care all that much about whether the Deployment, DaemonSet, CustomResourceDefinition, or Job succeeded. Most feedback is going to come from the ClusterOperator, so with this commit we continue past the resource wait-for unless the resource is really hurting, in which case we fail immediately (inside builder.Apply, Task.Run will still hit us a few times) to bubble that up. In situations where we don't see anything too terrible going on, we'll continue on past and later block on ClusterOperator not being ready. There's no object status for CRDs or DaemonSets that marks "we are really hurting". The v1.18.0 Kubernetes CRD and DaemonSet controllers do not set any conditions in their operand status (although the API for those conditions exists [1,2]). With this commit, we have very minimal wait logic for either. Sufficiently unhealthy DaemonSet should be reported on via their associated ClusterOperator, and sufficiently unhealthy CRD should be reported on when we fail to push any custom resources consuming them (Task.Run retries will give the API server time to ready itself after accepting a CRD update before the CVO fails its sync cycle). [1]: https://github.com/kubernetes/api/blob/v0.18.0/apps/v1/types.go#L586-L590 [2]: https://github.com/kubernetes/apiextensions-apiserver/blob/v0.18.0/pkg/apis/apiextensions/types.go#L319-L320

…ow?" We've had 'if updated' guards around waitFor*Completion since the library landed in 2d334c2 (lib: add resource builder that allows Do on any lib.Manifest, 2018-08-20, openshift#10). But, only waiting when 'updated' is true is a weak block, because if/when we fail to complete, Task.Run will back-off and call builder.Apply again. That new Apply will see the already-updated object, set 'updated' false, and not wait. So whether we block or not is orthogonal to 'updated'; nobody cares about whether the most recent update happened in this builder.Apply, this sync cycle, or a previous cycle. We don't even care all that much about whether the Deployment, DaemonSet, CustomResourceDefinition, or Job succeeded. Most feedback is going to come from the ClusterOperator, so with this commit we continue past the resource wait-for unless the resource is really hurting, in which case we fail immediately (inside builder.Apply, Task.Run will still hit us a few times) to bubble that up. In situations where we don't see anything too terrible going on, we'll continue on past and later block on ClusterOperator not being ready. There's no object status for CRDs or DaemonSets that marks "we are really hurting". The v1.18.0 Kubernetes CRD and DaemonSet controllers do not set any conditions in their operand status (although the API for those conditions exists [1,2]). With this commit, we have very minimal wait logic for either. Sufficiently unhealthy DaemonSet should be reported on via their associated ClusterOperator, and sufficiently unhealthy CRD should be reported on when we fail to push any custom resources consuming them (Task.Run retries will give the API server time to ready itself after accepting a CRD update before the CVO fails its sync cycle). We still need the public WaitForJobCompletion, because fetchUpdatePayloadToDir uses it to wait on the release download. [1]: https://github.com/kubernetes/api/blob/v0.18.0/apps/v1/types.go#L586-L590 [2]: https://github.com/kubernetes/apiextensions-apiserver/blob/v0.18.0/pkg/apis/apiextensions/types.go#L319-L320

…ow?" We've had 'if updated' guards around waitFor*Completion since the library landed in 2d334c2 (lib: add resource builder that allows Do on any lib.Manifest, 2018-08-20, openshift#10). But, only waiting when 'updated' is true is a weak block, because if/when we fail to complete, Task.Run will back-off and call builder.Apply again. That new Apply will see the already-updated object, set 'updated' false, and not wait. So whether we block or not is orthogonal to 'updated'; nobody cares about whether the most recent update happened in this builder.Apply, this sync cycle, or a previous cycle. We don't even care all that much about whether the Deployment, DaemonSet, CustomResourceDefinition, or Job succeeded. Most feedback is going to come from the ClusterOperator, so with this commit we continue past the resource wait-for unless the resource is really hurting, in which case we fail immediately (inside builder.Apply, Task.Run will still hit us a few times) to bubble that up. In situations where we don't see anything too terrible going on, we'll continue on past and later block on ClusterOperator not being ready. There's no object status for CRDs or DaemonSets that marks "we are really hurting". The v1.18.0 Kubernetes CRD and DaemonSet controllers do not set any conditions in their operand status (although the API for those conditions exists [1,2]). With this commit, we have very minimal wait logic for either. Sufficiently unhealthy DaemonSet should be reported on via their associated ClusterOperator, and sufficiently unhealthy CRD should be reported on when we fail to push any custom resources consuming them (Task.Run retries will give the API server time to ready itself after accepting a CRD update before the CVO fails its sync cycle). We still need the public WaitForJobCompletion, because fetchUpdatePayloadToDir uses it to wait on the release download. Also expand "iff" -> "if and only if" while I'm touching that line, at Jack's suggestion [3]. [1]: https://github.com/kubernetes/api/blob/v0.18.0/apps/v1/types.go#L586-L590 [2]: https://github.com/kubernetes/apiextensions-apiserver/blob/v0.18.0/pkg/apis/apiextensions/types.go#L319-L320 [3]: openshift#400 (comment)

…ow?" We've had 'if updated' guards around waitFor*Completion since the library landed in 2d334c2 (lib: add resource builder that allows Do on any lib.Manifest, 2018-08-20, openshift#10). But, only waiting when 'updated' is true is a weak block, because if/when we fail to complete, Task.Run will back-off and call builder.Apply again. That new Apply will see the already-updated object, set 'updated' false, and not wait. So whether we block or not is orthogonal to 'updated'; nobody cares about whether the most recent update happened in this builder.Apply, this sync cycle, or a previous cycle. We don't even care all that much about whether the Deployment, DaemonSet, CustomResourceDefinition, or Job succeeded. Most feedback is going to come from the ClusterOperator, so with this commit we continue past the resource wait-for unless the resource is really hurting, in which case we fail immediately (inside builder.Apply, Task.Run will still hit us a few times) to bubble that up. In situations where we don't see anything too terrible going on, we'll continue on past and later block on ClusterOperator not being ready. The "unknown state" Deployment logging has changed a bit. I'd initially dropped it, but Jack suggested keeping it to make identifying broken-Deployment-controller and similar situations easier [1]. Previously it was logged when we weren't happy with updatedReplicas and unavailableReplicas, nothing obviously bad was happening, and we were not Progressing=True. We no longer check updatedReplicas or unavailableReplicas, so now it's just "nothing obviously bad is happening, but that may just be because the Deployment controller isn't giving us any of the oconditions we look at to judge badness". It's possible that we should also check for "when we do have those conditions, the values are either True or False, not some unexpected key". But I'm leaving that alone for now. There's no object status for CRDs or DaemonSets that marks "we are really hurting". The v1.18.0 Kubernetes CRD and DaemonSet controllers do not set any conditions in their operand status (although the API for those conditions exists [2,3]). With this commit, we have very minimal wait logic for either. Sufficiently unhealthy DaemonSet should be reported on via their associated ClusterOperator, and sufficiently unhealthy CRD should be reported on when we fail to push any custom resources consuming them (Task.Run retries will give the API server time to ready itself after accepting a CRD update before the CVO fails its sync cycle). We still need the public WaitForJobCompletion, because fetchUpdatePayloadToDir uses it to wait on the release download. Also expand "iff" -> "if and only if" while I'm touching that line, at Jack's suggestion [4]. [1]: openshift#400 (comment) [2]: https://github.com/kubernetes/api/blob/v0.18.0/apps/v1/types.go#L586-L590 [3]: https://github.com/kubernetes/apiextensions-apiserver/blob/v0.18.0/pkg/apis/apiextensions/types.go#L319-L320 [4]: openshift#400 (comment)

This wasn't covered when ClusterRole reconciliation landed in 697cbf6 (lib: update resource{read,merge,apply} to add new objects, 2018-08-20, openshift#10), but the property existed even back then: $ git grep -A20 'type ClusterRole struct' 697cbf6 | grep '[-][[:space:]]Aggrega tionRule' 697cbf6:vendor/k8s.io/api/rbac/v1/types.go- AggregationRule *AggregationRule `json:"aggregationRule,omitempty" protobuf:"bytes,3,opt,name=aggregationRule"` 697cbf6:vendor/k8s.io/api/rbac/v1alpha1/types.go- AggregationRule *AggregationRule `json:"aggregationRule,omitempty" protobuf:"bytes,3,opt,name=aggregationRule"` 697cbf6:vendor/k8s.io/api/rbac/v1beta1/types.go- AggregationRule *AggregationRule `json:"aggregationRule,omitempty" protobuf:"bytes,3,opt,name=aggregationRule"` and now folks want to use aggregationRule for something [1]. [1]: openshift/machine-api-operator#795 // EnsureRoleBinding ensures that the existing matches the required. diff --git a/lib/resourcemerge/rbacv1beta1.go b/lib/resourcemerge/rbacv1beta1.go index d9e84d5..fa3cf17 100644 --- a/lib/resourcemerge/rbacv1beta1.go +++ b/lib/resourcemerge/rbacv1beta1.go @@ -27,6 +27,10 @@ func EnsureClusterRolev1beta1(modified *bool, existing *rbacv1beta1.ClusterRole, *modified = true existing.Rules = required.Rules } + if !equality.Semantic.DeepEqual(existing.AggregationRule, required.AggregationRule) { + *modified = true + existing.AggregationRule = required.AggregationRule + } } // EnsureRoleBindingv1beta1 ensures that the existing matches the required.

This wasn't covered when ClusterRole reconciliation landed in 697cbf6 (lib: update resource{read,merge,apply} to add new objects, 2018-08-20, openshift#10), but the property existed even back then: $ git grep -A20 'type ClusterRole struct' 697cbf6 | grep '[-][[:space:]]Aggrega tionRule' 697cbf6:vendor/k8s.io/api/rbac/v1/types.go- AggregationRule *AggregationRule `json:"aggregationRule,omitempty" protobuf:"bytes,3,opt,name=aggregationRule"` 697cbf6:vendor/k8s.io/api/rbac/v1alpha1/types.go- AggregationRule *AggregationRule `json:"aggregationRule,omitempty" protobuf:"bytes,3,opt,name=aggregationRule"` 697cbf6:vendor/k8s.io/api/rbac/v1beta1/types.go- AggregationRule *AggregationRule `json:"aggregationRule,omitempty" protobuf:"bytes,3,opt,name=aggregationRule"` and now folks want to use aggregationRule for something [1]. [1]: openshift/machine-api-operator#795

vendor: update

6045c23

openshift-ci-robot assigned crawford and yifan-gu Aug 21, 2018

openshift-ci-robot requested review from smarterclayton and wking August 21, 2018 00:47

openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 21, 2018

abhinavdahiya removed request for smarterclayton and wking August 21, 2018 00:47

lib: update resource{read,merge,apply} to add new objects

697cbf6

abhinavdahiya force-pushed the builder branch 3 times, most recently from 15d564a to 7883428 Compare August 21, 2018 01:02

yifan-gu reviewed Aug 21, 2018

View reviewed changes

yifan-gu approved these changes Aug 21, 2018

View reviewed changes

abhinavdahiya force-pushed the builder branch 2 times, most recently from 1df370e to ccd9097 Compare August 21, 2018 22:54

abhinavdahiya added 4 commits August 21, 2018 16:21

lib: add funcs to load manifests from disk

04ebff0

lib: add resource builder that allows Do on any lib.Manifest

2d334c2

pkg: update sync to fetchupdatepayload and apply payload

4b485ca

* adds syncUpdatePayloadContents that ensures <prefix>/<version> contains the <version> update payload * adds syncUpdatePayload that uses lib.resourcebuilder to generically apply the update payload in order.

cmd: update start to add nodename

9527a3b

abhinavdahiya force-pushed the builder branch from ccd9097 to 9527a3b Compare August 21, 2018 23:21

abhinavdahiya mentioned this pull request Aug 22, 2018

Add: autoupdate controller #11

Merged

openshift-ci-robot requested a review from smarterclayton August 22, 2018 18:28

abhinavdahiya requested a review from crawford August 22, 2018 18:30

smarterclayton reviewed Aug 22, 2018

View reviewed changes

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 27, 2018

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 27, 2018

openshift-merge-robot merged commit fda1e14 into openshift:master Aug 27, 2018

wking mentioned this pull request Jun 11, 2019

docs/user/reconciliation: Document release-image application #201

Merged

wking mentioned this pull request Jul 7, 2020

lib/resourcebuilder: Replace wait-for with single-shot "is it alive now?" #400

Merged

wking mentioned this pull request Apr 8, 2021

lib/resourcemerge/rbac: Reconcile ClusterRole.AggregationRule #544

Merged

wking mentioned this pull request Jul 13, 2021

Bug 1984414: Log resource diffs on update only in reconcile mode #628

Merged

Add CVO sync from payload #10

Add CVO sync from payload #10

Uh oh!

Conversation

abhinavdahiya commented Aug 21, 2018

Uh oh!

yifan-gu Aug 21, 2018

Choose a reason for hiding this comment

Uh oh!

abhinavdahiya Aug 21, 2018

Choose a reason for hiding this comment

Uh oh!

yifan-gu Aug 21, 2018

Choose a reason for hiding this comment

Uh oh!

abhinavdahiya Aug 21, 2018

Choose a reason for hiding this comment

Uh oh!

yifan-gu Aug 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abhinavdahiya Aug 21, 2018

Choose a reason for hiding this comment

Uh oh!

yifan-gu left a comment

Choose a reason for hiding this comment

Uh oh!

abhinavdahiya commented Aug 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smarterclayton Aug 22, 2018

Choose a reason for hiding this comment

Uh oh!

abhinavdahiya Aug 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yifan-gu Aug 27, 2018

Choose a reason for hiding this comment

Uh oh!

abhinavdahiya commented Aug 27, 2018

Uh oh!

yifan-gu commented Aug 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Aug 27, 2018

Uh oh!

smarterclayton commented Aug 28, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

yifan-gu Aug 21, 2018 •

edited

Loading

abhinavdahiya commented Aug 22, 2018 •

edited

Loading

abhinavdahiya Aug 23, 2018 •

edited

Loading

yifan-gu commented Aug 27, 2018 •

edited

Loading