-
Notifications
You must be signed in to change notification settings - Fork 213
Add CVO sync from payload #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CVO sync from payload #10
Conversation
15d564a to
7883428
Compare
lib/manifest.go
Outdated
| if m == nil { | ||
| return errors.New("Manifest: UnmarshalJSON on nil pointer") | ||
| } | ||
| if !bytes.Equal(in, []byte("null")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why need to special case this? won't this be handled by the Decode()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update it add comment. look here
| func (m *Manifest) Object() metav1.Object { return m.obj } | ||
|
|
||
| const ( | ||
| rootDirKey = "000" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abhinavdahiya Could you comment why we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
lib/manifest.go
Outdated
| // returns map | ||
| // 000: [manifest0, manifest1] | ||
| // 00_subdir0: [manifest0, manifest1] | ||
| // 00_subdir0: [manifest0, manifest1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be 01_subdir1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
yifan-gu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some nits, otherwise lgtm.
Nice work @abhinavdahiya !
1df370e to
ccd9097
Compare
* adds syncUpdatePayloadContents that ensures <prefix>/<version> contains the <version> update payload * adds syncUpdatePayload that uses lib.resourcebuilder to generically apply the update payload in order.
| // passed to LoadManifests | ||
| // It is set to `000` to give it more priority if the actor sorts | ||
| // based on keys. | ||
| rootDirKey = "000" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively we could just force the root dir to be a real dir and ignore files in the root (treat as not manifests). The only content we had in there in the payload right now was the image mapping and the Cincinnati file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the update payload is
/cincinnati.json
/images.json
/manifests/
....
....The root here would mean manifests dir.
making manifests in /manifests (or root in code) gives us a way to add things like job-migrations with highest priority above any operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on keeping the manifests dir, seems pretty clean to me and not much overhead.
|
@yifan-gu @smarterclayton any updates ?? |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya, yifan-gu The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
I'm going to test the prototype release build tool against this image today. |
…ow?" We've had 'if updated' guards around waitFor*Completion since the library landed in 2d334c2 (lib: add resource builder that allows Do on any lib.Manifest, 2018-08-20, openshift#10). But, only waiting when 'updated' is true is a weak block, because if/when we fail to complete, Task.Run will back-off and call builder.Apply again. That new Apply will see the already-updated object, set 'updated' false, and not wait. So whether we block or not is orthogonal to 'updated'; nobody cares about whether the most recent update happened in this builder.Apply, this sync cycle, or a previous cycle. We don't even care all that much about whether the Deployment, DaemonSet, CustomResourceDefinition, or Job succeeded. Most feedback is going to come from the ClusterOperator, so with this commit we continue past the resource wait-for unless the resource is really hurting, in which case we fail immediately (inside builder.Apply, Task.Run will still hit us a few times) to bubble that up. In situations where we don't see anything too terrible going on, we'll continue on past and later block on ClusterOperator not being ready. Other changes in this commit: * I've pulled 'b.modifier(crd)' out of the switch, because that should happen regardless of the CRD version. * I've added an error for unrecognized CRD version, because that will be easier to debug than trying to figure out why a CRD manifest is being silently ignored by the CVO. * There's no object status for CRDs or DaemonSets that marks "we are really hurting". The v1.18.0 Kubernetes CRD and DaemonSet controllers do not set any conditions in their operand status (although the API for those conditions exists [1,2]). With this commit, we have very minimal wait logic for either. Sufficiently unhealthy DaemonSet should be reported on via their associated ClusterOperator, and sufficiently unhealthy CRD should be reported on when we fail to push any custom resources consuming them (Task.Run retries will give the API server time to ready itself after accepting a CRD update before the CVO fails its sync cycle). * deploymentBuilder and daemonsetBuilder grow mode properties. They had been using 'actual.Generation > 1' as a proxy for "post-install" since 14fab0b (add generic 2-way merge handler for random types, 2018-09-27, openshift#26), but generation 1 is just "we haven't changed the object since it was created", not "we're installing a fresh cluster". For example, a new Deployment or DaemonSet could be added as part of a cluster update, and we don't want special install-time "we don't care about specific manifest failures" then. [1]: https://github.com/kubernetes/api/blob/v0.18.0/apps/v1/types.go#L586-L590 [2]: https://github.com/kubernetes/apiextensions-apiserver/blob/v0.18.0/pkg/apis/apiextensions/types.go#L319-L320
…ow?" We've had 'if updated' guards around waitFor*Completion since the library landed in 2d334c2 (lib: add resource builder that allows Do on any lib.Manifest, 2018-08-20, openshift#10). But, only waiting when 'updated' is true is a weak block, because if/when we fail to complete, Task.Run will back-off and call builder.Apply again. That new Apply will see the already-updated object, set 'updated' false, and not wait. So whether we block or not is orthogonal to 'updated'; nobody cares about whether the most recent update happened in this builder.Apply, this sync cycle, or a previous cycle. We don't even care all that much about whether the Deployment, DaemonSet, CustomResourceDefinition, or Job succeeded. Most feedback is going to come from the ClusterOperator, so with this commit we continue past the resource wait-for unless the resource is really hurting, in which case we fail immediately (inside builder.Apply, Task.Run will still hit us a few times) to bubble that up. In situations where we don't see anything too terrible going on, we'll continue on past and later block on ClusterOperator not being ready. There's no object status for CRDs or DaemonSets that marks "we are really hurting". The v1.18.0 Kubernetes CRD and DaemonSet controllers do not set any conditions in their operand status (although the API for those conditions exists [1,2]). With this commit, we have very minimal wait logic for either. Sufficiently unhealthy DaemonSet should be reported on via their associated ClusterOperator, and sufficiently unhealthy CRD should be reported on when we fail to push any custom resources consuming them (Task.Run retries will give the API server time to ready itself after accepting a CRD update before the CVO fails its sync cycle). [1]: https://github.com/kubernetes/api/blob/v0.18.0/apps/v1/types.go#L586-L590 [2]: https://github.com/kubernetes/apiextensions-apiserver/blob/v0.18.0/pkg/apis/apiextensions/types.go#L319-L320
…ow?" We've had 'if updated' guards around waitFor*Completion since the library landed in 2d334c2 (lib: add resource builder that allows Do on any lib.Manifest, 2018-08-20, openshift#10). But, only waiting when 'updated' is true is a weak block, because if/when we fail to complete, Task.Run will back-off and call builder.Apply again. That new Apply will see the already-updated object, set 'updated' false, and not wait. So whether we block or not is orthogonal to 'updated'; nobody cares about whether the most recent update happened in this builder.Apply, this sync cycle, or a previous cycle. We don't even care all that much about whether the Deployment, DaemonSet, CustomResourceDefinition, or Job succeeded. Most feedback is going to come from the ClusterOperator, so with this commit we continue past the resource wait-for unless the resource is really hurting, in which case we fail immediately (inside builder.Apply, Task.Run will still hit us a few times) to bubble that up. In situations where we don't see anything too terrible going on, we'll continue on past and later block on ClusterOperator not being ready. There's no object status for CRDs or DaemonSets that marks "we are really hurting". The v1.18.0 Kubernetes CRD and DaemonSet controllers do not set any conditions in their operand status (although the API for those conditions exists [1,2]). With this commit, we have very minimal wait logic for either. Sufficiently unhealthy DaemonSet should be reported on via their associated ClusterOperator, and sufficiently unhealthy CRD should be reported on when we fail to push any custom resources consuming them (Task.Run retries will give the API server time to ready itself after accepting a CRD update before the CVO fails its sync cycle). [1]: https://github.com/kubernetes/api/blob/v0.18.0/apps/v1/types.go#L586-L590 [2]: https://github.com/kubernetes/apiextensions-apiserver/blob/v0.18.0/pkg/apis/apiextensions/types.go#L319-L320
…ow?" We've had 'if updated' guards around waitFor*Completion since the library landed in 2d334c2 (lib: add resource builder that allows Do on any lib.Manifest, 2018-08-20, openshift#10). But, only waiting when 'updated' is true is a weak block, because if/when we fail to complete, Task.Run will back-off and call builder.Apply again. That new Apply will see the already-updated object, set 'updated' false, and not wait. So whether we block or not is orthogonal to 'updated'; nobody cares about whether the most recent update happened in this builder.Apply, this sync cycle, or a previous cycle. We don't even care all that much about whether the Deployment, DaemonSet, CustomResourceDefinition, or Job succeeded. Most feedback is going to come from the ClusterOperator, so with this commit we continue past the resource wait-for unless the resource is really hurting, in which case we fail immediately (inside builder.Apply, Task.Run will still hit us a few times) to bubble that up. In situations where we don't see anything too terrible going on, we'll continue on past and later block on ClusterOperator not being ready. There's no object status for CRDs or DaemonSets that marks "we are really hurting". The v1.18.0 Kubernetes CRD and DaemonSet controllers do not set any conditions in their operand status (although the API for those conditions exists [1,2]). With this commit, we have very minimal wait logic for either. Sufficiently unhealthy DaemonSet should be reported on via their associated ClusterOperator, and sufficiently unhealthy CRD should be reported on when we fail to push any custom resources consuming them (Task.Run retries will give the API server time to ready itself after accepting a CRD update before the CVO fails its sync cycle). We still need the public WaitForJobCompletion, because fetchUpdatePayloadToDir uses it to wait on the release download. [1]: https://github.com/kubernetes/api/blob/v0.18.0/apps/v1/types.go#L586-L590 [2]: https://github.com/kubernetes/apiextensions-apiserver/blob/v0.18.0/pkg/apis/apiextensions/types.go#L319-L320
…ow?" We've had 'if updated' guards around waitFor*Completion since the library landed in 2d334c2 (lib: add resource builder that allows Do on any lib.Manifest, 2018-08-20, openshift#10). But, only waiting when 'updated' is true is a weak block, because if/when we fail to complete, Task.Run will back-off and call builder.Apply again. That new Apply will see the already-updated object, set 'updated' false, and not wait. So whether we block or not is orthogonal to 'updated'; nobody cares about whether the most recent update happened in this builder.Apply, this sync cycle, or a previous cycle. We don't even care all that much about whether the Deployment, DaemonSet, CustomResourceDefinition, or Job succeeded. Most feedback is going to come from the ClusterOperator, so with this commit we continue past the resource wait-for unless the resource is really hurting, in which case we fail immediately (inside builder.Apply, Task.Run will still hit us a few times) to bubble that up. In situations where we don't see anything too terrible going on, we'll continue on past and later block on ClusterOperator not being ready. There's no object status for CRDs or DaemonSets that marks "we are really hurting". The v1.18.0 Kubernetes CRD and DaemonSet controllers do not set any conditions in their operand status (although the API for those conditions exists [1,2]). With this commit, we have very minimal wait logic for either. Sufficiently unhealthy DaemonSet should be reported on via their associated ClusterOperator, and sufficiently unhealthy CRD should be reported on when we fail to push any custom resources consuming them (Task.Run retries will give the API server time to ready itself after accepting a CRD update before the CVO fails its sync cycle). We still need the public WaitForJobCompletion, because fetchUpdatePayloadToDir uses it to wait on the release download. [1]: https://github.com/kubernetes/api/blob/v0.18.0/apps/v1/types.go#L586-L590 [2]: https://github.com/kubernetes/apiextensions-apiserver/blob/v0.18.0/pkg/apis/apiextensions/types.go#L319-L320
…ow?" We've had 'if updated' guards around waitFor*Completion since the library landed in 2d334c2 (lib: add resource builder that allows Do on any lib.Manifest, 2018-08-20, openshift#10). But, only waiting when 'updated' is true is a weak block, because if/when we fail to complete, Task.Run will back-off and call builder.Apply again. That new Apply will see the already-updated object, set 'updated' false, and not wait. So whether we block or not is orthogonal to 'updated'; nobody cares about whether the most recent update happened in this builder.Apply, this sync cycle, or a previous cycle. We don't even care all that much about whether the Deployment, DaemonSet, CustomResourceDefinition, or Job succeeded. Most feedback is going to come from the ClusterOperator, so with this commit we continue past the resource wait-for unless the resource is really hurting, in which case we fail immediately (inside builder.Apply, Task.Run will still hit us a few times) to bubble that up. In situations where we don't see anything too terrible going on, we'll continue on past and later block on ClusterOperator not being ready. There's no object status for CRDs or DaemonSets that marks "we are really hurting". The v1.18.0 Kubernetes CRD and DaemonSet controllers do not set any conditions in their operand status (although the API for those conditions exists [1,2]). With this commit, we have very minimal wait logic for either. Sufficiently unhealthy DaemonSet should be reported on via their associated ClusterOperator, and sufficiently unhealthy CRD should be reported on when we fail to push any custom resources consuming them (Task.Run retries will give the API server time to ready itself after accepting a CRD update before the CVO fails its sync cycle). We still need the public WaitForJobCompletion, because fetchUpdatePayloadToDir uses it to wait on the release download. Also expand "iff" -> "if and only if" while I'm touching that line, at Jack's suggestion [3]. [1]: https://github.com/kubernetes/api/blob/v0.18.0/apps/v1/types.go#L586-L590 [2]: https://github.com/kubernetes/apiextensions-apiserver/blob/v0.18.0/pkg/apis/apiextensions/types.go#L319-L320 [3]: openshift#400 (comment)
…ow?" We've had 'if updated' guards around waitFor*Completion since the library landed in 2d334c2 (lib: add resource builder that allows Do on any lib.Manifest, 2018-08-20, openshift#10). But, only waiting when 'updated' is true is a weak block, because if/when we fail to complete, Task.Run will back-off and call builder.Apply again. That new Apply will see the already-updated object, set 'updated' false, and not wait. So whether we block or not is orthogonal to 'updated'; nobody cares about whether the most recent update happened in this builder.Apply, this sync cycle, or a previous cycle. We don't even care all that much about whether the Deployment, DaemonSet, CustomResourceDefinition, or Job succeeded. Most feedback is going to come from the ClusterOperator, so with this commit we continue past the resource wait-for unless the resource is really hurting, in which case we fail immediately (inside builder.Apply, Task.Run will still hit us a few times) to bubble that up. In situations where we don't see anything too terrible going on, we'll continue on past and later block on ClusterOperator not being ready. There's no object status for CRDs or DaemonSets that marks "we are really hurting". The v1.18.0 Kubernetes CRD and DaemonSet controllers do not set any conditions in their operand status (although the API for those conditions exists [1,2]). With this commit, we have very minimal wait logic for either. Sufficiently unhealthy DaemonSet should be reported on via their associated ClusterOperator, and sufficiently unhealthy CRD should be reported on when we fail to push any custom resources consuming them (Task.Run retries will give the API server time to ready itself after accepting a CRD update before the CVO fails its sync cycle). We still need the public WaitForJobCompletion, because fetchUpdatePayloadToDir uses it to wait on the release download. Also expand "iff" -> "if and only if" while I'm touching that line, at Jack's suggestion [3]. [1]: https://github.com/kubernetes/api/blob/v0.18.0/apps/v1/types.go#L586-L590 [2]: https://github.com/kubernetes/apiextensions-apiserver/blob/v0.18.0/pkg/apis/apiextensions/types.go#L319-L320 [3]: openshift#400 (comment)
…ow?" We've had 'if updated' guards around waitFor*Completion since the library landed in 2d334c2 (lib: add resource builder that allows Do on any lib.Manifest, 2018-08-20, openshift#10). But, only waiting when 'updated' is true is a weak block, because if/when we fail to complete, Task.Run will back-off and call builder.Apply again. That new Apply will see the already-updated object, set 'updated' false, and not wait. So whether we block or not is orthogonal to 'updated'; nobody cares about whether the most recent update happened in this builder.Apply, this sync cycle, or a previous cycle. We don't even care all that much about whether the Deployment, DaemonSet, CustomResourceDefinition, or Job succeeded. Most feedback is going to come from the ClusterOperator, so with this commit we continue past the resource wait-for unless the resource is really hurting, in which case we fail immediately (inside builder.Apply, Task.Run will still hit us a few times) to bubble that up. In situations where we don't see anything too terrible going on, we'll continue on past and later block on ClusterOperator not being ready. The "unknown state" Deployment logging has changed a bit. I'd initially dropped it, but Jack suggested keeping it to make identifying broken-Deployment-controller and similar situations easier [1]. Previously it was logged when we weren't happy with updatedReplicas and unavailableReplicas, nothing obviously bad was happening, and we were not Progressing=True. We no longer check updatedReplicas or unavailableReplicas, so now it's just "nothing obviously bad is happening, but that may just be because the Deployment controller isn't giving us any of the oconditions we look at to judge badness". It's possible that we should also check for "when we do have those conditions, the values are either True or False, not some unexpected key". But I'm leaving that alone for now. There's no object status for CRDs or DaemonSets that marks "we are really hurting". The v1.18.0 Kubernetes CRD and DaemonSet controllers do not set any conditions in their operand status (although the API for those conditions exists [2,3]). With this commit, we have very minimal wait logic for either. Sufficiently unhealthy DaemonSet should be reported on via their associated ClusterOperator, and sufficiently unhealthy CRD should be reported on when we fail to push any custom resources consuming them (Task.Run retries will give the API server time to ready itself after accepting a CRD update before the CVO fails its sync cycle). We still need the public WaitForJobCompletion, because fetchUpdatePayloadToDir uses it to wait on the release download. Also expand "iff" -> "if and only if" while I'm touching that line, at Jack's suggestion [4]. [1]: openshift#400 (comment) [2]: https://github.com/kubernetes/api/blob/v0.18.0/apps/v1/types.go#L586-L590 [3]: https://github.com/kubernetes/apiextensions-apiserver/blob/v0.18.0/pkg/apis/apiextensions/types.go#L319-L320 [4]: openshift#400 (comment)
This wasn't covered when ClusterRole reconciliation landed in 697cbf6 (lib: update resource{read,merge,apply} to add new objects, 2018-08-20, openshift#10), but the property existed even back then: $ git grep -A20 'type ClusterRole struct' 697cbf6 | grep '[-][[:space:]]Aggrega tionRule' 697cbf6:vendor/k8s.io/api/rbac/v1/types.go- AggregationRule *AggregationRule `json:"aggregationRule,omitempty" protobuf:"bytes,3,opt,name=aggregationRule"` 697cbf6:vendor/k8s.io/api/rbac/v1alpha1/types.go- AggregationRule *AggregationRule `json:"aggregationRule,omitempty" protobuf:"bytes,3,opt,name=aggregationRule"` 697cbf6:vendor/k8s.io/api/rbac/v1beta1/types.go- AggregationRule *AggregationRule `json:"aggregationRule,omitempty" protobuf:"bytes,3,opt,name=aggregationRule"` and now folks want to use aggregationRule for something [1]. [1]: openshift/machine-api-operator#795 // EnsureRoleBinding ensures that the existing matches the required. diff --git a/lib/resourcemerge/rbacv1beta1.go b/lib/resourcemerge/rbacv1beta1.go index d9e84d5..fa3cf17 100644 --- a/lib/resourcemerge/rbacv1beta1.go +++ b/lib/resourcemerge/rbacv1beta1.go @@ -27,6 +27,10 @@ func EnsureClusterRolev1beta1(modified *bool, existing *rbacv1beta1.ClusterRole, *modified = true existing.Rules = required.Rules } + if !equality.Semantic.DeepEqual(existing.AggregationRule, required.AggregationRule) { + *modified = true + existing.AggregationRule = required.AggregationRule + } } // EnsureRoleBindingv1beta1 ensures that the existing matches the required.
This wasn't covered when ClusterRole reconciliation landed in 697cbf6 (lib: update resource{read,merge,apply} to add new objects, 2018-08-20, openshift#10), but the property existed even back then: $ git grep -A20 'type ClusterRole struct' 697cbf6 | grep '[-][[:space:]]Aggrega tionRule' 697cbf6:vendor/k8s.io/api/rbac/v1/types.go- AggregationRule *AggregationRule `json:"aggregationRule,omitempty" protobuf:"bytes,3,opt,name=aggregationRule"` 697cbf6:vendor/k8s.io/api/rbac/v1alpha1/types.go- AggregationRule *AggregationRule `json:"aggregationRule,omitempty" protobuf:"bytes,3,opt,name=aggregationRule"` 697cbf6:vendor/k8s.io/api/rbac/v1beta1/types.go- AggregationRule *AggregationRule `json:"aggregationRule,omitempty" protobuf:"bytes,3,opt,name=aggregationRule"` and now folks want to use aggregationRule for something [1]. [1]: openshift/machine-api-operator#795
contains the update payload
apply the update payload in order.
/assign @crawford @yifan-gu