Add Validating/MutatingWebhookConfiguration structured Apply #836
Add Validating/MutatingWebhookConfiguration structured Apply #836openshift-merge-robot merged 2 commits intoopenshift:masterfrom
Conversation
|
/retest |
| } | ||
| } | ||
|
|
||
| // Explcitily specify type for requiredUnstr to get object meta accessor |
| // usntructured will be merged with the existing usntructured and an update performed if the | ||
| // usntructured spec and metadata differ from the previously required spec and metadata. | ||
| // For further detail, check the top-level comment. | ||
| func applyUnstructured(resourceClient dynamic.ResourceInterface, path string, recorder events.Recorder, |
There was a problem hiding this comment.
doesn't this ask for hot loops? Apply in a generic way does not really make sense IMO.
There was a problem hiding this comment.
@sttts So you think this needs an option to forceApply, same as it is for Deployments and DaemonSets? Or it should repeat the attempt couple of times before failing?
There was a problem hiding this comment.
@Danil-Grigorev Can you please build this into the MAO and try it on a real cluster to see how it behaves?
I don't think it should hot loop, it's based on the ApplyDeployment logic, so if that doesn't hot loop I think this shouldn't? If the metadata and spec that we input doesn't change, then the spec hash annotation shouldn't change and we should short circuit on line 42 before we attempt to update again. @sttts Is there something in this in particular that makes you think it would hot loop?
There was a problem hiding this comment.
This func looks generic. But it is wrong in most cases. It does not consider things like list-type markers (compare https://github.com/kubernetes/enhancements/blob/6a1ac61fbfd64d49e13fae3a8d773dd4ebda53c8/keps/sig-api-machinery/0006-apply.md). It might be fine for one specific resource. But having this helper here encourages simple, but wrong code.
There was a problem hiding this comment.
Once this PR passes e2e tests, we would have the proof that hot loops do not occur in current implementation: openshift/machine-api-operator#642
There was a problem hiding this comment.
@Danil-Grigorev please check #836 (comment). This is a blocker for the PR.
0efbe2b to
7e2669b
Compare
cc20589 to
a9808ab
Compare
|
|
||
| // This is based on resourcemerge.EnsureObjectMeta but uses the metav1.Object interface instead | ||
| // TODO: Update this to use resourcemerge.EnsureObjectMeta or update resourcemerge.EnsureObjectMeta to use the interface | ||
| func ensureObjectMeta(existing, required metav1.Object) (bool, error) { |
There was a problem hiding this comment.
what do we do for other resources? I looks wrong to have this here in a generic way, but not use it for other resources as well.
There was a problem hiding this comment.
Other resources use
library-go/pkg/operator/resource/resourcemerge/object_merger.go
Lines 10 to 17 in 100bf3f
metav1.Object interface, even though it could.
We could update the public implementation so we only have one version but that would be a breaking change to the method since it accepts a *metav1.ObjectMeta and a metav1.ObjectMeta, the latter not satisfying the interface
|
Why is this using unstructured after all an not a typed client as we do for all other resources? /hold |
17ea56f to
a9808ab
Compare
For background on why we are proposing this change: We would like to be able to reconfigure the MWC/VWC resources in the future so looked at how we could do this. For now, we looked at bringing the management of the MWC/VWC into the MAO with our other resources. Other resources being the Deployment and DaemonSet we manage. They use the library-go We then thought it would be good to have the same for MWC and VWC. However, the logic for The algorithm we have in By copying the CABundles specifically in the MWC/VWC implementation, we can get around the problem that an external party needs to modify an otherwise managed object, but because we do that before the spec hash is calculated, we are safe for future updates. There is prior art for using unstructured in this package as While I agree that ideally we would use server side apply, I didn't think that was ready for general consumption yet? This logic replicates what we have already with other resources that have a spec, if the spec has changed, we overwrite it completely. CVO does this today for many resources already, and this just mimics that, so I don't think we are making things worse really? Long term, it is my belief that CVO should be able to manage MWC and VWC resources and handle the update to the CABundle (at the moment it uses a generic apply which just replaces the spec verabatim if there's a difference), if there is a better way to approach this problem and solve the issue for the wider Openshift community then I'm happy to help implement it |
|
FWIW, I've manually tested this code which is being used in openshift/machine-api-operator#642 this morning and it is working pretty much as expected, there was one minor discrepancy but that was within the MAO repo and I believe it has now been fixed. I did not observe any hot loops, as I modified the resource spec it got put back and kept the CABundle as desired. |
I was not on the ball this morning, ignore that previous comment, it was completely wrong 🤦 Too much context switching right now |
|
/retest |
|
@JoelSpeed I understand your motiviation to add webhook support here. And I agree it makes sense. But for this very resourceapply package it is harmful to have a generic implementation. All generic implementations are wrong if defaulting happens in the apiserver. This even applies to specific implementations if upstream adds defaults and we forget to update our apply funcs. Hence, we have to be super careful with the logic here for each single resource type, otherwise we end up in hot loops. tl/dr: Not having too many tempting generic helpers is a feature of this package, not code smell. Maybe we can have some common logic like for ObjectMeta if that makes sense. Am not against that. |
Ok, this makes sense. How would you suggest we approach this then? Should we rewrite this to be structured and specific for the MWC/VWC, would specific implementations for these be acceptable for this package? This would then allow us to add CVO support for MWC/VWC, do you think that would be accepted? Or should we just keep our own implementation and manage the webhooks from MAO long term?
How do you mean for that? I don't follow. |
You had some logic about ObjectMeta which looked generic. If it is and matches what we do everywhere, that could make sense. But this should go to another independent PR. |
Yes. Similarly as for the other types. Be careful with the defaults that kube applies. You can find the defaulters here: You have to manually replicate that logic. And yes, this is super tedious and error-prone, and for many of the supported types here we did not do that in a complete way, i.e. we lack applying defaults risking hotloops. That's why we must be so careful in this package. Manual apply requires this kind of deep knowledge. This is the reason for server side apply upstream that we can eventually use some day. |
|
Ok, thanks for the guidance @sttts! I will work with @Danil-Grigorev over the next few days to re-approach this |
a9808ab to
0f3ca2b
Compare
JoelSpeed
left a comment
There was a problem hiding this comment.
Looks good. I am slightly concerned that we aren't accounting for the defaulting as suggested by @sttts, I think if we take the defaulting functions linked below and copy them here, we can run our required object through that before we start comparing it, then we shouldn't end up conflicting and hot looping with the defaulted fields in existing object.
We should look around this package to see how other implementations are handling this
| if !equality.Semantic.DeepEqual(existing.Webhooks, required.Webhooks) { | ||
| *modified = true | ||
| existing.Webhooks = required.Webhooks | ||
| } |
There was a problem hiding this comment.
Is there any defaulting within the Webhooks that we need to account for here? We could end up in a hot loop if there's server side defaulting that we are missing
| if !equality.Semantic.DeepEqual(existing.Webhooks, required.Webhooks) { | ||
| *modified = true | ||
| existing.Webhooks = required.Webhooks | ||
| } |
There was a problem hiding this comment.
Is there any defaulting within the Webhooks that we need to account for here? We could end up in a hot loop if there's server side defaulting that we are missing
There was a problem hiding this comment.
Having to call defaulting in here is the reason for possible hot loops. If we pull server implementation for webhooks, someone would have to suffer a burden of maintaining it in sync with upstream, or otherwise, any defaulting implementation change would be a 100% cause for hot looping. The location @sttts pointed at, is an internal kubenetes/kubernetes package. If it is not exposed at https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/api/admissionregistration/v1, I don't think it is wise to use it and especially, copying that. I'm not sure that the strict rules for kubernetes api changes are applied there, so the probability of described situation is higher. Let's leave it to the server-side apply, as a future improvement.
Right now I could agree that if you immediately apply the updated resource, returned from ApplyMutatingWebhookConfiguration or ApplyValidatingWebhookConfigurationonce again, it would not be filtered by the logic, and Update call will still happen. Even if the resource is technically the same, from the algorithmic perspective you provided a different input, and the change should occur.
I'm adding the missing comment from resourceapply/apps.go from where the implementation was inspired, so the explanation will be preserved:
library-go/pkg/operator/resource/resourceapply/apps.go
Lines 21 to 51 in 3b4f06f
There was a problem hiding this comment.
While I agree with that comment for the deployment implementation, this does not match with the logic that you have in this implementation.
We are checking the Webhooks for equality here, we aren't solely relying on the spec hash. There is no way that the two sets of webhooks will be equal unless the input has already been defaulted, so this will currently cause a hot loop.
We have two options, we copy the logic from the K8s package (definitely do not import it), or we remove the lines I've highlighted here and solely rely on the spec hash annotation for detecting differences.
someone would have to suffer a burden of maintaining it in sync with upstream, or otherwise, any defaulting implementation change would be a 100% cause for hot looping.
This has already been discussed, this is what Stefan was saying about in his previous comments. We have to be careful when adding to this package because if we don't get the defaulting right, we will hot loop. He's already said it's annoying because we have to keep them up to date. See #836 (comment)
There was a problem hiding this comment.
Now I see it, thanks. Updated.
jsafrane
left a comment
There was a problem hiding this comment.
I tested ApplyValidatingWebhookConfiguration and it works as expected. I got one Generation bumped by caBundle injector, but after that things are pretty stable.
BTW, most (all?) my comments apply interchangeably both to ApplyMutatingWebhookConfiguration and ApplyValidatingWebhookConfiguration
| return nil, false, fmt.Errorf("Unexpected nil instead of an object") | ||
| } | ||
| required := requiredOriginal.DeepCopy() | ||
|
|
There was a problem hiding this comment.
Please add specHashAnnotation to the object, it will force update the webhook config when a new operator starts with a new version of the config.
See how ApplyDeployment does it. Hash webhooks instead of spec, of course.
There was a problem hiding this comment.
This is one thing which I don't think we should do. That mix of hashing and generation is what was causing the necessity of using server-side apply. If you really require to see the full populated spec of the resource, as the server returns it, you need to copy defaulting methods from the server. Otherwise, you have to merge the new spec with the previous one (which we could avoid). In any case, this brings chaos in the approach and make it actually difficult to maintain the state with other operators doing changes on the webhooks with the mentioned annotation.
If you'd like to force redeployment of the resource, the expected generation should be set to 0. This is also replicated on operator startup, as the actual cache for the generation will be empty.
I'll add a test case that will provide an example.
There was a problem hiding this comment.
Hash and generation serve different purposes.
- Generation (in
existing) signalizes that something outside of the operator has modified the WebhookConfiguration and the operator should overwrite these changes. - Hash of
requireddetects changes inrequired- e.g. when a new operator with say newwebhook[].timeoutstarts, it should update existing WebhookConfiguration. It does not hashexistingwith defaulted values and caBundle and whatnot! The hash should be very very stable, unless the operator really needs a differentrequiredthan it provided previously. We do the same for Deployments / DaemonSets and they don't mess up the generation at all.
There was a problem hiding this comment.
There are corner cases. Let's say we just created a new webhookConfiguration and didn't use some of the default values which are populated by the server, instead we set the same on the resource ourselves for "better visibility". Then after some time, we remove those - from the server perspective those resources are absolutely equal, so it should not do anything. But it has to apply a new version just because of the hash.
There was a problem hiding this comment.
IMO that's expected and OK. Hash is not a perfect solution, it may lead to unnecessary updates. But the update will be just one (to get the hash into annotation), not an endless loop of updates.
There was a problem hiding this comment.
This just seems like an unnecessary complication. Generation covers this use-case too, and we won't have to copy fields with changed certificates from https://github.com/openshift/service-ca-operator to calculate the hash correctly on each reconcile. The code is less error-prone IMO when we rely on a single source of truth, and generation provides a stable one.
There was a problem hiding this comment.
No, Generation does not cover this use case at all. And these are not a corner case either.
- Say, you have operator v4.7 that deploys ValidatingWebhookConfiguration with 1 webhook. It gets applied and gets
Generation: 2and the operator getscr.Status.Generations[..].LastGeneration: 2. - In operator v4.8, you add a new webhook, so the array has now 2 items.
cr.Statushas stillGenerations[..].LastGeneration: 2- ValidatingWebhookConfiguration has
Generation: 2, which matches the expected one.
-> The current code won't update the object at all! There will be still only one webhook[]. It will introduce hard to debug issues (trust me, been there...). Ad that's why we added hash of spec to Deployment+DaemonSet and we should add them to *WebhookConfiguration too.
There was a problem hiding this comment.
In this case, this is managed on a higher level. There is surely at least one restart for the operator, so the leader for handling this state will change. The approach is different from the typical one, but it is still the same from a functional perspective. We just store this "hash" approximation in the controller cache, which makes more sense, as the operator's responsibility is to apply his version of the resource at least once, but not more than that. In the case of the stored hash on the resource, it is worse - the new operator may not ever put his version of resource even on startup if someone would sneak in the correct spec hash, but pass a different spec content while the operator was down/upgrading etc..
There was a problem hiding this comment.
as the operator's responsibility is to apply his version of the resource at least once,
Why an operator must apply it at least once? Apply*WebhookConfig can see that everything is fine (using this hash) and nothing is needed.
Anyway, I can apply the has in the operator, but I'd bet some users of Apply* will get it wrong.
| result.Error = fmt.Errorf("missing kubeClient") | ||
| } | ||
| result.Result, result.Changed, result.Error = ApplyValidatingWebhookConfiguration(clients.kubeClient.AdmissionregistrationV1(), recorder, t) | ||
| result.Result, result.Changed, result.Error = ApplyValidatingWebhookConfiguration(clients.kubeClient.AdmissionregistrationV1(), recorder, t, 0) |
There was a problem hiding this comment.
Since they need expectedGeneration now, should we remove them from ApplyDirectly? It will keep overwriting the existing object and that's not the behavior we want to support.
There was a problem hiding this comment.
This behavior is equal to other ApplyDirectly methods
There was a problem hiding this comment.
CSIDriver does not have Generation and can't be (easily) compared, direct apply is the only way. Anyway, I am not going to use ApplyDirectly / StaticController, so feel free to ignore this comment.
| // and ignore clientConfig.caBundle changes on "inject-cabundle" label | ||
| if required.GetAnnotations() != nil && required.GetAnnotations()[genericCABundleInjectorAnnotation] != "" { | ||
| copyMutatingWebhookCABundle(existing, required) | ||
| } |
There was a problem hiding this comment.
can't we check in requiredOriginal whether the CA is set and keep the one on the server if that's not the case? I don't like a dependency on service-ca here. It can be other actors setting the CA.
There was a problem hiding this comment.
I changed it and then checked the operator behavior... It forces the caBundle, so will revert the change: https://github.com/openshift/service-ca-operator/blob/master/pkg/controller/cabundleinjector/mutatingwebhook.go#L48
03f39c4 to
e94d5be
Compare
| // and an update performed if the validatingwebhookconfiguration spec and metadata differ from | ||
| // the previously required spec and metadata based on generation change. | ||
| // | ||
| // expectedGeneration set to 0 disables generation tracking, forcing redeployment logic. |
There was a problem hiding this comment.
what is "forcing redeployment logic" ?
There was a problem hiding this comment.
This signature breaks every other user of the func. Do we want that? ApplyValidatingWebhookConfigurationWithGeneration would avoid that.
There was a problem hiding this comment.
Let's break consumers IMO. According to @jsafrane there is no other consumer so.
There was a problem hiding this comment.
This is used in this PR in ApplyDirectly. Maybe it should not be here in the first place, I'm not sure about a use-case either. But this comment is nothing new, just highlighting the side-effect of the chosen algorithm on Deployment/DaemonSet apply logic too. For some reason, they use forceRollout, but it is not needed -
| toWrite.Webhooks = required.Webhooks | ||
|
|
||
| actual, err := client.MutatingWebhookConfigurations().Update(context.TODO(), required, metav1.UpdateOptions{}) | ||
| klog.V(4).Infof("ValidatingWebhookConfiguration %q changes: %v", required.GetNamespace()+"/"+required.GetName(), JSONPatchNoError(existing, toWrite)) |
There was a problem hiding this comment.
there was a reason for if klog.V(4).Enabled() {
There was a problem hiding this comment.
It should be the same, according to docs:
// Thus, one may write either
// if glog.V(2).Enabled() { klog.Info("log this") }
// or
// klog.V(2).Info("log this")
// The second form is shorter but the first is cheaper if logging is off because it does
// not evaluate its arguments.| } | ||
|
|
||
| // copyValidatingWebhookCABundle populates webhooks[].clientConfig.caBundle fields from existing resource | ||
| func copyValidatingWebhookCABundle(existing, required *admissionregistrationv1.ValidatingWebhookConfiguration) { |
| toWrite := existingCopy // shallow copy so the code reads easier | ||
| // Providing upgrade compatibility with service-ca-bundle operator | ||
| // and ignore clientConfig.caBundle changes on "inject-cabundle" label | ||
| if required.GetAnnotations() != nil && required.GetAnnotations()[genericCABundleInjectorAnnotation] != "" { |
There was a problem hiding this comment.
I had a previous comment about this which seem to be gone: why don't we check for a CA cert in required, and keep the existing one if there isn't one enforced. I don't like this dependency on service-ca here. There might be different actors creating CA bundles.
There was a problem hiding this comment.
Sorry, I hoped it was unresolved. If we do that and the operator just does its job, then we will get to a hot loop situation with such an approach - https://github.com/openshift/library-go/pull/836/files#r533587142 as it enforces its certificate over the one present.
e94d5be to
606abe9
Compare
Matching existing implementation for apps.Deployment, the generation management is extended with support for MutatingWebhookConfigurations and ValidatingWebhookConfiguration - Implementation relies on usage of copyMutatingWebhookCABundle and copyValidatingWebhookCABundle for compatibility with service-ca-operator and "inject-cabundle" annotation - ApplyValidatingWebhooksConfiguration and ApplyMutatingWebhooksConfiguration tracks generation changes between resources, reducing the number of unwanted updates, and returns updated: true only when the Update call increased resource generation - meaning the change was registered.
606abe9 to
352bc2d
Compare
|
/retest |
| existingMutatingWebhooks[webhook.Name] = webhook | ||
| } | ||
|
|
||
| webhooks := make([]admissionregistrationv1.MutatingWebhook, len(to.Webhooks)) |
There was a problem hiding this comment.
why a new map? Just change the existing one.
There was a problem hiding this comment.
for i, wh := range to.Webhooks {
if existing, ok := fromMap[wh.Name]; ok && wh.ClientConfig.CABundle == nil {
to.Webhooks[i].ClientConfig.CABundle = existing.ClientConfig.CABundle
}
}
|
Up to #836 (comment) lgtm. @jsafrane can you do a final review? I will approve when lgtm'ed. |
…nly if is not set
a9ca3b2 to
13d8844
Compare
|
/lgtm |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Danil-Grigorev, jsafrane, sttts The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR adds
Applymethod in the library forValidatingWebhookConfigurationsandMutatingWebhookConfigurations, to efficiently update existing resources, and preserve the functionality ofservice-ca-bundleinject-cabundleannotations.The functionality relies solely on generation changes done by the server. To track generation changes and prevent unnecessary polling on resource updates, the helper generation caching is introduced. It should be used similarly to:
This PR is tested with e2e coverage in MAO with openshift/machine-api-operator#642
Fixes #953