Skip to content

Bug 1870158: Use mergo to merge webhooks instead of custom apply#707

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
JoelSpeed:mergo-webhooks
Sep 26, 2020
Merged

Bug 1870158: Use mergo to merge webhooks instead of custom apply#707
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
JoelSpeed:mergo-webhooks

Conversation

@JoelSpeed
Copy link
Contributor

This replaces our custom apply logic with a simpler merging algorithm which is implemented using mergo.

The idea behind this is that we should be able to preserve things like the CABundle which is set outside the scope of this reconcilliation, but overwrite the webhooks to a know expected configuration whenever we reconcile and find a difference.

I will update this description once I have tested this on a real cluster, hopefully today, but I think this should be ok as it is.
I have one concern which is that we are using Update to update the resource, this could cause the webhooks to always be increasing their generation as we will always call update even if it's a no-op. An alternative which would reduce API calls is to switch to using a patch, If the testing reveals that this is an issue, I will investigate switching to patching which should allow us to not send an api call whenever there is nothing to be done

This replaces our custom apply logic with a simpler merging algorithm 
which is implemented using mergo.

The idea behind this is that we should be able to preserve things like 
the CABundle which is set outside the scope of this reconcilliation, but 
overwrite the webhooks to a know expected configuration whenever we 
reconcile and find a difference
@JoelSpeed
Copy link
Contributor Author

I've done some manual testing of this and am happy with the way it's working, I can update fields and it puts everything back to the way we are expecting it to. If I modify a field that isn't managed, it doesn't replace it, this is expected and desired. Anything we deem critical to the configuration, should be managed by the operator

Copy link
Contributor

@michaelgugino michaelgugino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we're doing merges. We know what field we want to persist. We should scrape any info we want to persist (ca certs) from the old copy if present and inject into the new copy, and we should just apply the object definition.

Copy link

@Danil-Grigorev Danil-Grigorev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly like this implementation for its simplicity, but I have concerns about its flexibility, and the fact that we go away from both "spec hashing" and generation comparison, when objectively the last one with the generation comparison will allow us to simplify the code even more, and delegate the task of resource comparison to the server with all defaulting, etc.. If we are ok with some things being preserved just because we don't set them... Then it is fine, I guess

}

// The webhook already exists, so merge the existing fields with the desired fields
if err := mergo.Merge(expected, current); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it going to preserve custom labels we don't set on from the current resource? Are we ok with that? cc @michaelgugino

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it will, custom labels and annotations or even finalizers should someone wish to add one, will be preserved. I think this is desirable. We should try to be nice and allow users to modify things if they so desire, just as long as it doesn't affect the operation of our webhooks. Which in this case, I don't think it will.

withCABundle := defaultConfiguration.DeepCopy()
for i, webhook := range withCABundle.Webhooks {
webhook.ClientConfig.CABundle = []byte("test")
webhook.TimeoutSeconds = pointer.Int32Ptr(10)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be there. If someone changes default timeout onto 10 seconds, we should bring it back to the one we want to see there (even if it is equal, but we don't set it, we need to apply)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the timeout affect the operation of the webhook? If you think having the timeout as a particular value is critical to the operation of the webhooks then we should be forcing it. I'm currently leaning towards we don't really care about this value (hence using it in the tests), but am open to discussing it if you feel strongly about this.

@JoelSpeed
Copy link
Contributor Author

/retest

@JoelSpeed
Copy link
Contributor Author

I don't understand why we're doing merges. We know what field we want to persist. We should scrape any info we want to persist (ca certs) from the old copy if present and inject into the new copy, and we should just apply the object definition.

We discussed this yesterday in the arch call so would be good to know if you've changed your mind at all, but to summarise my thoughts on this:

  • We currently have a two way merging algorithm for Deployments and DaemonSets, we are trying to keep behaviour consistent with these
  • By merging, we allow users to update certain things that they may want to, and are valid to update, for example labels and annotations (as long as they don't conflict with what we are setting) are valid for a user to update, and as such, we will want to merge these in with our desired values
  • There are options in the MWC/VWC which are defaulted by the API server when created, we don't have opinions on certain values here, (eg the timeout) so it doesn't matter to us (it won't break the webhooks) if a user wants to modify this, so do we want to force that to a certain value or shall we just keep whatever is on the API? If the latter, this can be achieved by merging
  • Any value which is critical to the functioning of the MWC/VWC should be specified in our desired version, and these values are persisted by this algorithm so users won't be able to overwrite them. (Eg we only want MWC to apply to CREATE operations, this algorithm forces that, while (IIRC) the existing algorithm would allow users to append items to this list)
  • By merging, we could, in the future, reduce the API calls we are making by switching to a patch, this would allow us to be "proper" and determine if there are actually any changes to be made rather than just blindly sending updates to the API to determine if there were any changes or not, without merging we couldn't do this.
  • Merging allows us to preserve the values that we want to be set externally (eg CA Bundle) without having to worry about every individual value (eg if we wanted some other component to set another field at a later date, so long as it doesn't conflict with our "defaults", we don't need to change the code here)

I mostly like this implementation for its simplicity, but I have concerns about its flexibility, and the fact that we go away from both "spec hashing" and generation comparison, when objectively the last one with the generation comparison will allow us to simplify the code even more, and delegate the task of resource comparison to the server with all defaulting, etc.. If we are ok with some things being preserved just because we don't set them... Then it is fine, I guess

I think being more prescriptive in this case is actually beneficial, for example, in the current merge algorithm, users can (IIRC) add extra verbs to the rules and the algorithm will keep that, or they can add extra webhooks to the list and these will be preserved, I would argue that this is not desirable behaviour and that we should be dropping these extra items that the users are adding.

We already have generation comparison, though this PR is removing it. I don't think having just generation comparison is a viable solution here. With or without generation comparison, we still need some way to merge the current and desired specifications. All generation comparison gets us (which we could include with this method) is that we short circuit earlier if we notice that the generation hasn't changed since the last time we observed it.

I'm on the fence about how much we should or shouldn't be relying on the server to do the defaulting, bear in mind that if everyone takes the approach of throw it at the API server and have it handle all of the logic, the API server in a cluster will become much busier than it may need to be. Long term if we can handle some of this logic ourself and determine that actually we have nothing to patch, we can do some of the work and take this load off of the API server, spreading the workload out and theoretically making the entire system more stable.

@enxebre
Copy link
Member

enxebre commented Sep 25, 2020

/approve

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 25, 2020
@JoelSpeed JoelSpeed changed the title Use mergo to merge webhooks instead of custom apply Bug 1870158: Use mergo to merge webhooks instead of custom apply Sep 25, 2020
@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label Sep 25, 2020
@openshift-ci-robot
Copy link
Contributor

@JoelSpeed: This pull request references Bugzilla bug 1870158, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

Bug 1870158: Use mergo to merge webhooks instead of custom apply

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Sep 25, 2020
@Danil-Grigorev
Copy link

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 25, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

5 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot
Copy link
Contributor

@JoelSpeed: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-azure 760561c link /test e2e-azure
ci/prow/e2e-gcp 760561c link /test e2e-gcp

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit b7d860f into openshift:master Sep 26, 2020
@openshift-ci-robot
Copy link
Contributor

@JoelSpeed: All pull requests linked via external trackers have merged:

Bugzilla bug 1870158 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1870158: Use mergo to merge webhooks instead of custom apply

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants