Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ConfigurationPolicy event driven by default #274

Merged

Conversation

mprahl
Copy link
Member

@mprahl mprahl commented Jul 8, 2024

The user can still set evaluationInterval to go back to the legacy
polling behavior.

This also removes the code to remove the legacy finalizer on the Deployment and the common_related_objects metric.

Relates:
https://issues.redhat.com/browse/ACM-11666

This also involved updating client-go and related packages.

Signed-off-by: mprahl <[email protected]>
@mprahl mprahl marked this pull request as draft July 8, 2024 17:25
@mprahl mprahl changed the title WIP: Make ConfigurationPolicy event driven by default Make ConfigurationPolicy event driven by default Jul 8, 2024
@mprahl mprahl force-pushed the watch-based branch 5 times, most recently from 043a724 to f50a70c Compare July 9, 2024 18:13
@mprahl mprahl marked this pull request as ready for review July 9, 2024 19:07
@mprahl mprahl requested review from dhaiducek and JustinKuli July 9, 2024 19:08
@mprahl
Copy link
Member Author

mprahl commented Jul 9, 2024

/hold for reviews

@@ -84,29 +85,34 @@ func (t Target) String() string {
}

// EvaluationInterval configures the minimum elapsed time before a configuration policy is
// reevaluated. If the policy spec is changed, or if the list of namespaces selected by the policy
// reevaluated. This defaults to "watch" to leverage Kubernetes API watches instead of polling the Kubernetes API
Copy link
Contributor

@yiraeChristineKim yiraeChristineKim Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question. When it is watch, how does policy detect and change the status when the resource is created by configuration policy is changed, created or deleted (ex: change status from compliant -> noncompliant)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any query using the DynamicWatcher client or from a template function executed in TemplateResolver causes a watch for that ConfigurationPolicy and triggers a reconcile of the ConfigurationPolicy when any of those objects change. Those reconciles are triggered from the additional source added in the SetupWithManager method.

}

func (r *NamespaceSelectorReconciler) update(name string, sel namespaceSelection) {
func splitKey(key string) (string, string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this return (namespace, name).. ? Can we write return value in function description (comment)

Copy link
Member

@JustinKuli JustinKuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly small-ish things, I think it's good overall (and of course the passing tests are a great sign).

This should be really nice for performance, likely reacting more quickly to changes in the cluster, while also doing fewer repetitive and wasted computations.

Comment on lines +130 to +135
func (e EvaluationInterval) IsWatchForCompliant() bool {
return e.Compliant == "" || e.Compliant == "watch"
}

func (e EvaluationInterval) IsWatchForNonCompliant() bool {
return e.NonCompliant == "" || e.NonCompliant == "watch"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't fully realized that you'd be able to set the watch mode separately for compliant and noncompliant - this could be really handy if the watches are potentially a performance concern.

@@ -282,7 +296,7 @@ type ConfigurationPolicySpec struct {

// ComplianceState reports the observed status from the definitions of the policy.
//
// +kubebuilder:validation:Enum=Compliant;Pending;NonCompliant;Terminating
// +kubebuilder:validation:Enum=Compliant;Pending;NonCompliant;Terminating;UnknownCompliancy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? I'd rather not have another option for it if possible... I didn't see where it was used elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JustinKuli there was a bug in the code during development and the status update was failing because UnknownCompliancy was not a valid value. I just added it here since I thought it was an oversight since it is a value the status update code can set.

Is it your preference that I remove it? It should not be possible to get this value in ConfigurationPolicy but I wondered if having a status of UnknownCompliancy is better than the status update failing if a bug is introduced.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep it for completeness. Do you mean that it can be set by status-sync or something else in the framework?

If we were making it from scratch, my preference would just be to have an empty value imply an unknown compliance, as opposed to a special value. But if other things already use UnknownCompliancy, then it's not worth changing everywhere, and it should at least be mentioned here for user clarity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just remove it.

main.go Outdated Show resolved Hide resolved
main.go Outdated
if err = nsSelReconciler.SetupWithManager(nsSelMgr); err != nil {
log.Error(err, "Unable to create controller", "controller", "NamespaceSelector")
os.Exit(1)
}

discoveryClient := discovery.NewDiscoveryClientForConfigOrDie(targetK8sConfig)

serverVersion, err := discoveryClient.ServerVersion()
var err error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly unnecessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this since it doesn't overwrite the err value from the above scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it shadows the old err, instead of overwriting it? Maybe this is another case where it just needs a different name.

But this is pretty minor - we can keep it as-is if you really prefer.

main.go Outdated
Comment on lines 676 to 679
&opts.maxEvalFrequencySeconds,
"max-evaluation-frequency",
10,
"The status update frequency (in seconds) of a mutation policy",
"The number of seconds before a policy is eligible for reevaluation (throttles frequently evaluated policies)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The combination of "max" and "frequency" here is confusing - is 1 eval / 10 seconds the fastest or slowest it would go? The description sounds more like a minimum amount of time between reconciles.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After getting Dale's opinion, I'm changing this to evaluation-backoff.

controllers/configurationpolicy_controller.go Outdated Show resolved Hide resolved
r.discoveryInfo.apiGroups = apiGroups
log.Error(
err,
"The policy has an invalid evaluation interval; defaulting to watch",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But hasn't it already removed the watches? In this case does it constantly requeue immediately?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Fixing.

controllers/configurationpolicy_controller.go Outdated Show resolved Hide resolved
)
if err != nil {
log.V(2).Error(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what .V(2).Error will do combined... do you have an example output?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it changed the log level but still display the stack trace but that's not documented behavior from what I can tell, so I'll change it to an info log with the error message as a key/value.

controllers/configurationpolicy_controller.go Outdated Show resolved Hide resolved
@mprahl mprahl force-pushed the watch-based branch 2 times, most recently from 1f922b0 to a4e7979 Compare July 11, 2024 14:13
Copy link
Member

@dhaiducek dhaiducek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beginnings of a review--I have some feedback on the CRD descriptions 🙂

api/v1/configurationpolicy_types.go Outdated Show resolved Hide resolved
api/v1/configurationpolicy_types.go Outdated Show resolved Hide resolved
api/v1/configurationpolicy_types.go Outdated Show resolved Hide resolved
api/v1/configurationpolicy_types.go Outdated Show resolved Hide resolved
mprahl added 2 commits July 12, 2024 10:23
The user can still set `evaluationInterval` to go back to the legacy
polling behavior.

This also removes the code to remove the legacy finalizer on the
deployment since it has been multiple releases and there isn't an ideal
place to run this code.

Relates:
https://issues.redhat.com/browse/ACM-11666

Signed-off-by: mprahl <[email protected]>
The metric was not concurrency safe due to writing slices without locks
and so would provide unreliable results. This was more apparent with
event driven ConfigurationPolicy with --max-evaluation-frequency=1 set.

Since the metric is undocumented and it's not known to be used, let's
remove this technical debt rather than fix it.

Signed-off-by: mprahl <[email protected]>
@mprahl mprahl requested a review from dhaiducek July 12, 2024 14:28
@mprahl
Copy link
Member Author

mprahl commented Jul 12, 2024

Okay, the latest push fixes the CI failure of when an object deletion hangs from prune object behavior.

Copy link
Member

@JustinKuli JustinKuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

I believe all of my requests were addressed and this is still looking good. Holding for any other reviews.

@openshift-ci openshift-ci bot added the lgtm label Jul 16, 2024
Copy link

openshift-ci bot commented Jul 16, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JustinKuli, mprahl, yiraeChristineKim

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [JustinKuli,mprahl,yiraeChristineKim]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mprahl
Copy link
Member Author

mprahl commented Jul 16, 2024

/unhold

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants