-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ConfigurationPolicy event driven by default #274
Make ConfigurationPolicy event driven by default #274
Conversation
This also involved updating client-go and related packages. Signed-off-by: mprahl <[email protected]>
043a724
to
f50a70c
Compare
/hold for reviews |
api/v1/configurationpolicy_types.go
Outdated
@@ -84,29 +85,34 @@ func (t Target) String() string { | |||
} | |||
|
|||
// EvaluationInterval configures the minimum elapsed time before a configuration policy is | |||
// reevaluated. If the policy spec is changed, or if the list of namespaces selected by the policy | |||
// reevaluated. This defaults to "watch" to leverage Kubernetes API watches instead of polling the Kubernetes API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question. When it is watch, how does policy detect and change the status when the resource is created by configuration policy is changed, created or deleted (ex: change status from compliant -> noncompliant)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any query using the DynamicWatcher
client or from a template function executed in TemplateResolver
causes a watch for that ConfigurationPolicy
and triggers a reconcile of the ConfigurationPolicy
when any of those objects change. Those reconciles are triggered from the additional source added in the SetupWithManager
method.
} | ||
|
||
func (r *NamespaceSelectorReconciler) update(name string, sel namespaceSelection) { | ||
func splitKey(key string) (string, string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this return (namespace, name).. ? Can we write return value in function description (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly small-ish things, I think it's good overall (and of course the passing tests are a great sign).
This should be really nice for performance, likely reacting more quickly to changes in the cluster, while also doing fewer repetitive and wasted computations.
func (e EvaluationInterval) IsWatchForCompliant() bool { | ||
return e.Compliant == "" || e.Compliant == "watch" | ||
} | ||
|
||
func (e EvaluationInterval) IsWatchForNonCompliant() bool { | ||
return e.NonCompliant == "" || e.NonCompliant == "watch" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't fully realized that you'd be able to set the watch mode separately for compliant and noncompliant - this could be really handy if the watches are potentially a performance concern.
api/v1/configurationpolicy_types.go
Outdated
@@ -282,7 +296,7 @@ type ConfigurationPolicySpec struct { | |||
|
|||
// ComplianceState reports the observed status from the definitions of the policy. | |||
// | |||
// +kubebuilder:validation:Enum=Compliant;Pending;NonCompliant;Terminating | |||
// +kubebuilder:validation:Enum=Compliant;Pending;NonCompliant;Terminating;UnknownCompliancy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? I'd rather not have another option for it if possible... I didn't see where it was used elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JustinKuli there was a bug in the code during development and the status update was failing because UnknownCompliancy
was not a valid value. I just added it here since I thought it was an oversight since it is a value the status update code can set.
Is it your preference that I remove it? It should not be possible to get this value in ConfigurationPolicy but I wondered if having a status of UnknownCompliancy is better than the status update failing if a bug is introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep it for completeness. Do you mean that it can be set by status-sync or something else in the framework?
If we were making it from scratch, my preference would just be to have an empty value imply an unknown compliance, as opposed to a special value. But if other things already use UnknownCompliancy, then it's not worth changing everywhere, and it should at least be mentioned here for user clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just remove it.
main.go
Outdated
if err = nsSelReconciler.SetupWithManager(nsSelMgr); err != nil { | ||
log.Error(err, "Unable to create controller", "controller", "NamespaceSelector") | ||
os.Exit(1) | ||
} | ||
|
||
discoveryClient := discovery.NewDiscoveryClientForConfigOrDie(targetK8sConfig) | ||
|
||
serverVersion, err := discoveryClient.ServerVersion() | ||
var err error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this since it doesn't overwrite the err
value from the above scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it shadows the old err
, instead of overwriting it? Maybe this is another case where it just needs a different name.
But this is pretty minor - we can keep it as-is if you really prefer.
main.go
Outdated
&opts.maxEvalFrequencySeconds, | ||
"max-evaluation-frequency", | ||
10, | ||
"The status update frequency (in seconds) of a mutation policy", | ||
"The number of seconds before a policy is eligible for reevaluation (throttles frequently evaluated policies)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The combination of "max" and "frequency" here is confusing - is 1 eval / 10 seconds the fastest or slowest it would go? The description sounds more like a minimum amount of time between reconciles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After getting Dale's opinion, I'm changing this to evaluation-backoff
.
r.discoveryInfo.apiGroups = apiGroups | ||
log.Error( | ||
err, | ||
"The policy has an invalid evaluation interval; defaulting to watch", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But hasn't it already removed the watches? In this case does it constantly requeue immediately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Fixing.
) | ||
if err != nil { | ||
log.V(2).Error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what .V(2).Error
will do combined... do you have an example output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it changed the log level but still display the stack trace but that's not documented behavior from what I can tell, so I'll change it to an info log with the error message as a key/value.
1f922b0
to
a4e7979
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beginnings of a review--I have some feedback on the CRD descriptions 🙂
The user can still set `evaluationInterval` to go back to the legacy polling behavior. This also removes the code to remove the legacy finalizer on the deployment since it has been multiple releases and there isn't an ideal place to run this code. Relates: https://issues.redhat.com/browse/ACM-11666 Signed-off-by: mprahl <[email protected]>
The metric was not concurrency safe due to writing slices without locks and so would provide unreliable results. This was more apparent with event driven ConfigurationPolicy with --max-evaluation-frequency=1 set. Since the metric is undocumented and it's not known to be used, let's remove this technical debt rather than fix it. Signed-off-by: mprahl <[email protected]>
Okay, the latest push fixes the CI failure of when an object deletion hangs from prune object behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold
I believe all of my requests were addressed and this is still looking good. Holding for any other reviews.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JustinKuli, mprahl, yiraeChristineKim The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |
c2a8cac
into
open-cluster-management-io:main
The user can still set
evaluationInterval
to go back to the legacypolling behavior.
This also removes the code to remove the legacy finalizer on the Deployment and the common_related_objects metric.
Relates:
https://issues.redhat.com/browse/ACM-11666