feat: add generate operation and wait for VAPB generation#3573
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3573 +/- ##
==========================================
- Coverage 54.49% 47.64% -6.85%
==========================================
Files 134 236 +102
Lines 12329 19786 +7457
==========================================
+ Hits 6719 9428 +2709
- Misses 5116 9476 +4360
- Partials 494 882 +388
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
maxsmythe
left a comment
There was a problem hiding this comment.
Biggest concern is the timestamp stuff.
| DefaultGenerateVAP = flag.Bool("default-create-vap-for-templates", false, "Create VAP resource for template containing VAP-style CEL source. Allowed values are false: do not create Validating Admission Policy unless generateVAP: true is set on constraint template explicitly, true: create Validating Admission Policy unless generateVAP: false is set on constraint template explicitly.") | ||
| log = logf.Log.V(logging.DebugLevel).WithName("controller").WithValues(logging.Process, "constraint_controller") | ||
| discoveryErr *apiutil.ErrResourceDiscoveryFailed | ||
| DefaultWaitForGeneration = flag.Int("default-wait-for-generation", 30, "Wait to generate ValidatingAdmissionPolicyBinding after the constraint is created. Defaults to 30 seconds.") |
There was a problem hiding this comment.
Probably should not include the hardcoded time in the error text -- prone to bitrot.
| } | ||
| currentVapBinding = nil | ||
|
|
||
| if currentVapBinding == nil && instance.GetCreationTimestamp().Add(time.Duration(*DefaultWaitForGeneration)).Before(time.Now()) { |
There was a problem hiding this comment.
We should not rely on the creation timestamp... that leaves us vulnerable to clock skew since that is set by an unknown machine.
Also, as discussed in the OSS mtg, the time delay should be based off of the constraint template, specifically wait time seconds after the CRD is created from the template. There is no need to wait due to the constraint.
| if generateVAPB && groupVersion != nil { | ||
| currentVapBinding, err := vapBindingForVersion(*groupVersion) | ||
| if operations.IsAssigned(operations.Generate) { | ||
| err := util.ValidateEnforcementAction(enforcementAction, instance.Object) |
There was a problem hiding this comment.
VAP generation should probably be in its own function -- the reconcile function is unwieldy at this point.
Also, I'm not sure we want to only validate enforcement action when generate is enabled.
There was a problem hiding this comment.
refactored the code
| return reconcile.Result{}, err | ||
|
|
||
| if operations.IsAssigned(operations.Generate) { | ||
| isVapAPIEnabled := false |
There was a problem hiding this comment.
We should probably move generation lifecycle stuff to its own function.
| \n - --exempt-namespace=${GATEKEEPER_NAMESPACE}\ | ||
| \n - --operation=webhook\ | ||
| \n - --operation=mutation-webhook\ | ||
| \n - --operation=generate\ |
There was a problem hiding this comment.
this is added to controller-manager instead of audit. was this intentional?
There was a problem hiding this comment.
This was supposed to be added to audit pod. Fixed it.
a0fd55a to
a0d015e
Compare
| | enableK8sNativeValidation | Enable the K8s Native Validating driver to allow constraint templates to use rules written in VAP-style CEL (beta feature) | `true` | | ||
| | defaultCreateVAPForTemplates | (alpha) Create VAP resource for template containing VAP-style CEL source. Allowed values are false: do not create Validating Admission Policy unless generateVAP: true is set on constraint template explicitly, true: create Validating Admission Policy unless generateVAP: false is set on constraint template explicitly. | `false` | | ||
| | defaultCreateVAPBindingForConstraints | (alpha) Create VAPBinding resource for constraint of the template containing VAP-style CEL source. Allowed values are false: do not create Validating Admission Policy Binding, true: create Validating Admission Policy Binding. | `false` | | ||
| | defaultWaitForVAPBGeneration | (alpha) Wait to generate ValidatingAdmissionPolicyBinding after the constraint CRD is created. | `30` | |
There was a problem hiding this comment.
| | defaultWaitForVAPBGeneration | (alpha) Wait to generate ValidatingAdmissionPolicyBinding after the constraint CRD is created. | `30` | | |
| | defaultWaitForVAPBGeneration | (alpha) Wait time in seconds before generating a ValidatingAdmissionPolicyBinding after a constraint CRD is created. | `30` | |
| logger = log.Log.V(logging.DebugLevel).WithName("controller").WithValues("kind", "ConstraintTemplate", logging.Process, "constraint_template_controller") | ||
| discoveryErr *apiutil.ErrResourceDiscoveryFailed | ||
| logger = log.Log.V(logging.DebugLevel).WithName("controller").WithValues("kind", "ConstraintTemplate", logging.Process, "constraint_template_controller") | ||
| defaultWaitForVAPBGeneration = flag.Int("default-wait-for-vapb-generation", 30, "(alpha) Wait to generate ValidatingAdmissionPolicyBinding after the constraint CRD is created.") |
|
|
||
| func (r *ReconcileConstraint) generateVAPB(ctx context.Context, enforcementAction util.EnforcementAction, instance *unstructured.Unstructured, status *constraintstatusv1beta1.ConstraintPodStatus) (time.Duration, error) { | ||
| ret := time.Duration(0) | ||
| if !operations.IsAssigned(operations.Generate) { |
There was a problem hiding this comment.
if none of the controllers has the generate operation, how do we report this issue to the user?
There was a problem hiding this comment.
I have added log statement for now. Another option is to figure the same out by looking at all CTStatus resource in CTstatus controller - https://github.com/open-policy-agent/gatekeeper/blob/master/pkg/controller/constrainttemplatestatus/constrainttemplatestatus_controller.go#L185. However we might need one field under CT Status to post the error.
| if err != nil { | ||
| return err | ||
| } | ||
| if t.Before(currentTime.Add(time.Duration(*defaultWaitForVAPBGeneration*2) * time.Second)) { |
There was a problem hiding this comment.
why is this * 2? can you add a comment for this?
There was a problem hiding this comment.
I modified this to remove *2, added the comment as well.
|
|
||
| // waiting for sometime before generating vapbinding, gives api-server time to cache CRDs | ||
| timestamp := ct.Annotations[BlockVAPBGenerationUntilAnnotation] | ||
| if timestamp != "" { |
There was a problem hiding this comment.
Fixed this behavior.
| func (r *ReconcileConstraintTemplate) updateTemplateWithBlockVAPBGenerationAnnotations(ctx context.Context, ct *v1beta1.ConstraintTemplate) error { | ||
| currentTime := time.Now() | ||
| switch { | ||
| case ct.Annotations == nil || ct.Annotations[constraint.BlockVAPBGenerationUntilAnnotation] == "": |
There was a problem hiding this comment.
does this need to be a switch statement or can a if clause work better since some of the code below seems to be repeated.
There was a problem hiding this comment.
Updated the code to if stmt
| // currentTime := time.Now() | ||
| // if currentTime.Before(blockTime) { | ||
| // t.Fatal("VAPBinding should not be created before the timestamp", currentTime, blockTime) | ||
| // } |
There was a problem hiding this comment.
removed the commented block. Updated the tests as well.
| return err | ||
| } | ||
| // if wait time is within the time window to generate vap binding, do not update the annotation | ||
| if t.Before(currentTime.Add(time.Duration(*constraint.DefaultWaitForVAPBGeneration) * time.Second)) { |
There was a problem hiding this comment.
if annotation is already populated, then can we always skip an update? what's the use case for when the annotation needs to be updated with a new time?
There was a problem hiding this comment.
This ensures that VAPB is generated if a faulty actor has set the timestamp in the future manually—beyond the allowed delay configured by the flag.
There was a problem hiding this comment.
thanks for the context and it would be useful to add that as a comment.
| default: | ||
| // reconcile after default wait time for vapb generation if annotation is not set | ||
| if ct.Annotations == nil || ct.Annotations[BlockVAPBGenerationUntilAnnotation] == "" { | ||
| return time.Duration(*DefaultWaitForVAPBGeneration) * time.Second, r.reportErrorOnConstraintStatus(ctx, status, errors.New("annotation to wait for ValidatingAdmissionPolicyBinding generation not found"), "could not find annotation to wait for ValidatingAdmissionPolicyBinding generation") |
There was a problem hiding this comment.
this returned time duration is ignore when an error is returned. we should be consistent to always rely on the annotation for the timestamp instead of returning a duration here.
There was a problem hiding this comment.
The purpose here was to requeue after user defined delay in case wanted annotations are not found (wait for annotation to be set by CT constroller if annotation was not present for any reason). I updated the code to reflect the same.
There was a problem hiding this comment.
No need to have a user-defined delay when annotation is missing, that should be handled by controller code (user sets time delay after CRD gets created, not internal controller mechanics)
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
2947f62 to
d7e2964
Compare
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
| return reconcile.Result{}, r.reportErrorOnConstraintStatus(ctx, status, err, fmt.Sprintf("could not delete ValidatingAdmissionPolicyBinding: %s", vapBindingName)) | ||
| } | ||
| } | ||
| if requeueAfter != time.Duration(0) { |
There was a problem hiding this comment.
because we are requeueing before cacheConstraint is called (which adds the constraint to the constraint framework), the constraint will not be enforced at all until the VAP objects are created. These seems unnecessary and will hurt the performance of existing G8r uses.
| return reconcile.Result{}, r.reportErrorOnConstraintStatus(ctx, status, err, fmt.Sprintf("could not update ValidatingAdmissionPolicyBinding: %s", vapBindingName)) | ||
| } | ||
| } | ||
| requeueAfter, err := r.generateVAPB(ctx, enforcementAction, instance, status) |
There was a problem hiding this comment.
All of the VAP generation logic is gated on reflect.DeepEqual() of the constraint as-cached by the constraint framework.
I don't think that is the right logic gate for VAP-gen logic -- it should be gated on whether the extant VAP objects have drifted from the to-be-generated VAP objects. Otherwise users could modify the generated VAP objects and G8r would not re-align them until the next time the constraint is touched.
There was a problem hiding this comment.
It looks like we gate actual writes on VAP binding having a diff --- could just always call generateVAP() code. Maybe we could avoid unnecessary logic execution and therefore improve performance by refactoring more, but always calling VAP gen should be sufficient.
Bonus points for calling it after adding the constraint to the constraint framework -- that avoids blocking all enforcement on VAP succeeding as mentioned elsewhere.
There was a problem hiding this comment.
updated to call generateVAP out side of reflect.DeepEqual() gate. Also calling generateVAP after constraint is marked enforced.
| hasVAP, err := ShouldGenerateVAP(unversionedCT) | ||
| switch { | ||
| case errors.Is(err, celSchema.ErrCodeNotDefined): | ||
| generateVAPB = false |
There was a problem hiding this comment.
Should this error be swallowed? Or should we tell the user that the constraint template does not support VAP?
There was a problem hiding this comment.
This will mean when we switch the default to always generate VAP and VAPB, we will be throwing error for existing users using constraint template with Rego. I can also see the other use-case, where we would want to notify users if the intent is to use VAP but there is no CEL in the CT.
There was a problem hiding this comment.
Did we not switch to a model where users must use the VAP enforcement point to use VAP? I remember having that discussion. Part of the reason for requiring explicit user intent was to avoid questions like this.
There was a problem hiding this comment.
In any case, I think an error or "warning"-type error makes sense. Users can use scoped EA to disable it.
Contrast that with users getting no notice that a Rego-sourced CT is incompatible with VAP.
There was a problem hiding this comment.
IMO we should check if engine is K8sNativeValidation, then log and report the error. I recall this error was swallowed in 3.17.1 because 3.17.0 was reporting this error for existing rego CTs.
There was a problem hiding this comment.
I do not recall having a discussion where user must use the VAP enforcement point to use VAP. I thought eventually we wanted to enable VAP enforcement point by default in-line with other enforcement points.
Reporting error only when VAP enforcement point explicitly mentioned in constraint through scopedEnforcementAction and template does not satisfy conditions to create VAP could work. This will not affect current rego users. And when we switch to VAP by default, the guidance for constraint authors would be:
- If VAP is available to use, GK will use VAP with all constraint. Otherwise GK will use webhook for enforcement. No errors reported on CT about not generating VAP.
- If VAP is not available to use and user wants to use VAP with
scopedEnforcementPoints, GK reports error on CT and enforces through webhook if included as enforcement point.
Does this make sense?^^
There was a problem hiding this comment.
As per the community discussion, I will follow up with instroducing field on constraint status for enforcementPoint status under bypod.
| default: | ||
| // reconcile after default wait time for vapb generation if annotation is not set | ||
| if ct.Annotations == nil || ct.Annotations[BlockVAPBGenerationUntilAnnotation] == "" { | ||
| return time.Duration(*DefaultWaitForVAPBGeneration) * time.Second, r.reportErrorOnConstraintStatus(ctx, status, errors.New("annotation to wait for ValidatingAdmissionPolicyBinding generation not found"), "could not find annotation to wait for ValidatingAdmissionPolicyBinding generation") |
There was a problem hiding this comment.
No need to have a user-defined delay when annotation is missing, that should be handled by controller code (user sets time delay after CRD gets created, not internal controller mechanics)
| vapBindingName := fmt.Sprintf("gatekeeper-%s", instance.GetName()) | ||
| log.Info("check if vapbinding exists", "vapBindingName", vapBindingName) | ||
| if err := r.reader.Get(ctx, types.NamespacedName{Name: vapBindingName}, currentVapBinding); err != nil { | ||
| if !apierrors.IsNotFound(err) && !errors.As(err, &discoveryErr) && !meta.IsNoMatchError(err) { |
There was a problem hiding this comment.
Discovery/nomatch errors should not be a concern for bindings (which are a built-in type)
There was a problem hiding this comment.
Updated to remove the error matching.
| return reconcile.Result{}, r.reportErrorOnConstraintStatus(ctx, status, err, fmt.Sprintf("could not update ValidatingAdmissionPolicyBinding: %s", vapBindingName)) | ||
| } | ||
| } | ||
| requeueAfter, err := r.generateVAPB(ctx, enforcementAction, instance, status) |
There was a problem hiding this comment.
It looks like we gate actual writes on VAP binding having a diff --- could just always call generateVAP() code. Maybe we could avoid unnecessary logic execution and therefore improve performance by refactoring more, but always calling VAP gen should be sufficient.
Bonus points for calling it after adding the constraint to the constraint framework -- that avoids blocking all enforcement on VAP succeeding as mentioned elsewhere.
| return err | ||
| } | ||
| // if wait time is within the time window to generate vap binding, do not update the annotation | ||
| // otherwise update the annotation with the current time + wait time. This protects against manual updates on annotations with a timestamp that prevents binding from getting generated. |
There was a problem hiding this comment.
I think, more accurately, this prevents clock skew from preventing generation on task reschedule.
This design really doesn't protect against malicious users (RBAC for templates does that)
| return nil | ||
| } | ||
|
|
||
| err := r.updateTemplateWithBlockVAPBGenerationAnnotations(ctx, ct) |
There was a problem hiding this comment.
Should probably note that we are purposefully making a second call to the API server here in order to make sure the timestamp is post-CRD creation.
Otherwise the question I have is "why make two requests?"
There was a problem hiding this comment.
Added a comment here.
| if ct.Annotations == nil { | ||
| ct.Annotations = make(map[string]string) | ||
| } | ||
| ct.Annotations[constraint.BlockVAPBGenerationUntilAnnotation] = currentTime.Add(time.Duration(*constraint.DefaultWaitForVAPBGeneration) * time.Second).Format(time.RFC3339) |
There was a problem hiding this comment.
We should also have an annotation that is non-clock-dependent, that lets us know when generation is always allowed, otherwise we run the risk of sporadic delays when the pod reschedules.
Likely a significant change, probably not a blocker for beta (just a known risk)
There was a problem hiding this comment.
Once the annotations are set after CRD is created, pod reschedules won't cause the annotations to be set again. In the event of pod getting reschedule after CRD is created and before the annotations are set, annotations will get set on the restart and then it won't get updated again.
Since we only update annotations if it is not set or it is in future outside of the window defined by users, I am failing to see the need of non-clock-dependent annotation. Can you describe an example where we run the risk of sporadic delays?
There was a problem hiding this comment.
Remember the discussion about clock skew. Pod gets rescheduled to a node that is a day behind -> sporadic 30-second delays generating bindings, even if the CRD has been extant for hours.
There was a problem hiding this comment.
Yip, this slipped my mind. Thanks for the context. Let's follow up with this fix. I will create an issue.
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
| default: | ||
| // reconcile for vapb generation if annotation is not set | ||
| if ct.Annotations == nil || ct.Annotations[BlockVAPBGenerationUntilAnnotation] == "" { | ||
| return time.Duration(1) * time.Second, r.reportErrorOnConstraintStatus(ctx, status, errors.New("annotation to wait for ValidatingAdmissionPolicyBinding generation not found"), "could not find annotation to wait for ValidatingAdmissionPolicyBinding generation") |
There was a problem hiding this comment.
Per previous comment, this should return a duration of zero... let the controller decide what the retry delay is.
| return err | ||
| } | ||
| if ct.Annotations != nil { | ||
| ct.Annotations[constraint.BlockVAPBGenerationUntilAnnotation] = "" |
There was a problem hiding this comment.
Do we need to reset this annotation when the CRD needs an update? the CRD should already been created and known to the apiserver right? and therefore, no need to wait for the vapb creation/updates?
There was a problem hiding this comment.
Yeah, you are correct. I initially thought that we need to wait because CRD would be different and api-server needs time to cache it. But it really doesn't matter, because all VAP cares about is kind. I will update the PR.
There was a problem hiding this comment.
Updated the PR.
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
6e863dd to
45db3f4
Compare
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
maxsmythe
left a comment
There was a problem hiding this comment.
LGTM with 2 nits, sorry for delay.
| if generateVAPB && groupVersion != nil { | ||
| currentVapBinding, err := vapBindingForVersion(*groupVersion) | ||
| if err != nil { | ||
| return ret, r.reportErrorOnConstraintStatus(ctx, status, err, "could not get ValidatingAdmissionPolicyBinding API version") |
There was a problem hiding this comment.
nit: ret is effectively time.Duration(0) (it never gets overriden) for readability, probably best to rename ret something like noDelay or similar.
| return nil | ||
| } | ||
|
|
||
| // Update templates to make sure that the VAPB generation block until timestamp set post CRD creation |
There was a problem hiding this comment.
nit: "We add the annotation as a follow-on update to be sure the timestamp is set relative to a time after the CRD is successfully created. Creating the CRD with a delay timestamp already set would not account for request latency"
would be more clear (feel free to update wording). Current comment is more descriptive of the code than explanatory as to why the code is the way it is.
…setting annotation on CT Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
|
can we get some docs on what this generate operation does ? |
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, using
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when the PR gets merged):Fixes #3659
Related #3501
Special notes for your reviewer: