Add validation webhook deployment#62
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jsafrane The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
2b3e04a to
58d787f
Compare
58d787f to
530c59e
Compare
|
/retest |
|
/hold |
| labels: | ||
| app: csi-snapshot-webhook | ||
| annotations: | ||
| service.beta.openshift.io/inject-cabundle: "true" |
There was a problem hiding this comment.
Don't we need to specify the self-managed-high-availability profile? Same question for the Service manifest
|
|
||
| queue workqueue.RateLimitingInterface | ||
|
|
||
| stopCh <-chan struct{} |
There was a problem hiding this comment.
Not necessary since we're using library-go
| opSpec, opStatus, _, err := c.client.GetOperatorState() | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
IMO it's a good practice to check for NotFound errors here (even though we don't check in the other controller)
| return factory.New().WithSync(c.sync).WithSyncDegradedOnError(client).WithInformers( | ||
| client.Informer(), | ||
| deployInformer.Informer(), | ||
| ).ToController(WebhookControllerName, eventRecorder) |
There was a problem hiding this comment.
It could be helpful to add a suffix to this recorder in order to distinguish this controller (eventrecorder.WithComponentSuffix(...)
| return err | ||
| } | ||
| lastGeneration := resourcemerge.ExpectedDeploymentGeneration(deployment, opStatus.Generations) | ||
| deployment, _, err = resourceapply.ApplyDeployment(c.kubeClient.AppsV1(), c.eventRecorder, deployment, lastGeneration) |
There was a problem hiding this comment.
You should use syncContext.Recorder() to get the right event recover (and also get rid of the one specified in csiSnapshotWebhookController
| deploymentAvailable.Status = operatorapi.ConditionTrue | ||
| } else { | ||
| deploymentAvailable.Status = operatorapi.ConditionFalse | ||
| deploymentAvailable.Reason = "WaitDeployment" |
There was a problem hiding this comment.
In library-go the reason was set to Deploying: bertinatto/library-go@fa1bce1
Do we want to keep the Reason consistent with CSI driver operators?
There was a problem hiding this comment.
Used Deploying everywhere
| } | ||
| if deployment.Status.ObservedGeneration != deployment.Generation { | ||
| deploymentProgressing.Status = operatorapi.ConditionTrue | ||
| deploymentProgressing.Reason = "NewGeneration" |
There was a problem hiding this comment.
Same question about the Reason
There was a problem hiding this comment.
Used Deploying everywhere
| if deployment.Status.UpdatedReplicas == *deployment.Spec.Replicas { | ||
| deploymentProgressing.Status = operatorapi.ConditionFalse | ||
| // All replicas were updated, set the version | ||
| c.versionGetter.SetVersion(webhookVersionName, c.operandVersion) |
There was a problem hiding this comment.
In theory Available could still be false here. In that case, setting the operand version here would be incorrect.
There was a problem hiding this comment.
Also, we should only report operator version only after all operands have been rolled out:
However, currently we report the operator version in the other controller, which has no idea if the webhook operand has been rolled out yet.
How can we solve this? Another controller only to set the operator version? 🤔
There was a problem hiding this comment.
It sets its own version, webhookVersionName - versions is an array. So there are two distinct version, which can should converge to the same value. Honestly, I don't know what CVO does with this array :-)
There was a problem hiding this comment.
Used Deploying everywhere
There was a problem hiding this comment.
I mean, I'm concerned about 2 things:
-
Setting
webhookVersionNameshould be OK here, but I think we need to make sure we do so only whenProgressing=falseandAvailable=true. Currently it's setting whenProgressing=false, which is too soon. -
Since we're introducing a new operand (the webhook), we need to change this part of the code to only set the
operatorwhen both operands (webhook and the snapshot-controller) have rolled out (currently it only accounts for the snapshot-controller):
There was a problem hiding this comment.
I completely removed the version, I need to think about it more.
| } | ||
|
|
||
| updateGenerationFn := func(newStatus *operatorapi.OperatorStatus) error { | ||
| if deployment != nil { |
There was a problem hiding this comment.
Would it have panicked above if deployment were nil?
There was a problem hiding this comment.
It would :-)
Removed the check... What could possibly go wrong, right? :-D
| ); err != nil { | ||
| return err | ||
| } | ||
| return nil |
assets/webhook_deployment.yaml
Outdated
| tolerations: | ||
| - key: "node-role.kubernetes.io/master" | ||
| operator: Exists | ||
| effect: NoSchedule |
There was a problem hiding this comment.
CriticalAddonsOnly toleration?
There was a problem hiding this comment.
Actually the other operand deployment has a few other tolerations. I suppose it makes sense they are consistent?
There was a problem hiding this comment.
Copied tolerations from the controller
assets/webhook_deployment.yaml
Outdated
| resources: | ||
| requests: | ||
| cpu: 10m | ||
| serviceAccountName: csi-snapshot-controller-operator |
There was a problem hiding this comment.
Is this what we want (csi-snapshot-controller-operator)?
There was a problem hiding this comment.
The webhook does not talk to Kubernetes API at all. Should it have a separate service account?
There was a problem hiding this comment.
Should this webhook have a SA at all?
The multus webhook doesn't need one: https://github.com/openshift/multus-cni/blob/master/deployment/webhook/deployment.yaml
There was a problem hiding this comment.
Removed the service account, it will get the default one from the namespace.
|
/retest |
|
/hold cancel |
f28293e to
d5f6409
Compare
| func TestSync(t *testing.T) { | ||
| const replica0 = 0 | ||
| const replica1 = 1 | ||
| const replica2 = 2 |
There was a problem hiding this comment.
I think you're not using these consts
|
/retest |
bertinatto
left a comment
There was a problem hiding this comment.
/hold
To check the secret issue.
| volumes: | ||
| - name: certs | ||
| secret: | ||
| secretName: csi-snapshot-webhook-secret |
There was a problem hiding this comment.
From service.beta.openshift.io/inject-cabundle annotation,
Don't use the controller image for webhook Deployment.
|
/hold cancel |
|
/test e2e-aws |
|
The webhook pod fails to start: |
Use port 8443 to listen for the webhook.
|
And the webhook runs and admits snapshots: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-csi-snapshot-controller-operator/62/pull-ci-openshift-cluster-csi-snapshot-controller-operator-master-e2e-aws/1334612973666177024/artifacts/e2e-aws/gather-extra/pods/openshift-cluster-storage-operator_csi-snapshot-webhook-75f64c6865-snt8b_webhook.log (which reminds me to check the log levels, this is too noisy) |
Add deployment of upstream validation webhook as a separate Controller that handles webhook Deployment.
OperatorClientto a standalone package/Unknowncondition of the other controller. Otherwise the whole operator could beAvailable: truetoo early.ValidatingWebhookConfigurationis deployed by CVO, which is technically wrong - it will overwrite caBundle injected by another operator. There is https://github.com/openshift/library-go/pull/836/files in progress that should fix this issue.WIP:
get downstream image and update related-imagesWe have the images now.@openshift/storage