Guard controller#1238
Conversation
437193e to
ac61a3d
Compare
| } else { | ||
| _, result.Changed, result.Error = DeleteStorageVersionMigration(ctx, clients.migrationClient, recorder, t) | ||
| } | ||
| case *unstructured.Unstructured: |
There was a problem hiding this comment.
note that in combination with a RESTMapper, we'd be able to delete any manfiest we can read with a one to one mapping of resources to kind.
Doesn't have to be in this PR, but seems like a valuable future thing.
|
move |
50af26a to
1d58cbd
Compare
|
/retest |
c7a6c9f to
3dbd10f
Compare
soltysh
left a comment
There was a problem hiding this comment.
Left some questions, in general this looks good.
| @@ -0,0 +1,313 @@ | |||
| // Code generated for package bindata by go-bindata DO NOT EDIT. (@generated) | |||
There was a problem hiding this comment.
Oh my, we got rid of go-bindata from all the operators but library-go is still left with one 😞 we'll should fix that...
There was a problem hiding this comment.
That will require more changes. Better to do it in another PR.
There was a problem hiding this comment.
Definitely a separate PR, just an observation 😉
| for _, node := range nodes { | ||
| if _, exists := operands[node.Name]; !exists { | ||
| klog.Errorf("Missing operand on node %v", node.Name) | ||
| errs = append(errs, fmt.Errorf("Missing operand on node %v", node.Name)) |
There was a problem hiding this comment.
During reboots or similar problems we'll be throwing a lot of errors, I don't think that's the intention. Shouldn't we filter out NotReady nodes earlier in the node query? IIRC installer ignores not ready nodes.
There was a problem hiding this comment.
Good point. In case the operand pod does not exist, I will skip in case the underlying node is not ready. That might help reduce the number of errors. Also, once a node condition changes, the sync loop gets triggered again so no need to return an error here.
| pdbGetter: pdbGetter, | ||
| installerPodImageFn: getInstallerPodImageFromEnv, | ||
| createConditionalFunc: createConditionalFunc, | ||
| deleteConditionalFunc: deleteConditionalFunc, |
There was a problem hiding this comment.
Looking at openshift/cluster-kube-scheduler-operator#373 I see this is only used for SNO check, any other particular reason you want to split into create and delete condition? If that's only SNO, why not having a single check?
There was a problem hiding this comment.
Since create is not always of !delete (not delete). Also openshift/cluster-kube-controller-manager-operator#568 (comment)
There was a problem hiding this comment.
Right, but returning bool, errror tuple is an option you also mention there.
|
|
||
| klog.V(5).Infof("Rendering guard pod for operand %v on node %v", operands[node.Name].Name, node.Name) | ||
|
|
||
| pod := resourceread.ReadPodV1OrDie(bindata.MustAsset(filepath.Join("pkg/operator/staticpod/controller/guard", "manifests/guard-pod.yaml"))) |
There was a problem hiding this comment.
This seems like a lot of unnecessary reads, why not just read it once when we create the controller and then just update the internal representation as needed and apply.
There was a problem hiding this comment.
bindata.MustAsset already has the internal representation as it is. So it's just a convenient wrapper around it.
soltysh
left a comment
There was a problem hiding this comment.
Two nits about ConditionalFunc, but those can be either addressed in followup or I can be overruled by majority 😉
/hold
/lgtm
| return utilerrors.NewAggregate(errs) | ||
| } | ||
|
|
||
| func WithIsSNOCheck(infraLister configv1listers.InfrastructureLister, neg bool) resourceapply.ConditionalFunction { |
There was a problem hiding this comment.
Do we really need neg parameter to negate the result, it's not that hard to add ! before the invocation, is it?
3dbd10f to
2ffeea3
Compare
2ffeea3 to
3ae44e4
Compare
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
15 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
Something changed meantime |
|
|
Deploy guarding pods with a PDB which will make sure a node does not drain a node until at most one replica of an operand is unavailable. Each guarding pod checks healthz status of an operand on the same node.
3ae44e4 to
2553dbd
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, ingvagabund, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Deploy guarding pods with a PDB which will make sure a node does not drain a node until at most one replica of an operand is unavailable. Each guarding pod checks healthz status of an operand on the same node.
The guard controller repeatedly checks available master nodes and for each one renders an unmanaged guard pod. Each guard pod is responsible for checking if corresponding operand (operand pod running on the same node as the guard pod) is ready (through checking the
/healthzendpoint). The guard uses an https probe to check readiness of its operand. At the same time a pdb with minAvailable set tolen(masters)-1is rendered. Each time the number of master nodes changes, theminAvailablefield is updated accordingly.Outstanding facts:
Applied and tested in: