Allow to change the cluster default network type#438
Allow to change the cluster default network type#438openshift-merge-robot merged 1 commit intoopenshift:masterfrom
Conversation
2929c8b to
7cb3a11
Compare
|
/hold |
| #!/bin/bash | ||
| /usr/share/openvswitch/scripts/ovs-ctl stop | ||
| rm -rf /run/openvswitch/* | ||
| rm -rf /var/lib/openvswitch/* |
There was a problem hiding this comment.
We don't want to do this on ordinary termination; it would mean that every upgrade would tear down the network. (Likewise for the ovn-kube change.)
| lifecycle: | ||
| preStop: | ||
| exec: | ||
| command: ["rm","-f","/etc/cni/net.d/80-openshift-network.conf"] |
There was a problem hiding this comment.
there is already a preStop handler at the end of the container spec that does this
| ports: | ||
| - name: metrics | ||
| port: 9101 | ||
| port: 9103 |
There was a problem hiding this comment.
(this could be done as a separate PR...)
| objToDelete.SetNamespace(currentObj.Namespace) | ||
| objToDelete.SetGroupVersionKind(gvk) | ||
| if objToDelete.GetKind() == "Namespace" { | ||
| continue |
There was a problem hiding this comment.
Why was this added? We want to delete unused namespaces...
(Though... I guess probably we need to fix it to do that last.)
There was a problem hiding this comment.
Due to some reason, the 'namespace' deletion is always stuck at the "Terminating" state forever, even when all the related objects in that namespace have been deleted. And it will block the same Namespace creation if we what to perform a rollback from ovn-kubernetes to openshift-sdn.
I agree that we need to remove the unused namespaces anyway. But I think maybe we can fix it later.
There was a problem hiding this comment.
Due to some reason, the 'namespace' deletion is always stuck at the "Terminating" state forever, even when all the related objects in that namespace have been deleted.
Maybe there is something else in that Namespace that isn't in the related objects then?
There was a problem hiding this comment.
Yes, there are some objects which are not managed by the CNO. I am not sure what is the proper way to delete all of them, then delete the namespace itself.
There was a problem hiding this comment.
hm. do you remember what objects?
There was a problem hiding this comment.
I did some further tests. It turns out that the namespace will be stuck in 'Terminating' before the cluster reboot, but can be deleted eventually after that. Therefore I remove the lines above.
| oldStatus := co.Status.DeepCopy() | ||
| status.deleteRelatedObjectsNotRendered(co) | ||
| status.deleteDaemonSetNotRendered(co) | ||
| status.deleteDeploymentNotRendered(co) |
There was a problem hiding this comment.
why is this needed? The DaemonSets and Deployments should also be in RelatedObjects
There was a problem hiding this comment.
The origin intention is to remove the deployment and daemonset before other objects. But I found it is unnecessary after more tests. It shall be removed.
Agree. What's your suggestion on how to do it? How about adding an annotation like |
|
An annotation sounds good. It should be more namespaced than that though. Maybe something like (Standard annotation semantics are that if there's nothing to say about the annotation beyond the fact that it exists, then you just give it an empty-string value rather than You still need to figure out exactly what that entails though; you don't want CNO to start changing DaemonSets right away because that would interfere with the controlled node-by-node migration plan... do you stop the CNO during the migration? Is the rest of the code for dealing with migrations posted somewhere? |
7cb3a11 to
fa7972b
Compare
In 4.4, we plan to take the cluster reboot approach, which means all the nodes will be rebooted during the migration. Therefore, the whole cluster shall be considered as out of service during the migration. So all the pod will be connected to the new network backend after reboot. I gave a live demo at the Group C demo on Jan 8th, however, the demo record has not been added to the agenda doc yet. But here is a previous demo record, which shows the whole procedure of the 'cluster reboot' migration. |
9dc3053 to
b455578
Compare
|
/test e2e-gcp |
1 similar comment
|
/test e2e-gcp |
danwinship
left a comment
There was a problem hiding this comment.
So this is looking pretty simple now. We still need to figure out what's going wrong with deleting the namespace
|
|
||
| const ( | ||
| // The Network Migration Annotation which indicates if the operator is allowed to change the cluster network type. | ||
| MigrationAnno = "networkoperator.openshift.io/network-migration" |
There was a problem hiding this comment.
move this to pkg/names/names.go and spell out "Annotation" in full, please
b455578 to
0cdfe0a
Compare
|
Thanks. Can you squash the commits? |
Allow users to change the cluster default network type by updating the 'networkType' field of the network.config.openshift.io CR. This not only supports the use case of migrating the cluster network from Openshift SDN to OVN-Kubernetes, but also the case vise Versa. By default, this operation is not allowed by the operator. Users need to add annotation 'networkoperator.openshift.io/network-migration: ""' to the Network.operator.openshift.io CR 'cluster' before executing.
ae7064a to
288b502
Compare
Done. |
Before rebooting, the cluster is in an unstable status. The stuck namespace has a finalizer 'kubernetes', which blocks the deletion. I think it should be fine, as long as, the namespace can be deleted successfully after the cluster back to normal. p.s. There is a way to force delete the namespace kubernetes/kubernetes#60807 (comment), it could be a way out if the namespace deletion went wrong. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship, pliurh The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
|
/retest Please review the full test history for this PR and help us cut down flakes. |
Allow users to change the cluster default network type by updating the 'networkType' field of the network.config.openshift.io CR. This not only support the use case of migrating the cluster network from Openshift SDN to OVN-Kubernetes, but also the use case vise Versa.