Bug 1937594: Split SDN migration into 2 phase#763
Bug 1937594: Split SDN migration into 2 phase#763openshift-merge-robot merged 1 commit intoopenshift:masterfrom
Conversation
|
/hold |
|
This PR depends on the openshift/machine-config-operator#2015 |
|
/unhold |
|
@pliurh: An error was encountered adding this pull request to the external tracker bugs for bug 1854306 on the Bugzilla server at https://bugzilla.redhat.com:
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
0c6cb73 to
4fa66fe
Compare
|
/approve |
|
/hold |
In 4.6, maybe yes. In long term, if the ovn-kube local gw mode will exist until openshift-sdn fades away in OCP, we can always relay on it as the intermediate stage during migration. But, if that is not the case, we will still need this 2-steps approach when the local gw mode is deprecated by ovn-kube. |
|
OK I'll leave it to your judgement. I just wanted to make you aware of the other changes before this merged so you could make the call. /hold cancel |
|
@pliurh: This pull request references Bugzilla bug 1854306, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/hold I'll test the local gw approach first with @trozet 's PRs. If it works, we will go that way in 4.6. And this PR can wait until 4.7. |
|
@pliurh: No Bugzilla bug is referenced in the title of this pull request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@pliurh: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@pliurh: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
pkg/network/cluster_config.go
Outdated
|
|
||
| // Set networkType to empty string when preparing migration. So networkType value in spec can be consumed by MCO. | ||
| if migrationPrepare { | ||
| status.NetworkType = "" |
There was a problem hiding this comment.
We can't do this. This is a public API; arbitrary components may be relying on the value of this field, for arbitrary purposes, and they might break if it is suddenly unset.
We may need to add a field to the config indicating that migration is in progress, so MCO can respond to that. (Maybe MCO could just read the annotation? That seems kind of bad though. If it's an actual API between CNO and MCO, then it should be actual API, not an annotation.)
There was a problem hiding this comment.
When the migration starts, the cluster will be in a maintenance state. The cluster shall be considered as out-of-service until the whole process finishes. We don't guarantee any of the cluster components can behave as normal during this period. So as long as the MCO can work as expected, I suppose it should be fine. WDYT?
|
/retest |
|
@pliurh: This pull request references Bugzilla bug 1937594, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Bugzilla (huirwang@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Requires MCO change in openshift/machine-config-operator#2518 |
|
@danwinship PTAL |
go.mod
Outdated
| github.com/mitchellh/reflectwalk v1.0.1 // indirect | ||
| github.com/onsi/gomega v1.10.2 | ||
| github.com/openshift/api v0.0.0-20210402143208-92e9dab578e8 | ||
| github.com/openshift/api v0.0.0-20210408195222-460636dd2fea |
There was a problem hiding this comment.
can you do the api bump as a separate commit?
| if operConfig.Spec.Migration == nil || operConfig.Spec.Migration.NetworkType != operConfig.Spec.DefaultNetwork.Type { | ||
| // We may need to fill defaults here -- sort of as a poor-man's | ||
| // upconversion scheme -- if we add additional fields to the config. | ||
| err = network.IsChangeSafe(prev, &operConfig.Spec) |
There was a problem hiding this comment.
Hm... I think it would be better to call IsChangeSafe even in migration mode, and do migration-related checks there too. In particular:
- rules for changing
operConfig.Spec.Migration:- if
prev.Migrationisnil, then you can setoperConfig.Spec.Migrationto whatever you want - if
prev.Migrationis set, thenoperConfig.Spec.Migrationhas to match it, or else be set tonil. (ie, you can't change the details of what you're migrating to after you start the migration)
- if
- rules for changing
operConfig.Spec.DefaultNetwork.Type:- if
prev.Migrationis unset, you can't changeoperConfig.Spec.DefaultNetwork.Type - if
prev.Migrationis set, you can changeoperConfig.Spec.DefaultNetwork.Typeto the value ofoperConfig.Spec.Migration.NetworkType
- if
4ec52cf to
76f18d9
Compare
Phase1, Users add the spec.migration feild in network.operator CR. The only changes the migration feild under the status of Network.config. It will trigger MCO to provision ovs-configuration service on every node and reboot. After rebooting, openshift-sdn will use br-ex as L3 gateway. Phase2, Users update the spec.networkType in network.config. CNO starts to swap the network provider for the cluster to ovnkube. Users need to manually reboot all nodes to make all pods attach to the new cluster network.
|
/retest |
|
/retest |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship, knobunc, pliurh The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@pliurh: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@pliurh: All pull requests linked via external trackers have merged:
Bugzilla bug 1937594 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Phase1, Users add the spec.migration feild in network.operator
CR. The only changes the migration field under the status of
Network.config. It will trigger MCO to provision ovs-configuration
service on every node and reboot. After rebooting, openshift-sdn
will use br-ex as L3 gateway.
Phase2, Users update the spec.networkType in network.config.
CNO starts to swap the network provider for the cluster to ovnkube.
Users need to manually reboot all nodes to make all pods attach to
the new cluster network.