Skip to content

Bug 1937594: Split SDN migration into 2 phase#763

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
pliurh:split-migration
Apr 19, 2021
Merged

Bug 1937594: Split SDN migration into 2 phase#763
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
pliurh:split-migration

Conversation

@pliurh
Copy link
Contributor

@pliurh pliurh commented Aug 25, 2020

Phase1, Users add the spec.migration feild in network.operator
CR. The only changes the migration field under the status of
Network.config. It will trigger MCO to provision ovs-configuration
service on every node and reboot. After rebooting, openshift-sdn
will use br-ex as L3 gateway.

Phase2, Users update the spec.networkType in network.config.
CNO starts to swap the network provider for the cluster to ovnkube.
Users need to manually reboot all nodes to make all pods attach to
the new cluster network.

@pliurh
Copy link
Contributor Author

pliurh commented Aug 25, 2020

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 25, 2020
@pliurh
Copy link
Contributor Author

pliurh commented Aug 25, 2020

@trozet @abhat @squeed PTAL.

@pliurh
Copy link
Contributor Author

pliurh commented Aug 25, 2020

This PR depends on the openshift/machine-config-operator#2015

@pliurh
Copy link
Contributor Author

pliurh commented Sep 18, 2020

/unhold
Since there will be no intermediary state (aka old ovn local gateway mode) available anymore. We have to take this approach.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 18, 2020
@pliurh pliurh changed the title Split SDN migration into 2 phase Bug 1854306: Split SDN migration into 2 phase Sep 21, 2020
@openshift-ci-robot
Copy link
Contributor

@pliurh: An error was encountered adding this pull request to the external tracker bugs for bug 1854306 on the Bugzilla server at https://bugzilla.redhat.com:

JSONRPC error 32000: There was an error reported for a GitHub REST call. URL: https://api.github.com/repos/openshift/cluster-network-operator/pulls/763 Error: 403 Forbidden at /loader/0x558945dcc880/Bugzilla/Extension/ExternalBugs/Type/GitHub.pm line 111. at /loader/0x558945dcc880/Bugzilla/Extension/ExternalBugs/Type/GitHub.pm line 111. eval {...} called at /loader/0x558945dcc880/Bugzilla/Extension/ExternalBugs/Type/GitHub.pm line 98 Bugzilla::Extension::ExternalBugs::Type::GitHub::_do_rest_call('Bugzilla::Extension::ExternalBugs::Type::GitHub=HASH(0x55894c...', 'https://api.github.com/repos/openshift/cluster-network-operat...', 'GET') called at /loader/0x558945dcc880/Bugzilla/Extension/ExternalBugs/Type/GitHub.pm line 62 Bugzilla::Extension::ExternalBugs::Type::GitHub::get_data('Bugzilla::Extension::ExternalBugs::Type::GitHub=HASH(0x55894c...', 'Bugzilla::Extension::ExternalBugs::Bug=HASH(0x55894d39e468)') called at /loader/0x558945dcc880/Bugzilla/Extension/ExternalBugs/Bug.pm line 302 eval {...} called at /loader/0x558945dcc880/Bugzilla/Extension/ExternalBugs/Bug.pm line 302 Bugzilla::Extension::ExternalBugs::Bug::update_ext_info('Bugzilla::Extension::ExternalBugs::Bug=HASH(0x55894d39e468)', 1) called at /loader/0x558945dcc880/Bugzilla/Extension/ExternalBugs/Bug.pm line 125 Bugzilla::Extension::ExternalBugs::Bug::create('Bugzilla::Extension::ExternalBugs::Bug', 'HASH(0x55894c0b8f48)') called at /var/www/html/bugzilla/extensions/ExternalBugs/Extension.pm line 877 Bugzilla::Extension::ExternalBugs::bug_start_of_update('Bugzilla::Extension::ExternalBugs=HASH(0x55894e279388)', 'HASH(0x55894c4cf010)') called at /var/www/html/bugzilla/Bugzilla/Hook.pm line 21 Bugzilla::Hook::process('bug_start_of_update', 'HASH(0x55894c4cf010)') called at /var/www/html/bugzilla/Bugzilla/Bug.pm line 1170 Bugzilla::Bug::update('Bugzilla::Bug=HASH(0x55894e313268)') called at /loader/0x558945dcc880/Bugzilla/Extension/ExternalBugs/WebService.pm line 88 Bugzilla::Extension::ExternalBugs::WebService::add_external_bug('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x55894c0eb538)') called at (eval 3367) line 1 eval ' $procedure->{code}->($self, @params) ;' called at /usr/share/perl5/vendor_perl/JSON/RPC/Legacy/Server.pm line 220 JSON::RPC::Legacy::Server::_handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x55894e60d6c8)') called at /var/www/html/bugzilla/Bugzilla/WebService/Server/JSONRPC.pm line 295 Bugzilla::WebService::Server::JSONRPC::_handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x55894e60d6c8)') called at /usr/share/perl5/vendor_perl/JSON/RPC/Legacy/Server.pm line 126 JSON::RPC::Legacy::Server::handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...') called at /var/www/html/bugzilla/Bugzilla/WebService/Server/JSONRPC.pm line 70 Bugzilla::WebService::Server::JSONRPC::handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...') called at /var/www/html/bugzilla/jsonrpc.cgi line 31 ModPerl::ROOT::Bugzilla::ModPerl::ResponseHandler::var_www_html_bugzilla_jsonrpc_2ecgi::handler('Apache2::RequestRec=SCALAR(0x55894caa4678)') called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 207 eval {...} called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 207 ModPerl::RegistryCooker::run('Bugzilla::ModPerl::ResponseHandler=HASH(0x55894d7655e8)') called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 173 ModPerl::RegistryCooker::default_handler('Bugzilla::ModPerl::ResponseHandler=HASH(0x55894d7655e8)') called at /usr/lib64/perl5/vendor_perl/ModPerl/Registry.pm line 32 ModPerl::Registry::handler('Bugzilla::ModPerl::ResponseHandler', 'Apache2::RequestRec=SCALAR(0x55894caa4678)') called at /var/www/html/bugzilla/mod_perl.pl line 139 Bugzilla::ModPerl::ResponseHandler::handler('Bugzilla::ModPerl::ResponseHandler', 'Apache2::RequestRec=SCALAR(0x55894caa4678)') called at (eval 3367) line 0 eval {...} called at (eval 3367) line 0
Please contact an administrator to resolve this issue, then request a bug refresh with /bugzilla refresh.

Details

In response to this:

Bug 1854306: Split SDN migration into 2 phase

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pliurh
Copy link
Contributor Author

pliurh commented Sep 22, 2020

@dcbw @trozet @abhat PTAL

@knobunc
Copy link
Contributor

knobunc commented Sep 22, 2020

/approve
/retest

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 22, 2020
@trozet
Copy link
Contributor

trozet commented Sep 22, 2020

/hold
I think you may not want to split into 2 phases after these go in:
#801
openshift/ovn-kubernetes#281

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 22, 2020
@pliurh
Copy link
Contributor Author

pliurh commented Sep 23, 2020

/hold
I think you may not want to split into 2 phases after these go in:
#801
openshift/ovn-kubernetes#281

In 4.6, maybe yes. In long term, if the ovn-kube local gw mode will exist until openshift-sdn fades away in OCP, we can always relay on it as the intermediate stage during migration. But, if that is not the case, we will still need this 2-steps approach when the local gw mode is deprecated by ovn-kube.

@trozet
Copy link
Contributor

trozet commented Sep 23, 2020

OK I'll leave it to your judgement. I just wanted to make you aware of the other changes before this merged so you could make the call.

/hold cancel

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Sep 23, 2020
@openshift-ci-robot
Copy link
Contributor

@pliurh: This pull request references Bugzilla bug 1854306, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

Bug 1854306: Split SDN migration into 2 phase

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pliurh
Copy link
Contributor Author

pliurh commented Sep 23, 2020

/hold

I'll test the local gw approach first with @trozet 's PRs. If it works, we will go that way in 4.6. And this PR can wait until 4.7.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 23, 2020
@pliurh pliurh changed the title Bug 1854306: Split SDN migration into 2 phase Split SDN migration into 2 phase Sep 25, 2020
@openshift-ci-robot openshift-ci-robot removed the bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. label Sep 25, 2020
@openshift-ci-robot
Copy link
Contributor

@pliurh: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Details

In response to this:

Split SDN migration into 2 phase

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Sep 25, 2020
@openshift-ci-robot
Copy link
Contributor

@pliurh: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-gcp-upgrade 0c6cb73 link /test e2e-gcp-upgrade
ci/prow/e2e-vsphere 4fa66fe link /test e2e-vsphere
ci/prow/unit 4fa66fe link /test unit
ci/prow/verify 4fa66fe link /test verify
ci/prow/e2e-gcp 4fa66fe link /test e2e-gcp
ci/prow/e2e-openstack 4fa66fe link /test e2e-openstack
ci/prow/e2e-vsphere-ovn 4fa66fe link /test e2e-vsphere-ovn
ci/prow/e2e-ovn-step-registry 4fa66fe link /test e2e-ovn-step-registry
ci/prow/e2e-azure 4fa66fe link /test e2e-azure
ci/prow/e2e-gcp-ovn-upgrade 4fa66fe link /test e2e-gcp-ovn-upgrade
ci/prow/e2e-metal-ipi-ovn-ipv6 4fa66fe link /test e2e-metal-ipi-ovn-ipv6

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot
Copy link
Contributor

@pliurh: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-ovn-windows 4fa66fe link /test e2e-aws-ovn-windows
ci/prow/e2e-agnostic-upgrade 4fa66fe link /test e2e-agnostic-upgrade
ci/prow/e2e-ovn-ipsec-step-registry 4fa66fe link /test e2e-ovn-ipsec-step-registry

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.


// Set networkType to empty string when preparing migration. So networkType value in spec can be consumed by MCO.
if migrationPrepare {
status.NetworkType = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do this. This is a public API; arbitrary components may be relying on the value of this field, for arbitrary purposes, and they might break if it is suddenly unset.

We may need to add a field to the config indicating that migration is in progress, so MCO can respond to that. (Maybe MCO could just read the annotation? That seems kind of bad though. If it's an actual API between CNO and MCO, then it should be actual API, not an annotation.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the migration starts, the cluster will be in a maintenance state. The cluster shall be considered as out-of-service until the whole process finishes. We don't guarantee any of the cluster components can behave as normal during this period. So as long as the MCO can work as expected, I suppose it should be fine. WDYT?

@pliurh
Copy link
Contributor Author

pliurh commented Apr 6, 2021

/retest

@pliurh pliurh force-pushed the split-migration branch from 8b97fc1 to c857e8b Compare April 9, 2021 07:22
@openshift-ci-robot
Copy link
Contributor

@pliurh: This pull request references Bugzilla bug 1937594, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (huirwang@redhat.com), skipping review request.

Details

In response to this:

Bug 1937594: Split SDN migration into 2 phase

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pliurh
Copy link
Contributor Author

pliurh commented Apr 9, 2021

Requires MCO change in openshift/machine-config-operator#2518

@pliurh
Copy link
Contributor Author

pliurh commented Apr 9, 2021

@danwinship PTAL

@pliurh pliurh requested a review from danwinship April 14, 2021 02:54
go.mod Outdated
github.com/mitchellh/reflectwalk v1.0.1 // indirect
github.com/onsi/gomega v1.10.2
github.com/openshift/api v0.0.0-20210402143208-92e9dab578e8
github.com/openshift/api v0.0.0-20210408195222-460636dd2fea
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do the api bump as a separate commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if operConfig.Spec.Migration == nil || operConfig.Spec.Migration.NetworkType != operConfig.Spec.DefaultNetwork.Type {
// We may need to fill defaults here -- sort of as a poor-man's
// upconversion scheme -- if we add additional fields to the config.
err = network.IsChangeSafe(prev, &operConfig.Spec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... I think it would be better to call IsChangeSafe even in migration mode, and do migration-related checks there too. In particular:

  • rules for changing operConfig.Spec.Migration:
    • if prev.Migration is nil, then you can set operConfig.Spec.Migration to whatever you want
    • if prev.Migration is set, then operConfig.Spec.Migration has to match it, or else be set to nil. (ie, you can't change the details of what you're migrating to after you start the migration)
  • rules for changing operConfig.Spec.DefaultNetwork.Type:
    • if prev.Migration is unset, you can't change operConfig.Spec.DefaultNetwork.Type
    • if prev.Migration is set, you can change operConfig.Spec.DefaultNetwork.Type to the value of operConfig.Spec.Migration.NetworkType

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@pliurh pliurh force-pushed the split-migration branch 2 times, most recently from 4ec52cf to 76f18d9 Compare April 15, 2021 12:05
Phase1, Users add the spec.migration feild in network.operator
CR. The only changes the migration feild under the status of
Network.config. It will trigger MCO to provision ovs-configuration
service on every node and reboot. After rebooting, openshift-sdn
will use br-ex as L3 gateway.

Phase2, Users update the spec.networkType in network.config.
CNO starts to swap the network provider for the cluster to ovnkube.
Users need to manually reboot all nodes to make all pods attach to
the new cluster network.
@pliurh
Copy link
Contributor Author

pliurh commented Apr 16, 2021

/retest

@pliurh
Copy link
Contributor Author

pliurh commented Apr 16, 2021

/retest

@danwinship
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 18, 2021
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, knobunc, pliurh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 19, 2021

@pliurh: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-ovn-ipsec-step-registry a98be89 link /test e2e-ovn-ipsec-step-registry
ci/prow/e2e-gcp-ovn-upgrade a98be89 link /test e2e-gcp-ovn-upgrade
ci/prow/e2e-openstack-ovn a98be89 link /test e2e-openstack-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit d4c55d5 into openshift:master Apr 19, 2021
@openshift-ci-robot
Copy link
Contributor

@pliurh: All pull requests linked via external trackers have merged:

Bugzilla bug 1937594 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1937594: Split SDN migration into 2 phase

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants