-
Notifications
You must be signed in to change notification settings - Fork 462
Bug 1854306: Initialize host ovs differently for Openshift-SDN and Ovn-kubernetes by ovs-configuration.service #2066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1854306: Initialize host ovs differently for Openshift-SDN and Ovn-kubernetes by ovs-configuration.service #2066
Conversation
|
@pliurh: This pull request references Bugzilla bug 1854306, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@trozet @cybertron @abhat PTAL |
|
What would help here I think is to teach NM to have a way to make transient changes to the network setup i.e. don't persist into |
cybertron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would help here I think is to teach NM to have a way to make transient changes to the network setup i.e. don't persist into
/etc. Then we'd runconfigure-ovs.shat each boot, and wouldn't need a cleanup script.
@bcrochet was working on a way to make transient network config changes that might be relevant.
One concern with the keepalived workaround inline. Ideally we would remove that once my fix is in anyway, but if it unblocks things in the meantime I guess it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will fail on platforms that don't use keepalived.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would help here I think is to teach NM to have a way to make transient changes to the network setup i.e. don't persist into
/etc. Then we'd runconfigure-ovs.shat each boot, and wouldn't need a cleanup script.@bcrochet was working on a way to make transient network config changes that might be relevant.
One concern with the keepalived workaround inline. Ideally we would remove that once my fix is in anyway, but if it unblocks things in the meantime I guess it's fine.
Proposal here: openshift/enhancements#399
PR here: #2017
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will fail on platforms that don't use keepalived.
fixed.
ae3d40b to
8a4d6e0
Compare
This is not only about NM objects. In ovs-configuration.sh, we use NM to create ovs bridge and interfaces, which are stored in ovsdb and also need to be removed properly. |
|
PR: #2017 cannot help this case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be necessary; the setup side needs to deal with the possibility that the first time it runs, NM might be partway through bringing the network up for the first time, and so we don't know which interface will end up being the default. But in the cleanup case, either br-ex and ovs-port-phys0 and the rest already exist as NetworkManager objects, and you know what needs to be done with them; or else they don't exist, and there's no cleanup to be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how the scripts are closely connected (needing to know the same interface names, etc) it seems like it might be better to put the setup and cleanup both into a single file, and run it as either configure-ovs.sh setup or configure-ovs.sh cleanup depending on which mode you need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or, if you're going to merge openshift/cluster-network-operator#782 into this, then maybe just configure-ovs.sh {{.NetworkType}}, where configure-ovs.sh OpenShiftSDN would delete br-ex if it existed, and configure-ovs.sh OVNKubernetes would delete br0 if it existed and set up br-ex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to make keepalived be After: network-online.target ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keepalived runs in static pod. So it is After: network-online.target. However, the keepalived-monitor depends on kube-apiserver to generate the correct keepalived.conf. We cannot control the order of the starting of static pods. So when keepalived is up before kube-apiserver without a up-to-date config, it will mistakenly set the VIP to the old interface and break the network connectivity. Then kube-apiserver cannot come up as expected. We end up in a dead lock.
The logic here is just a workaround for this case, so that keepalived can not start before kube-apiserver. Ideally, keepalived side shall be able to handle this situation properly after fixing https://bugzilla.redhat.com/show_bug.cgi?id=1873955. But before that is available, this workaround can work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to the template docs, you should be able to say
enabled: {{if eq .NetworkType "OVNKubernetes" "OpenShiftSDN"}}true{{else}}false{{end}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this part necessary? The other connections that are lower priority should automatically become active with iface after ovs-if-phys0 gets deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
8a4d6e0 to
81ee8af
Compare
|
@pliurh: This pull request references Bugzilla bug 1854306, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/retest |
81ee8af to
bfd0417
Compare
|
/approve |
bfd0417 to
72c9d33
Compare
|
/assign @ashcrow |
|
/hold |
72c9d33 to
734fbe0
Compare
|
/unhold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neither cat nor grep are needed
driver=$(awk -F "=" '/DRIVER/ {print $2}' < "/sys/class/net/${intf}/device/uevent")There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't new code. He's just moving it into the if statement so this PR isn't really fixing that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't ifconfig deprecated?
ip link set dev "$intf" allmulticast on|
could we run scripts through shellcheck? https://www.shellcheck.net For more information: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use guards and always quote args to rm
bad
old_conn_file='-r .*'
rm -f ${old_conn_file}better
old_conn_file='-r .*'
rm -f -- "${old_conn_file}"734fbe0 to
7c71f5f
Compare
|
@rbbratta The diff result of Github is a bit misleading here. You can have more clear diff result with vscode. Actually, this patch didn't modify the lines you comment on. This patch only adds the following logic to this file. I prefer we focus on the new code in this PR and address your comments in a separate one. |
1. Clean NM objects created for ovn-kube shared gateway mode by configure-ovs.sh, when cluster network is openshift-sdn. 2. Render ovs-configuration.service differently for openshift-sdn and ovn-kube. 3. Remove unneeded ovs bridges.
|
/test e2e-metal-ipi |
7c71f5f to
6323fe4
Compare
|
/test okd-e2e-aws |
|
/retest |
|
/test okd-e2e-aws |
trozet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
@kikisdeliveryservice or @ashcrow can you please approve? |
|
/approve |
cgwalters
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really hope after 4.7 starts we can approach this in a much more structured fashion that involves more testing, much more observability/debuggability and much less shell script.
| @@ -1,5 +1,5 @@ | |||
| name: ovs-configuration.service | |||
| enabled: {{if eq .NetworkType "OVNKubernetes"}}true{{else}}false{{end}} | |||
| enabled: {{if eq .NetworkType "OVNKubernetes" "OpenShiftSDN"}}true{{else}}false{{end}} | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ever false now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CNO also supports kuryr and other 3-party CNIs.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, knobunc, pliurh, trozet, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@pliurh: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
| fi | ||
| # remove bridges created by ovn-kubernetes, try to delete br-ex again in case NM fail to talk to ovsdb | ||
| ovs-vsctl --timeout=30 --if-exists del-br br-int -- --if-exists del-br br-local -- --if-exists del-br br-ex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pliurh one thing I just thought of is that, if you are using openshift-sdn and you reboot a node, this will run again and delete these bridges. Does openshift-sdn also use br-int bridge? Would this cause problems for openshift-sdn to delete the bridge on each reboot?
@danwinship ^ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
openshift-sdn does not use br-int, it only uses br0. And it will end up destroying and recreating br0 if OVS brings it up after a reboot anyway. It doesn't want OVS preserving its state across reboots.
|
@pliurh: Some pull requests linked via external trackers have merged: The following pull requests linked via external trackers have not merged: These pull request must merge or be unlinked from the Bugzilla bug in order for it to move to the next state. Bugzilla bug 1854306 has not been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
- What I did
Fixes: BZ 1854306
- How to verify it
Executing the SDN migration process
- Description for the changelog
Initialize host ovs differently for Openshift-SDN and Ovn-kubernetes