Skip to content

Conversation

@juanluisvaladas
Copy link

In ovnkubernetes and openshift-sdn we're unable to reliably identify
whether the openvswitch.service is enabled or not though systemctl or
the systemd's DBus API.

To avoid issues we have decided to create a unit that launches before
the SDN and creates an empty file that we can check easily and reliably.
This is part of a fix for BZ#1874696

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: juanluisvaladas
To complete the pull request process, please assign sinnykumari
You can assign the PR to them by writing /assign @sinnykumari in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@juanluisvaladas juanluisvaladas changed the title Add ovs-notification.service [WIP] Bug 1874696: Add ovs-notification.service Sep 18, 2020
@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. labels Sep 18, 2020
@openshift-ci-robot
Copy link
Contributor

@juanluisvaladas: This pull request references Bugzilla bug 1874696, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

[WIP] Bug 1874696: Add ovs-notification.service

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Sep 18, 2020
@dcbw
Copy link
Contributor

dcbw commented Sep 18, 2020

lgtm but we should get some more review @abhat @mccv1r0 @danwinship @trozet

@juanluisvaladas juanluisvaladas changed the title [WIP] Bug 1874696: Add ovs-notification.service Bug 1874696: Add ovs-notification.service Sep 18, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 18, 2020
@juanluisvaladas
Copy link
Author

Tested it manually everything looks fine in my manual testing.

@kikisdeliveryservice
Copy link
Contributor

lgtm but we should get some more review @abhat @mccv1r0 @danwinship @trozet

Was going to say the same thing as they have the expertise to review this PR as they've been working on this 👍

@dcbw
Copy link
Contributor

dcbw commented Sep 18, 2020

@juanluisvaladas when you confirm that a cluster-bot multi-PR run does the right thing, please note that in this PR. Thanks!

contents: |
[Unit]
Description=Creates a file to let the OVS pod know OVS is running on systemd.
# Don't write the notification before OVS starts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? The goal of the service is to tell the sdn-ovs pod that system OVS is enabled. If this service is running at all, then system OVS is enabled, so it doesn't matter if this runs before or after OVS starts

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the goal to have a hostPath type File constraint in ovn-kubernetes/006-ovs-node.yaml to keep ovs-node from starting until it sees this file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danwinship yes I guess that's reasonable.
@mccv1r0 yes

Requires=openvswitch.service
Wants=NetworkManager-wait-online.service
After=NetworkManager-wait-online.service openvswitch.service
Before=network-online.target kubelet.service crio.service node-valid-hostname.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we care about most of those services? If we run before kubelet then we run before the sdn-ovs pod and we're good, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did that to be consistent with the ovs-configuration.service.yaml, but you're right.

I'm pretty sure we don't need the node-valid-hostname.service and crio. Regarding the network-online.target, I think it's necessary so that the network-online.target isn't complete until this is finished.

StandardError=journal+console

[Install]
WantedBy=network-online.target
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the network so I'd say we can't consider the network completely setup until this unit is complete. I would also make sense in the network.target but I think network-online.target is the most logical place.

Anyway I don't feel strongly enough about this to block the PR while we discuss where it should go, so if you want it somewhere else I'll move it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't right to say wanted by network online target, but then in runtime this is running before network online target

@juanluisvaladas
Copy link
Author

juanluisvaladas commented Sep 19, 2020

@juanluisvaladas when you confirm that a cluster-bot multi-PR run does the right thing, please note that in this PR. Thanks!

Yes, I confirm that

In ovnkubernetes and openshift-sdn we're unable to reliably identify
whether the openvswitch.service is enabled or not though systemctl or
the systemd's DBus API.

To avoid issues we have decided to create a unit that launches before
the SDN and creates an empty file that we can check easily and reliably.
@juanluisvaladas
Copy link
Author

@danwinship resolved some of your concerns, but not entirely. Please let me know if you want it outside the network-online.target where should it go. network.target and I'll change that?

@openshift-ci-robot
Copy link
Contributor

@juanluisvaladas: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/okd-e2e-aws c2abaae link /test okd-e2e-aws
ci/prow/e2e-ovn-step-registry c2abaae link /test e2e-ovn-step-registry

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Contributor

@trozet trozet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont really see why this PR is necessary if we already have ovs-confguration service. You can just mount that path and check that the file exists. As long as the service is part of a target, its fine to check the target file path. I think we already do this in ovn: https://github.com/openshift/cluster-network-operator/blob/master/bindata/network/ovn-kubernetes/ovnkube-node.yaml#L229

StandardError=journal+console

[Install]
WantedBy=network-online.target
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't right to say wanted by network online target, but then in runtime this is running before network online target

@danwinship
Copy link
Contributor

I dont really see why this PR is necessary if we already have ovs-confguration service. You can just mount that path and check that the file exists.

Yeah... we don't need to answer the question "Is OVS actually running as a system service on this node?", we just need to answer the question "Is this node running a version of OCP which would configure OVS to run as a system service if OpenShiftSDN was the configured network plugin?" which is a fact that is either true or false at install/upgrade time, not something that needs to be computed at any particular time during boot.

ovs-configuration.service isn't quite the right thing to check for though; it's a slightly different thing. (At the moment we don't even install it on openshift-sdn nodes anyway, and it's possible that nmstate or something would result in that all being done differently in the future.)

We could check for one of the .conf files created as side effects of templates/common/_base/units/ovs-vswitchd.service.yaml or ovsdb-server.service.yaml, but those are only specified as relative paths and I'm not sure it's 100% guaranteed that they end up as specific absolute paths? (In the same way we don't want to try to figure out if OVS is enabled based on file paths.)

If we can't check for those, then we can add a new file to templates/common/_base/files/ to just statically create some file, and then openshift-sdn can then check for that.

@trozet
Copy link
Contributor

trozet commented Sep 19, 2020

@danwinship I think ovs-configuration exists even in openshift-sdn deployments; it just is not enabled as a service. Peng was changing this though in another PR to also enable it for openshift-sdn I think: #2066

@danwinship
Copy link
Contributor

Ah... ok. And actually "it's possible that nmstate or something would result in that all being done differently in the future" doesn't matter because this is only needed for the 4.6 cycle anyway right? (The 4.6 sdn-ovs pod needs to be able to DTRT when started on a 4.5 node, but the 4.7 sdn-ovs pod will only ever be started on 4.6 or 4.7 nodes and so can assume system OVS is always enabled.)

@juanluisvaladas
Copy link
Author

juanluisvaladas commented Sep 21, 2020

Peng was changing this though in another PR to also enable it for openshift-sdn I think: #2066

@pliurh Can this be merged before code freeze? If it will be merged the best thing we can do is closing this PR and add a line with touch /usr/bin/touch /var/run/openvswitch/ovs-notification in configure-ovs.sh

@pliurh
Copy link
Contributor

pliurh commented Sep 21, 2020

@pliurh Can this be merged before code freeze? If it will be merged the best thing we can do is closing this PR and add a line with touch /usr/bin/touch /var/run/openvswitch/ovs-notification in configure-ovs.sh

I think we have to get that merge before code freeze. The keepalived issue has been fixed. I'm testing #2066 with the latest build. Could you add a comment in #2066 where you want to add this line?

@danwinship
Copy link
Contributor

If it will be merged the best thing we can do is closing this PR and add a line with touch /usr/bin/touch /var/run/openvswitch/ovs-notification in configure-ovs.sh

No, you don't need to test if the service has run, you only need to test if it's there. test -f /usr/local/bin/configure-ovs.sh. If it's there, then this is a 4.6 node and OVS is run on the system. If it's not there, then this is a 4.5 node and OVS should be run in the pod.

@dcbw
Copy link
Contributor

dcbw commented Sep 21, 2020

No, you don't need to test if the service has run, you only need to test if it's there. test -f /usr/local/bin/configure-ovs.sh. If it's there, then this is a 4.6 node and OVS is run on the system. If it's not there, then this is a 4.5 node and OVS should be run in the pod.

hmm, good point; if we do this, we're not yet sure if OVS is actually running, so we'll need to remove all the log tailing stuff from the OVS container or maybe have it use -F to continously retry?

@juanluisvaladas
Copy link
Author

/close
Not necessary, we'll use #2066 instead.

@openshift-ci-robot
Copy link
Contributor

@juanluisvaladas: Closed this PR.

Details

In response to this:

/close
Not necessary, we'll use #2066 instead.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

@juanluisvaladas: This pull request references Bugzilla bug 1874696. The bug has been updated to no longer refer to the pull request using the external bug tracker.

Details

In response to this:

Bug 1874696: Add ovs-notification.service

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants