-
Notifications
You must be signed in to change notification settings - Fork 270
openshift-sdn, ovn-kubernetes: adopt systemd-managed openvswitch if present #672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| echo "openvswitch is running in systemd" | ||
| # Don't need to worry about restoring flows; this can only change if we've rebooted | ||
| exec tail -F /var/log/openvswitch-host/ovs-vswitchd.log /var/log/openvswitch-host/ovsdb-server.log | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean /var/log/openvswitch instead of openvswitch-host?
Is the goal to block here, and never return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is correct. /var/log/openvswitch is not a host directory.
The goal is to block here. We need to keep a process running so kubelet doesn't think the pod has died.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is correct. /var/log/openvswitch is not a host directory.
Then shouldn't e.g.
tail -F --pid=$(cat /var/run/openvswitch/ovs-vswitchd.pid) /var/log/openvswitch/ovs-vswitchd.log &
below also use /var/log/openvswitch-host ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no... because that's for the containerized case.
|
e2e-gcp has a single freaking test failure. the network came up just fine. |
|
/retest |
|
@squeed this is the only PR where gcp-ovn test passed, rest all PR are failing this test because CNO is degrading becayse br-int is not created on the nodes. |
|
I think we should revert: openshift/machine-config-operator#1830 first. |
|
test failures are all flakes or infra issues |
|
Upgrades are failing, nuts: |
|
Huh: |
|
Okay, that should fix it. |
|
/retest |
|
So, this PR was failing because system openvswitch was added first, and we needed to handle an impossible upgrade case. Now that the MCO pr has been reverted, we can get this in, then enable system openvswitch. |
|
@juanluisvaladas all valid criticisms... but I didn't add that :-) Those are unchanged, left over from the containerized case. |
|
Tested this by manually running |
trozet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ovn-k8s part looks good to me.
/lgtm
The unit file already exists, it's just not enabled by default. So we can't use whether or not the service exists as a predicate. We could use I don't want to get in the business of checking for systemd configuration files on disk. They can live in a lot of places, so that's just setting ourselves up for trouble. In any case, the chances of |
|
/retest |
|
All required CI runs are green. Failing jobs are oddities - their network came up. |
I see your point now. I did't realize OVS RPM was in 4.5 already, so checking unit file existence doesn't really help us. @dcbw fyi |
trozet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
/hold |
|
/hold cancel |
|
I fed this and the MCO PR to cluster-bot and things seemed to behave. The logging of e.g. |
Huh. Well, did the PR do the right thing? Do you have a link to the CI run? |
"things seemed to behave" The cluster did come up. I didn't get a link just kubeconfig so I can run |
|
@mccv1r0 if this looks good to you, can you /lgtm? |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mccv1r0, squeed, trozet The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
7 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@squeed: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
Needed to remove some of the readiness probes because they don't work, but this doesn't actually lose much coverage.
And liveness probes are not useful in this context.
This contains a hack to pass a strange case that an artifact of CI upgrade tests. It won't be in the wild, and could be removed.
Note to reviewers: hide whitespace changes :-)