Skip to content

Bug 1880591: 4.6 Use ovs-configuration file to determine if OVS is running in systemd#825

Merged
openshift-merge-robot merged 1 commit intoopenshift:release-4.6from
trozet:use_os_for_ovs_state2
Oct 8, 2020
Merged

Bug 1880591: 4.6 Use ovs-configuration file to determine if OVS is running in systemd#825
openshift-merge-robot merged 1 commit intoopenshift:release-4.6from
trozet:use_os_for_ovs_state2

Conversation

@trozet
Copy link
Contributor

@trozet trozet commented Oct 5, 2020

Checking files or symlinks on disks seems to be flaky and there are time
windows where files are laid down on the disk but the running version is
still 4.5. This changes the logic to determine if OVS is running on the
host by checking the for a file generated when ovs-configuration is
executed.

Signed-off-by: Tim Rozet trozet@redhat.com

@openshift-ci-robot
Copy link
Contributor

@trozet: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Details

In response to this:

[WIP] DNM 4.6 check Use OS version to determine if OVS is running in systemd

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 5, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 5, 2020
@trozet
Copy link
Contributor Author

trozet commented Oct 5, 2020

/test e2e-gcp-ovn

@openshift-ci-robot
Copy link
Contributor

@trozet: The specified target(s) for /test were not found.
The following commands are available to trigger jobs:

  • /test e2e-aws-multitenant
  • /test e2e-aws-ovn
  • /test e2e-aws-sdn-multi
  • /test e2e-aws-sdn-single
  • /test e2e-azure-ovn
  • /test e2e-gcp
  • /test e2e-gcp-ovn-upgrade
  • /test e2e-metal-ipi
  • /test e2e-operator-with-custom-vxlan-port
  • /test e2e-ovn-hybrid-step-registry
  • /test e2e-ovn-step-registry
  • /test e2e-upgrade
  • /test e2e-vsphere-ovn
  • /test e2e-windows-hybrid-network
  • /test images
  • /test unit
  • /test verify

Use /test all to run the following jobs:

  • pull-ci-openshift-cluster-network-operator-release-4.6-e2e-aws-sdn-multi
  • pull-ci-openshift-cluster-network-operator-release-4.6-e2e-aws-sdn-single
  • pull-ci-openshift-cluster-network-operator-release-4.6-e2e-azure-ovn
  • pull-ci-openshift-cluster-network-operator-release-4.6-e2e-gcp
  • pull-ci-openshift-cluster-network-operator-release-4.6-e2e-metal-ipi
  • pull-ci-openshift-cluster-network-operator-release-4.6-e2e-operator-with-custom-vxlan-port
  • pull-ci-openshift-cluster-network-operator-release-4.6-e2e-ovn-hybrid-step-registry
  • pull-ci-openshift-cluster-network-operator-release-4.6-e2e-ovn-step-registry
  • pull-ci-openshift-cluster-network-operator-release-4.6-e2e-upgrade
  • pull-ci-openshift-cluster-network-operator-release-4.6-e2e-vsphere-ovn
  • pull-ci-openshift-cluster-network-operator-release-4.6-e2e-windows-hybrid-network
  • pull-ci-openshift-cluster-network-operator-release-4.6-images
  • pull-ci-openshift-cluster-network-operator-release-4.6-unit
  • pull-ci-openshift-cluster-network-operator-release-4.6-verify
Details

In response to this:

/test e2e-gcp-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@trozet
Copy link
Contributor Author

trozet commented Oct 5, 2020

/test e2e-gcp-ovn-upgrade

@trozet
Copy link
Contributor Author

trozet commented Oct 5, 2020

cluster-botAPP 11:42 AM
job test upgrade 4.5.0-0.nightly #825 aws,ovn succeeded

cluster-botAPP 11:50 AM
job test upgrade 4.5.0-0.nightly #825 aws succeeded

cluster-botAPP 12:01 PM
job test upgrade 4.5.0-0.nightly #825 gcp,ovn succeeded
12:06
job test upgrade 4.5.0-0.nightly #825 gcp succeeded

@juanluisvaladas
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 5, 2020
@trozet trozet changed the title [WIP] DNM 4.6 check Use OS version to determine if OVS is running in systemd Bug 1880591: 4.6 check Use OS version to determine if OVS is running in systemd Oct 5, 2020
@openshift-ci-robot openshift-ci-robot added bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Oct 5, 2020
@openshift-ci-robot
Copy link
Contributor

@trozet: This pull request references Bugzilla bug 1880591, which is invalid:

  • expected Bugzilla bug 1880591 to depend on a bug targeting a release in 4.7.0 and in one of the following states: MODIFIED, VERIFIED, but no dependents were found

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1880591: 4.6 check Use OS version to determine if OVS is running in systemd

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Oct 5, 2020
@trozet
Copy link
Contributor Author

trozet commented Oct 5, 2020

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Oct 5, 2020
@openshift-ci-robot
Copy link
Contributor

@trozet: This pull request references Bugzilla bug 1880591, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 1885344 is in the state VERIFIED, which is one of the valid states (MODIFIED, VERIFIED)
  • dependent Bugzilla bug 1885344 targets the "4.7.0" release, which is one of the valid target releases: 4.7.0
  • bug has dependents
Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Oct 5, 2020
@trozet
Copy link
Contributor Author

trozet commented Oct 5, 2020

/retest

2 similar comments
@trozet
Copy link
Contributor Author

trozet commented Oct 5, 2020

/retest

@trozet
Copy link
Contributor Author

trozet commented Oct 5, 2020

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@trozet
Copy link
Contributor Author

trozet commented Oct 6, 2020

vsphere and hybrid step are only failing:
Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured

which is being fixed by #824

@trozet
Copy link
Contributor Author

trozet commented Oct 6, 2020

/override ci/prow/e2e-ovn-hybrid-step-registry

@trozet
Copy link
Contributor Author

trozet commented Oct 6, 2020

/override ci/prow/e2e-vsphere-ovn

@trozet
Copy link
Contributor Author

trozet commented Oct 7, 2020

/retest

@danwinship
Copy link
Contributor

/hold cancel
/lgtm

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 7, 2020
@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 7, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, juanluisvaladas, trozet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@trozet
Copy link
Contributor Author

trozet commented Oct 7, 2020

/hold

until I can verify job logs and cluster bot upgrades

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 7, 2020
@trozet
Copy link
Contributor Author

trozet commented Oct 7, 2020

logs look ok to me

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 7, 2020
@trozet
Copy link
Contributor Author

trozet commented Oct 7, 2020

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@trozet
Copy link
Contributor Author

trozet commented Oct 7, 2020

@juanluisvaladas @tssurya I'm seeing vsphere failed because:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/825/pull-ci-openshift-cluster-network-operator-release-4.6-e2e-vsphere-ovn/1313891263669342208

fail [github.com/openshift/origin/test/extended/util/prometheus/helpers.go:174]: Expected
    <map[string]error | len:1>: {
        "ALERTS{alertname!~\"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards\",alertstate=\"firing\",severity!=\"info\"} >= 1": {
            s: "promQL query: ALERTS{alertname!~\"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards\",alertstate=\"firing\",severity!=\"info\"} >= 1 had reported incorrect results:\n[{\"metric\":{\"__name__\":\"ALERTS\",\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"ovnkube-master\",\"namespace\":\"openshift-ovn-kubernetes\",\"service\":\"ovn-kubernetes-master\",\"severity\":\"warning\"},\"value\":[1602096410.162,\"1\"]}]"

I don't see ovnkube master container go down. The value here looks suspicious:[1602096410.162,"1"] but I don't know anything about prom alerts. This PR run included the fix for #826

@trozet
Copy link
Contributor Author

trozet commented Oct 7, 2020

vsphere passed all tests and is running in systemd OVS. The alert was the only thing it failed...overriding

/override ci/prow/e2e-vsphere-ovn

@openshift-ci-robot
Copy link
Contributor

@trozet: Overrode contexts on behalf of trozet: ci/prow/e2e-vsphere-ovn

Details

In response to this:

vsphere passed all tests and is running in systemd OVS. The alert was the only thing it failed...overriding

/override ci/prow/e2e-vsphere-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tssurya
Copy link
Contributor

tssurya commented Oct 7, 2020

I don't see ovnkube master container go down. The value here looks suspicious:[1602096410.162,"1"] but I don't know anything about prom alerts. This PR run included the fix for #826

This alert's value is indeed strange.. usually it doesn't exceed 100 since we take a percentage of the value. So for example since we have 3 ovnkube-master pods and say one of the metrics port is not reachable, then 1/3*100 = 33.33 would be the value. I have no idea how the value here is so huge :D

@vrutkovs
Copy link

vrutkovs commented Oct 7, 2020

First is a timestamp, the second one is value. I'm seeing similar alerts in OKD nightly runs, seems started at around Oct 3 and flaking pretty often

@tssurya
Copy link
Contributor

tssurya commented Oct 7, 2020

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/825/pull-ci-openshift-cluster-network-operator-release-4.6-e2e-vsphere-ovn/1313891263669342208

fail [github.com/openshift/origin/test/extended/util/prometheus/helpers.go:174]: Expected
    <map[string]error | len:1>: {
        "ALERTS{alertname!~\"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards\",alertstate=\"firing\",severity!=\"info\"} >= 1": {
            s: "promQL query: ALERTS{alertname!~\"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards\",alertstate=\"firing\",severity!=\"info\"} >= 1 had reported incorrect results:\n[{\"metric\":{\"__name__\":\"ALERTS\",\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"ovnkube-master\",\"namespace\":\"openshift-ovn-kubernetes\",\"service\":\"ovn-kubernetes-master\",\"severity\":\"warning\"},\"value\":[1602096410.162,\"1\"]}]"

I don't see ovnkube master container go down. The value here looks suspicious:[1602096410.162,"1"] but I don't know anything about prom alerts. This PR run included the fix for #826

For this particular run, there are clearly some issues:

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/825/pull-ci-openshift-cluster-network-operator-release-4.6-e2e-vsphere-ovn/1313891263669342208/artifacts/e2e-vsphere-ovn/gather-extra/pods/openshift-ovn-kubernetes_ovnkube-master-62rxw_ovnkube-master.log

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/825/pull-ci-openshift-cluster-network-operator-release-4.6-e2e-vsphere-ovn/1313891263669342208/artifacts/e2e-vsphere-ovn/gather-extra/pods/openshift-ovn-kubernetes_ovnkube-master-62rxw_kube-rbac-proxy.log

Although I don't see this alert in previous runs of this job.

@tssurya
Copy link
Contributor

tssurya commented Oct 7, 2020

First is a timestamp, the second one is value. I'm seeing similar alerts in OKD nightly runs, seems started at around Oct 3 and flaking pretty often

This should have got fixed with #826

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@trozet
Copy link
Contributor Author

trozet commented Oct 8, 2020

/override ci/prow/e2e-vsphere-ovn

@openshift-ci-robot
Copy link
Contributor

@trozet: Overrode contexts on behalf of trozet: ci/prow/e2e-vsphere-ovn

Details

In response to this:

/override ci/prow/e2e-vsphere-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 893d3e9 into openshift:release-4.6 Oct 8, 2020
@openshift-ci-robot
Copy link
Contributor

@trozet: All pull requests linked via external trackers have merged:

Bugzilla bug 1880591 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1880591: 4.6 Use ovs-configuration file to determine if OVS is running in systemd

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.


# Check to see if ovs is provided by the node:
if [[ -L '/host/etc/systemd/system/network-online.target.wants/ovs-configuration.service' ]]; then
if [ -f /host/var/run/ovs-config-executed ]; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this file gets generated? I don't see it being created by MCO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. staff-eng-approved Indicates a release branch PR has been approved by a staff engineer (formerly group/pillar lead).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants