Skip to content

Conversation

@suleymanakbas91
Copy link
Contributor

@suleymanakbas91 suleymanakbas91 commented Sep 6, 2022

This PR adds two new alerting rules IngressWithoutClassName and UnmanagedRoutes that use the custom metrics created in
openshift/route-controller-manager#8.

  • IngressWithoutClassName fires when there is an Ingress with an unset IngressClassName for longer than one day.
  • UnmanagedRoutes fires when there is a Route owned by an Ingress no longer managed.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 6, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 6, 2022

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Sep 6, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 6, 2022

@suleymanakbas91: This pull request references Bugzilla bug 1962502, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.12.0) matches configured target release for branch (4.12.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @quarterpin

Details

In response to this:

Bug 1962502: Add alerts for unmanaged Routes and Ingresses without IngressClassName

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from quarterpin September 6, 2022 09:04
@suleymanakbas91 suleymanakbas91 marked this pull request as ready for review September 6, 2022 12:16
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 6, 2022

@suleymanakbas91: This pull request references Bugzilla bug 1962502, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.12.0) matches configured target release for branch (4.12.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @quarterpin

Details

In response to this:

Bug 1962502: Add alerts for unmanaged Routes and Ingresses without IngressClassName

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 6, 2022
@suleymanakbas91
Copy link
Contributor Author

/retest

@candita
Copy link
Contributor

candita commented Sep 7, 2022

/assign
/assign @rfredette

@suleymanakbas91
Copy link
Contributor Author

/retest

@suleymanakbas91
Copy link
Contributor Author

/retest

1 similar comment
@brandisher
Copy link
Contributor

/retest

@quarterpin
Copy link

/label qe-approved

Verified via pre-merge workflow. For more details : https://bugzilla.redhat.com/show_bug.cgi?id=1962502#c7

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Sep 20, 2022
@brandisher
Copy link
Contributor

/retest

3 similar comments
@brandisher
Copy link
Contributor

/retest

@brandisher
Copy link
Contributor

/retest

@suleymanakbas91
Copy link
Contributor Author

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 5, 2022

@suleymanakbas91: This pull request references Bugzilla bug 1962502, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.12.0) matches configured target release for branch (4.12.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @zhaozhanqi

Details

In response to this:

Bug 1962502: Add alerts for unmanaged Routes and Ingresses without IngressClassName

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from zhaozhanqi October 5, 2022 14:47
@candita
Copy link
Contributor

candita commented Oct 6, 2022

/retest-required

@candita
Copy link
Contributor

candita commented Oct 24, 2022

Strange test result. It doesn't like the Alerts, but I don't think this has anything to do with the new Alerts.

Oct 24 17:16:53.148: INFO: Alerts were detected during upgrade which are allowed:

alert etcdMemberCommunicationSlow pending for 120.93899989128113 seconds with labels: {To="762c0325ba8c065c", endpoint="etcd-metrics", instance="10.0.130.172:9979", job="etcd", namespace="openshift-etcd", pod="etcd-ip-10-0-130-172.us-west-1.compute.internal", service="etcd", severity="warning"} (allowed: Excluded because it triggers during upgrade (detects ~5m of high latency immediately preceeding the end of the test), and we don't want to change the alert because it is correct)
alert etcdMemberCommunicationSlow pending for 120.93899989128113 seconds with labels: {To="762c0325ba8c065c", endpoint="etcd-metrics", instance="10.0.190.160:9979", job="etcd", namespace="openshift-etcd", pod="etcd-ip-10-0-190-160.us-west-1.compute.internal", service="etcd", severity="warning"} (allowed: Excluded because it triggers during upgrade (detects ~5m of high latency immediately preceeding the end of the test), and we don't want to change the alert because it is correct)
Oct 24 17:16:53.149: FAIL: Unexpected alerts fired or pending during the upgrade:

alert TargetDown fired for 360 seconds with labels: {job="metrics", namespace="openshift-kube-scheduler-operator", service="metrics", severity="warning"}

/test e2e-aws-ovn-upgrade

@suleymanakbas91
Copy link
Contributor Author

I think you missed my point on this. Instead of testing openshift_ingress_to_route_controller_ingress_without_class_name{name=<...>}, try without the label, which is what the alert uses: openshift_ingress_to_route_controller_ingress_without_class_name. Let's find out if it aggregates in the case where there are multiples.

@candita The following image shows openshift_ingress_to_route_controller_ingress_without_class_name without any labels. It does not aggregate any one of them but just list them all seperately:

Screenshot 2022-10-24 at 17 18 07

@suleymanakbas91
Copy link
Contributor Author

Failures seem to be unrelated..

/retest

@suleymanakbas91
Copy link
Contributor Author

Unrelated failures..

/retest

@candita
Copy link
Contributor

candita commented Oct 25, 2022

/approve

@candita
Copy link
Contributor

candita commented Oct 25, 2022

/test e2e-aws-operator

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 25, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: candita

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 25, 2022
@brandisher
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 25, 2022
@brandisher
Copy link
Contributor

/test e2e-aws-operator

@suleymanakbas91
Copy link
Contributor Author

/retest

5 similar comments
@suleymanakbas91
Copy link
Contributor Author

/retest

@suleymanakbas91
Copy link
Contributor Author

/retest

@suleymanakbas91
Copy link
Contributor Author

/retest

@suleymanakbas91
Copy link
Contributor Author

/retest

@suleymanakbas91
Copy link
Contributor Author

/retest

@Miciah
Copy link
Contributor

Miciah commented Oct 27, 2022

/test e2e-aws-operator
/test e2e-azure-operator
/test e2e-gcp-operator

@suleymanakbas91
Copy link
Contributor Author

/retest

2 similar comments
@suleymanakbas91
Copy link
Contributor Author

/retest

@suleymanakbas91
Copy link
Contributor Author

/retest

@Miciah
Copy link
Contributor

Miciah commented Oct 27, 2022

e2e-aws-operator failed because kube-apiserver reported NodeInstallerProgressing. Because (a) the e2e-azure-operator and e2e-gcp-operator jobs passed, (b) we know that e2e-aws-operator is flaky and that the failure we saw is a frequently occurring flake, and (c) the change in this PR is low risk, I'm going to override e2e-aws-operator.
/override ci/prow/e2e-aws-operator

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 27, 2022

@Miciah: Overrode contexts on behalf of Miciah: ci/prow/e2e-aws-operator

Details

In response to this:

e2e-aws-operator failed because kube-apiserver reported NodeInstallerProgressing. Because (a) the e2e-azure-operator and e2e-gcp-operator jobs passed, (b) we know that e2e-aws-operator is flaky and that the failure we saw is a frequently occurring flake, and (c) the change in this PR is low risk, I'm going to override e2e-aws-operator.
/override ci/prow/e2e-aws-operator

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 27, 2022

@suleymanakbas91: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 9c4689f into openshift:master Oct 27, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 27, 2022

@suleymanakbas91: All pull requests linked via external trackers have merged:

Bugzilla bug 1962502 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1962502: Add alerts for unmanaged Routes and Ingresses without IngressClassName

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants