Skip to content

Comments

MON-4129: slos: accomodate for Prometheus v3 "le" normalization#1815

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
machine424:cddfr
Feb 27, 2025
Merged

MON-4129: slos: accomodate for Prometheus v3 "le" normalization#1815
openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
machine424:cddfr

Conversation

@machine424
Copy link
Contributor

@machine424 machine424 commented Feb 20, 2025

ensure all series involved in the different queries change
during the integer->float transition so that rate calculation remains consistent across all series.

If apiserver_request_sli_duration_seconds_bucket{le="1"} had a last value of 15 and
then apiserver_request_sli_duration_seconds_bucket{le=~"1.0"} reappeared with 20, the
rate calculated over a range where both {le="1"} and {le="1.0"} overlap will not
account for the 20−15=5 difference, as the two series are distinct. But
apiserver_request_sli_duration_seconds_count's rate will still take
that 5 jump into account as the series doesn't change.

Replace apiserver_request_sli_duration_seconds_count with
apiserver_request_sli_duration_seconds_bucket{le=~"60(.0)?"}
since they should be equal given that the timeout is 60s and cannot be customized.

This change is temporary to avoid silencing alerts or having to reset/forget historical integer buckets during the transition.

Later, we'll revert back to using apiserver_request_sli_duration_seconds_count.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 20, 2025

@machine424: This PR was included in a payload test run from openshift/prometheus#227
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.19-e2e-azure-ovn-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8fa3ab30-ef8d-11ef-806f-b3c24d19db2e-0

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 23, 2025

@machine424: This PR was included in a payload test run from openshift/prometheus#227
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/4169c650-f219-11ef-9664-fbdc48d35fd5-0

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 24, 2025

@machine424: This PR was included in a payload test run from openshift/prometheus#227
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e091d030-f2cb-11ef-8688-c206a05c46fb-0

@machine424 machine424 changed the title NO REVIEW NEEDED: placeholder/debug MON-4129: slos: ensure all series involved in the different queries change Feb 25, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 25, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 25, 2025

@machine424: This pull request references MON-4129 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.19.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 25, 2025

@machine424: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-operator-single-node a730f93 link false /test e2e-gcp-operator-single-node

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@machine424 machine424 marked this pull request as draft February 26, 2025 09:05
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 26, 2025
ensure all series involved in the different queries change
during the integer->float transition so that rate calculation remains consistent across all series.

If apiserver_request_sli_duration_seconds_bucket{le="1"} had a last value of 15 and
then apiserver_request_sli_duration_seconds_bucket{le=~"1.0"} reappeared with 20, the
rate calculated over a range where both {le="1"} and {le="1.0"} overlap will not
account for the 20−15=5 difference, as the two series are distinct. But
apiserver_request_sli_duration_seconds_count's rate will still take
that 5 jump into account as the series doesn't change.

Replace apiserver_request_sli_duration_seconds_count with
apiserver_request_sli_duration_seconds_bucket{le=~"60(.0)?"}
since they should be equal given that the timeout is 60s and cannot be customized.

This change is temporary to avoid silencing alerts or having to reset/forget historical integer buckets during the transition.

Later, we'll revert back to using apiserver_request_sli_duration_seconds_count.
@machine424 machine424 changed the title MON-4129: slos: ensure all series involved in the different queries change MON-4129: slos: accomodate for Prometheus v3 "le" normalization Feb 27, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 27, 2025

@machine424: This pull request references MON-4129 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.19.0" version, but no target version was set.

Details

In response to this:

ensure all series involved in the different queries change
during the integer->float transition so that rate calculation remains consistent across all series.

If apiserver_request_sli_duration_seconds_bucket{le="1"} had a last value of 15 and
then apiserver_request_sli_duration_seconds_bucket{le=~"1.0"} reappeared with 20, the
rate calculated over a range where both {le="1"} and {le="1.0"} overlap will not
account for the 20−15=5 difference, as the two series are distinct. But
apiserver_request_sli_duration_seconds_count's rate will still take
that 5 jump into account as the series doesn't change.

Replace apiserver_request_sli_duration_seconds_count with
apiserver_request_sli_duration_seconds_bucket{le=~"60(.0)?"}
since they should be equal given that the timeout is 60s and cannot be customized.

This change is temporary to avoid silencing alerts or having to reset/forget historical integer buckets during the transition.

Later, we'll revert back to using apiserver_request_sli_duration_seconds_count.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@machine424 machine424 marked this pull request as ready for review February 27, 2025 12:14
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 27, 2025
@openshift-ci openshift-ci bot requested review from p0lyn0mial and tkashem February 27, 2025 12:15
Copy link
Member

@dgrisonnet dgrisonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 27, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 27, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgrisonnet, machine424

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 27, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 27, 2025

@machine424: This PR was included in a payload test run from openshift/prometheus#227
trigger 4 job(s) of type blocking for the ci release of OCP 4.19

  • periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.19-e2e-gcp-ovn-upgrade
  • periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/2aa27100-f512-11ef-9240-acdf99acc697-0

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 27, 2025

@machine424: This PR was included in a payload test run from openshift/prometheus#227
trigger 15 job(s) of type blocking for the nightly release of OCP 4.19

  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-upgrade-ovn-single-node
  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.19-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn-conformance
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-serial
  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview
  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview-serial
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-driver-toolkit
  • periodic-ci-openshift-release-master-nightly-4.19-fips-payload-scan
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-bm
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-ipv6
  • periodic-ci-openshift-microshift-release-4.19-periodics-e2e-aws-ovn-ocp-conformance
  • periodic-ci-openshift-microshift-release-4.19-periodics-e2e-aws-ovn-ocp-conformance-serial
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-rosa-sts-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/22d0a410-f512-11ef-90a2-a00bea7bb8e4-0

@openshift-merge-bot openshift-merge-bot bot merged commit 56e7346 into openshift:master Feb 27, 2025
12 of 16 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-cluster-kube-apiserver-operator
This PR has been included in build ose-cluster-kube-apiserver-operator-container-v4.19.0-202502271739.p0.g56e7346.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants