Skip to content

Conversation

@machine424
Copy link
Contributor

no longer considering the old integer buckets is intentional

ensure all series involved in the different queries change during the integer->float transition so that rate calculation remains consistent across all series.

If apiserver_request_sli_duration_seconds_bucket{le="1"} had a last value of 15 and then apiserver_request_sli_duration_seconds_bucket{le="1.0"} reappeared with 20, the rate of {le="1.0"} over a range that covers the transition will not account for the 20−15=5 difference, as the two {le="1"} and {le="1.0"} series are distinct. But apiserver_request_sli_duration_seconds_count's rate will still take that 5 jump into account as the series doesn't change.

Replace apiserver_request_sli_duration_seconds_count with apiserver_request_sli_duration_seconds_bucket{le="60.0"} since they should be equal given that the timeout is 60s and cannot be customized.

This change is temporary to avoid silencing alerts during the transition.

Later, we'll revert back to using apiserver_request_sli_duration_seconds_count.

…eger->float

no longer considering the old integer buckets is intentional

ensure all series involved in the different queries change
during the integer->float transition so that rate calculation remains consistent across all series.

If apiserver_request_sli_duration_seconds_bucket{le="1"} had a last value of 15 and
then apiserver_request_sli_duration_seconds_bucket{le="1.0"} reappeared with 20, the
rate of {le="1.0"} over a range that covers the transition  will not
account for the 20−15=5 difference, as the two {le="1"} and {le="1.0"} series are distinct. But
apiserver_request_sli_duration_seconds_count's rate will still take
that 5 jump into account as the series doesn't change.

Replace apiserver_request_sli_duration_seconds_count with
apiserver_request_sli_duration_seconds_bucket{le="60.0"}
since they should be equal given that the timeout is 60s and cannot be customized.

This change is temporary to avoid silencing alerts during the transition.

Later, we'll revert back to using apiserver_request_sli_duration_seconds_count.
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 26, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 26, 2025

@machine424: This pull request references MON-4129 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.19.0" version, but no target version was set.

Details

In response to this:

no longer considering the old integer buckets is intentional

ensure all series involved in the different queries change during the integer->float transition so that rate calculation remains consistent across all series.

If apiserver_request_sli_duration_seconds_bucket{le="1"} had a last value of 15 and then apiserver_request_sli_duration_seconds_bucket{le="1.0"} reappeared with 20, the rate of {le="1.0"} over a range that covers the transition will not account for the 20−15=5 difference, as the two {le="1"} and {le="1.0"} series are distinct. But apiserver_request_sli_duration_seconds_count's rate will still take that 5 jump into account as the series doesn't change.

Replace apiserver_request_sli_duration_seconds_count with apiserver_request_sli_duration_seconds_bucket{le="60.0"} since they should be equal given that the timeout is 60s and cannot be customized.

This change is temporary to avoid silencing alerts during the transition.

Later, we'll revert back to using apiserver_request_sli_duration_seconds_count.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@machine424
Copy link
Contributor Author

/hold
until prometheus v3 is merged

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 26, 2025
machine424 added a commit to machine424/cluster-kube-apiserver-operator that referenced this pull request Feb 26, 2025
this also fixes a bad `le="30(.0)?"` as a result of openshift#1742

The chosen approach is to move to float buckets one Prometheus v3 is merged
forgetting about the integer historical data.

That will be done in openshift#1816
@openshift-ci openshift-ci bot requested review from deads2k and dgrisonnet February 26, 2025 09:06
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 26, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: machine424
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 26, 2025

@machine424: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 83fac63 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-gcp-operator-single-node 83fac63 link false /test e2e-gcp-operator-single-node
ci/prow/e2e-aws-ovn 83fac63 link true /test e2e-aws-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@machine424 machine424 marked this pull request as draft February 27, 2025 12:13
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 27, 2025
@machine424
Copy link
Contributor Author

/close
went with #1815

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 27, 2025
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot closed this Feb 27, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 27, 2025

@machine424: Closed this PR.

Details

In response to this:

/close
went with #1815

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

p0lyn0mial pushed a commit to p0lyn0mial/cluster-kube-apiserver-operator that referenced this pull request Dec 11, 2025
this also fixes a bad `le="30(.0)?"` as a result of openshift#1742

The chosen approach is to move to float buckets one Prometheus v3 is merged
forgetting about the integer historical data.

That will be done in openshift#1816
wangke19 pushed a commit to wangke19/cluster-kube-apiserver-operator that referenced this pull request Dec 22, 2025
this also fixes a bad `le="30(.0)?"` as a result of openshift#1742

The chosen approach is to move to float buckets one Prometheus v3 is merged
forgetting about the integer historical data.

That will be done in openshift#1816
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants