-
Notifications
You must be signed in to change notification settings - Fork 181
MON-4129: slos: move to float buckets as Prometheus v3 normalized integer->float #1816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…eger->float
no longer considering the old integer buckets is intentional
ensure all series involved in the different queries change
during the integer->float transition so that rate calculation remains consistent across all series.
If apiserver_request_sli_duration_seconds_bucket{le="1"} had a last value of 15 and
then apiserver_request_sli_duration_seconds_bucket{le="1.0"} reappeared with 20, the
rate of {le="1.0"} over a range that covers the transition will not
account for the 20−15=5 difference, as the two {le="1"} and {le="1.0"} series are distinct. But
apiserver_request_sli_duration_seconds_count's rate will still take
that 5 jump into account as the series doesn't change.
Replace apiserver_request_sli_duration_seconds_count with
apiserver_request_sli_duration_seconds_bucket{le="60.0"}
since they should be equal given that the timeout is 60s and cannot be customized.
This change is temporary to avoid silencing alerts during the transition.
Later, we'll revert back to using apiserver_request_sli_duration_seconds_count.
|
@machine424: This pull request references MON-4129 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.19.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/hold |
this also fixes a bad `le="30(.0)?"` as a result of openshift#1742 The chosen approach is to move to float buckets one Prometheus v3 is merged forgetting about the integer historical data. That will be done in openshift#1816
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: machine424 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@machine424: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/close |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@machine424: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
this also fixes a bad `le="30(.0)?"` as a result of openshift#1742 The chosen approach is to move to float buckets one Prometheus v3 is merged forgetting about the integer historical data. That will be done in openshift#1816
this also fixes a bad `le="30(.0)?"` as a result of openshift#1742 The chosen approach is to move to float buckets one Prometheus v3 is merged forgetting about the integer historical data. That will be done in openshift#1816
no longer considering the old integer buckets is intentional
ensure all series involved in the different queries change during the integer->float transition so that rate calculation remains consistent across all series.
If apiserver_request_sli_duration_seconds_bucket{le="1"} had a last value of 15 and then apiserver_request_sli_duration_seconds_bucket{le="1.0"} reappeared with 20, the rate of {le="1.0"} over a range that covers the transition will not account for the 20−15=5 difference, as the two {le="1"} and {le="1.0"} series are distinct. But apiserver_request_sli_duration_seconds_count's rate will still take that 5 jump into account as the series doesn't change.
Replace apiserver_request_sli_duration_seconds_count with apiserver_request_sli_duration_seconds_bucket{le="60.0"} since they should be equal given that the timeout is 60s and cannot be customized.
This change is temporary to avoid silencing alerts during the transition.
Later, we'll revert back to using apiserver_request_sli_duration_seconds_count.