Skip to content

[monitoring] Revert CPU Usage rule changes#172913

Merged
miltonhultgren merged 6 commits intoelastic:mainfrom
miltonhultgren:revert-cpu-rule-changes
Dec 8, 2023
Merged

[monitoring] Revert CPU Usage rule changes#172913
miltonhultgren merged 6 commits intoelastic:mainfrom
miltonhultgren:revert-cpu-rule-changes

Conversation

@miltonhultgren
Copy link
Contributor

@miltonhultgren miltonhultgren commented Dec 8, 2023

Reverts #159351
Reverts #167244

Due to the many unexpected issues that these changes introduced we've decided to revert these changes until we have better solutions for the problems we've learnt about.

Problems:

To address some of these issues we also need more functionality in the alerting framework to be able to register secondary actions so that we may trigger non-oncall workflows for when a rule faces issues with evaluating the stats.

Original issue #116128

@ghost
Copy link

ghost commented Dec 8, 2023

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • /oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@crespocarlos crespocarlos self-requested a review December 8, 2023 10:05
Copy link
Contributor

@crespocarlos crespocarlos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@miltonhultgren miltonhultgren enabled auto-merge (squash) December 8, 2023 11:52
@kibana-ci
Copy link

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@miltonhultgren miltonhultgren merged commit 55bc6d5 into elastic:main Dec 8, 2023
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Dec 8, 2023
Reverts elastic#159351
Reverts elastic#167244

Due to the many unexpected issues that these changes introduced we've
decided to revert these changes until we have better solutions for the
problems we've learnt about.

Problems:
- Gaps in data cause alerts to fire (see next point)
- Normal CPU rescaling causes alerts to fire
elastic#160905
- Any error fires an alert (since there is no other way to inform the
user about the problems faced by the rule executor)
- Many assumptions about cgroups only being for container users are
wrong

To address some of these issues we also need more functionality in the
alerting framework to be able to register secondary actions so that we
may trigger non-oncall workflows for when a rule faces issues with
evaluating the stats.

Original issue elastic#116128

(cherry picked from commit 55bc6d5)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.12

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Dec 8, 2023
# Backport

This will backport the following commits from `main` to `8.12`:
- [[monitoring] Revert CPU Usage rule changes
(#172913)](#172913)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Milton
Hultgren","email":"milton.hultgren@elastic.co"},"sourceCommit":{"committedDate":"2023-12-08T15:25:23Z","message":"[monitoring]
Revert CPU Usage rule changes (#172913)\n\nReverts
https://github.com/elastic/kibana/pull/159351\r\nReverts
https://github.com/elastic/kibana/pull/167244\r\n\r\nDue to the many
unexpected issues that these changes introduced we've\r\ndecided to
revert these changes until we have better solutions for the\r\nproblems
we've learnt about.\r\n\r\nProblems:\r\n- Gaps in data cause alerts to
fire (see next point)\r\n- Normal CPU rescaling causes alerts to
fire\r\nhttps://github.com//issues/160905\r\n- Any error
fires an alert (since there is no other way to inform the\r\nuser about
the problems faced by the rule executor)\r\n- Many assumptions about
cgroups only being for container users are\r\nwrong\r\n\r\nTo address
some of these issues we also need more functionality in the\r\nalerting
framework to be able to register secondary actions so that we\r\nmay
trigger non-oncall workflows for when a rule faces issues
with\r\nevaluating the stats.\r\n\r\nOriginal issue
https://github.com/elastic/kibana/issues/116128","sha":"55bc6d505977e8831633cc76e0f46b2ca66ef559","branchLabelMapping":{"^v8.13.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","backport:prev-minor","v8.12.0","v8.13.0"],"number":172913,"url":"https://github.com/elastic/kibana/pull/172913","mergeCommit":{"message":"[monitoring]
Revert CPU Usage rule changes (#172913)\n\nReverts
https://github.com/elastic/kibana/pull/159351\r\nReverts
https://github.com/elastic/kibana/pull/167244\r\n\r\nDue to the many
unexpected issues that these changes introduced we've\r\ndecided to
revert these changes until we have better solutions for the\r\nproblems
we've learnt about.\r\n\r\nProblems:\r\n- Gaps in data cause alerts to
fire (see next point)\r\n- Normal CPU rescaling causes alerts to
fire\r\nhttps://github.com//issues/160905\r\n- Any error
fires an alert (since there is no other way to inform the\r\nuser about
the problems faced by the rule executor)\r\n- Many assumptions about
cgroups only being for container users are\r\nwrong\r\n\r\nTo address
some of these issues we also need more functionality in the\r\nalerting
framework to be able to register secondary actions so that we\r\nmay
trigger non-oncall workflows for when a rule faces issues
with\r\nevaluating the stats.\r\n\r\nOriginal issue
https://github.com/elastic/kibana/issues/116128","sha":"55bc6d505977e8831633cc76e0f46b2ca66ef559"}},"sourceBranch":"main","suggestedTargetBranches":["8.12"],"targetPullRequestStates":[{"branch":"8.12","label":"v8.12.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.13.0","labelRegex":"^v8.13.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/172913","number":172913,"mergeCommit":{"message":"[monitoring]
Revert CPU Usage rule changes (#172913)\n\nReverts
https://github.com/elastic/kibana/pull/159351\r\nReverts
https://github.com/elastic/kibana/pull/167244\r\n\r\nDue to the many
unexpected issues that these changes introduced we've\r\ndecided to
revert these changes until we have better solutions for the\r\nproblems
we've learnt about.\r\n\r\nProblems:\r\n- Gaps in data cause alerts to
fire (see next point)\r\n- Normal CPU rescaling causes alerts to
fire\r\nhttps://github.com//issues/160905\r\n- Any error
fires an alert (since there is no other way to inform the\r\nuser about
the problems faced by the rule executor)\r\n- Many assumptions about
cgroups only being for container users are\r\nwrong\r\n\r\nTo address
some of these issues we also need more functionality in the\r\nalerting
framework to be able to register secondary actions so that we\r\nmay
trigger non-oncall workflows for when a rule faces issues
with\r\nevaluating the stats.\r\n\r\nOriginal issue
https://github.com/elastic/kibana/issues/116128","sha":"55bc6d505977e8831633cc76e0f46b2ca66ef559"}}]}]
BACKPORT-->

Co-authored-by: Milton Hultgren <milton.hultgren@elastic.co>
@Numpypy
Copy link

Numpypy commented Mar 27, 2024

Has this issue been fixed in version 8.12?

@miltonhultgren
Copy link
Contributor Author

@Numpypy Yes, this revert should be part of 8.12 if I look at the labels

@Numpypy
Copy link

Numpypy commented Mar 28, 2024

@miltonhultgren Thanks. I have two servers with version 8.11.2 installed, one of which runs with an error "Failed to resolve needed aggregations for CPU Usage Rule", while the other is running properly,it is hard to understand.😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants