[monitoring] Revert CPU Usage rule changes#172913
Merged
miltonhultgren merged 6 commits intoelastic:mainfrom Dec 8, 2023
Merged
[monitoring] Revert CPU Usage rule changes#172913miltonhultgren merged 6 commits intoelastic:mainfrom
miltonhultgren merged 6 commits intoelastic:mainfrom
Conversation
…ic#167244)" This reverts commit 833c075.
…tic#159351)" This reverts commit bcb1649.
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
💚 Build Succeeded
Metrics [docs]
History
To update your PR or re-run it, just comment with: |
kibanamachine
pushed a commit
to kibanamachine/kibana
that referenced
this pull request
Dec 8, 2023
Reverts elastic#159351 Reverts elastic#167244 Due to the many unexpected issues that these changes introduced we've decided to revert these changes until we have better solutions for the problems we've learnt about. Problems: - Gaps in data cause alerts to fire (see next point) - Normal CPU rescaling causes alerts to fire elastic#160905 - Any error fires an alert (since there is no other way to inform the user about the problems faced by the rule executor) - Many assumptions about cgroups only being for container users are wrong To address some of these issues we also need more functionality in the alerting framework to be able to register secondary actions so that we may trigger non-oncall workflows for when a rule faces issues with evaluating the stats. Original issue elastic#116128 (cherry picked from commit 55bc6d5)
Contributor
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
kibanamachine
added a commit
that referenced
this pull request
Dec 8, 2023
# Backport This will backport the following commits from `main` to `8.12`: - [[monitoring] Revert CPU Usage rule changes (#172913)](#172913) <!--- Backport version: 8.9.7 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Milton Hultgren","email":"milton.hultgren@elastic.co"},"sourceCommit":{"committedDate":"2023-12-08T15:25:23Z","message":"[monitoring] Revert CPU Usage rule changes (#172913)\n\nReverts https://github.com/elastic/kibana/pull/159351\r\nReverts https://github.com/elastic/kibana/pull/167244\r\n\r\nDue to the many unexpected issues that these changes introduced we've\r\ndecided to revert these changes until we have better solutions for the\r\nproblems we've learnt about.\r\n\r\nProblems:\r\n- Gaps in data cause alerts to fire (see next point)\r\n- Normal CPU rescaling causes alerts to fire\r\nhttps://github.com//issues/160905\r\n- Any error fires an alert (since there is no other way to inform the\r\nuser about the problems faced by the rule executor)\r\n- Many assumptions about cgroups only being for container users are\r\nwrong\r\n\r\nTo address some of these issues we also need more functionality in the\r\nalerting framework to be able to register secondary actions so that we\r\nmay trigger non-oncall workflows for when a rule faces issues with\r\nevaluating the stats.\r\n\r\nOriginal issue https://github.com/elastic/kibana/issues/116128","sha":"55bc6d505977e8831633cc76e0f46b2ca66ef559","branchLabelMapping":{"^v8.13.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","backport:prev-minor","v8.12.0","v8.13.0"],"number":172913,"url":"https://github.com/elastic/kibana/pull/172913","mergeCommit":{"message":"[monitoring] Revert CPU Usage rule changes (#172913)\n\nReverts https://github.com/elastic/kibana/pull/159351\r\nReverts https://github.com/elastic/kibana/pull/167244\r\n\r\nDue to the many unexpected issues that these changes introduced we've\r\ndecided to revert these changes until we have better solutions for the\r\nproblems we've learnt about.\r\n\r\nProblems:\r\n- Gaps in data cause alerts to fire (see next point)\r\n- Normal CPU rescaling causes alerts to fire\r\nhttps://github.com//issues/160905\r\n- Any error fires an alert (since there is no other way to inform the\r\nuser about the problems faced by the rule executor)\r\n- Many assumptions about cgroups only being for container users are\r\nwrong\r\n\r\nTo address some of these issues we also need more functionality in the\r\nalerting framework to be able to register secondary actions so that we\r\nmay trigger non-oncall workflows for when a rule faces issues with\r\nevaluating the stats.\r\n\r\nOriginal issue https://github.com/elastic/kibana/issues/116128","sha":"55bc6d505977e8831633cc76e0f46b2ca66ef559"}},"sourceBranch":"main","suggestedTargetBranches":["8.12"],"targetPullRequestStates":[{"branch":"8.12","label":"v8.12.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.13.0","labelRegex":"^v8.13.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/172913","number":172913,"mergeCommit":{"message":"[monitoring] Revert CPU Usage rule changes (#172913)\n\nReverts https://github.com/elastic/kibana/pull/159351\r\nReverts https://github.com/elastic/kibana/pull/167244\r\n\r\nDue to the many unexpected issues that these changes introduced we've\r\ndecided to revert these changes until we have better solutions for the\r\nproblems we've learnt about.\r\n\r\nProblems:\r\n- Gaps in data cause alerts to fire (see next point)\r\n- Normal CPU rescaling causes alerts to fire\r\nhttps://github.com//issues/160905\r\n- Any error fires an alert (since there is no other way to inform the\r\nuser about the problems faced by the rule executor)\r\n- Many assumptions about cgroups only being for container users are\r\nwrong\r\n\r\nTo address some of these issues we also need more functionality in the\r\nalerting framework to be able to register secondary actions so that we\r\nmay trigger non-oncall workflows for when a rule faces issues with\r\nevaluating the stats.\r\n\r\nOriginal issue https://github.com/elastic/kibana/issues/116128","sha":"55bc6d505977e8831633cc76e0f46b2ca66ef559"}}]}] BACKPORT--> Co-authored-by: Milton Hultgren <milton.hultgren@elastic.co>
|
Has this issue been fixed in version 8.12? |
Contributor
Author
|
@Numpypy Yes, this revert should be part of 8.12 if I look at the labels |
|
@miltonhultgren Thanks. I have two servers with version 8.11.2 installed, one of which runs with an error "Failed to resolve needed aggregations for CPU Usage Rule", while the other is running properly,it is hard to understand.😅 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reverts #159351
Reverts #167244
Due to the many unexpected issues that these changes introduced we've decided to revert these changes until we have better solutions for the problems we've learnt about.
Problems:
To address some of these issues we also need more functionality in the alerting framework to be able to register secondary actions so that we may trigger non-oncall workflows for when a rule faces issues with evaluating the stats.
Original issue #116128