Skip to content

Conversation

@cesco-f
Copy link
Contributor

@cesco-f cesco-f commented Jul 30, 2025

This PR is to improve how we recover monitor status alerts, issue #193613 .

It does not close the issue, it is currently disabled because of the intermediate release process.

This commit has to be reverted to enable the feature and it will be done in a separate PR.

The user will now be able to select how they want to recover the alert, either when the monitor is back up or when the condition is no longer met.

For example, with the current implementation if the rule is "is down 1 time in the past 24 hours" it won't recover until there are 0 downs in the past 24 hours.

Enabling this option will make the monitor recover as soon as a UP monitor is received.

Acceptance criteria:
Users have the option where to specify when the recovery should occur:

  • The first UP test comes in ✅
  • The alert condition is no longer met ✅
  • Default is the first UP test ✅

Screenshot:
Screenshot 2025-07-30 at 15 06 46

Video:

Screen.Recording.2025-07-29.at.16.24.20.mov

@cesco-f cesco-f added release_note:enhancement backport:skip This PR does not require backporting labels Jul 30, 2025
@cesco-f cesco-f requested review from a team as code owners July 30, 2025 13:16
@cesco-f cesco-f added the v9.2.0 label Jul 30, 2025
@botelastic botelastic bot added the Team:actionable-obs Formerly "obs-ux-management", responsible for SLO, o11y alerting, significant events, & synthetics. label Jul 30, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

@github-actions github-actions bot added the author:obs-ux-management PRs authored by the obs ux management team label Jul 30, 2025
Copy link
Contributor

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResponseOps changes LGTM. Thanks for the feature flag / intermediate release aspects!

I'm not sure of OAS docs need to be changed because of this, I think I do often see changes there ...

@cesco-f
Copy link
Contributor Author

cesco-f commented Jul 31, 2025

Thanks @pmuellr!

I've checked this file and it looks like we don't have docs for this rule yet.

Screenshot 2025-07-31 at 09 57 16

@cesco-f cesco-f force-pushed the recover-first-up branch from 895c506 to 7cb2c47 Compare July 31, 2025 08:25
@kibanamachine
Copy link
Contributor

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#8947

[❌] x-pack/solutions/observability/test/api_integration_deployment_agnostic/feature_flag_configs/serverless/oblt.serverless.config.ts: 0/25 tests passed.

see run history

@cesco-f
Copy link
Contributor Author

cesco-f commented Jul 31, 2025

I don't think the failed flaky test run has something to do with changes in this PR.

@elasticmachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
synthetics 1.0MB 1.0MB +936.0B

History

Copy link
Contributor

@shahzad31 shahzad31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !!!

@drewpost should this be true by default?

@cesco-f cesco-f merged commit 933d2ba into elastic:main Aug 5, 2025
12 checks passed
@cesco-f cesco-f deleted the recover-first-up branch August 5, 2025 12:22
delanni pushed a commit to delanni/kibana that referenced this pull request Aug 5, 2025
This PR is to improve how we recover monitor status alerts, issue
elastic#193613 .

It does not close the issue, it is currently disabled because of the
[intermediate release
process](https://docs.google.com/document/d/1mU5jlIfCKyXdDPtEzAz1xTpFXFCWxqdO5ldYRVO_hgM).

[This
commit](elastic@dc00912)
has to be reverted to enable the feature and it will be done in a
separate PR.

The user will now be able to select how they want to recover the alert,
either when the monitor is back up or when the condition is no longer
met.

For example, with the current implementation if the rule is "is down 1
time in the past 24 hours" it won't recover until there are 0 downs in
the past 24 hours.

Enabling this option will make the monitor recover as soon as a UP
monitor is received.

**Acceptance criteria:**
Users have the option where to specify when the recovery should occur:

- The first UP test comes in ✅
- The alert condition is no longer met ✅
- Default is the first UP test ✅

**Screenshot:**
<img width="1512" height="825" alt="Screenshot 2025-07-30 at 15 06 46"
src="https://github.com/user-attachments/assets/110ca4ac-e259-4bdf-9c41-b55feda95aed"
/>

**Video:**


https://github.com/user-attachments/assets/5c0f6d10-caf8-47e7-a206-fe616f130fca

---------

Co-authored-by: Shahzad <shahzad31comp@gmail.com>
@wildemat wildemat mentioned this pull request Aug 7, 2025
10 tasks
NicholasPeretti pushed a commit to NicholasPeretti/kibana that referenced this pull request Aug 18, 2025
This PR is to improve how we recover monitor status alerts, issue
elastic#193613 .

It does not close the issue, it is currently disabled because of the
[intermediate release
process](https://docs.google.com/document/d/1mU5jlIfCKyXdDPtEzAz1xTpFXFCWxqdO5ldYRVO_hgM).

[This
commit](elastic@dc00912)
has to be reverted to enable the feature and it will be done in a
separate PR.

The user will now be able to select how they want to recover the alert,
either when the monitor is back up or when the condition is no longer
met.

For example, with the current implementation if the rule is "is down 1
time in the past 24 hours" it won't recover until there are 0 downs in
the past 24 hours.

Enabling this option will make the monitor recover as soon as a UP
monitor is received.

**Acceptance criteria:**
Users have the option where to specify when the recovery should occur:

- The first UP test comes in ✅
- The alert condition is no longer met ✅
- Default is the first UP test ✅

**Screenshot:**
<img width="1512" height="825" alt="Screenshot 2025-07-30 at 15 06 46"
src="https://github.com/user-attachments/assets/110ca4ac-e259-4bdf-9c41-b55feda95aed"
/>

**Video:**


https://github.com/user-attachments/assets/5c0f6d10-caf8-47e7-a206-fe616f130fca

---------

Co-authored-by: Shahzad <shahzad31comp@gmail.com>
cesco-f added a commit that referenced this pull request Aug 20, 2025
…#231091)

This PR is a follow up of #229962 and it closes #193613 .

It can be merged only after [this
commit](933d2ba)
has been included in a serverless release.
qn895 pushed a commit to qn895/kibana that referenced this pull request Aug 26, 2025
…elastic#231091)

This PR is a follow up of elastic#229962 and it closes elastic#193613 .

It can be merged only after [this
commit](elastic@933d2ba)
has been included in a serverless release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

author:obs-ux-management PRs authored by the obs ux management team backport:skip This PR does not require backporting release_note:enhancement Team:actionable-obs Formerly "obs-ux-management", responsible for SLO, o11y alerting, significant events, & synthetics. v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants