[Synthetics] Add recovery mode switch for status alerts #229962

cesco-f · 2025-07-30T13:16:00Z

This PR is to improve how we recover monitor status alerts, issue #193613 .

It does not close the issue, it is currently disabled because of the intermediate release process.

This commit has to be reverted to enable the feature and it will be done in a separate PR.

The user will now be able to select how they want to recover the alert, either when the monitor is back up or when the condition is no longer met.

For example, with the current implementation if the rule is "is down 1 time in the past 24 hours" it won't recover until there are 0 downs in the past 24 hours.

Enabling this option will make the monitor recover as soon as a UP monitor is received.

Acceptance criteria:
Users have the option where to specify when the recovery should occur:

The first UP test comes in ✅
The alert condition is no longer met ✅
Default is the first UP test ✅

Screenshot:

Video:

Screen.Recording.2025-07-29.at.16.24.20.mov

…lease process

elasticmachine · 2025-07-30T13:16:07Z

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

pmuellr

ResponseOps changes LGTM. Thanks for the feature flag / intermediate release aspects!

I'm not sure of OAS docs need to be changed because of this, I think I do often see changes there ...

cesco-f · 2025-07-31T07:59:07Z

Thanks @pmuellr!

I've checked this file and it looks like we don't have docs for this rule yet.

kibanamachine · 2025-07-31T12:09:52Z

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#8947

[❌] x-pack/solutions/observability/test/api_integration_deployment_agnostic/feature_flag_configs/serverless/oblt.serverless.config.ts: 0/25 tests passed.

see run history

cesco-f · 2025-07-31T12:22:39Z

I don't think the failed flaky test run has something to do with changes in this PR.

src/platform/packages/shared/response-ops/rule_params/synthetics_monitor_status/v1.ts

elasticmachine · 2025-08-05T10:11:41Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 95eca2d

Failed CI Steps

FTR Configs #134

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`synthetics`	1.0MB	1.0MB	+936.0B

History

💛 Build #325918 was flaky 6a264ec
💚 Build #325843 succeeded 22f7990
💛 Build #324991 was flaky 7cb2c47
💔 Build #324768 failed dc00912

shahzad31

LGTM !!!

@drewpost should this be true by default?

This PR is to improve how we recover monitor status alerts, issue elastic#193613 . It does not close the issue, it is currently disabled because of the [intermediate release process](https://docs.google.com/document/d/1mU5jlIfCKyXdDPtEzAz1xTpFXFCWxqdO5ldYRVO_hgM). [This commit](elastic@dc00912) has to be reverted to enable the feature and it will be done in a separate PR. The user will now be able to select how they want to recover the alert, either when the monitor is back up or when the condition is no longer met. For example, with the current implementation if the rule is "is down 1 time in the past 24 hours" it won't recover until there are 0 downs in the past 24 hours. Enabling this option will make the monitor recover as soon as a UP monitor is received. **Acceptance criteria:** Users have the option where to specify when the recovery should occur: - The first UP test comes in ✅ - The alert condition is no longer met ✅ - Default is the first UP test ✅ **Screenshot:** <img width="1512" height="825" alt="Screenshot 2025-07-30 at 15 06 46" src="https://github.com/user-attachments/assets/110ca4ac-e259-4bdf-9c41-b55feda95aed" /> **Video:** https://github.com/user-attachments/assets/5c0f6d10-caf8-47e7-a206-fe616f130fca --------- Co-authored-by: Shahzad <shahzad31comp@gmail.com>

…#231091) This PR is a follow up of #229962 and it closes #193613 . It can be merged only after [this commit](933d2ba) has been included in a serverless release.

…elastic#231091) This PR is a follow up of elastic#229962 and it closes elastic#193613 . It can be merged only after [this commit](elastic@933d2ba) has been included in a serverless release.

cesco-f added 2 commits July 30, 2025 12:58

feat(synthetics): added option to recover alert when monitor is up

ad359be

fix(hide switch): hiding the recovery mode switch for intermediate re…

dc00912

…lease process

cesco-f added release_note:enhancement backport:skip This PR does not require backporting labels Jul 30, 2025

cesco-f requested review from a team as code owners July 30, 2025 13:16

cesco-f added the v9.2.0 label Jul 30, 2025

botelastic bot added the Team:actionable-obs Formerly "obs-ux-management", responsible for SLO, o11y alerting, significant events, & synthetics. label Jul 30, 2025

github-actions bot added the author:obs-ux-management PRs authored by the obs ux management team label Jul 30, 2025

fix(lint): fixed lint

7d3707b

pmuellr approved these changes Jul 30, 2025

View reviewed changes

fix(first up): fixed first up condition

7cb2c47

cesco-f force-pushed the recover-first-up branch from 895c506 to 7cb2c47 Compare July 31, 2025 08:25

cesco-f added 2 commits August 1, 2025 14:04

Merge branch 'main' into recover-first-up

9c51fc4

Merge branch 'main' into recover-first-up

22f7990

shahzad31 reviewed Aug 4, 2025

View reviewed changes

src/platform/packages/shared/response-ops/rule_params/synthetics_monitor_status/v1.ts Outdated Show resolved Hide resolved

refactor(synthetics): recoveryMode->recoveryStrategy

6a264ec

cesco-f force-pushed the recover-first-up branch from 23431cd to 6a264ec Compare August 4, 2025 09:51

Merge branch 'main' into recover-first-up

95eca2d

shahzad31 approved these changes Aug 5, 2025

View reviewed changes

cesco-f merged commit 933d2ba into elastic:main Aug 5, 2025
12 checks passed

cesco-f deleted the recover-first-up branch August 5, 2025 12:22

cesco-f mentioned this pull request Aug 5, 2025

[Synthetics] Improve recovery of custom monitor status rules to optionally consider up status #193613

Closed

3 tasks

wildemat mentioned this pull request Aug 7, 2025

pr 230826 #231022

Closed

10 tasks

cesco-f mentioned this pull request Aug 8, 2025

[Synthetics] Enable recovery strategy switch for monitor status rules #231091

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Synthetics] Add recovery mode switch for status alerts #229962

[Synthetics] Add recovery mode switch for status alerts #229962

Uh oh!

cesco-f commented Jul 30, 2025 •

edited

Loading

Uh oh!

elasticmachine commented Jul 30, 2025

Uh oh!

pmuellr left a comment

Uh oh!

cesco-f commented Jul 31, 2025

Uh oh!

kibanamachine commented Jul 31, 2025

Uh oh!

cesco-f commented Jul 31, 2025

Uh oh!

Uh oh!

elasticmachine commented Aug 5, 2025

Uh oh!

shahzad31 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Synthetics] Add recovery mode switch for status alerts #229962

[Synthetics] Add recovery mode switch for status alerts #229962

Uh oh!

Conversation

cesco-f commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Jul 30, 2025

Uh oh!

pmuellr left a comment

Choose a reason for hiding this comment

Uh oh!

cesco-f commented Jul 31, 2025

Uh oh!

kibanamachine commented Jul 31, 2025

Flaky Test Runner Stats

🟠 Some tests failed. - kibana-flaky-test-suite-runner#8947

Uh oh!

cesco-f commented Jul 31, 2025

Uh oh!

Uh oh!

elasticmachine commented Aug 5, 2025

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

Async chunks

History

Uh oh!

shahzad31 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cesco-f commented Jul 30, 2025 •

edited

Loading