[Alerting] Preserve rule type payload across delayed-to-active graduation#266012
Conversation
…tion Resolves elastic#259886. When a delayed alert was reactivated by `delayRecoveredFlappingAlerts` and crossed `alertDelay` on a run where the executor did not report it, the alert builder dispatched to `buildNewAlert` with an empty payload and produced an active AAD doc with blank rule type fields. This makes the framework own that transition explicitly: - `buildDelayedAlert` now stores the full executor payload on the delayed AAD doc, so the predecessor carries the rule type fields. - A new `buildGraduatedAlert` runs whenever a delayed alert becomes active. It merges the predecessor's rule type fields with the current run's payload (current wins per-field; predecessor fills gaps), sets `event.action: "open"`, and marks the alert as user-visible for the first time. - `AlertBuilder.buildActiveAlerts` dispatches to ongoing / graduated / new based on whether the alert was tracked as active or delayed previously. Made-with: Cursor
|
Catch flakiness early (recommended): run the flaky test runner against this PR before merging. New FTR integration test with timing-sensitive polling ( Trigger a run with the Flaky Test Runner UI or post this comment on the PR: Share feedback in the #appex-qa channel. Posted via Macroscope — Flaky Test Runner nudge |
|
/flaky ftrConfig:x-pack/platform/test/alerting_api_integration/spaces_only/tests/alerting/group4/config.ts:30 |
Flaky Test Runner✅ Build triggered - kibana-flaky-test-suite-runner#11968
|
Flaky Test Runner Stats🟠 Some tests failed. - kibana-flaky-test-suite-runner#11968[❌] x-pack/platform/test/alerting_api_integration/spaces_only/tests/alerting/group4/config.ts: 0/30 tests passed. |
| // anything the predecessor or the executor payload could carry. | ||
| const alertUpdates = { | ||
| ...rule, | ||
| [TIMESTAMP]: timestamp, |
There was a problem hiding this comment.
Is there any reasonable way to refactor this and build_delayed_alert's similar processing into one function? I suppose we may have more big "alert builders" lying around as well.
There was a problem hiding this comment.
Probably we can use some common helpers but I prefer to keep them separate. It is already very confusing and adding more if-else cases in a single builder would make it more complex.
|
Do you still have that matrix / state-machine you showed off the other day? Would be nice to add that in here ... |
Added in the description |
|
Pinging @elastic/response-ops (Team:ResponseOps) |
`test.patternFiringAad`'s default recovery hook calls `setAlertData` with
`{ patternIndex: -1, instancePattern: [] }`, which writes into
`reportedAlerts[id]` before the framework reactivates the alert via
flap-hold. The graduated active doc then merges that recovery payload in
and the `patternIndex: 4` from the trackedDelayed predecessor never
shows up.
Add an opt-in `setRecoveryPayload?: boolean` param (default true so
existing tests are unaffected) and use it in the new graduation test so
`cleanedPayload` stays empty on run 6 and the architectural fix's
predecessor fallback is actually exercised.
Made-with: Cursor
💛 Build succeeded, but was flaky
Failed CI Steps
Test Failures
Metrics [docs]
History
|
|
Starting backport for target branches: 8.19, 9.3, 9.4 https://github.com/elastic/kibana/actions/runs/25553663267 |
💔 Some backports could not be created
Note: Successful backport PRs will be merged automatically after passing CI. Manual backportTo create the backport manually run: Questions ?Please refer to the Backport tool documentation |
…graduation (#266012) (#268402) # Backport This will backport the following commits from `main` to `9.4`: - [[Alerting] Preserve rule type payload across delayed-to-active graduation (#266012)](#266012) <!--- Backport version: 9.6.6 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Ersin Erdal","email":"92688503+ersin-erdal@users.noreply.github.com"},"sourceCommit":{"committedDate":"2026-05-08T11:42:15Z","message":"[Alerting] Preserve rule type payload across delayed-to-active graduation (#266012)\n\n## Summary\n\nResolves #259886. Architectural alternative to #265588.\n\nWhen a delayed alert is reactivated by `delayRecoveredFlappingAlerts`\n(flap-hold) and crosses `alertDelay` on a run where the executor does\n**not** report it, the alert builder used to dispatch to `buildNewAlert`\nwith an empty payload — producing an active AAD doc with blank rule type\nfields (e.g. `kibana.alert.reason`).\n\nRather than skip the graduation in that case, this PR makes the\nframework own the `delayed -> active` transition explicitly so the\nresulting active doc is always complete.\n\n### Fix (code)\n\n- **`buildDelayedAlert`** now stores the full executor payload on the\ndelayed AAD doc. Previously the delayed doc only carried framework\nfields. Persisting the rule type payload turns each delayed doc into a\nusable predecessor.\n- **`buildGraduatedAlert`** is a new builder dedicated to `delayed ->\nactive` transitions. It deep-merges the predecessor delayed doc with the\ncurrent run's payload (per-field precedence: current wins, predecessor\nfills gaps), sets `event.action: 'open'` and `kibana.alert.status:\nactive`, and treats the alert as user-visible for the first time\n(`severity_improving: false`, no `previous_action_group`).\n- **`AlertBuilder.buildActiveAlerts`** now branches on `trackedActive`\nvs `trackedDelayed` to dispatch to ongoing / graduated / new\nrespectively, instead of the previous status check on a single tracked\nalert.\n\nThe per-field merge means:\n\n| Run shape on graduation | `cleanedPayload[K]` | Resulting field |\n| --- | --- | --- |\n| Executor reports `K` | present | fresh value (predecessor shadowed) |\n| Flap-hold reactivation, no executor report | absent | predecessor's\nvalue preserved |\n| Partial report (some `K` reported) | present for some | executor where\npresent, predecessor where absent |\n\nThis matches the long-standing semantics of `buildOngoingAlert`, just\nsourced from the delayed predecessor instead of an active one.\n\nHow to reproduce the issue (on the 6th execution we see an alert without\ncontext):\n\nRun | pattern | flappingHistory | active | recovered | activeCount |\npending recovered | flapping | AAD status\n-- | -- | -- | -- | -- | -- | -- | -- | --\n1 | a | T | x | | 1 | 0 | FALSE | delayed\n2 | a | T,F | x | | 2 | 0 | FALSE | active\n3 | - | T,F,T | | x | 0 | - | FALSE | recovered\n4 | - | T,F,T,F | | x | 0 | - | FALSE | recovered\n5 | a | T,F,T,F,T | x | | 1 | 0 | FALSE | delayed\n6 | - | T,F,T,F,T,T | x | | 2 | 1 | TRUE | active","sha":"5a5a2d056c0d3a4f8dcce0c456d6d8779e8b6f50","branchLabelMapping":{"^v9.5.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["bug","release_note:fix","Feature:Alerting","Team:ResponseOps","Feature:Alerting/RulesFramework","backport:all-open","v9.5.0"],"title":"[Alerting] Preserve rule type payload across delayed-to-active graduation","number":266012,"url":"https://github.com/elastic/kibana/pull/266012","mergeCommit":{"message":"[Alerting] Preserve rule type payload across delayed-to-active graduation (#266012)\n\n## Summary\n\nResolves #259886. Architectural alternative to #265588.\n\nWhen a delayed alert is reactivated by `delayRecoveredFlappingAlerts`\n(flap-hold) and crosses `alertDelay` on a run where the executor does\n**not** report it, the alert builder used to dispatch to `buildNewAlert`\nwith an empty payload — producing an active AAD doc with blank rule type\nfields (e.g. `kibana.alert.reason`).\n\nRather than skip the graduation in that case, this PR makes the\nframework own the `delayed -> active` transition explicitly so the\nresulting active doc is always complete.\n\n### Fix (code)\n\n- **`buildDelayedAlert`** now stores the full executor payload on the\ndelayed AAD doc. Previously the delayed doc only carried framework\nfields. Persisting the rule type payload turns each delayed doc into a\nusable predecessor.\n- **`buildGraduatedAlert`** is a new builder dedicated to `delayed ->\nactive` transitions. It deep-merges the predecessor delayed doc with the\ncurrent run's payload (per-field precedence: current wins, predecessor\nfills gaps), sets `event.action: 'open'` and `kibana.alert.status:\nactive`, and treats the alert as user-visible for the first time\n(`severity_improving: false`, no `previous_action_group`).\n- **`AlertBuilder.buildActiveAlerts`** now branches on `trackedActive`\nvs `trackedDelayed` to dispatch to ongoing / graduated / new\nrespectively, instead of the previous status check on a single tracked\nalert.\n\nThe per-field merge means:\n\n| Run shape on graduation | `cleanedPayload[K]` | Resulting field |\n| --- | --- | --- |\n| Executor reports `K` | present | fresh value (predecessor shadowed) |\n| Flap-hold reactivation, no executor report | absent | predecessor's\nvalue preserved |\n| Partial report (some `K` reported) | present for some | executor where\npresent, predecessor where absent |\n\nThis matches the long-standing semantics of `buildOngoingAlert`, just\nsourced from the delayed predecessor instead of an active one.\n\nHow to reproduce the issue (on the 6th execution we see an alert without\ncontext):\n\nRun | pattern | flappingHistory | active | recovered | activeCount |\npending recovered | flapping | AAD status\n-- | -- | -- | -- | -- | -- | -- | -- | --\n1 | a | T | x | | 1 | 0 | FALSE | delayed\n2 | a | T,F | x | | 2 | 0 | FALSE | active\n3 | - | T,F,T | | x | 0 | - | FALSE | recovered\n4 | - | T,F,T,F | | x | 0 | - | FALSE | recovered\n5 | a | T,F,T,F,T | x | | 1 | 0 | FALSE | delayed\n6 | - | T,F,T,F,T,T | x | | 2 | 1 | TRUE | active","sha":"5a5a2d056c0d3a4f8dcce0c456d6d8779e8b6f50"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.5.0","branchLabelMappingKey":"^v9.5.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/266012","number":266012,"mergeCommit":{"message":"[Alerting] Preserve rule type payload across delayed-to-active graduation (#266012)\n\n## Summary\n\nResolves #259886. Architectural alternative to #265588.\n\nWhen a delayed alert is reactivated by `delayRecoveredFlappingAlerts`\n(flap-hold) and crosses `alertDelay` on a run where the executor does\n**not** report it, the alert builder used to dispatch to `buildNewAlert`\nwith an empty payload — producing an active AAD doc with blank rule type\nfields (e.g. `kibana.alert.reason`).\n\nRather than skip the graduation in that case, this PR makes the\nframework own the `delayed -> active` transition explicitly so the\nresulting active doc is always complete.\n\n### Fix (code)\n\n- **`buildDelayedAlert`** now stores the full executor payload on the\ndelayed AAD doc. Previously the delayed doc only carried framework\nfields. Persisting the rule type payload turns each delayed doc into a\nusable predecessor.\n- **`buildGraduatedAlert`** is a new builder dedicated to `delayed ->\nactive` transitions. It deep-merges the predecessor delayed doc with the\ncurrent run's payload (per-field precedence: current wins, predecessor\nfills gaps), sets `event.action: 'open'` and `kibana.alert.status:\nactive`, and treats the alert as user-visible for the first time\n(`severity_improving: false`, no `previous_action_group`).\n- **`AlertBuilder.buildActiveAlerts`** now branches on `trackedActive`\nvs `trackedDelayed` to dispatch to ongoing / graduated / new\nrespectively, instead of the previous status check on a single tracked\nalert.\n\nThe per-field merge means:\n\n| Run shape on graduation | `cleanedPayload[K]` | Resulting field |\n| --- | --- | --- |\n| Executor reports `K` | present | fresh value (predecessor shadowed) |\n| Flap-hold reactivation, no executor report | absent | predecessor's\nvalue preserved |\n| Partial report (some `K` reported) | present for some | executor where\npresent, predecessor where absent |\n\nThis matches the long-standing semantics of `buildOngoingAlert`, just\nsourced from the delayed predecessor instead of an active one.\n\nHow to reproduce the issue (on the 6th execution we see an alert without\ncontext):\n\nRun | pattern | flappingHistory | active | recovered | activeCount |\npending recovered | flapping | AAD status\n-- | -- | -- | -- | -- | -- | -- | -- | --\n1 | a | T | x | | 1 | 0 | FALSE | delayed\n2 | a | T,F | x | | 2 | 0 | FALSE | active\n3 | - | T,F,T | | x | 0 | - | FALSE | recovered\n4 | - | T,F,T,F | | x | 0 | - | FALSE | recovered\n5 | a | T,F,T,F,T | x | | 1 | 0 | FALSE | delayed\n6 | - | T,F,T,F,T,T | x | | 2 | 1 | TRUE | active","sha":"5a5a2d056c0d3a4f8dcce0c456d6d8779e8b6f50"}}]}] BACKPORT--> Co-authored-by: Ersin Erdal <92688503+ersin-erdal@users.noreply.github.com>
|
backport for 8.19 and 9.3 cannot be created since the new delayed alert AAD status was introduced with 9.4. |
|
Starting backport for target branches: 9.4 https://github.com/elastic/kibana/actions/runs/25599934386 |
💔 All backports failed
Manual backportTo create the backport manually run: Questions ?Please refer to the Backport tool documentation |
…sitioning to recovered (#268559) ## Summary Contributes to elastic/docs-content-internal#1168 by doing the following: - Adds a known issue to 9.2.7, 9.2.8, 9.3.2, 9.3.3 Kibana release notes about stale “active” alerts not transitioning to recovered for Stack. - Docs #266012 as a fixed bug for Kibana in stack versions 9.3.4 and 9.4.0. ### Corresponding updates - **Kibana and Observability 8.x release notes updated in:** #268558 ### Previews - Known issue - https://docs-v3-preview.elastic.dev/elastic/kibana/pull/268559/release-notes/known-issues - 9.4.0 bug fixes - https://docs-v3-preview.elastic.dev/elastic/kibana/pull/268559/release-notes#kibana-9.4.0-fixes - 9.3.4 bug fixes - https://docs-v3-preview.elastic.dev/elastic/kibana/pull/268559/release-notes#kibana-9.3.4-fixes
…ctive instead of transitioning to recovered (#268558) ## Summary Contributes to elastic/docs-content-internal#1168 by doing the following: - Adds a known issue to 8.19.13 and 8.19.14 Kibana release notes about stale “active” alerts not transitioning to recovered for Stack and Observability. - Docs #266012 as a fixed bug for Kibana and Observability in stack version 8.19.15. ### Corresponding updates - **Kibana 9.x release notes updated in:** #268559 - **Observability 9.x release notes updated in:** elastic/docs-content#6394 ### Previews - 8.19.13 known issues - https://kibana_bk_268558.docs-preview.app.elstc.co/guide/en/kibana/8.19/release-notes-8.19.13.html#known-issues-8.19.13 - 8.19.14 known issues - https://kibana_bk_268558.docs-preview.app.elstc.co/guide/en/kibana/8.19/release-notes-8.19.14.html#known-issues-8.19.14 - 8.19.15 bug fixes - https://kibana_bk_268558.docs-preview.app.elstc.co/guide/en/kibana/8.19/release-notes-8.19.15.html#fixes-v8.19.15
…sitioning to recovered (#6394) <!-- Thank you for contributing to the Elastic Docs! 🎉 Use this template to help us efficiently review your contribution. --> ## Summary <!-- Describe what your PR changes or improves. If your PR fixes an issue, link it here. If your PR does not fix an issue, describe the reason you are making the change. --> Contributes to elastic/docs-content-internal#1168 by doing the following: - Adds a known issue to Obs release notes about stale “active” alerts not transitioning to recovered for Stack. - Docs elastic/kibana#266012 as a fixed bug for Obs in 9.3.4 and 9.4.0. ### Previews - Known issue - https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6394/release-notes/elastic-observability/known-issues - 9.4.0 bug fixes - https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6394/release-notes/elastic-observability#elastic-observability-9.4.0-fixes - 9.3.4 bug fixes - https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/6394/release-notes/elastic-observability#elastic-observability-9.3.4-fixes ## Generative AI disclosure <!-- To help us ensure compliance with the Elastic open source and documentation guidelines, please answer the following: --> 1. Did you use a generative AI (GenAI) tool to assist in creating this contribution? - [ ] Yes - [x] No <!-- 2. If you answered "Yes" to the previous question, please specify the tool(s) and model(s) used (e.g., Google Gemini, OpenAI ChatGPT-4, etc.). Tool(s) and model(s) used: -->
…sitioning to recovered (#268559) ## Summary Contributes to elastic/docs-content-internal#1168 by doing the following: - Adds a known issue to 9.2.7, 9.2.8, 9.3.2, 9.3.3 Kibana release notes about stale “active” alerts not transitioning to recovered for Stack. - Docs #266012 as a fixed bug for Kibana in stack versions 9.3.4 and 9.4.0. ### Corresponding updates - **Kibana and Observability 8.x release notes updated in:** #268558 ### Previews - Known issue - https://docs-v3-preview.elastic.dev/elastic/kibana/pull/268559/release-notes/known-issues - 9.4.0 bug fixes - https://docs-v3-preview.elastic.dev/elastic/kibana/pull/268559/release-notes#kibana-9.4.0-fixes - 9.3.4 bug fixes - https://docs-v3-preview.elastic.dev/elastic/kibana/pull/268559/release-notes#kibana-9.3.4-fixes
…tion (elastic#266012) ## Summary Resolves elastic#259886. Architectural alternative to elastic#265588. When a delayed alert is reactivated by `delayRecoveredFlappingAlerts` (flap-hold) and crosses `alertDelay` on a run where the executor does **not** report it, the alert builder used to dispatch to `buildNewAlert` with an empty payload — producing an active AAD doc with blank rule type fields (e.g. `kibana.alert.reason`). Rather than skip the graduation in that case, this PR makes the framework own the `delayed -> active` transition explicitly so the resulting active doc is always complete. ### Fix (code) - **`buildDelayedAlert`** now stores the full executor payload on the delayed AAD doc. Previously the delayed doc only carried framework fields. Persisting the rule type payload turns each delayed doc into a usable predecessor. - **`buildGraduatedAlert`** is a new builder dedicated to `delayed -> active` transitions. It deep-merges the predecessor delayed doc with the current run's payload (per-field precedence: current wins, predecessor fills gaps), sets `event.action: 'open'` and `kibana.alert.status: active`, and treats the alert as user-visible for the first time (`severity_improving: false`, no `previous_action_group`). - **`AlertBuilder.buildActiveAlerts`** now branches on `trackedActive` vs `trackedDelayed` to dispatch to ongoing / graduated / new respectively, instead of the previous status check on a single tracked alert. The per-field merge means: | Run shape on graduation | `cleanedPayload[K]` | Resulting field | | --- | --- | --- | | Executor reports `K` | present | fresh value (predecessor shadowed) | | Flap-hold reactivation, no executor report | absent | predecessor's value preserved | | Partial report (some `K` reported) | present for some | executor where present, predecessor where absent | This matches the long-standing semantics of `buildOngoingAlert`, just sourced from the delayed predecessor instead of an active one. How to reproduce the issue (on the 6th execution we see an alert without context): Run | pattern | flappingHistory | active | recovered | activeCount | pending recovered | flapping | AAD status -- | -- | -- | -- | -- | -- | -- | -- | -- 1 | a | T | x | | 1 | 0 | FALSE | delayed 2 | a | T,F | x | | 2 | 0 | FALSE | active 3 | - | T,F,T | | x | 0 | - | FALSE | recovered 4 | - | T,F,T,F | | x | 0 | - | FALSE | recovered 5 | a | T,F,T,F,T | x | | 1 | 0 | FALSE | delayed 6 | - | T,F,T,F,T,T | x | | 2 | 1 | TRUE | active
…sitioning to recovered (elastic#268559) ## Summary Contributes to elastic/docs-content-internal#1168 by doing the following: - Adds a known issue to 9.2.7, 9.2.8, 9.3.2, 9.3.3 Kibana release notes about stale “active” alerts not transitioning to recovered for Stack. - Docs elastic#266012 as a fixed bug for Kibana in stack versions 9.3.4 and 9.4.0. ### Corresponding updates - **Kibana and Observability 8.x release notes updated in:** elastic#268558 ### Previews - Known issue - https://docs-v3-preview.elastic.dev/elastic/kibana/pull/268559/release-notes/known-issues - 9.4.0 bug fixes - https://docs-v3-preview.elastic.dev/elastic/kibana/pull/268559/release-notes#kibana-9.4.0-fixes - 9.3.4 bug fixes - https://docs-v3-preview.elastic.dev/elastic/kibana/pull/268559/release-notes#kibana-9.3.4-fixes
Summary
Resolves #259886. Architectural alternative to #265588.
When a delayed alert is reactivated by
delayRecoveredFlappingAlerts(flap-hold) and crossesalertDelayon a run where the executor does not report it, the alert builder used to dispatch tobuildNewAlertwith an empty payload — producing an active AAD doc with blank rule type fields (e.g.kibana.alert.reason).Rather than skip the graduation in that case, this PR makes the framework own the
delayed -> activetransition explicitly so the resulting active doc is always complete.Fix (code)
buildDelayedAlertnow stores the full executor payload on the delayed AAD doc. Previously the delayed doc only carried framework fields. Persisting the rule type payload turns each delayed doc into a usable predecessor.buildGraduatedAlertis a new builder dedicated todelayed -> activetransitions. It deep-merges the predecessor delayed doc with the current run's payload (per-field precedence: current wins, predecessor fills gaps), setsevent.action: 'open'andkibana.alert.status: active, and treats the alert as user-visible for the first time (severity_improving: false, noprevious_action_group).AlertBuilder.buildActiveAlertsnow branches ontrackedActivevstrackedDelayedto dispatch to ongoing / graduated / new respectively, instead of the previous status check on a single tracked alert.The per-field merge means:
cleanedPayload[K]KKreported)This matches the long-standing semantics of
buildOngoingAlert, just sourced from the delayed predecessor instead of an active one.How to reproduce the issue (on the 6th execution we see an alert without context):