[Alerting] Log warning when rules are not rescheduled due to Saved Object not found error#101591
Conversation
…ect not found and doesn't reschedule rule
|
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
|
@elasticmachine merge upstream |
| ), | ||
| schedule: resolveErr<IntervalSchedule | undefined, Error>(schedule, (error) => { | ||
| if (isAlertSavedObjectNotFoundError(error, alertId)) { | ||
| this.logger.warn( |
There was a problem hiding this comment.
WDYT about adding telemetry regarding this (in a separate ticket)? Do we have any insight into how often this happen?
There was a problem hiding this comment.
Yeah, we can create a separate issue for this. I think we would have to query the event log index for rule executions that end in an error status, but maybe would need a different field to aggregate on since the event log captures the Saved object not found message but since that contains the rule id, it's different for each rule. Currently it looks like the telemetry runs on the .kibana index.
There was a problem hiding this comment.
@chrisronline I opened a generic issue for adding information from the event log to telemetry: #101809
pmuellr
left a comment
There was a problem hiding this comment.
Added a comment about adding some more info to the message we log.
| schedule: resolveErr<IntervalSchedule | undefined, Error>(schedule, (error) => { | ||
| if (isAlertSavedObjectNotFoundError(error, alertId)) { | ||
| this.logger.warn( | ||
| `Unable to execute rule "${alertId}" because ${error.message} - this rule will not be rescheduled. To restart rule execution, try disabling and re-enabling this rule.` |
There was a problem hiding this comment.
Since this message is so actionable, I feel like we should include more info so we can make it as easy as possible for the user to find the alert to fix it. Probably means adding the Kibana space and rule name. Rule type would be interesting for diagnostic purposes / telemetry, but probably doesn't help a user that much - I would assume the rule name would provide most of the context to find the rule in Kibana ...
There was a problem hiding this comment.
@pmuellr So, this might be a little tricky since in this context, we only have access to the ruleId and presumably, since we're seeing the Saved object not found error, we have been unable to retrieve the saved object that would give us the spaceId and the ruleName.
ETA. My mistake, it looks like we do have access to the namespace
There was a problem hiding this comment.
heh, right! bummer!
I think we have seen this message when we get transient network problems, in which case we might have gotten the alert SO at the some point, then failed later when the alert ran. In which case, some piece of code had that info. How we'd keep track on that seems ... difficult. Ah well.
There was a problem hiding this comment.
Added the spaceId if defined to the message in this commit d314a54
pmuellr
left a comment
There was a problem hiding this comment.
Cool - adding the space will help a bit anyway! Thx!
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: cc @ymao1 |
…ject not found error (elastic#101591) * Adding warning to logs when alerting task runner encounters saved object not found and doesn't reschedule rule * Adding space id to warning message Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
💚 Backport successful
This backport PR will be merged automatically after passing CI. |
…ject not found error (#101591) (#101827) * Adding warning to logs when alerting task runner encounters saved object not found and doesn't reschedule rule * Adding space id to warning message Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: ymao1 <ying.mao@elastic.co>
…add-agent-flyout * 'master' of github.com:elastic/kibana: (35 commits) [Cases] Improve connectors mapping (elastic#101145) [ML] Fixes display of job group badges in recognizer wizard (elastic#101775) Fix es_archives path (elastic#101737) [kbnArchiver] convert archive names to root-relative paths (elastic#101839) [Reporting] Make "ScreenCapturePanel" shareable for Canvas (elastic#100623) [Alerting UI] Converted Rules and Connectors management pages to new layout. (elastic#101697) [Fleet] Support granular integrations in policy editor (elastic#101531) [Security Solution][Detections] Update detection alert mappings to ECS v1.10.0 (elastic#101680) [Fleet] Integrations UI: Adjust policies list UI (elastic#101600) chore(NA): moving @kbn/server-route-repository into bazel (elastic#101484) Support owner and description attributes inside the Manifest file, use in API docs (elastic#101786) [Security Solution] fix security empty overview links (elastic#101536) Unskips migration tests now that elastic search is fixed (elastic#101682) Fix endpoint -> integrations onboarding link (elastic#101804) [Alerting] Log warning when rules are not rescheduled due to Saved Object not found error (elastic#101591) Update datafeed_high_count_network_denies.json (elastic#101681) [Index patterns] Field editor example app (elastic#100524) [DOCS] Adding file upload to add data page (elastic#101674) [Security Solution][Endpoint] Adds Endpoint Host Isolation Status common component (elastic#101782) Upgrade ws v7.3.1->v7.4.2 and v6.2.1->v6.2.2 (elastic#101402) ... # Conflicts: # x-pack/plugins/fleet/public/components/agent_enrollment_flyout/agent_policy_selection.tsx # x-pack/plugins/fleet/public/components/agent_enrollment_flyout/index.tsx # x-pack/plugins/fleet/public/components/agent_enrollment_flyout/managed_instructions.tsx # x-pack/plugins/fleet/public/components/agent_enrollment_flyout/standalone_instructions.tsx
…add-integrations-redirect * 'master' of github.com:elastic/kibana: (44 commits) Allow navigating discover flyout via arrow keys (elastic#101772) [Cases] Improve connectors mapping (elastic#101145) [ML] Fixes display of job group badges in recognizer wizard (elastic#101775) Fix es_archives path (elastic#101737) [kbnArchiver] convert archive names to root-relative paths (elastic#101839) [Reporting] Make "ScreenCapturePanel" shareable for Canvas (elastic#100623) [Alerting UI] Converted Rules and Connectors management pages to new layout. (elastic#101697) [Fleet] Support granular integrations in policy editor (elastic#101531) [Security Solution][Detections] Update detection alert mappings to ECS v1.10.0 (elastic#101680) [Fleet] Integrations UI: Adjust policies list UI (elastic#101600) chore(NA): moving @kbn/server-route-repository into bazel (elastic#101484) Support owner and description attributes inside the Manifest file, use in API docs (elastic#101786) [Security Solution] fix security empty overview links (elastic#101536) Unskips migration tests now that elastic search is fixed (elastic#101682) Fix endpoint -> integrations onboarding link (elastic#101804) [Alerting] Log warning when rules are not rescheduled due to Saved Object not found error (elastic#101591) Update datafeed_high_count_network_denies.json (elastic#101681) [Index patterns] Field editor example app (elastic#100524) [DOCS] Adding file upload to add data page (elastic#101674) [Security Solution][Endpoint] Adds Endpoint Host Isolation Status common component (elastic#101782) ... # Conflicts: # x-pack/plugins/fleet/public/applications/fleet/sections/agent_policy/create_package_policy_page/index.tsx # x-pack/plugins/fleet/public/applications/fleet/sections/agent_policy/details_page/components/package_policies/package_policies_table.tsx
…ject not found error (#101591) * Adding warning to logs when alerting task runner encounters saved object not found and doesn't reschedule rule * Adding space id to warning message Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Resolves #101227
Summary
Logging a warning with a suggestion to disable/reenable to restart rule execution.
Checklist
Delete any items that are not applicable to this PR.