[Synthetics] Detect and display missing/corrupted Synthetics integrations in monitor UIs#256738
Conversation
|
/ci |
|
/ci |
…om/miguelmartin-elastic/kibana into synthetics/missing-integrations-ui
|
/ci |
…h statuses Both statuses are removed from the private location health API: - AgentPolicyMismatch: scenario is practically impossible in normal usage; monitors where the package policy exists now report Healthy regardless of which agent policy it is attached to - PackageNotInstalled: if the synthetics package is missing the entire app fails; surfacing it per-monitor adds noise without actionable value Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Re: layout issue in the private locations table:
The rest of comments have been addressed 😃 |
@miguelmartin-elastic Thanks for the explanation. Upon closer look, it seems that the layout issue happens when there are more than 2 actions. Since in this PR, we are adding a third action, it overflows and creates layout issue. Below is the screenshot from
|
…, just as the reset one
…om/miguelmartin-elastic/kibana into synthetics/missing-integrations-ui
@benakansara fixed 🚀 |
benakansara
left a comment
There was a problem hiding this comment.
LGTM! 🚀 Just one comment about using JSON.stringify
| // eslint-disable-next-line react-hooks/exhaustive-deps | ||
| }, [dispatch, JSON.stringify(configIds)]); |
There was a problem hiding this comment.
do we need JSON.stringify?
| // eslint-disable-next-line react-hooks/exhaustive-deps | |
| }, [dispatch, JSON.stringify(configIds)]); | |
| }, [dispatch, configIds]); |
| useEffect(() => { | ||
| dispatch(updateManagementPageStateAction({ configIds })); | ||
| // eslint-disable-next-line react-hooks/exhaustive-deps | ||
| }, [dispatch, JSON.stringify(configIds)]); |
There was a problem hiding this comment.
Nope. Fixed now :)
| useEffect(() => { | ||
| dispatch(updateManagementPageStateAction({ configIds })); | ||
| }, [dispatch, configIds]); |
There was a problem hiding this comment.
🟢 Low monitor_filters/use_filters.ts:67
The useEffect for configIds only dispatches to updateManagementPageStateAction but not setOverviewPageStateAction, while the useLogicalAndFor effect immediately after it dispatches to both. Since MonitorOverviewPageState extends MonitorFilterState which includes configIds, the overview page state won't receive configIds updates from URL params, causing the overview page filtering to be out of sync with the URL.
- useEffect(() => {
- dispatch(updateManagementPageStateAction({ configIds }));
- }, [dispatch, configIds]);
+ useEffect(() => {
+ dispatch(updateManagementPageStateAction({ configIds }));
+ dispatch(setOverviewPageStateAction({ configIds }));
+ }, [dispatch, configIds]);🤖 Copy this AI Prompt to have your agent fix this:
In file x-pack/solutions/observability/plugins/synthetics/public/apps/synthetics/components/monitors_page/common/monitor_filters/use_filters.ts around lines 67-69:
The `useEffect` for `configIds` only dispatches to `updateManagementPageStateAction` but not `setOverviewPageStateAction`, while the `useLogicalAndFor` effect immediately after it dispatches to both. Since `MonitorOverviewPageState` extends `MonitorFilterState` which includes `configIds`, the overview page state won't receive `configIds` updates from URL params, causing the overview page filtering to be out of sync with the URL.
Evidence trail:
1. x-pack/solutions/observability/plugins/synthetics/public/apps/synthetics/components/monitors_page/common/monitor_filters/use_filters.ts lines 62-64: configIds effect only dispatches updateManagementPageStateAction
2. use_filters.ts lines 66-76: useLogicalAndFor effect dispatches to both setOverviewPageStateAction and updateManagementPageStateAction
3. x-pack/solutions/observability/plugins/synthetics/public/apps/synthetics/state/monitor_list/models.ts line 27: MonitorFilterState includes configIds
4. x-pack/solutions/observability/plugins/synthetics/public/apps/synthetics/state/overview/models.ts line 15: MonitorOverviewPageState extends MonitorFilterState
5. x-pack/solutions/observability/plugins/synthetics/public/apps/synthetics/utils/filters/filter_fields.ts line 29: getMonitorFilterFields() does NOT include configIds
…om/miguelmartin-elastic/kibana into synthetics/missing-integrations-ui
…om/miguelmartin-elastic/kibana into synthetics/missing-integrations-ui
💛 Build succeeded, but was flaky
Failed CI StepsTest FailuresMetrics [docs]Module Count
Async chunks
History
|
…261367) Blocked by #256738 ## Summary Closes #258541. Follow-up to #256738. Extends the monitor integration health API and UI to detect agent-level issues in private locations — specifically when no agents are enrolled or all agents are offline/unhealthy. These statuses are surfaced alongside the existing integration-level checks. | | | |---|---| | Monitors Management - No agents enrolled tooltip | <img width="1552" height="982" alt="image" src="https://github.com/user-attachments/assets/9e1ab7fb-481c-48fa-8f22-e57e1f366474" /> | | Monitors Management - Mixed issues: reset-fixable and non-reset-fixable | <img width="1552" height="982" alt="image" src="https://github.com/user-attachments/assets/fab9becd-9db8-45fc-b0c1-e300c6cb4df7" /> | | Monitors Management: reset is applied only to monitors that have at least one reset-fixable issue | <img width="1552" height="982" alt="image" src="https://github.com/user-attachments/assets/7f22413b-b45d-403b-8d55-ef7ae343b37b" /> | | Monitors Management: reset is applied only to monitors that have at least one reset-fixable issue | <img width="1552" height="982" alt="image" src="https://github.com/user-attachments/assets/be4048fa-b576-4be2-be9d-341c7c50431d" /> | | Monitor edit: if there are no reset-fixable issues the reset button is not shown | <img width="1552" height="982" alt="image" src="https://github.com/user-attachments/assets/35d564a9-e10e-46e2-8ac8-69f68d2b5d00" /> | | Private locations table: if none of the unhealthy monitors is reset-fixable in that private location, the reset button is not shown | <img width="1552" height="982" alt="image" src="https://github.com/user-attachments/assets/e1f4662a-9322-4ef6-bb7a-3cb969101f02" /> | | Private locations table: if at least one of the unhealthy monitors is reset-fixable in that private location, the reset button is shown | <img width="1552" height="982" alt="image" src="https://github.com/user-attachments/assets/f5298fcb-c83f-4d85-910f-58017d7dead6" /> | **New health statuses (server):** - `missing_agents` — the agent policy exists but has zero active agents enrolled - `unhealthy_agent` — agents are enrolled but none are online Agent status is fetched in batch via Fleet's `getAgentStatusForAgentPolicy` for all relevant agent policies. The check uses `status.active` (not `status.all`) so that unenrolled/deleted agents don't incorrectly count as enrolled. **Priority order** (most fundamental → least): `missing_location` → `missing_agent_policy` → `missing_package_policy` → `missing_agents` → `unhealthy_agent` → `healthy` **UI changes:** - `MissingAgentPolicy`, `MissingAgents`, and `UnhealthyAgent` are classified as **non-reset-fixable** — the reset button is hidden when all unhealthy locations have agent-level issues - When a selection is mixed (some fixable, some not), the bulk reset modal shows a collapsible warning listing the skipped monitors - In the private locations table, the reset button is now a primary inline action to avoid a blank space when hidden - The edit monitor callout shows "Agent issue detected" as the title when agent-level issues are present **Reset API fix:** Before calling `editMonitors`, the reset API now pre-filters out locations whose agent policy no longer exists. This prevents `AgentPolicyNotFoundError` from bubbling up as a 500 when a monitor has both fixable and non-fixable locations. ## Test plan ### Prerequisites - A running Kibana with at least one private location that has an enrolled, online agent ### Setup test monitors Run [`~/elastic/scripts/break_monitors.sh`](https://github.com/miguelmartin-elastic/kibana/blob/feat/synthetics-agent-health-status-258541/x-pack/solutions/observability/plugins/synthetics/server/services/monitor_integration_health_api.test.ts) against your Kibana instance. It creates: | Monitor | Locations | Expected status | |---|---|---| | Mon A | loc1 (agent online) | `missing_package_policy` (fixable) | | Mon B | loc1 (agent online) | `missing_package_policy` (fixable) | | Mon C | loc1 + loc2 (no agents) | `missing_package_policy` on loc1 + `missing_agents` on loc2 | | Mon D | loc2 (no agents) | `missing_agents` (not fixable) | | Mon E | loc3 (deleted agent policy) | `missing_agent_policy` (not fixable) | | Mon F | loc1 + loc3 | `missing_package_policy` on loc1 + `missing_agent_policy` on loc3 | ### What to verify **Monitor list page (`/app/synthetics/monitors`):** - [ ] Mon A and Mon B show the unhealthy badge and a "Reset monitor" button in the row actions - [ ] Mon D shows the unhealthy badge but **no** reset button - [ ] Mon C shows the unhealthy badge and a reset button (mixed: one location is fixable) - [ ] Selecting Mon A + Mon B + Mon D and clicking bulk reset opens the confirmation modal with a warning listing Mon D as skipped - [ ] Confirming the bulk reset fixes Mon A and Mon B (they become healthy after a few seconds) **Edit monitor page for Mon C:** - [ ] The callout lists both locations with their respective status messages - [ ] The reset button is visible (because loc1 is fixable) - [ ] Clicking reset fixes loc1; loc2 remains `missing_agents` **Private locations settings page (`/app/synthetics/settings/private-locations`):** - [ ] The location with enrolled agents shows a "Reset monitors" button when Mon A/B are broken - [ ] The no-agents location does **not** show the reset button (all issues are agent-level) - [ ] No blank space appears where the reset button would be on the no-agents location row **New health status messages:** - [ ] `missing_agents`: "No Fleet agents are enrolled in the agent policy for this private location. Enroll an agent in Fleet to resolve this." - [ ] `unhealthy_agent`: "All Fleet agents for this private location are unhealthy or offline. Check the agent status in Fleet." --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

Release note
Synthetics monitor integration health detection and self-healing
Synthetics now automatically detects when private location monitors have broken Fleet integrations — such as deleted agent or package policies and missing locations — and surfaces per-location health status directly in the monitor management list, monitor edit page, and private locations settings. Users can reset affected monitors individually or in bulk to recreate the missing Fleet resources and restore monitoring.
Summary
Closes #256397
Closes #256398
Closes #256399
Adds a new Monitor Integration Health API for Synthetics that detects and reports unhealthy monitor/private-location configurations, and surfaces these health statuses in the UI with actionable per-location details and reset capabilities.
Problem
When a Synthetics monitor uses a private location, several Fleet-level resources (package policies, agent policies, the synthetics package itself) must be in place for the monitor to run. If any of these resources are missing or misconfigured, the monitor silently fails to collect data. Users have no clear signal about what went wrong or which location is affected.
Solution
Backend:
MonitorIntegrationHealthApiserviceA new
MonitorIntegrationHealthApiservice class that evaluates the health of monitors across their private locations. It detects 5 distinct failure scenarios, evaluated in intentional priority order (most fundamental issue first):missing_locationmissing_agent_policymissing_package_policyOnly the first matching status is reported per location, since the higher-priority issue is the root cause (e.g., if the agent policy is deleted, reporting
missing_package_policyon its package policies would be misleading — the fix is the same regardless).The service supports both new (
{configId}-{locationId}) and legacy ({configId}-{locationId}-{spaceId}) policy ID formats usinggetPolicyIdFormatInfo, preventing false-positivemissing_package_policyreports for monitors created before the space-agnostic migration.Two new internal endpoints expose this:
POST /internal/synthetics/monitors/_health— Bulk health check (up to 500 monitor IDs)GET /internal/synthetics/monitors/{monitorId}/_health— Single monitor health check (returns 404 if the monitor doesn't exist)You can check the OAS here api.yml
The service uses
Promise.allSettledfor partial error handling: if some monitors fail to load, the response includes both the successful health results and per-monitor errors with properstatusCodepropagation.Frontend:
useMonitorIntegrationHealthhook + Redux sliceA new
monitor_healthRedux slice fetches health data from the bulk API. The saga usesdebounce(50ms)to aggregate rapid dispatches from multiple hook instances into a single API call. TheuseMonitorIntegrationHealthhook provides:isUnhealthy(configId)— boolean check for a specific monitorgetUnhealthyLocationStatuses(configId)— per-location details with translated reasonsgetUnhealthyMonitorCountForLocation(locationId)— count of unhealthy monitors per locationgetUnhealthyConfigIdsForLocation(locationId)— config IDs of unhealthy monitors at a locationresetMonitor(configId)— triggers a single monitor reset via the reset APIresetMonitors(configIds)— triggers a bulk monitor reset via the bulk reset APIUI changes
ResetMonitorModalcomponent with confirmation dialog, loading state, and success/error toast notificationsAPI response examples
Bulk endpoint (
POST /internal/synthetics/monitors/_health):{ "monitors": [ { "configId": "a04dd21...", "monitorName": "https://www.elastic.co", "isUnhealthy": false, "locations": [ { "locationId": "80ad4fda...", "locationLabel": "My Private Location", "status": "healthy", "policyId": "a04dd21...-80ad4fda..." } ] }, { "configId": "6249a8b4...", "monitorName": "Test Unhealthy Monitor", "isUnhealthy": true, "locations": [ { "locationId": "80ad4fda...", "locationLabel": "My Private Location", "status": "missing_package_policy", "policyId": "6249a8b4...-80ad4fda...", "reason": "The Fleet package policy for this monitor/location pair does not exist." } ] } ], "errors": [] }Single endpoint (
GET /internal/synthetics/monitors/{monitorId}/_health):{ "configId": "6249a8b4...", "monitorName": "Test Unhealthy Monitor", "isUnhealthy": true, "locations": [ { "locationId": "80ad4fda...", "locationLabel": "My Private Location", "status": "missing_package_policy", "policyId": "6249a8b4...-80ad4fda...", "reason": "The Fleet package policy for this monitor/location pair does not exist." } ] }Some screenshots may be outdated!
Screenshots
Test plan
You can use this script for creating unhealthy monitor locally: break_monitors.sh, just replace the location id and kibana url
Prerequisites
Scenario 1: Verify healthy monitor
GET /internal/synthetics/monitors/{monitorId}/_health— all locations should be"status": "healthy"Scenario 2: Simulate
missing_package_policyPOST /api/fleet/package_policies/deletewithforce: trueif needed)POST /internal/synthetics/monitors/_healthwith{"monitorIds": ["<configId>"]}— response should show"status": "missing_package_policy"with areasonScenario 3: Simulate
missing_agent_policy"status": "missing_agent_policy"Scenario 4: Verify private locations settings page
Scenario 5: Reset a single monitor (edit page)
Scenario 6: Bulk reset monitors (monitor list)
Scenario 7: Reset monitors from private locations page
Scenario 8: Legacy policy ID format
{configId}-{locationId}-{spaceId}), verify they are not falsely reported asmissing_package_policy"status": "healthy"Scenario 9: Partial errors
{"monitorIds": ["valid-id", "nonexistent-id"]}monitorsentry for the valid ID and anerrorsentry withstatusCode: 404for the invalid oneScenario 10: Verify the single endpoint 404 handling
GET /internal/synthetics/monitors/nonexistent/_healthRisk assessment
Low — this is an additive feature. No existing API contracts or data models are modified. The new endpoints are internal-only and the UI changes are isolated to the Synthetics management pages.