[APM] Add alert and SLO badges to the service map embeddable#266360
Conversation
ea05a9b to
b291017
Compare
|
Pinging @elastic/obs-presentation-team (Team:obs-presentation) |
ApprovabilityVerdict: Needs human review This PR adds new user-facing functionality (alert and SLO badges) to the service map embeddable, introducing new hooks, context providers, and UI behavior. The author does not own any of the changed files, which are all owned by @elastic/obs-presentation-team who should review this new feature. You can customize Macroscope's approvability policy. Learn more. |
3ffe26d to
06f9401
Compare
jennypavlova
left a comment
There was a problem hiding this comment.
Thanks for adding the badges and the fixes/tests! Overall it works fine, I found an issue with the navigation (not sure if it's directly related to the PR): When the time range is changed in the embeddable and then an alert badge is clicked, the resulting alerts page navigation uses the original time range rather than the updated one. The embeddable manages start/end locally (derived from rangeFrom/rangeTo props), but any alert navigation links inside ServiceMapGraph that pull from global URL state or a shared context will see stale values. Worth investigating before merge or tracking as a follow up. Here is an example:
time_range_issue.mov
(As I said we can open a follow up issue for that so everything else is fine ✅ )
| jest.mock('../../../hooks/use_fetcher', () => ({ | ||
| useFetcher: () => mockUseFetcher(), | ||
| FETCH_STATUS: { | ||
| LOADING: 'loading', | ||
| SUCCESS: 'success', | ||
| FAILURE: 'failure', | ||
| NOT_INITIATED: 'not_initiated', | ||
| }, | ||
| })); |
There was a problem hiding this comment.
First thanks for adding the test! NIT: FETCH_STATUS values are hardcoded as string literals in the mock ('loading', 'success', etc.). If the actual enum values ever change, tests will pass with stale values. We can use jest.requireActual instead: FETCH_STATUS: jest.requireActual('../../../hooks/use_fetcher').FETCH_STATUS. wdyt?
There was a problem hiding this comment.
Whoops, that slipped through my review of claude's code, requireActual should be the way to go; I'll update it
|
@elasticmachine merge upstream |
Wow, nice catch I hadn't realised that was happening. I'll take a look at what's going on there Update: Fixed in c4c901b, check the section "Fix: Unstable history reference in ApmEmbeddableContext" in the description for details on root cause and resolution |
jennypavlova
left a comment
There was a problem hiding this comment.
Thanks for the fixes, I added some comments and I will test locally in the meantime (I was finishing something and couldn't switch earlier, I am on it now)
| jest.mock('../../../hooks/use_fetcher', () => ({ | ||
| useFetcher: () => mockUseFetcher(), | ||
| FETCH_STATUS: { | ||
| LOADING: 'loading', | ||
| SUCCESS: 'success', | ||
| FAILURE: 'failure', | ||
| NOT_INITIATED: 'not_initiated', | ||
| }, | ||
| })); |
| if (history.current === null) { | ||
| history.current = createMemoryHistory({ | ||
| initialEntries: [ | ||
| `/service-map?rangeFrom=${rangeFrom}&rangeTo=${rangeTo}&kuery=${encodeURIComponent( |
There was a problem hiding this comment.
NIT: The same string template appears twice: once in the useRef lazy init and once in the useEffect. A small inline helper would DRY this:
const buildHistoryUrl = (from: string, to: string, q: string) =>
/service-map?rangeFrom=${from}&rangeTo=${to}&kuery=${encodeURIComponent(q)}&comparisonEnabled=false;
| useEffect(() => { | ||
| history.current?.replace( | ||
| `/service-map?rangeFrom=${rangeFrom}&rangeTo=${rangeTo}&kuery=${encodeURIComponent( | ||
| kuery | ||
| )}&comparisonEnabled=false` | ||
| ); | ||
| }, [history, rangeFrom, rangeTo, kuery]); |
There was a problem hiding this comment.
The useEffect fires after mount and immediately calls history.current?.replace() with a URL that's identical to the one already baked into createMemoryHistory. The history v4 library emits a location event even when the path hasn't changed, so this triggers an unnecessary React Router re-render on mount. Since the location value is the same, it's harmless, but it's work that doesn't need to happen.
A simple guard fixes this — skip the effect on first render using an extra ref:
| useEffect(() => { | |
| history.current?.replace( | |
| `/service-map?rangeFrom=${rangeFrom}&rangeTo=${rangeTo}&kuery=${encodeURIComponent( | |
| kuery | |
| )}&comparisonEnabled=false` | |
| ); | |
| }, [history, rangeFrom, rangeTo, kuery]); | |
| const isFirstRender = useRef(true); | |
| useEffect(() => { | |
| if (isFirstRender.current) { isFirstRender.current = false; return; } | |
| history.current?.replace(`/service-map?...`); | |
| }, [history, rangeFrom, rangeTo, kuery]); |
| expect(result.current.nodes).toBe(defaultParams.nodes); | ||
| }); | ||
|
|
||
| it('serviceMapEnabled is false', () => { |
There was a problem hiding this comment.
There's no test covering the NOT_INITIATED intermediate state when the hook transitions from disabled to enabled, should we have one?
jennypavlova
left a comment
There was a problem hiding this comment.
I can confirm that the time range is now correct after navigating to the alerts, thanks for fixing that! I am approving to unblock, you can open a follow up with the other changes from the comment if that is easier for you.
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Module Count
Async chunks
Page load bundle
History
|
…ilder_new_vis_attachment * commit '6fd683609eb6dee81f242f8ff6951edbe3bfd66c': (226 commits) Remove Model Author group-by option from external inference endpoints (elastic#264761) [Streams][Streamlang] Align ES|QL condition transpiler with Painless on null propagation (elastic#264751) chore(axios,workflows-eng): remove axios from workflows connector utils (elastic#267512) [failed-test-reporter] avoid opening issues for scout env failures (elastic#267649) [kbn-api-contracts] Detect request-body additionalProperties:false tightening (elastic#267546) [main] Sync bundled packages with Package Storage (elastic#267644) Centralize phase colors and descriptions (elastic#266680) [Unified Waterfall] Add "Scroll to origin" button (elastic#266594) [APM] Add alert and SLO badges to the service map embeddable (elastic#266360) [CI] Speed up telemetry_check by pre-filtering to collector files (elastic#265978) [Discover] Address flaky large CSV test (elastic#266642) avoid passing unrelated props within integration card icon component conditional render (elastic#266569) [Cases][Templates] Extend cases search by template field label (elastic#266414) [Background search] Migrate custom SplitButton to EuiSplitButton (elastic#267447) [i18n] Report translation coverage during integrate (elastic#264124) [api-docs] 2026-05-05 Daily api_docs build (elastic#267639) [Scout] Update test config manifests (elastic#267636) [content list] Add saved object provider services (elastic#266428) [Fleet] Otel UI add health and implement it in OTelComponentDetail (elastic#267292) Update dependency msw to v2.13.4 (main) (elastic#266770) ...
Summary
Adds alert and SLO health badge support to the APM service map embeddable, bringing it to parity with the full-page service map. Previously, the embeddable rendered nodes without badge data because the badge-fetching and node-merging steps were never ported from the page component.
Changes
Alert & SLO badges in the service map embeddable
The embeddable now calls
useServiceMapBadgesand wrapsServiceMapGraphinServiceMapSloFlyoutProvider, so alert counts and SLO health indicators appear on service nodes and the SLO detail flyout opens on badge click — matching the behaviour of the full-page service map. I've had to update the dependencies provided in the embeddable context provider to meet SLO badge requirements, adding the slo plugin deps as well as the telemetry client.Refactor:
useServiceMapBadgesowns all badge derivationMoved service-name extraction (
isServiceNodeDatafilter + dedup/sort), theenabledguard (license + config + nodes fetch status), and themergeServiceMapNodesWithBadgesmerge step into the hook. Callers now passnodes+nodesStatusand get{ nodes, status }back — removing ~30 lines of identical logic that had been duplicated between the page component and the embeddable.Refactor:
useSloOverviewFlyoutshared hookExtracted the repeated
useState+openSloOverviewFlyout+closeSloOverviewFlyoutpattern into a single hook atcomponents/shared/slo_overview_flyout/use_slo_overview_flyout.ts, consumed by the service map, service inventory table, and service header template.Enhancement: Badge loading status handling
Before this PR, the way we were handling loading status for node badges was leading to popping issue when the map was rendered.
Screen.Recording.2026-04-29.at.12.49.41.mov
As we can see in the video above, the map loads in 2 phases. First, node and edge data is fetched in order to build the map layout. Once that data is ready, the spinner is hidden and the nodes are painted on the canvas. But, this first fetch does not include alert nor SLO data so the map is painted first without any badge data. Some time after the first paint, badge data finishes fetching and then badges suddenly pop into view for the relevant nodes in the map.
This issue is not tied to race conditions or latency issues. While higher latencies can increase the time it takes for the badges to appear, this issue will always happen regardless of network conditions. Why? Because in order to load alert & SLO data for the services in the map we first need to know which services will be included in the map view. This means that badge information can only be retrieved after the map topology has finished fetching. Nodes & edges data will always load before badge data.
In order to prevent popping, this PR reworks the load state management for both the embeddable and full map components. The map will be considered to be loading (and thus the spinner will be shown) when topology or badge data is loading. This way, we delay the map paint until we have enough data to paint nodes, edges and badges at the same time. While this may increase the time it takes for the map to load it will prevent the need for a second render pass that includes badges and changes the map after it has already been laid out.
Here's another demo of how loading the map looks now after taking badge status into account
Screen.Recording.2026-04-29.at.12.48.53.mov
Fix: Unstable history reference in
ApmEmbeddableContextAs pointed out by @jennypavlova here, there was an issue where the alert and SLO badges weren't reacting properly to updates on the embeddable's own date picker when redirecting to the details pages. While the data on the map itself did work accordingly to the time range selected (the badges would properly show/hide dynamically depending on the data present for the selected range), redirecting would always use the initial date the embeddable component had when first rendered and would ignore live updates until the page was reloaded.
Then, I started digging into how the href for the badge redirections were built. Starting from
ServiceNodewhich is the component that renders nodes for services in the map I traced the following path:ServiceNodecallsuseServiceMapAlertsTabNavigate, which callsuseServiceMapAlertsTabHref. That hook callsuseAnyOfApmParams('/service-map', ...)to obtain the current
query(which includesrangeFrom/rangeTo) and passes it toapmRouter.link(...)to build the alerts tab URL.useAnyOfApmParams()→useParams()→useLocation()fromreact-router-dom.With this render path in mind and knowing that the issue is only observed in the redirection, it seemed reasonable to assume that the staleness issues was coming from somewhere in the routing config for the embeddable. Tracing into React Router's source code confirmed the root cause. React Router v5's Router class component wires up history.listen() in its constructor and componentDidMount, and has no componentDidUpdate that re-subscribes when the history prop changes. There's even a dev warning (visible in the browser console in Kibana) which warns users of changes on the history reference.
So when a new history instance arrived as a prop:
The source of the problem is in these lines from
ApmEmbeddableContextby which a completely new History object would be created on every parameter update, like date range. The solution introduced by this PR in c4c901b is to create a stable history object stored via
useRefwhen the context provider is first mounted. Then, a separateuseEffecthooks handles reacting to parameter updates by callinghistory.replace()with the new URL containing updated parameters. This way we avoid recreating the whole history and instead use the expected mechanisms to push history updates ensuring subscribers (such as theReactRoutercomponent) react properly to updatesUI Demo
Screen.Recording.2026-04-29.at.15.19.20.mov
Testing
violatedordegradingSLO status./app/apm/service-map) still renders badges and flyout correctly (no regression).References
Closes #265994