Skip to content

[APM] Add alert and SLO badges to the service map embeddable#266360

Merged
AlejandroFrndz merged 12 commits intoelastic:mainfrom
AlejandroFrndz:265994-enable-badges-service-map-embeddable
May 5, 2026
Merged

[APM] Add alert and SLO badges to the service map embeddable#266360
AlejandroFrndz merged 12 commits intoelastic:mainfrom
AlejandroFrndz:265994-enable-badges-service-map-embeddable

Conversation

@AlejandroFrndz
Copy link
Copy Markdown
Contributor

@AlejandroFrndz AlejandroFrndz commented Apr 29, 2026

Summary

Adds alert and SLO health badge support to the APM service map embeddable, bringing it to parity with the full-page service map. Previously, the embeddable rendered nodes without badge data because the badge-fetching and node-merging steps were never ported from the page component.

Changes

Alert & SLO badges in the service map embeddable

The embeddable now calls useServiceMapBadges and wraps ServiceMapGraph in ServiceMapSloFlyoutProvider, so alert counts and SLO health indicators appear on service nodes and the SLO detail flyout opens on badge click — matching the behaviour of the full-page service map. I've had to update the dependencies provided in the embeddable context provider to meet SLO badge requirements, adding the slo plugin deps as well as the telemetry client.

Refactor: useServiceMapBadges owns all badge derivation

Moved service-name extraction (isServiceNodeData filter + dedup/sort), the enabled guard (license + config + nodes fetch status), and the mergeServiceMapNodesWithBadges merge step into the hook. Callers now pass nodes + nodesStatus and get { nodes, status } back — removing ~30 lines of identical logic that had been duplicated between the page component and the embeddable.

Refactor: useSloOverviewFlyout shared hook

Extracted the repeated useState + openSloOverviewFlyout + closeSloOverviewFlyout pattern into a single hook at components/shared/slo_overview_flyout/use_slo_overview_flyout.ts, consumed by the service map, service inventory table, and service header template.

Enhancement: Badge loading status handling

Before this PR, the way we were handling loading status for node badges was leading to popping issue when the map was rendered.

Screen.Recording.2026-04-29.at.12.49.41.mov

As we can see in the video above, the map loads in 2 phases. First, node and edge data is fetched in order to build the map layout. Once that data is ready, the spinner is hidden and the nodes are painted on the canvas. But, this first fetch does not include alert nor SLO data so the map is painted first without any badge data. Some time after the first paint, badge data finishes fetching and then badges suddenly pop into view for the relevant nodes in the map.

This issue is not tied to race conditions or latency issues. While higher latencies can increase the time it takes for the badges to appear, this issue will always happen regardless of network conditions. Why? Because in order to load alert & SLO data for the services in the map we first need to know which services will be included in the map view. This means that badge information can only be retrieved after the map topology has finished fetching. Nodes & edges data will always load before badge data.

In order to prevent popping, this PR reworks the load state management for both the embeddable and full map components. The map will be considered to be loading (and thus the spinner will be shown) when topology or badge data is loading. This way, we delay the map paint until we have enough data to paint nodes, edges and badges at the same time. While this may increase the time it takes for the map to load it will prevent the need for a second render pass that includes badges and changes the map after it has already been laid out.

Here's another demo of how loading the map looks now after taking badge status into account

Screen.Recording.2026-04-29.at.12.48.53.mov

Fix: Unstable history reference in ApmEmbeddableContext

As pointed out by @jennypavlova here, there was an issue where the alert and SLO badges weren't reacting properly to updates on the embeddable's own date picker when redirecting to the details pages. While the data on the map itself did work accordingly to the time range selected (the badges would properly show/hide dynamically depending on the data present for the selected range), redirecting would always use the initial date the embeddable component had when first rendered and would ignore live updates until the page was reloaded.

Then, I started digging into how the href for the badge redirections were built. Starting from ServiceNode which is the component that renders nodes for services in the map I traced the following path:

ServiceNode calls useServiceMapAlertsTabNavigate, which calls
useServiceMapAlertsTabHref. That hook calls useAnyOfApmParams('/service-map', ...)
to obtain the current query (which includes rangeFrom/rangeTo) and passes it to
apmRouter.link(...) to build the alerts tab URL. useAnyOfApmParams()useParams()useLocation() from react-router-dom.

With this render path in mind and knowing that the issue is only observed in the redirection, it seemed reasonable to assume that the staleness issues was coming from somewhere in the routing config for the embeddable. Tracing into React Router's source code confirmed the root cause. React Router v5's Router class component wires up history.listen() in its constructor and componentDidMount, and has no componentDidUpdate that re-subscribes when the history prop changes. There's even a dev warning (visible in the browser console in Kibana) which warns users of changes on the history reference.

So when a new history instance arrived as a prop:

  • RouterContext reflected the new history object...
  • ...but this.state.location was still controlled by the original history's listen subscription, which was never called because the original object never navigated anywhere.
  • useLocation() therefore always returned the initial URL with the original rangeFrom/rangeTo.
  • Every alerts tab link built by useServiceMapAlertsTabHref used those stale params.

The source of the problem is in these lines from ApmEmbeddableContext

const history = useMemo(
  () =>
    createMemoryHistory({
      initialEntries: [`/service-map?rangeFrom=${rangeFrom}&rangeTo=${rangeTo}&...`],
    }),
  [rangeFrom, rangeTo, kuery]  // new object on every date change
);

by which a completely new History object would be created on every parameter update, like date range. The solution introduced by this PR in c4c901b is to create a stable history object stored via useRef when the context provider is first mounted. Then, a separate useEffect hooks handles reacting to parameter updates by calling history.replace() with the new URL containing updated parameters. This way we avoid recreating the whole history and instead use the expected mechanisms to push history updates ensuring subscribers (such as the ReactRouter component) react properly to updates

UI Demo

Screen.Recording.2026-04-29.at.15.19.20.mov

Testing

  • Load a dashboard containing the APM service map embeddable on a Platinum-licensed stack with active alerts and/or SLOs configured for APM services. Confirm:
    • Alert badges (red count bubble) appear on affected service nodes.
    • SLO badges appear on nodes with violated or degrading SLO status.
    • Clicking an SLO badge opens the SLO overview flyout without errors in the console.
  • Verify the full-page service map (/app/apm/service-map) still renders badges and flyout correctly (no regression).

References

Closes #265994

@AlejandroFrndz AlejandroFrndz added release_note:enhancement backport:skip This PR does not require backporting Team:obs-presentation Focus: APM UI, Infra UI, Hosts UI, Universal Profiling, Obs Overview and left Navigation v9.5.0 release_note:skip Skip the PR/issue when compiling release notes and removed release_note:enhancement labels Apr 29, 2026
@AlejandroFrndz AlejandroFrndz self-assigned this Apr 29, 2026
@AlejandroFrndz AlejandroFrndz force-pushed the 265994-enable-badges-service-map-embeddable branch from ea05a9b to b291017 Compare April 29, 2026 12:05
@AlejandroFrndz AlejandroFrndz marked this pull request as ready for review April 29, 2026 14:05
@AlejandroFrndz AlejandroFrndz requested a review from a team as a code owner April 29, 2026 14:05
@infra-vault-gh-plugin-prod
Copy link
Copy Markdown

Pinging @elastic/obs-presentation-team (Team:obs-presentation)

@macroscopeapp
Copy link
Copy Markdown
Contributor

macroscopeapp Bot commented Apr 29, 2026

Approvability

Verdict: Needs human review

This PR adds new user-facing functionality (alert and SLO badges) to the service map embeddable, introducing new hooks, context providers, and UI behavior. The author does not own any of the changed files, which are all owned by @elastic/obs-presentation-team who should review this new feature.

You can customize Macroscope's approvability policy. Learn more.

@AlejandroFrndz AlejandroFrndz force-pushed the 265994-enable-badges-service-map-embeddable branch from 3ffe26d to 06f9401 Compare April 29, 2026 15:04
@jennypavlova jennypavlova self-requested a review April 30, 2026 08:44
Copy link
Copy Markdown
Member

@jennypavlova jennypavlova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the badges and the fixes/tests! Overall it works fine, I found an issue with the navigation (not sure if it's directly related to the PR): When the time range is changed in the embeddable and then an alert badge is clicked, the resulting alerts page navigation uses the original time range rather than the updated one. The embeddable manages start/end locally (derived from rangeFrom/rangeTo props), but any alert navigation links inside ServiceMapGraph that pull from global URL state or a shared context will see stale values. Worth investigating before merge or tracking as a follow up. Here is an example:

time_range_issue.mov

(As I said we can open a follow up issue for that so everything else is fine ✅ )

Comment on lines +26 to +34
jest.mock('../../../hooks/use_fetcher', () => ({
useFetcher: () => mockUseFetcher(),
FETCH_STATUS: {
LOADING: 'loading',
SUCCESS: 'success',
FAILURE: 'failure',
NOT_INITIATED: 'not_initiated',
},
}));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First thanks for adding the test! NIT: FETCH_STATUS values are hardcoded as string literals in the mock ('loading', 'success', etc.). If the actual enum values ever change, tests will pass with stale values. We can use jest.requireActual instead: FETCH_STATUS: jest.requireActual('../../../hooks/use_fetcher').FETCH_STATUS. wdyt?

Copy link
Copy Markdown
Contributor Author

@AlejandroFrndz AlejandroFrndz May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, that slipped through my review of claude's code, requireActual should be the way to go; I'll update it

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks :)

@AlejandroFrndz
Copy link
Copy Markdown
Contributor Author

@elasticmachine merge upstream

@AlejandroFrndz
Copy link
Copy Markdown
Contributor Author

AlejandroFrndz commented May 4, 2026

Thanks for adding the badges and the fixes/tests! Overall it works fine, I found an issue with the navigation (not sure if it's directly related to the PR): When the time range is changed in the embeddable and then an alert badge is clicked, the resulting alerts page navigation uses the original time range rather than the updated one. The embeddable manages start/end locally (derived from rangeFrom/rangeTo props), but any alert navigation links inside ServiceMapGraph that pull from global URL state or a shared context will see stale values. Worth investigating before merge or tracking as a follow up. Here is an example:

time_range_issue.mov
(As I said we can open a follow up issue for that so everything else is fine ✅ )

Wow, nice catch I hadn't realised that was happening. I'll take a look at what's going on there

Update: Fixed in c4c901b, check the section "Fix: Unstable history reference in ApmEmbeddableContext" in the description for details on root cause and resolution

Copy link
Copy Markdown
Member

@jennypavlova jennypavlova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes, I added some comments and I will test locally in the meantime (I was finishing something and couldn't switch earlier, I am on it now)

Comment on lines +26 to +34
jest.mock('../../../hooks/use_fetcher', () => ({
useFetcher: () => mockUseFetcher(),
FETCH_STATUS: {
LOADING: 'loading',
SUCCESS: 'success',
FAILURE: 'failure',
NOT_INITIATED: 'not_initiated',
},
}));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks :)

if (history.current === null) {
history.current = createMemoryHistory({
initialEntries: [
`/service-map?rangeFrom=${rangeFrom}&rangeTo=${rangeTo}&kuery=${encodeURIComponent(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: The same string template appears twice: once in the useRef lazy init and once in the useEffect. A small inline helper would DRY this:
const buildHistoryUrl = (from: string, to: string, q: string) =>
/service-map?rangeFrom=${from}&rangeTo=${to}&kuery=${encodeURIComponent(q)}&comparisonEnabled=false;

Comment on lines +57 to +63
useEffect(() => {
history.current?.replace(
`/service-map?rangeFrom=${rangeFrom}&rangeTo=${rangeTo}&kuery=${encodeURIComponent(
kuery
)}&comparisonEnabled=false`
);
}, [history, rangeFrom, rangeTo, kuery]);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The useEffect fires after mount and immediately calls history.current?.replace() with a URL that's identical to the one already baked into createMemoryHistory. The history v4 library emits a location event even when the path hasn't changed, so this triggers an unnecessary React Router re-render on mount. Since the location value is the same, it's harmless, but it's work that doesn't need to happen.

A simple guard fixes this — skip the effect on first render using an extra ref:

Suggested change
useEffect(() => {
history.current?.replace(
`/service-map?rangeFrom=${rangeFrom}&rangeTo=${rangeTo}&kuery=${encodeURIComponent(
kuery
)}&comparisonEnabled=false`
);
}, [history, rangeFrom, rangeTo, kuery]);
const isFirstRender = useRef(true);
useEffect(() => {
if (isFirstRender.current) { isFirstRender.current = false; return; }
history.current?.replace(`/service-map?...`);
}, [history, rangeFrom, rangeTo, kuery]);

expect(result.current.nodes).toBe(defaultParams.nodes);
});

it('serviceMapEnabled is false', () => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no test covering the NOT_INITIATED intermediate state when the hook transitions from disabled to enabled, should we have one?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered 👍

@jennypavlova jennypavlova self-requested a review May 4, 2026 16:59
Copy link
Copy Markdown
Member

@jennypavlova jennypavlova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that the time range is now correct after navigating to the alerts, thanks for fixing that! I am approving to unblock, you can open a follow up with the other changes from the comment if that is easier for you.

@AlejandroFrndz AlejandroFrndz requested a review from a team as a code owner May 4, 2026 18:54
Copy link
Copy Markdown
Contributor

@cauemarcondes cauemarcondes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kibanamachine
Copy link
Copy Markdown
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #139 / integrations When in the Fleet application and on the Endpoint Integration details page should display the endpoint custom content

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
apm 2166 2167 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
apm 2.8MB 2.8MB +1.4KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
apm 25.0KB 25.1KB +12.0B

History

cc @AlejandroFrndz

@AlejandroFrndz AlejandroFrndz merged commit 284fe64 into elastic:main May 5, 2026
26 checks passed
mbondyra added a commit to mbondyra/kibana that referenced this pull request May 5, 2026
…ilder_new_vis_attachment

* commit '6fd683609eb6dee81f242f8ff6951edbe3bfd66c': (226 commits)
  Remove Model Author group-by option from external inference endpoints (elastic#264761)
  [Streams][Streamlang] Align ES|QL condition transpiler with Painless on null propagation (elastic#264751)
  chore(axios,workflows-eng): remove axios from workflows connector utils (elastic#267512)
  [failed-test-reporter] avoid opening issues for scout env failures (elastic#267649)
  [kbn-api-contracts] Detect request-body additionalProperties:false tightening (elastic#267546)
  [main] Sync bundled packages with Package Storage (elastic#267644)
  Centralize phase colors and descriptions (elastic#266680)
  [Unified Waterfall] Add "Scroll to origin" button  (elastic#266594)
  [APM] Add alert and SLO badges to the service map embeddable (elastic#266360)
  [CI] Speed up telemetry_check by pre-filtering to collector files (elastic#265978)
  [Discover] Address flaky large CSV test (elastic#266642)
  avoid passing unrelated props within integration card icon component conditional render (elastic#266569)
  [Cases][Templates] Extend cases search by template field label (elastic#266414)
  [Background search] Migrate custom SplitButton to EuiSplitButton (elastic#267447)
  [i18n] Report translation coverage during integrate (elastic#264124)
  [api-docs] 2026-05-05 Daily api_docs build (elastic#267639)
  [Scout] Update test config manifests (elastic#267636)
  [content list] Add saved object provider services (elastic#266428)
  [Fleet] Otel UI add health and implement it in OTelComponentDetail (elastic#267292)
  Update dependency msw to v2.13.4 (main) (elastic#266770)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:obs-presentation Focus: APM UI, Infra UI, Hosts UI, Universal Profiling, Obs Overview and left Navigation v9.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[APM] Add alerts and SLO badges to the embeddable and alerts page

5 participants