[Stack Monitoring] Support for apm-server/beats Agent subprocesses #144701

klacabane · 2022-11-07T13:14:48Z

Summary

An elastic-agent may spawn beats and apm-server subprocesses depending on its configuration. When that's the case the metrics for these processes will be ingested under metrics-elastic_agent.(apm-server|metricbeat)-*.

Stack Monitoring should be able to interpret and surface these processes just like we do with the metricbeat data stored in .monitoring-beats-mb.

Right now the mappings for these streams (stored in the elastic_agent package) are not aligned with stack monitoring expectations and adding the metrics-elastic_agent.* pattern to appropriate queries won't be enough, we'll either have to update the mappings to carry the legacy aliases or update queries to also look for the ECS format. The former approach is less work and is also consistent with other stack packages (es/kibana/logstash) so I'm inclined going that route.

Update SM server to read from metrics-elastic_agent.* pattern in appropriate places
Ensure existing ecs mappings are exhaustive
Add legacy aliases pointing to the ecs fields (use .monitoring-beats-mb as reference)

AC

Stack Monitoring surfaces apm-server and beats spawned by elastic-agent
apm-server/beats Views are similarly populated when powered by metricbeat or agent

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-11-07T13:14:50Z

Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)

joshdover · 2022-11-07T13:39:04Z

@klacabane @cmacknz do you see any downside to supporting the Beats metrics as they are in the SM UI in terms of forwards compatibility? In the future v2 Agent architecture, we're likely to break out some inputs from filebeat and metricbeat into separate processes. Will this cause a breaking change for the SM UI if we start supporting these metrics now?

cmacknz · 2022-11-07T14:48:20Z

Do you see any downside to supporting the Beats metrics as they are in the SM UI in terms of forwards compatibility? In the future v2 Agent architecture, we're likely to break out some inputs from filebeat and metricbeat into separate processes. Will this cause a breaking change for the SM UI if we start supporting these metrics now?

In addition to changing the input architecture, we are also adding new metrics and given the chance would probably like to deprecate some of the existing Beat metrics.

I don't think there is a great understanding of the implications of adding or changing metrics on stack monitoring within the agent team. In general for the existing Beat metrics we just never change them to avoid breaking anything, but this also means we don't improve them.

I am in favour of deferring this until the changes to the agent architecture are complete if we can. If we don't want to defer this, we should review the existing metrics and align on which ones we want to keep in the long term.

klacabane · 2022-11-07T15:04:38Z

I'm not very inclined to couple stack monitoring to the agent internals as stated here #120415 (comment). Adding the legacy mappings will also make that dependency two way which is not desirable when we already have a new agent architecture planned and likely different document shapes.

Now it depends how far ahead those changes are but Stack monitoring is only looking at maintaining feature parity with metricbeat collection, and we don't plan on improving the existing moving forward as the solution will be superseded. The future solution is based on packages and I'd say it would be a good opportunity to already invests in building dashboard directly in the elastic agent package instead of supporting it in SM

klacabane · 2022-11-22T11:43:44Z

Summarizing spread out pieces about apm and beats monitoring in Agent world:

For standalone mode, we’ll create a Beats package (#144995) that will spawn the corresponding metricbeat module so that standalone processes, apm-server included, can still be monitored and surfaced in Stack Monitoring.

For agent subprocesses, they are a detail of the Agent internals. If we want to leak this information we should be careful and avoid spreading it to Stack Monitoring because it will confuse users that didn’t explicitly set up beats and see them in Stack Monitoring (Agents are not shown in SM). I suggest keeping that information encapsulated within the Elastic-Agent package and creating Dashboards that monitor the processes. To ease discoverability of these dashboards, Stack Monitoring could detect whether metrics are collected by agents and offer a quick link to the relevant Dashboards.
@cmacknz is that reasonable path forward ?

APM running under agent is trickier because the internals are already exposed, one is directly configuring the server when adding the integration and it’s specified in the documentation[1]. The server is also unlikely to be replaced/refactored like other beats. No strong opinion here we could read from elastic_agent.apm* in Stack Monitoring UI, but this could be the opportunity to already invest in the future monitoring approach by directly building these dashboards in package, possibly in the APM package since it already exists and is a prerequisite to using APM in agent.
@simitt do you think we should already redirect users to Dashboards instead of SM when using apm integration ?

[1]

cmacknz · 2022-11-23T20:08:14Z

I suggest keeping that information encapsulated within the Elastic-Agent package and creating Dashboards that monitor the processes. To ease discoverability of these dashboards, Stack Monitoring could detect whether metrics are collected by agents and offer a quick link to the relevant Dashboards.
@cmacknz is that reasonable path forward ?

I think this makes sense, we really don't want to expose the implementation details of the agent for monitoring. We are starting to think more about how improve the existing agent monitoring dashboards within Fleet that we should avoid duplicating. @joshdover or @kpollich likely have some valuable input here on how to tie this into stack monitoring for agent.

simitt · 2022-12-01T08:18:12Z

@klacabane we haven't yet decided how to move forward with this topic, but we have internal conversations on whether to build a dedicated UI or dashboards.

klacabane · 2022-12-13T11:27:24Z

Thanks! So we need a way to monitor apm-server under agent until an alternative exists. If we want to read from metrics-elastic_agent.apm-server-* we'll have to carry over the legacy SM aliases to the elastic-agent mappings. Can an apm-server subprocess expose its metrics endpoints so it can be monitored like a standalone one ? That's maybe how elastic-agent itself collect these metrics and could be the temporary alternative

simitt · 2022-12-13T13:26:09Z

Can an apm-server subprocess expose its metrics endpoints so it can be monitored like a standalone one ?

Yes, that's how it is set up on ESS. But the configuration has to be passed down from the Elastic Agent (agent.monitoring.http.enabled: true), also see relevant docs.

klacabane · 2022-12-13T14:39:43Z

Would that be an acceptable solution for users ? My concern is polluting the elastic-agent mapping with Stack monitoring-related aliases and the maintenance that goes with it.

The beats package will be available starting for 8.7.0 and one could follow the same steps as described in the docs but for the agent package instead of metricbeat module.

simitt · 2022-12-23T06:59:48Z

Let's include the @joshdover and @cmacknz for this question. From an APM perspective, I think it is fine to use a configuration option for exposing a metrics endpoint, but not certain if that is aligned with the elastic-agent team's vision.

joshdover · 2023-01-04T12:15:47Z

Braindump of opinions:

Any process that is run by agent should have it's monitoring orchestrated by the agent automatically if agent.monitoring.metrics: true is enabled. No other configuration should be required by the user, except installing the elastic_agent integration package.
I don't see a strong use case for using Agent to monitor standalone APM Server. If you're running standalone APM Server and want to monitor it, you should either use Beats or switch to Elastic Agent to run APM Server and it's monitoring.
I don't think we should couple any monitoring data that Agent collects to the Stack Monitoring UI. We have an opportunity here to reduce our the matrix of combinations and focus only on the future vision of monitoring our products, based on the Platform Observability design of using OLTP + integration dashboards.
Given the above, if APM wants to have APM-specific UI views for Agent-managed APM Server, we should add dashboards for this to the elastic_agent package. If Agent is not collecting the right metrics for APM Server, we need to make changes to Agent to configure APM Server and Metricbeat correctly to ingest these.

klacabane · 2023-01-04T18:48:16Z

Thanks @joshdover

Looks like we're aligned on defining clear boundaries between Agent and Stack monitoring.
Summarizing the monitoring scenarios:

standalone beats and apm-server are monitored by the Beat package. While I agree that there is no strong use case for standalone apm-server <- agent monitoring, this comes out of the box since apm-server shares the same internals with beats processes. We don't need to document this possibility. The monitoring data will be surfaced in Stack Monitoring app.
elastic-agent beats subprocesses are monitored by the elastic-agent package. The metrics are surfaced in the [Elastic Agent] Agent metrics dashboard.
elastic-agent apm-server subprocesses are also monitored by the Agent metrics dashboard but the analysis is less thorough than the Stack Monitoring dashboards (eg processed events breakdown, uptime..). We should aim at building these additional dashboards in packages. These dashboards could be included in the elastic-agent package, but could also be included in the apm package since it's a prerequisite to spawning a functioning server. Either way both packages will be installed so that's maybe an ownership question

With these scenarios, no agent-produced data (metrics-elastic_agent*) will be surfaced in Stack Monitoring but instead directly consumed in packages.
@cmacknz @simitt does that sound good ?

cmacknz · 2023-01-04T18:52:16Z

👍 from me

simitt · 2023-01-05T14:56:52Z

SGTM; would prefer the dashboards being part of the apm package than the elastic-agent package

joshdover · 2023-01-10T10:47:43Z

SGTM. @simitt yes let's put them in the apm package to clarify ownership. One caveat is that the data streams are currently defined in elastic_agent package and we're currently planning some changes to the structure of these data streams. See discussion in elastic/elastic-agent#1814

This just means that we'll need to coordinate changes in across these two packages for this change and any future ones.

klacabane · 2023-01-11T16:20:47Z

Are both apm and elastic-agent packages stack-aligned ? Syncing the changes would be a difficult feat otherwise

simitt · 2023-01-11T16:29:19Z

APM packages are bundled with Kibana, and therefore aligned with the Kibana version.

klacabane · 2023-01-12T13:26:23Z

Because apm would rely on data streams defined in elastic-agent package (which does not appear to be stack bound), I'm concerned of breaking changes going uncaught:

apm package builds a dashboard that relies on metrics-elastic_agent.apm-server.*
elastic-agent package ships breaking change to the mappings/stream
apm dashboard are broken for users that updated the elastic-agent package

I think these concerns are mostly mitigated by the coupling of the data streams to the elastic-agent internals, so those would only happen when shipping a new stack version with a long enough window to catch potential issues. Coordination would also cover this, but I'm wondering how we could automatically detect such breakage. Maybe a CI step that installs both packages and run assertions on the dashboard ? Do we have such capability at the moment ?

klacabane · 2023-03-06T16:31:22Z

Closing this as the initial discussion came to a conclusion and the dependency concern between packages is out of scope. Ownership can be discussed when planning for new dashboards.

klacabane added Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Feature:Stack Monitoring v8.6.0 labels Nov 7, 2022

This was referenced Nov 7, 2022

[Stack Monitoring] Investigate reading apm-server metrics from agent indice #143265

Closed

[Stack Monitoring] Support for integrations #120415

Closed

klacabane changed the title ~~[Stack Monitoring] Support for apm-server/beats metrics~~ [Stack Monitoring] Support for apm-server/beats Agent subprocesses Nov 22, 2022

klacabane added v8.7.0 and removed v8.6.0 labels Nov 24, 2022

klacabane closed this as completed Mar 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stack Monitoring] Support for apm-server/beats Agent subprocesses #144701

[Stack Monitoring] Support for apm-server/beats Agent subprocesses #144701

klacabane commented Nov 7, 2022 •

edited

Loading

elasticmachine commented Nov 7, 2022

joshdover commented Nov 7, 2022

cmacknz commented Nov 7, 2022

klacabane commented Nov 7, 2022

klacabane commented Nov 22, 2022

cmacknz commented Nov 23, 2022

simitt commented Dec 1, 2022

klacabane commented Dec 13, 2022

simitt commented Dec 13, 2022

klacabane commented Dec 13, 2022 •

edited

Loading

simitt commented Dec 23, 2022

joshdover commented Jan 4, 2023

klacabane commented Jan 4, 2023 •

edited

Loading

cmacknz commented Jan 4, 2023

simitt commented Jan 5, 2023

joshdover commented Jan 10, 2023

klacabane commented Jan 11, 2023

simitt commented Jan 11, 2023

klacabane commented Jan 12, 2023 •

edited

Loading

klacabane commented Mar 6, 2023

[Stack Monitoring] Support for apm-server/beats Agent subprocesses #144701

[Stack Monitoring] Support for apm-server/beats Agent subprocesses #144701

Comments

klacabane commented Nov 7, 2022 • edited Loading

Summary

AC

elasticmachine commented Nov 7, 2022

joshdover commented Nov 7, 2022

cmacknz commented Nov 7, 2022

klacabane commented Nov 7, 2022

klacabane commented Nov 22, 2022

cmacknz commented Nov 23, 2022

simitt commented Dec 1, 2022

klacabane commented Dec 13, 2022

simitt commented Dec 13, 2022

klacabane commented Dec 13, 2022 • edited Loading

simitt commented Dec 23, 2022

joshdover commented Jan 4, 2023

klacabane commented Jan 4, 2023 • edited Loading

cmacknz commented Jan 4, 2023

simitt commented Jan 5, 2023

joshdover commented Jan 10, 2023

klacabane commented Jan 11, 2023

simitt commented Jan 11, 2023

klacabane commented Jan 12, 2023 • edited Loading

klacabane commented Mar 6, 2023

klacabane commented Nov 7, 2022 •

edited

Loading

klacabane commented Dec 13, 2022 •

edited

Loading

klacabane commented Jan 4, 2023 •

edited

Loading

klacabane commented Jan 12, 2023 •

edited

Loading