Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stack Monitoring] Support for apm-server/beats Agent subprocesses #144701

Closed
3 tasks
Tracked by #120415
klacabane opened this issue Nov 7, 2022 · 20 comments
Closed
3 tasks
Tracked by #120415

[Stack Monitoring] Support for apm-server/beats Agent subprocesses #144701

klacabane opened this issue Nov 7, 2022 · 20 comments
Labels
Feature:Stack Monitoring Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v8.7.0

Comments

@klacabane
Copy link
Contributor

klacabane commented Nov 7, 2022

Summary

An elastic-agent may spawn beats and apm-server subprocesses depending on its configuration. When that's the case the metrics for these processes will be ingested under metrics-elastic_agent.(apm-server|metricbeat)-*.

Stack Monitoring should be able to interpret and surface these processes just like we do with the metricbeat data stored in .monitoring-beats-mb.

Right now the mappings for these streams (stored in the elastic_agent package) are not aligned with stack monitoring expectations and adding the metrics-elastic_agent.* pattern to appropriate queries won't be enough, we'll either have to update the mappings to carry the legacy aliases or update queries to also look for the ECS format. The former approach is less work and is also consistent with other stack packages (es/kibana/logstash) so I'm inclined going that route.

  • Update SM server to read from metrics-elastic_agent.* pattern in appropriate places
  • Ensure existing ecs mappings are exhaustive
  • Add legacy aliases pointing to the ecs fields (use .monitoring-beats-mb as reference)

AC

  • Stack Monitoring surfaces apm-server and beats spawned by elastic-agent
  • apm-server/beats Views are similarly populated when powered by metricbeat or agent
@klacabane klacabane added Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Feature:Stack Monitoring v8.6.0 labels Nov 7, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)

@joshdover
Copy link
Contributor

@klacabane @cmacknz do you see any downside to supporting the Beats metrics as they are in the SM UI in terms of forwards compatibility? In the future v2 Agent architecture, we're likely to break out some inputs from filebeat and metricbeat into separate processes. Will this cause a breaking change for the SM UI if we start supporting these metrics now?

@cmacknz
Copy link
Member

cmacknz commented Nov 7, 2022

Do you see any downside to supporting the Beats metrics as they are in the SM UI in terms of forwards compatibility? In the future v2 Agent architecture, we're likely to break out some inputs from filebeat and metricbeat into separate processes. Will this cause a breaking change for the SM UI if we start supporting these metrics now?

In addition to changing the input architecture, we are also adding new metrics and given the chance would probably like to deprecate some of the existing Beat metrics.

I don't think there is a great understanding of the implications of adding or changing metrics on stack monitoring within the agent team. In general for the existing Beat metrics we just never change them to avoid breaking anything, but this also means we don't improve them.

I am in favour of deferring this until the changes to the agent architecture are complete if we can. If we don't want to defer this, we should review the existing metrics and align on which ones we want to keep in the long term.

@klacabane
Copy link
Contributor Author

I'm not very inclined to couple stack monitoring to the agent internals as stated here #120415 (comment). Adding the legacy mappings will also make that dependency two way which is not desirable when we already have a new agent architecture planned and likely different document shapes.

Now it depends how far ahead those changes are but Stack monitoring is only looking at maintaining feature parity with metricbeat collection, and we don't plan on improving the existing moving forward as the solution will be superseded. The future solution is based on packages and I'd say it would be a good opportunity to already invests in building dashboard directly in the elastic agent package instead of supporting it in SM

@klacabane klacabane changed the title [Stack Monitoring] Support for apm-server/beats metrics [Stack Monitoring] Support for apm-server/beats Agent subprocesses Nov 22, 2022
@klacabane
Copy link
Contributor Author

Summarizing spread out pieces about apm and beats monitoring in Agent world:

For standalone mode, we’ll create a Beats package (#144995) that will spawn the corresponding metricbeat module so that standalone processes, apm-server included, can still be monitored and surfaced in Stack Monitoring.

For agent subprocesses, they are a detail of the Agent internals. If we want to leak this information we should be careful and avoid spreading it to Stack Monitoring because it will confuse users that didn’t explicitly set up beats and see them in Stack Monitoring (Agents are not shown in SM). I suggest keeping that information encapsulated within the Elastic-Agent package and creating Dashboards that monitor the processes. To ease discoverability of these dashboards, Stack Monitoring could detect whether metrics are collected by agents and offer a quick link to the relevant Dashboards.
@cmacknz is that reasonable path forward ?

APM running under agent is trickier because the internals are already exposed, one is directly configuring the server when adding the integration and it’s specified in the documentation[1]. The server is also unlikely to be replaced/refactored like other beats. No strong opinion here we could read from elastic_agent.apm* in Stack Monitoring UI, but this could be the opportunity to already invest in the future monitoring approach by directly building these dashboards in package, possibly in the APM package since it already exists and is a prerequisite to using APM in agent.
@simitt do you think we should already redirect users to Dashboards instead of SM when using apm integration ?

[1]
Screenshot 2022-11-21 at 18 54 16

@cmacknz
Copy link
Member

cmacknz commented Nov 23, 2022

I suggest keeping that information encapsulated within the Elastic-Agent package and creating Dashboards that monitor the processes. To ease discoverability of these dashboards, Stack Monitoring could detect whether metrics are collected by agents and offer a quick link to the relevant Dashboards.
@cmacknz is that reasonable path forward ?

I think this makes sense, we really don't want to expose the implementation details of the agent for monitoring. We are starting to think more about how improve the existing agent monitoring dashboards within Fleet that we should avoid duplicating. @joshdover or @kpollich likely have some valuable input here on how to tie this into stack monitoring for agent.

@klacabane klacabane added v8.7.0 and removed v8.6.0 labels Nov 24, 2022
@simitt
Copy link
Contributor

simitt commented Dec 1, 2022

@klacabane we haven't yet decided how to move forward with this topic, but we have internal conversations on whether to build a dedicated UI or dashboards.

@klacabane
Copy link
Contributor Author

Thanks! So we need a way to monitor apm-server under agent until an alternative exists. If we want to read from metrics-elastic_agent.apm-server-* we'll have to carry over the legacy SM aliases to the elastic-agent mappings. Can an apm-server subprocess expose its metrics endpoints so it can be monitored like a standalone one ? That's maybe how elastic-agent itself collect these metrics and could be the temporary alternative

@simitt
Copy link
Contributor

simitt commented Dec 13, 2022

Can an apm-server subprocess expose its metrics endpoints so it can be monitored like a standalone one ?

Yes, that's how it is set up on ESS. But the configuration has to be passed down from the Elastic Agent (agent.monitoring.http.enabled: true), also see relevant docs.

@klacabane
Copy link
Contributor Author

klacabane commented Dec 13, 2022

Would that be an acceptable solution for users ? My concern is polluting the elastic-agent mapping with Stack monitoring-related aliases and the maintenance that goes with it.

The beats package will be available starting for 8.7.0 and one could follow the same steps as described in the docs but for the agent package instead of metricbeat module.

@simitt
Copy link
Contributor

simitt commented Dec 23, 2022

Let's include the @joshdover and @cmacknz for this question. From an APM perspective, I think it is fine to use a configuration option for exposing a metrics endpoint, but not certain if that is aligned with the elastic-agent team's vision.

@joshdover
Copy link
Contributor

Braindump of opinions:

  • Any process that is run by agent should have it's monitoring orchestrated by the agent automatically if agent.monitoring.metrics: true is enabled. No other configuration should be required by the user, except installing the elastic_agent integration package.
  • I don't see a strong use case for using Agent to monitor standalone APM Server. If you're running standalone APM Server and want to monitor it, you should either use Beats or switch to Elastic Agent to run APM Server and it's monitoring.
  • I don't think we should couple any monitoring data that Agent collects to the Stack Monitoring UI. We have an opportunity here to reduce our the matrix of combinations and focus only on the future vision of monitoring our products, based on the Platform Observability design of using OLTP + integration dashboards.
  • Given the above, if APM wants to have APM-specific UI views for Agent-managed APM Server, we should add dashboards for this to the elastic_agent package. If Agent is not collecting the right metrics for APM Server, we need to make changes to Agent to configure APM Server and Metricbeat correctly to ingest these.

@klacabane
Copy link
Contributor Author

klacabane commented Jan 4, 2023

Thanks @joshdover

Looks like we're aligned on defining clear boundaries between Agent and Stack monitoring.
Summarizing the monitoring scenarios:

  • standalone beats and apm-server are monitored by the Beat package. While I agree that there is no strong use case for standalone apm-server <- agent monitoring, this comes out of the box since apm-server shares the same internals with beats processes. We don't need to document this possibility. The monitoring data will be surfaced in Stack Monitoring app.
  • elastic-agent beats subprocesses are monitored by the elastic-agent package. The metrics are surfaced in the [Elastic Agent] Agent metrics dashboard.
  • elastic-agent apm-server subprocesses are also monitored by the Agent metrics dashboard but the analysis is less thorough than the Stack Monitoring dashboards (eg processed events breakdown, uptime..). We should aim at building these additional dashboards in packages. These dashboards could be included in the elastic-agent package, but could also be included in the apm package since it's a prerequisite to spawning a functioning server. Either way both packages will be installed so that's maybe an ownership question

With these scenarios, no agent-produced data (metrics-elastic_agent*) will be surfaced in Stack Monitoring but instead directly consumed in packages.
@cmacknz @simitt does that sound good ?

@cmacknz
Copy link
Member

cmacknz commented Jan 4, 2023

👍 from me

@simitt
Copy link
Contributor

simitt commented Jan 5, 2023

SGTM; would prefer the dashboards being part of the apm package than the elastic-agent package

@joshdover
Copy link
Contributor

SGTM. @simitt yes let's put them in the apm package to clarify ownership. One caveat is that the data streams are currently defined in elastic_agent package and we're currently planning some changes to the structure of these data streams. See discussion in elastic/elastic-agent#1814

This just means that we'll need to coordinate changes in across these two packages for this change and any future ones.

@klacabane
Copy link
Contributor Author

Are both apm and elastic-agent packages stack-aligned ? Syncing the changes would be a difficult feat otherwise

@simitt
Copy link
Contributor

simitt commented Jan 11, 2023

APM packages are bundled with Kibana, and therefore aligned with the Kibana version.

@klacabane
Copy link
Contributor Author

klacabane commented Jan 12, 2023

Because apm would rely on data streams defined in elastic-agent package (which does not appear to be stack bound), I'm concerned of breaking changes going uncaught:

  1. apm package builds a dashboard that relies on metrics-elastic_agent.apm-server.*
  2. elastic-agent package ships breaking change to the mappings/stream
  3. apm dashboard are broken for users that updated the elastic-agent package

I think these concerns are mostly mitigated by the coupling of the data streams to the elastic-agent internals, so those would only happen when shipping a new stack version with a long enough window to catch potential issues. Coordination would also cover this, but I'm wondering how we could automatically detect such breakage. Maybe a CI step that installs both packages and run assertions on the dashboard ? Do we have such capability at the moment ?

@klacabane
Copy link
Contributor Author

Closing this as the initial discussion came to a conclusion and the dependency concern between packages is out of scope. Ownership can be discussed when planning for new dashboards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Stack Monitoring Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v8.7.0
Projects
None yet
Development

No branches or pull requests

5 participants