Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add specs for host.id and profiler registration message #853

Merged
merged 5 commits into from
Mar 27, 2024

Conversation

JonasKunz
Copy link
Contributor

@JonasKunz JonasKunz commented Mar 25, 2024

The UI for profiling data relies on the host.id field for certain calculations, including the cost and CO2-consumption estimates. In order to correctly provide those values for profiling data stored on transaction documents, those need to have the host.id too.

Therefore this spec adds the host.id as an optional, new metadata field for IntakeV2. The field is only required to be populated when an agent actually is correlating transactions with profiling data.

Because the otel-compliant host.id can sometimes only be derived correctly with root permissions, this spec change also adds a way of receiving the host.id directly from the profiling host agent.

  • Create PR as draft
  • Approval by at least one other agent
  • Mark as Ready for Review (automatically requests reviews from all agents and PM via CODEOWNERS)
    • Remove PM from reviewers if impact on product is negligible
    • Remove agents from reviewers if the change is not relevant for them
  • Merge after 2 business days passed without objections
    To auto-merge the PR, add /schedule YYYY-MM-DD to the PR description.

@@ -75,6 +76,13 @@ hostname if `configured_hostname` is not provided.
Agents that are APM-Server-version-aware, or that are compatible only with versions >= 7.4, should
use the new fields wherever applicable.

#### Host.id

Agents MAY collect the `host.id` as an unique identifier for the host.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we have two agents involved here (the APM agent and the profiling host agent) we should always qualify which agent we actually mean. Also there are inconsistencies in terminology throughout the document. We call the APM agent:

  • APM agent
  • APM-agent
  • agent

and the profiling host agent:

  • profiling host agent
  • profiler host agent
  • host agent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've clarified the "agent" usages in this section.
For the rest of the document, the inconsistencies come from the fact that this document was originally intended as only an APM-agent spec. This metadata spec is also becoming less important as we move to Otel SemConv, which effectively will replace this spec.

I'm just adding the host.id here, because we intend to also add the universal profiling integration to the old elastic-apm-agent java to allow easier adoption for existing users.

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM

@gregkalapos gregkalapos self-requested a review March 26, 2024 15:11
Copy link
Contributor

@gregkalapos gregkalapos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the agents weekly: host.id as a new metadata field on intake lgtm.

@@ -83,6 +83,8 @@ transaction-id | uint8[8]
* *span-id*: The W3C trace id of the currently active span
* *transaction-id*: The W3C span id of the currently active transaction (=the local root span)

APM-agents MAY start populating the thread-local storage only after receiving a host agent [registration message](#profiler-registration-message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Do we have use-cases where the agent should populate this TLS without waiting for the registration message ? If so, then a "SHOULD" sounds more appropriate as updating this TLS seems useless if the profiler is not available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use case would be if you collect traces for the application startup and what to have profiling data for those at the very beginning:
There will be a short delay between application startup and receiving this initial registration message. As a result, spans started and activated before that receival would not get correlated.
This can be overcome for those edge cases by eagerly populating the TLS, even if it is not known whether a profiler is already there.

I'm planning to implement this by having a tri-state enabled config option:

  • false: No correlation, the native lib won't even be loaded
  • true: Correlation active, TLS will be eagerly populated
  • auto (default for OTel-extension): Correlation will be active, TLS will only be populated after receiving the profiler registration message (So basically close to zero overhead if no profiler is active)

@@ -188,6 +203,6 @@ stack-trace-id | uint8[16]
count | uint16

* *trace-id*: The APM W3C trace id of the trace which was active for the given profiling samples
* *trace-id*: The APM W3C transaction id of the transaction which was active for the given profiling samples
* *transaction-id*: The APM W3C transaction id of the transaction which was active for the given profiling samples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the W3C spec contains anything about the "transactions" ? From what I recall it's only about tracestate and traceparent (with a trace-id and parent-id as fields) (ref) but I might definitely have missed something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops, this should be The APM W3C span id of the transaction

JonasKunz and others added 2 commits March 27, 2024 10:02
@JonasKunz JonasKunz marked this pull request as ready for review March 27, 2024 10:03
@JonasKunz JonasKunz requested review from a team as code owners March 27, 2024 10:03
@JonasKunz JonasKunz removed the request for review from a team March 27, 2024 10:03
Copy link
Contributor

@athre0z athre0z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@JonasKunz JonasKunz merged commit 6bd4119 into elastic:main Mar 27, 2024
4 checks passed
@JonasKunz JonasKunz deleted the host-id-metadata branch March 27, 2024 12:47
@trentm
Copy link
Member

trentm commented Apr 5, 2024

As mentioned at elastic/beats#38689 (comment) using host.id for host correlation is probably the best path for future host correlation. Should we get a meta issue so agents other than the Java agent start adding host.id?

@JonasKunz
Copy link
Contributor Author

The agent also won't always collect the host.id, it only does so if the profiler is active and the profiler provides the host.id. The problem with host.id is that sometimes it might not be possible to discover the host.id without root permissions, at least accoding to the Otel SemConv specs.

@xrmx
Copy link
Member

xrmx commented Apr 8, 2024

The agent also won't always collect the host.id, it only does so if the profiler is active and the profiler provides the host.id.

That's not what I get from the specs. If we shouldn't provide it ourselves without the profiler it would be nice to write it explicitly.

@JonasKunz
Copy link
Contributor Author

The agent also won't always collect the host.id, it only does so if the profiler is active and the profiler provides the host.id.

Sorry, I was referring to the planned implementation details of the java-agent.

The current wording is

APM agents MAY collect the host.id as an unique identifier for the host.

This means agents are not required to do so, but might if they want to. It should be changed to MUST if we desire to force all agents to provide the host.id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants