-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add specs for host.id and profiler registration message #853
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -83,6 +83,8 @@ transaction-id | uint8[8] | |
* *span-id*: The W3C trace id of the currently active span | ||
* *transaction-id*: The W3C span id of the currently active transaction (=the local root span) | ||
|
||
APM-agents MAY start populating the thread-local storage only after receiving a host agent [registration message](#profiler-registration-message) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [nit] Do we have use-cases where the agent should populate this TLS without waiting for the registration message ? If so, then a "SHOULD" sounds more appropriate as updating this TLS seems useless if the profiler is not available. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The use case would be if you collect traces for the application startup and what to have profiling data for those at the very beginning: I'm planning to implement this by having a tri-state
|
||
|
||
### Concurrency-safe Updates | ||
|
||
The profiler might interrupt a thread and take a profiling sample while that thread is in the process of updating the contents of the shared thread local storage. Fortunately, we have the following guarantees about this interruption: | ||
|
@@ -136,10 +138,6 @@ And here how to read a messages in a non-blocking way: | |
|
||
```c | ||
size_t readProfilerSocketMessages(uint8_t* outputBuffer, size_t bufferSize) { | ||
if(profilerSocket == -1) { | ||
return raiseExceptionAndReturn(jniEnv, -1, "No profiler socket active!"); | ||
} | ||
|
||
int n = recv(profilerSocket, outputBuffer, bufferSize, 0); | ||
if (n == -1) { | ||
if(errno == EAGAIN || errno == EWOULDBLOCK) { | ||
|
@@ -173,6 +171,23 @@ All messages have the following layout: | |
* *message-type* : An ID uniquely identifying the type (and therefore payload structure) of the message. | ||
* *minor-version* : The version number for the given *message-type*. This value is incremented when new fields are added to the payload while preserving the *message-type* (non breaking changes). For breaking changes a new *message-type* must be used. | ||
|
||
## Profiler Registration Message | ||
|
||
Whenever the profiling host agent starts communicating for the first time with a process running an APM Agent, it MUST send this message. | ||
SylvainJuge marked this conversation as resolved.
Show resolved
Hide resolved
|
||
This message is used to let the APM-agent know that a profiler is actually active on the current host. Note that that an APM-agent may receive this message zero, one or several times: This may happen if no host agent is active, if one is active or if a host agent is restarted during the lifetime of the APM-agent respectively. | ||
JonasKunz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The *message-type* is `2` and the current *minor-version* is `1`. | ||
|
||
The payload layout is as follows: | ||
Name | Data type | ||
--------------------- | ------------- | ||
samples-delay-ms | uint32 | ||
host-id | utf8-str | ||
|
||
* *samples-delay-ms*: A sane upper bound of the usual time taken in milliseconds by the profiling host agent between the collection of a stacktrace and it being written to the apm-agent via the [messaging socket](#cpu-profiler-trace-correlation-message). The APM-agent will assume that all profiling data related to a span has been written to the socket if a span ended at least the provided duration ago. Note that this value doesn't need to be a hard a guarantee, but it should be the 99% case so that profiling data isn't distorted in the expected case. | ||
* *host-id*: The [`host.id` resource attribute](https://opentelemetry.io/docs/specs/semconv/attributes-registry/host/) used for the profiling data by this profiling host agent. If an APM-agent is already sending a `host.id` it SHOULD print a warning if the `host.id` is different and otherwise ignore the value received by the host agent. If an agent does not collect the `host.id` by itself, it MUST start sending the `host.id` after receiving it from the profiler host agent to ensure certain correlation features (e.g. cost and CO2 consumption) work correctly. | ||
JonasKunz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
## CPU Profiler Trace Correlation Message | ||
|
||
Whenever the profiler is able to correlate a taken CPU stacktrace sample with an APM trace (see [this section](#thread-local-storage-layout)). It sends the ID of the stacktrace back to the APM agent. | ||
|
@@ -188,6 +203,6 @@ stack-trace-id | uint8[16] | |
count | uint16 | ||
|
||
* *trace-id*: The APM W3C trace id of the trace which was active for the given profiling samples | ||
* *trace-id*: The APM W3C transaction id of the transaction which was active for the given profiling samples | ||
* *transaction-id*: The APM W3C transaction id of the transaction which was active for the given profiling samples | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does the W3C spec contains anything about the "transactions" ? From what I recall it's only about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ooops, this should be |
||
* *stack-trace-id*: The unique ID for the stacktrace captured assigned by the profiler. This ID is stored in elasticsearch in base64 URL safe encoding by the universal profiling solution. | ||
* The number of samples observed since the last report for the (*trace-id*, *transaction-id*, *stack-trace-id*) combination. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we have two agents involved here (the APM agent and the profiling host agent) we should always qualify which agent we actually mean. Also there are inconsistencies in terminology throughout the document. We call the APM agent:
and the profiling host agent:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've clarified the "agent" usages in this section.
For the rest of the document, the inconsistencies come from the fact that this document was originally intended as only an APM-agent spec. This metadata spec is also becoming less important as we move to Otel SemConv, which effectively will replace this spec.
I'm just adding the
host.id
here, because we intend to also add the universal profiling integration to the oldelastic-apm-agent java
to allow easier adoption for existing users.