Agent builder exporter#265290
Conversation
d744ec0 to
b19fad7
Compare
b19fad7 to
2d834b8
Compare
trentm
left a comment
There was a problem hiding this comment.
@machadoum From only reading the code (I haven't run this) I think this looks good.
| import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-proto'; | ||
| import { isInInferenceContext } from '../is_in_inference_context'; | ||
|
|
||
| const SHOULD_TRACK_ATTR = '_ab_should_track'; |
There was a problem hiding this comment.
This span attribute is dropped before export for this span processor.
Note that the other span processors (and their exported spans) will have this span attribute.
That's probably fine, but will you want a more self-explanatory attribute name?
Also perhaps name using dot-separators as is more common in OTel usage.
Is there a naming pattern from other span attributes added by other in-Kibana self-instrumentation?
If this attribute really isn't wanted, then you would need something like a custom SpanProcessor that wrapped all the other processors added via LateBindingSpanProcessor.get().register(...) to have them drop the attribute.
There was a problem hiding this comment.
That makes sense. I shamelessly copied this code from the base_inference_span_processor.ts.
Could we address it on another PR, since it already affects other span processors.
There was a problem hiding this comment.
I'm fine with it. But I'm not a privileged reviewer in this repo. :)
trentm
left a comment
There was a problem hiding this comment.
The OTel Node.js usage looks sane to me.
| "@opentelemetry/instrumentation-http": "0.214.0", | ||
| "@opentelemetry/instrumentation-undici": "0.24.0", | ||
| "@opentelemetry/otlp-exporter-base": "0.214.0", | ||
| "@opentelemetry/otlp-transformer": "0.214.0", |
There was a problem hiding this comment.
There's a note that this dependency is intended for internal use only, https://www.npmjs.com/package/@opentelemetry/otlp-transformer.
Might be worth probing out whether this is an actual risk or we can expect this to stabilize?
There was a problem hiding this comment.
@trentm Do you think that using otlp-transformer for serializing the spam is a risk?
const serialized = ProtobufTraceSerializer.serializeRequest(spans);
There is no public alternative — OTel JS provides no other way to serialize ReadableSpan[] into OTLP protobuf/JSON. The only options are: use this package, or hand-roll protobuf serialization. Which sounds like a much worse alternative. And very OTLP exporter in the OTel JS ecosystem depends on it — exporter-trace-otlp-proto, exporter-trace-otlp-http, exporter-logs-otlp-grpc, otlp-exporter-base, sdk-node.
There was a problem hiding this comment.
OTel JS is just not fully at 1.x (aka "stable") yet. One component that isn't yet "stable" is its OTLP exporters (of which the otlp-transformer package is a part).
No, I don't think having otlp-transformer as a transitive dep is a "risk".
These packages are what every user of OTel JS is using to export OTLP data.
However, why was this dep explicitly added? I don't see it used explicitly anywhere.
I see export { ElasticsearchOtlpExporter } from './src/elasticsearch_otlp_exporter'; in the PR. Is there a new "elasticsearch_otlp_exporter.ts" file that hasn't been added to this PR?
Note that the require buildkite check is failing.
There was a problem hiding this comment.
However, why was this dep explicitly added? I don't see it used explicitly anywhere.
Sorry hooky mistake. I failed to push the file after it was moved during a code review improvement. Sean also caught the same problem here: #265290 (comment)
If you refresh the page it will be there.
There was a problem hiding this comment.
Okay, I see.
As @SrdjanLL points out, otlp-transformer is primarily written as an internal package that is shared between OTel JS's OTLP exporters for the different signals (traces, metrics, logs). Its interface won't change without a noted breaking change. Because it is still 0.x, per semver, that "breaking" change version will be a new minor. So, for example, 0.214.0 -> 0.215.0 will potentially be breaking and you'd need to watch for that. Type changes, if any, would most likely automatically point out a breaking issue.
dmlemeshko
left a comment
There was a problem hiding this comment.
x-pack/platform/test/tsconfig.json changes LGTM
elena-shostak
left a comment
There was a problem hiding this comment.
@opentelemetry/otlp-transformer dependency LGTM
|
|
||
| /** | ||
| * A {@link tracing.SpanExporter} that ships OTLP-protobuf encoded spans | ||
| * to Elasticsearch's native `/_otlp/v1/traces` endpoint via the |
There was a problem hiding this comment.
I knew about /_otlp/v1/metrics. Is traces now supported as well?
There was a problem hiding this comment.
Yes, it was recently merged together with logs elastic/elasticsearch#147811
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Public APIs missing comments
Unknown metric groupsAPI count
History
cc @machadoum |
Depends on elastic/elasticsearch#147811 Resolves elastic/search-team#14191 Resolves elastic/search-team#14190 ## Summary This PR adds a dedicated OpenTelemetry trace export path for **Agent Builder inference spans**, so they can land in Elasticsearch under a distinct dataset (`agent_builder`) while reusing Kibana’s Elasticsearch connection (auth, TLS, transport). Generic tracing can remain sampled down without silently dropping inference work: inference spans identified via `kibana.inference.tracing` baggage are preserved through sampling so downstream processors can export them on a copy. **Why:** Agent Builder observability needs reliable inference-span export and routing into its own traces data stream, aligned with Elasticsearch’s native OTLP traces ingestion (`/_otlp/v1/traces` from elastic/elasticsearch#147811). ## Architecture - **`@kbn/tracing` — `InferencePreservingSampler`** wraps the existing `ParentBasedSampler` in `init_tracing.ts`. Non-inference spans pass through unchanged. Inference spans upgrade `NOT_RECORD` to `RECORD` (without forcing `SAMPLED`) so domain processors can clone and set `SAMPLED` for their pipeline. - **`@kbn/inference-tracing` — `ElasticsearchOtlpExporter`** serializes spans with `@opentelemetry/otlp-transformer` and POSTs OTLP-protobuf to ES `/_otlp/v1/traces` via the ES client transport (same connection settings as Kibana). - **`should_track_span.ts` / `isInferenceSpan()`** extracts “should track” logic from `BaseInferenceSpanProcessor.onStart`; shared by `BaseInferenceSpanProcessor` and Agent Builder. - **`agent_builder` — `AgentBuilderSpanProcessor`** copies eligible spans, forces `SAMPLED` on the copy for export, adds `data_stream.dataset: agent_builder`, and feeds a `BatchSpanProcessor`. Enabled state comes from an LRU-backed saved-objects check against `AGENT_BUILDER_EXPERIMENTAL_FEATURES_SETTING_ID`. - **`register_tracing.ts`** chooses **`OTLPTraceExporter`** when a trace URL is configured, otherwise **`ElasticsearchOtlpExporter`**; registers via `LateBindingSpanProcessor.register()`. - **Lifecycle:** exporter registration in `plugin.ts` `start()`, async teardown in `stop()`. ```mermaid flowchart LR subgraph tracing["Global tracing"] S["InferencePreservingSampler"] P["Span processors"] end subgraph ab["Agent Builder"] AB["AgentBuilderSpanProcessor"] BSP["BatchSpanProcessor"] E["OTLP URL exporter OR ElasticsearchOtlpExporter"] end S --> P P --> AB AB --> BSP --> E --> ES["Elasticsearch traces"] ``` ### Package exports - **`@kbn/tracing`:** `InferencePreservingSampler` (wired in `init_tracing.ts`). - **`@kbn/inference-tracing`:** `ElasticsearchOtlpExporter`, `isInferenceSpan` / `should_track_span` helpers (via new module), existing processors updated to use shared inference detection. ## How to test 1. Make sure your ES instance has the otlp endpoint enabled elastic/elasticsearch#147811 2. Enable agent builder experimental setting `agentBuilder:experimentalFeatures` 3. Enable Kibana tracing `telemetry.tracing.enabled: false` 4. Run a query through Agent Builder. 5. Confirm Agent Builder spans exist under `.ds-traces-agent_builder*`. If the evals plugin is enabled `xpack.evals.enabled: true` you will see a view traces button in agent builder the reasoning panel. <img width="787" height="277" alt="Screenshot 2026-04-30 at 11 02 04" src="https://github.com/user-attachments/assets/9a891a63-436c-4302-af48-3027406c8d1f" /> <img width="624" height="805" alt="Screenshot 2026-04-30 at 11 02 06" src="https://github.com/user-attachments/assets/d1493730-8ef4-4c66-a50d-8237a24d0180" /> ### Checklist Reviewers should verify this PR satisfies this list as well. - [ ] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) _(no product UI strings in this PR — server tracing/config only)_ - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios **(not in this PR yet — follow-up)** - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) **(`agent_builder.tracing.*` added — cloud/docker follow-up required before merge)** - [x] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. _(no breaking public HTTP API changes)_ - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed _(no tests changed in this PR yet)_ - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [x] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels. **`backport:skip`** — new feature; no backport planned via this PR. ### Third-party Dependency **Purpose:** Serializes ReadableSpan[] into OTLP-protobuf binary format (ProtobufTraceSerializer.serializeRequest()) so the ElasticsearchOtlpExporter can POST spans to ES's /_otlp/v1/traces via the ES client transport — no separate OTLP collector needed. **Justification:** The existing OTLP exporters (exporter-trace-otlp-proto) bundle their own HTTP transport and can't route through the ES client. We need the serialization layer standalone to reuse Kibana's ES connection (auth, TLS). **Alternatives explored:** * Use exporter-trace-otlp-proto directly: Can't — it owns its HTTP connection and can't use the ES client transport. We do use it for the external OTLP URL path; the ES path needs standalone serialization. * Implement serialization manually: OTLP-protobuf encoding is non-trivial (protobuf schema, resource/scope/span mapping, attribute encoding). Fragile and would drift from the spec. **Existing dependencies:** Already a direct dep in root package.json (0.214.0) and resolved in yarn.lock (3 versions). Transitively pulled by exporter-trace-otlp-proto, exporter-trace-otlp-http, exporter-logs-otlp-*, otlp-exporter-base, and sdk-node. No new package enters node_modules — this PR just adds a direct import from kbn-inference-tracing. ### Identify risks - [x] [See some risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) | Risk | Severity | Mitigation | | --- | --- | --- | | **Global sampling interaction** — `InferencePreservingSampler` changes when inference spans are recorded vs dropped relative to parent-based sampling. | Medium | Scoped to baggage-marked inference spans; non-inference spans unchanged. Review trace volume and cardinality in staging; validate alongside inference and platform tracing owners. | | **Elasticsearch OTLP dependency** — native `/_otlp/v1/traces` must be available and compatible for the fallback exporter path; misconfiguration could mean lost or failed exports. | Medium | Depends on elastic/elasticsearch#147811; test both OTLP URL and ES-transport paths; monitor exporter errors and ES responses. | --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Depends on elastic/elasticsearch#147811
Resolves https://github.com/elastic/search-team/issues/14191
Resolves https://github.com/elastic/search-team/issues/14190
Summary
This PR adds a dedicated OpenTelemetry trace export path for Agent Builder inference spans, so they can land in Elasticsearch under a distinct dataset (
agent_builder) while reusing Kibana’s Elasticsearch connection (auth, TLS, transport). Generic tracing can remain sampled down without silently dropping inference work: inference spans identified viakibana.inference.tracingbaggage are preserved through sampling so downstream processors can export them on a copy.Why: Agent Builder observability needs reliable inference-span export and routing into its own traces data stream, aligned with Elasticsearch’s native OTLP traces ingestion (
/_otlp/v1/tracesfrom elastic/elasticsearch#147811).Architecture
@kbn/tracing—InferencePreservingSamplerwraps the existingParentBasedSamplerininit_tracing.ts. Non-inference spans pass through unchanged. Inference spans upgradeNOT_RECORDtoRECORD(without forcingSAMPLED) so domain processors can clone and setSAMPLEDfor their pipeline.@kbn/inference-tracing—ElasticsearchOtlpExporterserializes spans with@opentelemetry/otlp-transformerand POSTs OTLP-protobuf to ES/_otlp/v1/tracesvia the ES client transport (same connection settings as Kibana).should_track_span.ts/isInferenceSpan()extracts “should track” logic fromBaseInferenceSpanProcessor.onStart; shared byBaseInferenceSpanProcessorand Agent Builder.agent_builder—AgentBuilderSpanProcessorcopies eligible spans, forcesSAMPLEDon the copy for export, addsdata_stream.dataset: agent_builder, and feeds aBatchSpanProcessor. Enabled state comes from an LRU-backed saved-objects check againstAGENT_BUILDER_EXPERIMENTAL_FEATURES_SETTING_ID.register_tracing.tschoosesOTLPTraceExporterwhen a trace URL is configured, otherwiseElasticsearchOtlpExporter; registers viaLateBindingSpanProcessor.register().plugin.tsstart(), async teardown instop().flowchart LR subgraph tracing["Global tracing"] S["InferencePreservingSampler"] P["Span processors"] end subgraph ab["Agent Builder"] AB["AgentBuilderSpanProcessor"] BSP["BatchSpanProcessor"] E["OTLP URL exporter OR ElasticsearchOtlpExporter"] end S --> P P --> AB AB --> BSP --> E --> ES["Elasticsearch traces"]Package exports
@kbn/tracing:InferencePreservingSampler(wired ininit_tracing.ts).@kbn/inference-tracing:ElasticsearchOtlpExporter,isInferenceSpan/should_track_spanhelpers (via new module), existing processors updated to use shared inference detection.How to test
agentBuilder:experimentalFeaturestelemetry.tracing.enabled: false.ds-traces-agent_builder*.If the evals plugin is enabled
xpack.evals.enabled: trueyou will see a view traces button in agent builder the reasoning panel.Checklist
Reviewers should verify this PR satisfies this list as well.
agent_builder.tracing.*added — cloud/docker follow-up required before merge)release_note:breakinglabel should be applied in these situations. (no breaking public HTTP API changes)release_note:*label is applied per the guidelinesbackport:*labels.backport:skip— new feature; no backport planned via this PR.Third-party Dependency
Purpose: Serializes ReadableSpan[] into OTLP-protobuf binary format (ProtobufTraceSerializer.serializeRequest()) so the ElasticsearchOtlpExporter can POST spans to ES's /_otlp/v1/traces via the ES client transport — no separate OTLP collector needed.
Justification: The existing OTLP exporters (exporter-trace-otlp-proto) bundle their own HTTP transport and can't route through the ES client. We need the serialization layer standalone to reuse Kibana's ES connection (auth, TLS).
Alternatives explored:
Existing dependencies: Already a direct dep in root package.json (0.214.0) and resolved in yarn.lock (3 versions). Transitively pulled by exporter-trace-otlp-proto, exporter-trace-otlp-http, exporter-logs-otlp-*, otlp-exporter-base, and sdk-node. No new package enters node_modules — this PR just adds a direct import from kbn-inference-tracing.
Identify risks
InferencePreservingSamplerchanges when inference spans are recorded vs dropped relative to parent-based sampling./_otlp/v1/tracesmust be available and compatible for the fallback exporter path; misconfiguration could mean lost or failed exports.