Skip to content

Implement OTLP traces indexing#147453

Merged
felixbarny merged 5 commits into
elastic:mainfrom
felixbarny:otlp-traces-impl
Apr 28, 2026
Merged

Implement OTLP traces indexing#147453
felixbarny merged 5 commits into
elastic:mainfrom
felixbarny:otlp-traces-impl

Conversation

@felixbarny
Copy link
Copy Markdown
Member

@felixbarny felixbarny commented Apr 24, 2026

Summary

  • implement OTLP trace ingestion by building span documents from OTLP trace payloads and indexing them into the traces data stream
  • add unit and REST coverage for span document construction and end-to-end OTLP trace indexing
  • keep this PR focused on the core traces endpoint so the remaining OTLP trace work can land in smaller follow-up changes

Follow-ups

  • index OTLP span events as log documents
  • honor elasticsearch.document_id for emitted trace and span event documents

Test plan

  • ./gradlew :x-pack:plugin:otel-data:test --tests org.elasticsearch.xpack.oteldata.otlp.docbuilder.SpanDocumentBuilderTests
  • ./gradlew :x-pack:plugin:otel-data:test --tests org.elasticsearch.xpack.oteldata.otlp.OTLPTracesTransportActionTests
  • ./gradlew :x-pack:plugin:otel-data:javaRestTest --tests org.elasticsearch.xpack.oteldata.otlp.OTLPTracesIndexingRestIT

Related

Add the OTLP traces endpoint, transport action, document builder, and
coverage needed to index trace spans into the built-in traces data
streams.
Emit span events as separate logs documents so OTLP trace ingestion
matches the upstream OTEL mapping and preserves event fields in the
built-in logs data streams.
Use elasticsearch.document_id for both emitted trace and span event
records so OTLP ingestion can keep stable document IDs across
trace and event documents.
@felixbarny felixbarny requested a review from a team as a code owner April 24, 2026 16:38
@felixbarny felixbarny self-assigned this Apr 24, 2026
@elasticsearchmachine elasticsearchmachine added Team:StorageEngine external-contributor Pull request authored by a developer outside the Elasticsearch team labels Apr 24, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@felixbarny
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

Walkthrough

Adds OpenTelemetry span indexing support by extending OTelDocumentBuilder to handle additional AnyValue types, enriching SpanDocumentBuilder to serialize trace/span identifiers, timestamps, duration, links, and status, and introducing corresponding integration and unit tests.

Changes

Cohort / File(s) Summary
OTLP Span Document Building
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/OTelDocumentBuilder.java, x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/SpanDocumentBuilder.java
Enhanced AnyValue type handling (KVLIST_VALUE, BYTES_VALUE, VALUE_NOT_SET) and extended span document construction with @timestamp derivation, trace/span IDs, duration calculation, links serialization, and status mapping.
Integration & Unit Tests
x-pack/plugin/otel-data/src/javaRestTest/java/org/elasticsearch/xpack/oteldata/otlp/OTLPTracesIndexingRestIT.java, x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/SpanDocumentBuilderTests.java
Three new REST integration tests (testBatchSpanIndexing, testSpanWithAttributes, testDataStreamRouting) and comprehensive unit test suite validating span document JSON construction, timestamp handling, and data stream routing.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hops of traces, bytes in flight,
Spans indexed, documents bright!
From OTLP data, wisdom we weave—
Timestamps and links, what wonders we retrieve! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: implementing OTLP traces indexing, which is exactly what the changeset accomplishes across all modified files.
Description check ✅ Passed The description is directly related to the changeset, providing context on OTLP trace ingestion, document building, and test coverage that matches the actual code changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/SpanDocumentBuilder.java (1)

93-102: Cache span.getStatus() locally

span.getStatus() is invoked four times in this method. While protobuf getters are cheap, a local variable improves readability and avoids the repeated accessor calls.

♻️ Proposed refactor
     private void buildStatus(XContentBuilder builder, Span span) throws IOException {
-        boolean hasCode = span.getStatus().getCode() != StatusCode.STATUS_CODE_UNSET;
-        boolean hasMessage = span.getStatus().getMessageBytes().isEmpty() == false;
+        Status status = span.getStatus();
+        boolean hasCode = status.getCode() != StatusCode.STATUS_CODE_UNSET;
+        boolean hasMessage = status.getMessageBytes().isEmpty() == false;
         if (hasCode == false && hasMessage == false) {
             return;
         }
         builder.startObject("status");
         if (hasCode) {
-            builder.field("code", normalizeStatusCode(span.getStatus().getCode()));
+            builder.field("code", normalizeStatusCode(status.getCode()));
         }
-        addFieldIfNotEmpty(builder, "message", span.getStatus().getMessageBytes());
+        addFieldIfNotEmpty(builder, "message", status.getMessageBytes());
         builder.endObject();
     }

(Requires adding import io.opentelemetry.proto.trace.v1.Status;.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/SpanDocumentBuilder.java`
around lines 93 - 102, The code repeatedly calls span.getStatus(); cache it in a
local Status variable (e.g., Status status = span.getStatus()) at the start of
the method/block and replace all subsequent span.getStatus() uses with that
local (affecting the boolean checks, normalizeStatusCode(status.getCode()),
status.getMessageBytes(), and any other usages); add the import
io.opentelemetry.proto.trace.v1.Status if missing to resolve the type.
x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/SpanDocumentBuilderTests.java (1)

47-144: LGTM — solid coverage

Nice thorough assertions including hex encoding for IDs, nested KVLIST as a Map, and Base64 for byte payloads. Two optional coverage gaps worth considering:

  • Status with only message and unset code (see related comment on SpanDocumentBuilder.buildStatus).
  • Both startTimeUnixNano and endTimeUnixNano equal to 0 (documents the @timestamp: 0 behavior).

Not blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/SpanDocumentBuilderTests.java`
around lines 47 - 144, Add tests that cover the two noted edge-cases: 1) a Span
with Status containing only a message and no code to exercise
SpanDocumentBuilder.buildStatus — create a span with
Status.newBuilder().setMessage("msg").build() and assert the produced document's
"status.message" equals "msg" and "status.code" is null/absent; 2) a Span with
both startTimeUnixNano and endTimeUnixNano set to 0 to exercise timestamp
fallback logic in buildDocument — build such a span and assert "@timestamp" is
0, "duration" is null, and other related fields behave as expected. Use the
existing test class and helper methods (buildDocument,
SpanDocumentBuilder.buildStatus) to place these cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/SpanDocumentBuilder.java`:
- Around line 44-48: The code in SpanDocumentBuilder sets `@timestamp` to 0 when
both span.getStartTimeUnixNano() and span.getEndTimeUnixNano() are zero; change
this to validate and reject such malformed spans instead of indexing epoch 0: in
SpanDocumentBuilder (the method that reads
span.getStartTimeUnixNano()/span.getEndTimeUnixNano() before calling
builder.field("@timestamp", ...)) add a guard that returns an error/throws an
IllegalArgumentException or signals the caller to drop the span when both times
are zero, and add a unit test that submits a span with both times unset to
assert the span is rejected (or handled according to the chosen failure path)
rather than producing `@timestamp`: 0.
- Around line 92-104: In buildStatus, avoid emitting a status object that
contains only a message; ensure the status block always includes a normalized
code when written. Update the buildStatus(Span) logic so that if you decide to
write the "status" object (currently guarded by hasCode || hasMessage) you
always call builder.field("code",
normalizeStatusCode(span.getStatus().getCode())) before adding the message (use
addFieldIfNotEmpty for the message), or alternatively gate emitting the entire
status block on hasCode alone; modify the buildStatus method accordingly
(references: buildStatus, normalizeStatusCode, addFieldIfNotEmpty).

---

Nitpick comments:
In
`@x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/SpanDocumentBuilder.java`:
- Around line 93-102: The code repeatedly calls span.getStatus(); cache it in a
local Status variable (e.g., Status status = span.getStatus()) at the start of
the method/block and replace all subsequent span.getStatus() uses with that
local (affecting the boolean checks, normalizeStatusCode(status.getCode()),
status.getMessageBytes(), and any other usages); add the import
io.opentelemetry.proto.trace.v1.Status if missing to resolve the type.

In
`@x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/SpanDocumentBuilderTests.java`:
- Around line 47-144: Add tests that cover the two noted edge-cases: 1) a Span
with Status containing only a message and no code to exercise
SpanDocumentBuilder.buildStatus — create a span with
Status.newBuilder().setMessage("msg").build() and assert the produced document's
"status.message" equals "msg" and "status.code" is null/absent; 2) a Span with
both startTimeUnixNano and endTimeUnixNano set to 0 to exercise timestamp
fallback logic in buildDocument — build such a span and assert "@timestamp" is
0, "duration" is null, and other related fields behave as expected. Use the
existing test class and helper methods (buildDocument,
SpanDocumentBuilder.buildStatus) to place these cases.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Enterprise

Run ID: 749a73aa-d3e7-42dc-b610-5341b09f8a1d

📥 Commits

Reviewing files that changed from the base of the PR and between d89f350 and 0ec29bd.

📒 Files selected for processing (4)
  • x-pack/plugin/otel-data/src/javaRestTest/java/org/elasticsearch/xpack/oteldata/otlp/OTLPTracesIndexingRestIT.java
  • x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/OTelDocumentBuilder.java
  • x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/SpanDocumentBuilder.java
  • x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/SpanDocumentBuilderTests.java

case KVLIST_VALUE -> {
builder.startObject();
List<KeyValue> kvList = value.getKvlistValue().getValuesList();
for (int i = 0, kvListSize = kvList.size(); i < kvListSize; i++) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: why not

for (KeyValue kv: value.getKvlistValue().getValuesList()) {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It avoids having to allocate an iterator. But the JIT will most likely replace with a stack allocation anyway. Both works. The more practical reason is that it follows the precedent of ARRAY_VALUE

builder.startObject();
long timestamp = span.getStartTimeUnixNano();
if (timestamp == 0) {
timestamp = span.getEndTimeUnixNano();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok if both are missing?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout. The OTLP proto marks span start/end timestamps as semantically required, but proto3 can still carry zeroes. I kept the current tolerant behavior so a malformed span does not fail the whole ingest request, added a comment, and added a test documenting the @timestamp: 0 / no-duration behavior. If we want stricter handling, I think it should be an explicit partial-success/drop path rather than throwing from the document builder.

if (links.isEmpty()) {
return;
}
builder.startArray("links");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we want an array even if it contains a single element?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we want a stable array shape here. A single link is still semantically part of the span’s links collection.

private void buildStatus(XContentBuilder builder, Span span) throws IOException {
boolean hasCode = span.getStatus().getCode() != StatusCode.STATUS_CODE_UNSET;
boolean hasMessage = span.getStatus().getMessageBytes().isEmpty() == false;
if (hasCode == false && hasMessage == false) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs test coverage.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added coverage for a status with only a message and no code. buildStatus now skips message-only unset statuses, so any emitted status object includes a normalized code.

return ObjectPath.createFromXContent(JsonXContent.jsonXContent, BytesReference.bytes(builder));
}

private static ByteString byteString(int length, int start) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe byteStringFromNumber? Also start is not very appropriate..

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed this to randomHexByteString and changed the implementation to use random bytes, so the helper now matches the name better.

Copy link
Copy Markdown
Member

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nits.

felixbarny and others added 2 commits April 28, 2026 09:29
Normalize status documents to include codes, document timestamp edge cases, and add regression coverage for malformed spans.
@felixbarny felixbarny enabled auto-merge (squash) April 28, 2026 09:14
@felixbarny felixbarny merged commit 4bbf7fa into elastic:main Apr 28, 2026
37 checks passed
@felixbarny felixbarny deleted the otlp-traces-impl branch April 28, 2026 10:42
chrisparrinello pushed a commit to chrisparrinello/elasticsearch that referenced this pull request Apr 28, 2026
Add the OTLP traces endpoint, transport action, document builder, and
coverage needed to index trace spans into the built-in traces data
streams.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue :StorageEngine/Logs You know, for Logs Team:StorageEngine v9.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants