Implement OTLP traces indexing#147453
Conversation
Add the OTLP traces endpoint, transport action, document builder, and coverage needed to index trace spans into the built-in traces data streams.
Emit span events as separate logs documents so OTLP trace ingestion matches the upstream OTEL mapping and preserves event fields in the built-in logs data streams.
Use elasticsearch.document_id for both emitted trace and span event records so OTLP ingestion can keep stable document IDs across trace and event documents.
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
WalkthroughAdds OpenTelemetry span indexing support by extending Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/SpanDocumentBuilder.java (1)
93-102: Cachespan.getStatus()locally
span.getStatus()is invoked four times in this method. While protobuf getters are cheap, a local variable improves readability and avoids the repeated accessor calls.♻️ Proposed refactor
private void buildStatus(XContentBuilder builder, Span span) throws IOException { - boolean hasCode = span.getStatus().getCode() != StatusCode.STATUS_CODE_UNSET; - boolean hasMessage = span.getStatus().getMessageBytes().isEmpty() == false; + Status status = span.getStatus(); + boolean hasCode = status.getCode() != StatusCode.STATUS_CODE_UNSET; + boolean hasMessage = status.getMessageBytes().isEmpty() == false; if (hasCode == false && hasMessage == false) { return; } builder.startObject("status"); if (hasCode) { - builder.field("code", normalizeStatusCode(span.getStatus().getCode())); + builder.field("code", normalizeStatusCode(status.getCode())); } - addFieldIfNotEmpty(builder, "message", span.getStatus().getMessageBytes()); + addFieldIfNotEmpty(builder, "message", status.getMessageBytes()); builder.endObject(); }(Requires adding
import io.opentelemetry.proto.trace.v1.Status;.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/SpanDocumentBuilder.java` around lines 93 - 102, The code repeatedly calls span.getStatus(); cache it in a local Status variable (e.g., Status status = span.getStatus()) at the start of the method/block and replace all subsequent span.getStatus() uses with that local (affecting the boolean checks, normalizeStatusCode(status.getCode()), status.getMessageBytes(), and any other usages); add the import io.opentelemetry.proto.trace.v1.Status if missing to resolve the type.x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/SpanDocumentBuilderTests.java (1)
47-144: LGTM — solid coverageNice thorough assertions including hex encoding for IDs, nested KVLIST as a
Map, and Base64 for byte payloads. Two optional coverage gaps worth considering:
- Status with only
messageand unsetcode(see related comment onSpanDocumentBuilder.buildStatus).- Both
startTimeUnixNanoandendTimeUnixNanoequal to 0 (documents the@timestamp: 0behavior).Not blocking.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/SpanDocumentBuilderTests.java` around lines 47 - 144, Add tests that cover the two noted edge-cases: 1) a Span with Status containing only a message and no code to exercise SpanDocumentBuilder.buildStatus — create a span with Status.newBuilder().setMessage("msg").build() and assert the produced document's "status.message" equals "msg" and "status.code" is null/absent; 2) a Span with both startTimeUnixNano and endTimeUnixNano set to 0 to exercise timestamp fallback logic in buildDocument — build such a span and assert "@timestamp" is 0, "duration" is null, and other related fields behave as expected. Use the existing test class and helper methods (buildDocument, SpanDocumentBuilder.buildStatus) to place these cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/SpanDocumentBuilder.java`:
- Around line 44-48: The code in SpanDocumentBuilder sets `@timestamp` to 0 when
both span.getStartTimeUnixNano() and span.getEndTimeUnixNano() are zero; change
this to validate and reject such malformed spans instead of indexing epoch 0: in
SpanDocumentBuilder (the method that reads
span.getStartTimeUnixNano()/span.getEndTimeUnixNano() before calling
builder.field("@timestamp", ...)) add a guard that returns an error/throws an
IllegalArgumentException or signals the caller to drop the span when both times
are zero, and add a unit test that submits a span with both times unset to
assert the span is rejected (or handled according to the chosen failure path)
rather than producing `@timestamp`: 0.
- Around line 92-104: In buildStatus, avoid emitting a status object that
contains only a message; ensure the status block always includes a normalized
code when written. Update the buildStatus(Span) logic so that if you decide to
write the "status" object (currently guarded by hasCode || hasMessage) you
always call builder.field("code",
normalizeStatusCode(span.getStatus().getCode())) before adding the message (use
addFieldIfNotEmpty for the message), or alternatively gate emitting the entire
status block on hasCode alone; modify the buildStatus method accordingly
(references: buildStatus, normalizeStatusCode, addFieldIfNotEmpty).
---
Nitpick comments:
In
`@x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/SpanDocumentBuilder.java`:
- Around line 93-102: The code repeatedly calls span.getStatus(); cache it in a
local Status variable (e.g., Status status = span.getStatus()) at the start of
the method/block and replace all subsequent span.getStatus() uses with that
local (affecting the boolean checks, normalizeStatusCode(status.getCode()),
status.getMessageBytes(), and any other usages); add the import
io.opentelemetry.proto.trace.v1.Status if missing to resolve the type.
In
`@x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/SpanDocumentBuilderTests.java`:
- Around line 47-144: Add tests that cover the two noted edge-cases: 1) a Span
with Status containing only a message and no code to exercise
SpanDocumentBuilder.buildStatus — create a span with
Status.newBuilder().setMessage("msg").build() and assert the produced document's
"status.message" equals "msg" and "status.code" is null/absent; 2) a Span with
both startTimeUnixNano and endTimeUnixNano set to 0 to exercise timestamp
fallback logic in buildDocument — build such a span and assert "@timestamp" is
0, "duration" is null, and other related fields behave as expected. Use the
existing test class and helper methods (buildDocument,
SpanDocumentBuilder.buildStatus) to place these cases.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Enterprise
Run ID: 749a73aa-d3e7-42dc-b610-5341b09f8a1d
📒 Files selected for processing (4)
x-pack/plugin/otel-data/src/javaRestTest/java/org/elasticsearch/xpack/oteldata/otlp/OTLPTracesIndexingRestIT.javax-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/OTelDocumentBuilder.javax-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/SpanDocumentBuilder.javax-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/SpanDocumentBuilderTests.java
| case KVLIST_VALUE -> { | ||
| builder.startObject(); | ||
| List<KeyValue> kvList = value.getKvlistValue().getValuesList(); | ||
| for (int i = 0, kvListSize = kvList.size(); i < kvListSize; i++) { |
There was a problem hiding this comment.
Nit: why not
for (KeyValue kv: value.getKvlistValue().getValuesList()) {
There was a problem hiding this comment.
It avoids having to allocate an iterator. But the JIT will most likely replace with a stack allocation anyway. Both works. The more practical reason is that it follows the precedent of ARRAY_VALUE
| builder.startObject(); | ||
| long timestamp = span.getStartTimeUnixNano(); | ||
| if (timestamp == 0) { | ||
| timestamp = span.getEndTimeUnixNano(); |
There was a problem hiding this comment.
Good callout. The OTLP proto marks span start/end timestamps as semantically required, but proto3 can still carry zeroes. I kept the current tolerant behavior so a malformed span does not fail the whole ingest request, added a comment, and added a test documenting the @timestamp: 0 / no-duration behavior. If we want stricter handling, I think it should be an explicit partial-success/drop path rather than throwing from the document builder.
| if (links.isEmpty()) { | ||
| return; | ||
| } | ||
| builder.startArray("links"); |
There was a problem hiding this comment.
I assume we want an array even if it contains a single element?
There was a problem hiding this comment.
Yes, I think we want a stable array shape here. A single link is still semantically part of the span’s links collection.
| private void buildStatus(XContentBuilder builder, Span span) throws IOException { | ||
| boolean hasCode = span.getStatus().getCode() != StatusCode.STATUS_CODE_UNSET; | ||
| boolean hasMessage = span.getStatus().getMessageBytes().isEmpty() == false; | ||
| if (hasCode == false && hasMessage == false) { |
There was a problem hiding this comment.
Added coverage for a status with only a message and no code. buildStatus now skips message-only unset statuses, so any emitted status object includes a normalized code.
| return ObjectPath.createFromXContent(JsonXContent.jsonXContent, BytesReference.bytes(builder)); | ||
| } | ||
|
|
||
| private static ByteString byteString(int length, int start) { |
There was a problem hiding this comment.
Nit: maybe byteStringFromNumber? Also start is not very appropriate..
There was a problem hiding this comment.
Renamed this to randomHexByteString and changed the implementation to use random bytes, so the helper now matches the name better.
Normalize status documents to include codes, document timestamp edge cases, and add regression coverage for malformed spans.
Add the OTLP traces endpoint, transport action, document builder, and coverage needed to index trace spans into the built-in traces data streams.
Summary
Follow-ups
elasticsearch.document_idfor emitted trace and span event documentsTest plan
./gradlew :x-pack:plugin:otel-data:test --tests org.elasticsearch.xpack.oteldata.otlp.docbuilder.SpanDocumentBuilderTests./gradlew :x-pack:plugin:otel-data:test --tests org.elasticsearch.xpack.oteldata.otlp.OTLPTracesTransportActionTests./gradlew :x-pack:plugin:otel-data:javaRestTest --tests org.elasticsearch.xpack.oteldata.otlp.OTLPTracesIndexingRestITRelated