Skip to content

feat(diagnostics-otel): nested message spans + conversation.id fix#3

Merged
Baukebrenninkmeijer merged 3 commits intoorq-ai:otel-diagnostics-fixesfrom
langwatch:feat/otel-nested-spans-and-conversation-id
Feb 10, 2026
Merged

feat(diagnostics-otel): nested message spans + conversation.id fix#3
Baukebrenninkmeijer merged 3 commits intoorq-ai:otel-diagnostics-fixesfrom
langwatch:feat/otel-nested-spans-and-conversation-id

Conversation

@rogeriochaves
Copy link
Copy Markdown

@rogeriochaves rogeriochaves commented Feb 9, 2026

Summary

Two improvements to the OTEL diagnostics plugin's trace structure and semantic compliance, plus a core fix to ensure proper span nesting.

1. Nested message lifecycle spans (no more orphan message.processed traces)

Problem: The message.processed diagnostic event was creating a standalone OTEL span with no parent context. This resulted in two separate traces appearing in observability platforms for every single message — one for the actual agent work (openclaw.agent.turn with nested LLM/tool spans) and a disconnected orphan for openclaw.message.processed.

Root cause: In OpenClaw, the message lifecycle is a sequence of internal diagnostic events:

message.queued → run.started → model.inference.started → model.inference → tool.execution → run.completed → message.processed

The message.queued/message.processed pair represents the outer envelope — the full lifecycle from when a message enters the queue to when processing completes (including queue wait time). The run.* and model.inference.* events represent the inner work — the actual agent execution.

From an OTEL semantic perspective, these are not independent operations — they are nested levels of the same unit of work. A message being processed IS the agent turn; the queue wait + agent execution + completion are phases of one trace, not separate ones. Creating a standalone span for message.processed is like having both a function call AND its return statement as separate traces — it double-counts and breaks the parent-child hierarchy.

Fix:

  • message.queued now creates a root span (openclaw.message) that becomes the parent for all subsequent spans in that session
  • agent.turn spans (created by ensureRunSpan) are automatically parented under this root span via activeTraces lookup by sessionKey
  • message.processed ends the root span instead of creating a new one, setting OK/ERROR status based on the outcome
  • Orphaned root spans are cleaned up via TTL (5 min) on the interval timer and on service stop

The resulting trace hierarchy:

openclaw.message (root, covers full lifecycle including queue wait)
  └── openclaw.agent.turn (agent execution)
        ├── chat claude-opus-4-6 (LLM call)
        ├── execute_tool read (tool execution)
        └── chat claude-opus-4-6 (follow-up LLM call)

2. Propagate sessionKey to run.started diagnostic event (core fix)

Problem: The run.started event only carried sessionId (from the agent session) but not sessionKey (from the message queue layer). Since the OTEL plugin uses sessionKey to correlate the root openclaw.message span (created at message.queued time) with child spans, the parent-child nesting silently failed — producing two separate traces instead of one nested hierarchy.

Root cause: handleAgentStart in pi-embedded-subscribe.handlers.lifecycle.ts emitted run.started without sessionKey because the subscribe params didn't include it. However, registerAgentRunContext(runId, { sessionKey }) is already called before the run starts (in agent-runner-execution.ts), making sessionKey available via getAgentRunContext(runId).

Fix: handleAgentStart now resolves sessionKey via getAgentRunContext(runId) and includes it in the run.started event. This is a 3-line change to core that enables clean parent-child span nesting without any workarounds in the plugin.

3. Use sessionId for gen_ai.conversation.id (OTEL GenAI semantic convention)

Problem: gen_ai.conversation.id was set to sessionKey (e.g., telegram:myagent:12345), which is a composite key that includes channel and agent info and never changes across /reset commands. This meant all messages in a channel were permanently grouped as the same "conversation" in downstream platforms.

Fix: Changed to use sessionId instead, which is the stable conversation identifier that resets when the user starts a new session (e.g., via /reset). This aligns with the OTEL GenAI semantic convention where gen_ai.conversation.id is defined as "the unique identifier for a conversation (session, thread), used to store and correlate messages within this conversation."

Test plan

  • All 27 OTEL plugin tests pass (23 existing + 4 new P0 tests)
  • Nesting verification: child spans properly parented under root openclaw.message
  • Root span attributes: outcome/durationMs/messageId set before ending
  • Error status path: error outcomes set SpanStatusCode.ERROR
  • Graceful handling: recordMessageProcessed with no matching root trace doesn't throw
  • TTL cleanup: orphaned root spans cleaned up on interval timer
  • gen_ai.conversation.id uses sessionId (not sessionKey)
  • Manual test: send message via gateway → verified single nested trace in LangWatch
  • Manual test: /reset → send message → verify new thread_id in LangWatch

…d for conversation.id

- Create root span on message.queued, end it on message.processed
  instead of creating a separate orphan span. This ensures the full
  message lifecycle (queue → agent turn → LLM calls → tools → done)
  appears as a single nested trace rather than disconnected spans.

- Parent agent.turn spans under the root message span via activeTraces
  map keyed by sessionKey.

- Use sessionId (stable conversation identifier that resets on /reset)
  for gen_ai.conversation.id instead of sessionKey (composite key
  including channel and agent info that never changes). This aligns
  with the OTEL GenAI semantic convention for thread/session correlation.

- Clean up activeTraces on service stop.
handleAgentStart now resolves sessionKey via getAgentRunContext(runId)
and includes it in the run.started diagnostic event. This allows the
OTEL plugin to parent agent.turn spans under the root openclaw.message
span via direct sessionKey lookup, since message.queued events use
sessionKey as the correlation key.
@Baukebrenninkmeijer
Copy link
Copy Markdown
Collaborator

@rogeriochaves — Great PR! The architectural approach is solid. Here's a comprehensive review covering code quality, error handling, test coverage, and comment accuracy.


Critical Issues (2 found)

1. activeTraces map has no TTL cleanup — unbounded memory leak

File: extensions/diagnostics-otel/src/service.ts

The existing runSpans and inferenceSpans maps are cleaned up every 60s via bufferCleanupTimer with a 10-minute TTL. The new activeTraces map has no equivalent cleanup. If message.processed never fires (crash, timeout, abandoned session), entries accumulate forever. The ActiveTrace interface already has startedAt — suggesting cleanup was intended but not implemented.

Fix: Add TTL cleanup in the existing setInterval callback:

for (const [key, entry] of activeTraces) {
  if (now - entry.startedAt > ACTIVE_TRACE_TTL_MS) {
    entry.span.setStatus({ code: SpanStatusCode.ERROR, message: "TTL expired" });
    entry.span.end();
    activeTraces.delete(key);
  }
}

2. Incomplete gen_ai.conversation.id migration — ensureRunSpan still uses sessionKey

File: extensions/diagnostics-otel/src/service.tsensureRunSpan function (line ~593-594)

The PR changes gen_ai.conversation.id from sessionKey to sessionId in 4 places, but ensureRunSpan (the primary span creation path for run.started events) still has:

spanAttrs["gen_ai.conversation.id"] = params.sessionKey;  // NOT updated

For run.completed, this is masked because the caller's params.attributes spread overwrites the value. But standalone run.started events (which don't pass attributes with gen_ai.conversation.id) will still use the wrong key. Additionally, recordRunCompleted at line ~742-743 still directly sets gen_ai.conversation.id = evt.sessionKey, creating a second inconsistent site.

Fix: In ensureRunSpan, change the gen_ai.conversation.id assignment to use params.sessionId. Also update recordRunCompleted for consistency.


Important Issues (4 found)

3. recordMessageProcessed silently drops all trace data when no matching root exists

File: extensions/diagnostics-otel/src/service.ts — new recordMessageProcessed

The old code always created a standalone openclaw.message.processed span. The new code only ends a root span if one exists in activeTraces. If sessionKey is missing, or message.queued wasn't captured (e.g. service restart mid-flight), zero trace data is emitted — no log, no fallback.

Fix: Add a fallback standalone span or at minimum a debug log when no active trace is found.

4. Loss of metadata attributes on the message.processed path

File: extensions/diagnostics-otel/src/service.ts — new recordMessageProcessed

The old code attached sessionId, chatId, messageId, reason, and outcome as span attributes. The new code sets status and calls end() but none of these attributes are added to the root span before ending it.

Fix: Call activeTrace.span.setAttribute(...) for the available fields before calling end().

5. SpanStatusCode.OK is missing from the test mock

File: extensions/diagnostics-otel/src/service.test.ts — line ~56-58

The PR uses SpanStatusCode.OK in the new recordMessageProcessed, but the mock only defines ERROR: 2. SpanStatusCode.OK resolves to undefined at test time.

Fix: Add UNSET: 0, OK: 1 to the SpanStatusCode mock.

6. Non-null assertion evt.sessionKey! after null-coalescing guard

File: extensions/diagnostics-otel/src/service.ts — new recordMessageProcessed

const activeTrace = evt.sessionKey ? activeTraces.get(evt.sessionKey) : null;
// ...
activeTraces.delete(evt.sessionKey!);  // non-null assertion

Fix: Capture const sessionKey = evt.sessionKey; before the guard and use it throughout, avoiding the !.


Suggestions (4 found)

7. Empty catch {} in stop() cleanup should log

The new catch {} block in the stop() method swallows all errors during root trace cleanup. At minimum log a warning so operators can diagnose shutdown issues.

8. activeTraces map comment has inaccurate lifecycle sequence

Comment says (message.queued → agent.turn → tools → message.processed) — but message.processed ends the root span, it's not a child. The recordMessageQueued block comment has the correct sequence.

9. getAgentRunContext(runId) may return undefined — no diagnostic logging

In handleAgentStart, if runContext is undefined, sessionKey silently drops from the run.started event, breaking the entire span hierarchy for that run. A debug log would help diagnosis.

10. Significant missing test coverage

The test changes are minimal (+7/-2 lines) for a significant architectural change. Key missing tests:

Priority Gap
P0 Root span → agent.turn parent-child nesting not verified
P0 message.processed ending the root span (.end() called) not verified
P1 Error outcome path (SpanStatusCode.ERROR) not verified
P1 message.processed without matching message.queued
P2 stop() cleanup of orphaned root traces
P2 message.queued without sessionKey (no span created)

The existing test infrastructure already supports all of these (the trace.setSpan mock, _parentCtx capture, etc.).


Strengths

  • The architectural approach is sound — nesting the full message lifecycle under a root span eliminates the disconnected orphan traces problem
  • The core fix in pi-embedded-subscribe.handlers.lifecycle.ts is clean and minimal (3 lines)
  • Using sessionId for gen_ai.conversation.id correctly aligns with OTEL GenAI semantic conventions
  • Comments explaining the "why" (especially the gen_ai.conversation.id rationale and the lifecycle sequence in recordMessageQueued) are well-written
  • Metrics recording is preserved regardless of trace state

Recommended Action

  1. Fix the two critical issues (memory leak + incomplete conversation.id migration) before merge
  2. Address the important issues (silent trace drop, missing attributes, test mock, non-null assertion)
  3. Add test coverage for at minimum the P0 gaps (nesting verification + span ending)
  4. Suggestions can be addressed as follow-ups

- Add TTL cleanup for activeTraces map (prevents unbounded memory leak)
- Add outcome/duration/messageId attributes to root span before ending
- Add debug log when message.processed has no matching root trace
- Add debug log when getAgentRunContext returns no sessionKey
- Fix SpanStatusCode.OK/UNSET missing from test mock
- Remove non-null assertion on evt.sessionKey
- Fix comment accuracy for message lifecycle hierarchy
- Add P0 test coverage: nesting verification, span ending, error path,
  and graceful handling of unmatched message.processed
@rogeriochaves
Copy link
Copy Markdown
Author

Review feedback addressed (fd83273)

Pushed a commit addressing the review feedback:

Fixed

  1. TTL cleanup for activeTraces — Added to the existing interval timer. Orphaned root spans that exceed ACTIVE_TRACE_TTL_MS (5 min) are now cleaned up with ERROR status and a "TTL expired" message, preventing memory leaks from interrupted message flows.

  2. Root span attributes before endingrecordMessageProcessed now sets openclaw.outcome, openclaw.durationMs, and openclaw.messageId on the root span before ending it. Previously the span was ended without any attributes beyond the initial ones.

  3. Debug log for missing root trace — When recordMessageProcessed fires but no matching root trace is found, a debug log is now emitted (guarded by debugExports to stay silent in production).

  4. SpanStatusCode mock — Added UNSET: 0 and OK: 1 to the mock alongside ERROR: 2. Tests now accurately reflect the real @opentelemetry/api exports.

  5. Non-null assertion removed — Replaced activeTraces.get(sessionKey)! with a const capture and early-return guard.

  6. Inaccurate comment fixed — Updated the recordRunCompleted comment to accurately describe what it does (no span creation, just checking unmatched traces).

  7. Debug log for undefined runContexthandleAgentStart now logs when getAgentRunContext(runId) returns no sessionKey.

New tests (4 P0 tests added, now 27 total)

  • Nesting verificationrecordRunStarted creates a child span that is properly parented under the root openclaw.message span
  • Root span ending with attributesrecordMessageProcessed sets outcome/durationMs/messageId attributes before ending the root span
  • Error status path — Error outcomes set SpanStatusCode.ERROR with the outcome message
  • Graceful handlingrecordMessageProcessed with no matching root trace doesn't throw

Not addressed

@Baukebrenninkmeijer Baukebrenninkmeijer merged commit e9ce4d2 into orq-ai:otel-diagnostics-fixes Feb 10, 2026
4 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants