feat(diagnostics-otel): nested message spans + conversation.id fix by rogeriochaves · Pull Request #3 · orq-ai/openclaw

rogeriochaves · 2026-02-09T16:03:05Z

Summary

Two improvements to the OTEL diagnostics plugin's trace structure and semantic compliance, plus a core fix to ensure proper span nesting.

1. Nested message lifecycle spans (no more orphan `message.processed` traces)

Problem: The message.processed diagnostic event was creating a standalone OTEL span with no parent context. This resulted in two separate traces appearing in observability platforms for every single message — one for the actual agent work (openclaw.agent.turn with nested LLM/tool spans) and a disconnected orphan for openclaw.message.processed.

Root cause: In OpenClaw, the message lifecycle is a sequence of internal diagnostic events:

message.queued → run.started → model.inference.started → model.inference → tool.execution → run.completed → message.processed

The message.queued/message.processed pair represents the outer envelope — the full lifecycle from when a message enters the queue to when processing completes (including queue wait time). The run.* and model.inference.* events represent the inner work — the actual agent execution.

From an OTEL semantic perspective, these are not independent operations — they are nested levels of the same unit of work. A message being processed IS the agent turn; the queue wait + agent execution + completion are phases of one trace, not separate ones. Creating a standalone span for message.processed is like having both a function call AND its return statement as separate traces — it double-counts and breaks the parent-child hierarchy.

Fix:

message.queued now creates a root span (openclaw.message) that becomes the parent for all subsequent spans in that session
agent.turn spans (created by ensureRunSpan) are automatically parented under this root span via activeTraces lookup by sessionKey
message.processed ends the root span instead of creating a new one, setting OK/ERROR status based on the outcome
Orphaned root spans are cleaned up via TTL (5 min) on the interval timer and on service stop

The resulting trace hierarchy:

openclaw.message (root, covers full lifecycle including queue wait)
  └── openclaw.agent.turn (agent execution)
        ├── chat claude-opus-4-6 (LLM call)
        ├── execute_tool read (tool execution)
        └── chat claude-opus-4-6 (follow-up LLM call)

2. Propagate `sessionKey` to `run.started` diagnostic event (core fix)

Problem: The run.started event only carried sessionId (from the agent session) but not sessionKey (from the message queue layer). Since the OTEL plugin uses sessionKey to correlate the root openclaw.message span (created at message.queued time) with child spans, the parent-child nesting silently failed — producing two separate traces instead of one nested hierarchy.

Root cause: handleAgentStart in pi-embedded-subscribe.handlers.lifecycle.ts emitted run.started without sessionKey because the subscribe params didn't include it. However, registerAgentRunContext(runId, { sessionKey }) is already called before the run starts (in agent-runner-execution.ts), making sessionKey available via getAgentRunContext(runId).

Fix: handleAgentStart now resolves sessionKey via getAgentRunContext(runId) and includes it in the run.started event. This is a 3-line change to core that enables clean parent-child span nesting without any workarounds in the plugin.

3. Use `sessionId` for `gen_ai.conversation.id` (OTEL GenAI semantic convention)

Problem: gen_ai.conversation.id was set to sessionKey (e.g., telegram:myagent:12345), which is a composite key that includes channel and agent info and never changes across /reset commands. This meant all messages in a channel were permanently grouped as the same "conversation" in downstream platforms.

Fix: Changed to use sessionId instead, which is the stable conversation identifier that resets when the user starts a new session (e.g., via /reset). This aligns with the OTEL GenAI semantic convention where gen_ai.conversation.id is defined as "the unique identifier for a conversation (session, thread), used to store and correlate messages within this conversation."

Test plan

All 27 OTEL plugin tests pass (23 existing + 4 new P0 tests)
Nesting verification: child spans properly parented under root openclaw.message
Root span attributes: outcome/durationMs/messageId set before ending
Error status path: error outcomes set SpanStatusCode.ERROR
Graceful handling: recordMessageProcessed with no matching root trace doesn't throw
TTL cleanup: orphaned root spans cleaned up on interval timer
gen_ai.conversation.id uses sessionId (not sessionKey)
Manual test: send message via gateway → verified single nested trace in LangWatch
Manual test: /reset → send message → verify new thread_id in LangWatch

…d for conversation.id - Create root span on message.queued, end it on message.processed instead of creating a separate orphan span. This ensures the full message lifecycle (queue → agent turn → LLM calls → tools → done) appears as a single nested trace rather than disconnected spans. - Parent agent.turn spans under the root message span via activeTraces map keyed by sessionKey. - Use sessionId (stable conversation identifier that resets on /reset) for gen_ai.conversation.id instead of sessionKey (composite key including channel and agent info that never changes). This aligns with the OTEL GenAI semantic convention for thread/session correlation. - Clean up activeTraces on service stop.

handleAgentStart now resolves sessionKey via getAgentRunContext(runId) and includes it in the run.started diagnostic event. This allows the OTEL plugin to parent agent.turn spans under the root openclaw.message span via direct sessionKey lookup, since message.queued events use sessionKey as the correlation key.

Baukebrenninkmeijer · 2026-02-09T20:48:28Z

@rogeriochaves — Great PR! The architectural approach is solid. Here's a comprehensive review covering code quality, error handling, test coverage, and comment accuracy.

Critical Issues (2 found)

1. `activeTraces` map has no TTL cleanup — unbounded memory leak

File: extensions/diagnostics-otel/src/service.ts

The existing runSpans and inferenceSpans maps are cleaned up every 60s via bufferCleanupTimer with a 10-minute TTL. The new activeTraces map has no equivalent cleanup. If message.processed never fires (crash, timeout, abandoned session), entries accumulate forever. The ActiveTrace interface already has startedAt — suggesting cleanup was intended but not implemented.

Fix: Add TTL cleanup in the existing setInterval callback:

for (const [key, entry] of activeTraces) {
  if (now - entry.startedAt > ACTIVE_TRACE_TTL_MS) {
    entry.span.setStatus({ code: SpanStatusCode.ERROR, message: "TTL expired" });
    entry.span.end();
    activeTraces.delete(key);
  }
}

2. Incomplete `gen_ai.conversation.id` migration — `ensureRunSpan` still uses `sessionKey`

File: extensions/diagnostics-otel/src/service.ts — ensureRunSpan function (line ~593-594)

The PR changes gen_ai.conversation.id from sessionKey to sessionId in 4 places, but ensureRunSpan (the primary span creation path for run.started events) still has:

spanAttrs["gen_ai.conversation.id"] = params.sessionKey;  // NOT updated

For run.completed, this is masked because the caller's params.attributes spread overwrites the value. But standalone run.started events (which don't pass attributes with gen_ai.conversation.id) will still use the wrong key. Additionally, recordRunCompleted at line ~742-743 still directly sets gen_ai.conversation.id = evt.sessionKey, creating a second inconsistent site.

Fix: In ensureRunSpan, change the gen_ai.conversation.id assignment to use params.sessionId. Also update recordRunCompleted for consistency.

Important Issues (4 found)

3. `recordMessageProcessed` silently drops all trace data when no matching root exists

File: extensions/diagnostics-otel/src/service.ts — new recordMessageProcessed

The old code always created a standalone openclaw.message.processed span. The new code only ends a root span if one exists in activeTraces. If sessionKey is missing, or message.queued wasn't captured (e.g. service restart mid-flight), zero trace data is emitted — no log, no fallback.

Fix: Add a fallback standalone span or at minimum a debug log when no active trace is found.

4. Loss of metadata attributes on the `message.processed` path

File: extensions/diagnostics-otel/src/service.ts — new recordMessageProcessed

The old code attached sessionId, chatId, messageId, reason, and outcome as span attributes. The new code sets status and calls end() but none of these attributes are added to the root span before ending it.

Fix: Call activeTrace.span.setAttribute(...) for the available fields before calling end().

5. `SpanStatusCode.OK` is missing from the test mock

File: extensions/diagnostics-otel/src/service.test.ts — line ~56-58

The PR uses SpanStatusCode.OK in the new recordMessageProcessed, but the mock only defines ERROR: 2. SpanStatusCode.OK resolves to undefined at test time.

Fix: Add UNSET: 0, OK: 1 to the SpanStatusCode mock.

6. Non-null assertion `evt.sessionKey!` after null-coalescing guard

File: extensions/diagnostics-otel/src/service.ts — new recordMessageProcessed

const activeTrace = evt.sessionKey ? activeTraces.get(evt.sessionKey) : null;
// ...
activeTraces.delete(evt.sessionKey!);  // non-null assertion

Fix: Capture const sessionKey = evt.sessionKey; before the guard and use it throughout, avoiding the !.

Suggestions (4 found)

7. Empty `catch {}` in `stop()` cleanup should log

The new catch {} block in the stop() method swallows all errors during root trace cleanup. At minimum log a warning so operators can diagnose shutdown issues.

8. `activeTraces` map comment has inaccurate lifecycle sequence

Comment says (message.queued → agent.turn → tools → message.processed) — but message.processed ends the root span, it's not a child. The recordMessageQueued block comment has the correct sequence.

9. `getAgentRunContext(runId)` may return undefined — no diagnostic logging

In handleAgentStart, if runContext is undefined, sessionKey silently drops from the run.started event, breaking the entire span hierarchy for that run. A debug log would help diagnosis.

10. Significant missing test coverage

The test changes are minimal (+7/-2 lines) for a significant architectural change. Key missing tests:

Priority	Gap
P0	Root span → `agent.turn` parent-child nesting not verified
P0	`message.processed` ending the root span (`.end()` called) not verified
P1	Error outcome path (`SpanStatusCode.ERROR`) not verified
P1	`message.processed` without matching `message.queued`
P2	`stop()` cleanup of orphaned root traces
P2	`message.queued` without `sessionKey` (no span created)

The existing test infrastructure already supports all of these (the trace.setSpan mock, _parentCtx capture, etc.).

Strengths

The architectural approach is sound — nesting the full message lifecycle under a root span eliminates the disconnected orphan traces problem
The core fix in pi-embedded-subscribe.handlers.lifecycle.ts is clean and minimal (3 lines)
Using sessionId for gen_ai.conversation.id correctly aligns with OTEL GenAI semantic conventions
Comments explaining the "why" (especially the gen_ai.conversation.id rationale and the lifecycle sequence in recordMessageQueued) are well-written
Metrics recording is preserved regardless of trace state

Recommended Action

Fix the two critical issues (memory leak + incomplete conversation.id migration) before merge
Address the important issues (silent trace drop, missing attributes, test mock, non-null assertion)
Add test coverage for at minimum the P0 gaps (nesting verification + span ending)
Suggestions can be addressed as follow-ups

- Add TTL cleanup for activeTraces map (prevents unbounded memory leak) - Add outcome/duration/messageId attributes to root span before ending - Add debug log when message.processed has no matching root trace - Add debug log when getAgentRunContext returns no sessionKey - Fix SpanStatusCode.OK/UNSET missing from test mock - Remove non-null assertion on evt.sessionKey - Fix comment accuracy for message lifecycle hierarchy - Add P0 test coverage: nesting verification, span ending, error path, and graceful handling of unmatched message.processed

rogeriochaves · 2026-02-10T09:32:24Z

Review feedback addressed (`fd83273`)

Pushed a commit addressing the review feedback:

Fixed

TTL cleanup for activeTraces — Added to the existing interval timer. Orphaned root spans that exceed ACTIVE_TRACE_TTL_MS (5 min) are now cleaned up with ERROR status and a "TTL expired" message, preventing memory leaks from interrupted message flows.
Root span attributes before ending — recordMessageProcessed now sets openclaw.outcome, openclaw.durationMs, and openclaw.messageId on the root span before ending it. Previously the span was ended without any attributes beyond the initial ones.
Debug log for missing root trace — When recordMessageProcessed fires but no matching root trace is found, a debug log is now emitted (guarded by debugExports to stay silent in production).
SpanStatusCode mock — Added UNSET: 0 and OK: 1 to the mock alongside ERROR: 2. Tests now accurately reflect the real @opentelemetry/api exports.
Non-null assertion removed — Replaced activeTraces.get(sessionKey)! with a const capture and early-return guard.
Inaccurate comment fixed — Updated the recordRunCompleted comment to accurately describe what it does (no span creation, just checking unmatched traces).
Debug log for undefined runContext — handleAgentStart now logs when getAgentRunContext(runId) returns no sessionKey.

New tests (4 P0 tests added, now 27 total)

Nesting verification — recordRunStarted creates a child span that is properly parented under the root openclaw.message span
Root span ending with attributes — recordMessageProcessed sets outcome/durationMs/messageId attributes before ending the root span
Error status path — Error outcomes set SpanStatusCode.ERROR with the outcome message
Graceful handling — recordMessageProcessed with no matching root trace doesn't throw

Not addressed

"Incomplete conversation.id migration" (review point feat: emit cache tokens as gen_ai.usage span attributes #2) — Verified all 4 gen_ai.conversation.id call sites already use sessionId. The migration is complete.
Empty catch in stop() (review point feat(diagnostics-otel): OpenTelemetry diagnostics with GenAI semantic conventions #7) — Minor style nit, deferred for now.

rogeriochaves added 2 commits February 9, 2026 16:02

rogeriochaves mentioned this pull request Feb 9, 2026

Diagnostics: fix OTEL plugin wiring + align with GenAI semantic conventions openclaw/openclaw#11100

Closed

Baukebrenninkmeijer merged commit e9ce4d2 into orq-ai:otel-diagnostics-fixes Feb 10, 2026
4 of 20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(diagnostics-otel): nested message spans + conversation.id fix#3

feat(diagnostics-otel): nested message spans + conversation.id fix#3
Baukebrenninkmeijer merged 3 commits intoorq-ai:otel-diagnostics-fixesfrom
langwatch:feat/otel-nested-spans-and-conversation-id

rogeriochaves commented Feb 9, 2026 •

edited

Loading

Uh oh!

Baukebrenninkmeijer commented Feb 9, 2026

Uh oh!

rogeriochaves commented Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rogeriochaves commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Nested message lifecycle spans (no more orphan message.processed traces)

2. Propagate sessionKey to run.started diagnostic event (core fix)

3. Use sessionId for gen_ai.conversation.id (OTEL GenAI semantic convention)

Test plan

Uh oh!

Baukebrenninkmeijer commented Feb 9, 2026

Critical Issues (2 found)

1. activeTraces map has no TTL cleanup — unbounded memory leak

2. Incomplete gen_ai.conversation.id migration — ensureRunSpan still uses sessionKey

Important Issues (4 found)

3. recordMessageProcessed silently drops all trace data when no matching root exists

4. Loss of metadata attributes on the message.processed path

5. SpanStatusCode.OK is missing from the test mock

6. Non-null assertion evt.sessionKey! after null-coalescing guard

Suggestions (4 found)

7. Empty catch {} in stop() cleanup should log

8. activeTraces map comment has inaccurate lifecycle sequence

9. getAgentRunContext(runId) may return undefined — no diagnostic logging

10. Significant missing test coverage

Strengths

Recommended Action

Uh oh!

rogeriochaves commented Feb 10, 2026

Review feedback addressed (fd83273)

Fixed

New tests (4 P0 tests added, now 27 total)

Not addressed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rogeriochaves commented Feb 9, 2026 •

edited

Loading

1. Nested message lifecycle spans (no more orphan `message.processed` traces)

2. Propagate `sessionKey` to `run.started` diagnostic event (core fix)

3. Use `sessionId` for `gen_ai.conversation.id` (OTEL GenAI semantic convention)

1. `activeTraces` map has no TTL cleanup — unbounded memory leak

2. Incomplete `gen_ai.conversation.id` migration — `ensureRunSpan` still uses `sessionKey`

3. `recordMessageProcessed` silently drops all trace data when no matching root exists

4. Loss of metadata attributes on the `message.processed` path

5. `SpanStatusCode.OK` is missing from the test mock

6. Non-null assertion `evt.sessionKey!` after null-coalescing guard

7. Empty `catch {}` in `stop()` cleanup should log

8. `activeTraces` map comment has inaccurate lifecycle sequence

9. `getAgentRunContext(runId)` may return undefined — no diagnostic logging

Review feedback addressed (`fd83273`)