feat(diagnostics-otel): OpenTelemetry diagnostics with GenAI semantic conventions by Baukebrenninkmeijer · Pull Request #7 · orq-ai/openclaw

Baukebrenninkmeijer · 2026-02-19T21:03:59Z

Summary

Upgrades the @openclaw/diagnostics-otel exporter to produce structured, per-call telemetry aligned with OpenTelemetry GenAI semantic conventions.

Run-level parent span (openclaw.agent.turn) per agent turn
Per-inference spans for each LLM call (initial, post-tool followups, loops)
Tool execution spans with gen_ai.tool.* attributes
Opt-in content capture (diagnostics.otel.captureContent) for messages and tool I/O
GenAI metrics: gen_ai.client.operation.duration, gen_ai.client.time_to_first_token, gen_ai.client.token.usage

Event model

Replaces the monolithic model.usage event with a structured lifecycle:

run.started — agent turn begins
model.inference.started — LLM call begins (captures input messages, system instructions, tool definitions)
model.inference — LLM call ends (duration, TTFT, usage, output messages)
tool.execution — tool call (duration, errors, optional I/O)
run.completed — agent turn ends (aggregate usage, cost, duration)

Key design decisions

Input messages captured at the actual model-call boundary (not from streaming state)
Content capture gated behind diagnostics.otel.captureContent — when disabled, spans still include timings/usage/errors
Provider names normalized to GenAI enum (openai, anthropic, gcp.gemini, etc.)
Symbol.for() used for global diagnostic state key (better cross-module isolation)
Recursion guard (dispatchDepth) retained for diagnostic event dispatch safety

Test plan

pnpm vitest run extensions/diagnostics-otel/src/service.test.ts
pnpm vitest run extensions/diagnostics-otel/src/service.metrics.test.ts
pnpm vitest run extensions/diagnostics-otel/src/service.spans.test.ts
pnpm vitest run src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.emits-diagnostic-tool-execution-events.test.ts
pnpm vitest run src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.emits-diagnostic-sessionkey.test.ts
pnpm vitest run src/commands/agent.diagnostics.test.ts
npx tsc --noEmit passes (only pre-existing e2e test errors)

🤖 Generated with Claude Code

@obviyus

…nks @obviyus)

* changelog: add security deepMerge prototype-pollution fix entry * update: refresh gateway service env during update restart * test(cli): fix daemon install mock assertion * test(cli): guard update restart false path

@mbelinky

Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 1beca3a Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky

@mbelinky

Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 31a27b0 Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky

@mbelinky

…s to user context (openclaw#20597) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 175919a Co-authored-by: anisoptera <768771+anisoptera@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky

…kkernerd

…kkernerd)

@afurm

…0313) (thanks @afurm)

@mbelinky

…aw#21226) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 7705a77 Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com> Reviewed-by: @mbelinky

* fix(docker): pin base images to SHA256 digests for supply chain security Pin all 9 Dockerfiles to immutable SHA256 digests to prevent supply chain attacks where a compromised upstream image could be silently pulled into production builds. Also add Docker ecosystem to Dependabot configuration for automated digest updates. Images pinned: - node:22-bookworm@sha256:cd7bcd2e7a1e6f72052feb023c7f6b722205d3fcab7bbcbd2d1bfdab10b1e935 - node:22-bookworm-slim@sha256:3cfe526ec8dd62013b8843e8e5d4877e297b886e5aace4a59fec25dc20736e45 - debian:bookworm-slim@sha256:98f4b71de414932439ac6ac690d7060df1f27161073c5036a7553723881bffbe - ubuntu:24.04@sha256:cd1dba651b3080c3686ecf4e3c4220f026b521fb76978881737d24f200828b2b Fixes openclaw#7731 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test(docker): add digest pinning regression coverage --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

…nclaw#21086) * fix: treat HTTP 503 as failover-eligible for LLM provider errors When LLM SDKs wrap 503 responses, the leading "503" prefix is lost (e.g. Google Gemini returns "high demand" / "UNAVAILABLE" without a numeric prefix). The existing isTransientHttpError only matches messages starting with "503 ...", so these wrapped errors silently skip failover — no profile rotation, no model fallback. This patch closes that gap: - resolveFailoverReasonFromError: map HTTP status 503 → rate_limit (covers structured error objects with a status field) - ERROR_PATTERNS.overloaded: add /\b503\b/, "service unavailable", "high demand" (covers message-only classification when the leading status prefix is absent) Existing isTransientHttpError behavior is unchanged; these additions are complementary and only fire for errors that previously fell through unclassified. * fix: address review feedback — drop /\b503\b/ pattern, add test coverage - Remove `/\b503\b/` from ERROR_PATTERNS.overloaded to resolve the semantic inconsistency noted by reviewers: `isTransientHttpError` already handles messages prefixed with "503" (→ "timeout"), so a redundant overloaded pattern would classify the same class of errors differently depending on message formatting. - Keep "service unavailable" and "high demand" patterns — these are the real gap-fillers for SDK-rewritten messages that lack a numeric prefix. - Add test case for JSON-wrapped 503 error body containing "overloaded" to strengthen coverage. * fix: unify 503 classification — status 503 → timeout (consistent with isTransientHttpError) resolveFailoverReasonFromError previously mapped status 503 → "rate_limit", while the string-based isTransientHttpError mapped "503 ..." → "timeout". Align both paths: structured {status: 503} now also returns "timeout", matching the existing transient-error convention. Both reasons are failover-eligible, so runtime behavior is unchanged. --------- Co-authored-by: Vincent Koc <vincentkoc@ieee.org>

… conventions - Align OTel spans with GenAI semantic conventions (gen_ai.* attributes, metrics, tool spans) - Add inference spans, content capture gating, and tool execution diagnostic events - Add OTel trace lifecycle to followup runner and subagent runs - Split oversized service into otel-event-handlers, otel-metrics, and otel-utils modules - Fix trace header leak, conversation ID consistency, and span context validation - Include cached tokens in gen_ai.usage.input_tokens calculation - Guard inner exporter to ensure resultCallback is always invoked - Add comprehensive test coverage for spans, metrics, and diagnostic events Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add missing trace-context-propagator.ts source module (PR review comment) - Add DiagnosticModelUsageEvent type to diagnostic-events.ts union - Add stateDir to OTel test createTestCtx() helpers (new required field) - Fix stop() signature to accept ctx parameter in OTel service - Add sessionKey/sessionId/channel to ToolHandlerParams Pick type - Remove duplicate sessionKey from SubscribeEmbeddedPiSessionParams - Fix type casts for mock .attributes/.kind access in test files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

steipete and others added 30 commits February 19, 2026 07:45

test(image-tool): dedupe repeated image tool fixture assertions

3cb0c96

test(agents): dedupe ping-pong loop test scaffolding

d7b2efc

test(subagents): dedupe sessions_spawn model expectation paths

ccd68d8

test(gateway): dedupe startup auth override token checks

57ea6fe

test: remove duplicate target-resolution cases from outbound suite

8d7df30

fix(ios): auto-generate local signing overrides (openclaw#20716)

9bd2261

test(shell-env): dedupe repeated login-shell path lookups

ca71b5c

test(media): dedupe active-model fallback resolver setup

bbb07bd

test: trim duplicate cross-context policy cases

18d4ad6

test(web): dedupe creds-update trigger helper in session tests

a82a412

test: drop duplicate followup compaction token assertion

9a490fb

ci: skip bun bootstrap in check and docs-check jobs

5f2bcfc

test: remove duplicate telegram .co link formatting case

b97b890

ci: pin bun setup version to avoid API rate-limit flakes

2cbf15e

fix: preserve assistant partial stream during reasoning

221d50b

fix: clear matched tool errors and dedupe reasoning end

0ff5061

fix: preserve reasoning stream partial contract (openclaw#20635) (tha…

d3dab08

…nks @obviyus)

test: remove duplicate telegram de-linkify case

b78fa57

test: collapse duplicate gateway token-generation cases

ad4c784

docs: clarify WhatsApp group allowlist and reply mention behavior

9c2640a

test: fix flaky run-node spawn side-effects

4e5cffe

test(infra): dedupe outbound recovery test scaffolding

ab924eb

test(config): dedupe OPENCLAW_HOME path assertions

644d037

test(gateway): dedupe assistant chat event assertions

8bb1747

test(config): dedupe model provider fixture setup

d8b720c

test(hooks): dedupe gmail runtime path assertions

733e385

test(cron): dedupe applyJobPatch fixture setup

edce5a5

test(browser): dedupe auth mode no-token assertions

e0c3cc4

test(gateway): dedupe config.apply request scaffolding

3c7c45e

test(auto-reply): dedupe heartbeat typing flow setup

69e6da0

steipete and others added 29 commits February 19, 2026 15:29

fix(ci): tighten test typing for browser and cron cli

30e36c3

fix(ui): unblock docker onboarding build

3077c35

fix(ci): verify actionlint release checksum before install

869ebbc

docs: trim refactor-only and duplicate changelog entries

9f5429e

ci: move blacksmith runners to 8 vcpu

2435499

fix(ci): use versioned actionlint checksum asset

2c05cbb

ci: move workflows to blacksmith 16vcpu runners

ce1f0c0

fix(ci): allow blacksmith 16vcpu labels in actionlint

e500110

fix(ci): restore actionlint rules and add blacksmith 16 ignore

7880947

fix(cli): refresh gateway service env during update (openclaw#21071)

45d9b20

* changelog: add security deepMerge prototype-pollution fix entry * update: refresh gateway service env during update restart * test(cli): fix daemon install mock assertion * test(cli): guard update restart false path

fix(test): mock runDaemonInstall with vi.mocked

03d7aad

chore(ci): trigger push workflows after main CI fix

e741a53

fix(update): silence npm deprecation/funding noise

bf8117a

chore: bump release metadata to 2026.2.20

ff3a7e5

Auto-reply: delay onAgentRunStart until real activity

7579e95

Changelog: add auto-reply run-start fix (openclaw#21165) (thanks @sha…

45b54d9

…kkernerd)

Changelog: move prompt caching fix to unreleased

eec5a6d

Net: strip sensitive headers on cross-origin redirects

c0cd5a7

Net: expand cross-origin sensitive header regression test

802f043

fix: changelog for cross-origin redirect header stripping (openclaw#2…

85fee30

…0313) (thanks @afurm)

Discord: handle gateway 4014 close

f7a8c2d

Baukebrenninkmeijer closed this Feb 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(diagnostics-otel): OpenTelemetry diagnostics with GenAI semantic conventions#7

feat(diagnostics-otel): OpenTelemetry diagnostics with GenAI semantic conventions#7
Baukebrenninkmeijer wants to merge 3711 commits into
mainfrom
otel-diagnostics-fixes

Baukebrenninkmeijer commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

Baukebrenninkmeijer commented Feb 19, 2026

Summary

Event model

Key design decisions

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants