feat: emit cache tokens as gen_ai.usage span attributes by rogeriochaves · Pull Request #2 · orq-ai/openclaw

rogeriochaves · 2026-02-08T22:32:46Z

Summary

Adds gen_ai.usage.cache_read.input_tokens and gen_ai.usage.cache_creation.input_tokens as span attributes alongside the existing openclaw.tokens.cache_* attributes
Both recordModelInference and recordRunCompleted now emit the new attributes
Fixes gen_ai.usage.input_tokens to include all input tokens (including cached) per OTEL semconv, using promptTokens (= input + cacheRead + cacheWrite) instead of raw input which excludes cached tokens
Adds gen_ai.usage.output_tokens to recordRunCompleted span which was previously missing it
Adds test assertions for all new attributes

Context

While integrating OpenClaw's OTEL diagnostics with LangWatch, we found that cache token counts (cache_read, cache_write) were only emitted under the openclaw.tokens.* namespace. Since OTEL-compatible observability platforms can't pick those up automatically, this PR adds the tokens under the official OpenTelemetry GenAI semantic convention names.

Additionally, gen_ai.usage.input_tokens was being set to the raw provider input value, which for providers like Anthropic only represents uncached tokens. The OTEL GenAI spec explicitly states:

"This value SHOULD include all types of input tokens, including cached tokens. Instrumentations SHOULD make a best effort to populate this value, using a total provided by the provider when available or, depending on the provider API, by summing different token types parsed from the provider output."

And for both cache_read.input_tokens and cache_creation.input_tokens:

"The value SHOULD be included in gen_ai.usage.input_tokens."

This means gen_ai.usage.input_tokens must be the total of all input tokens, with cache tokens as subsets — not the raw Anthropic value where they are additive/disjoint. OpenClaw already computes this correctly as promptTokens (via derivePromptTokens() which sums input + cacheRead + cacheWrite), so the fix is simply using that value for the OTEL attribute.

Provider semantics comparison

Provider	`input_tokens` meaning	Cache relationship
Anthropic	Only uncached tokens (after last cache breakpoint)	`total = input + cache_read + cache_creation` (additive, disjoint)
OpenAI	All prompt tokens including cached	`cached_tokens` is a subset of `prompt_tokens`
OTEL spec	All input tokens including cached	`cache_read` and `cache_creation` are subsets of `input_tokens`

Since pi-ai exposes raw provider values and derivePromptTokens() already normalizes them, the OTEL plugin should use the normalized value — which this PR now does.

OpenTelemetry GenAI Semantic Conventions: Cache Token Naming

The Final Standard (Merged January 27, 2026)

PR open-telemetry/semantic-conventions#3163 was merged into the main branch of the open-telemetry/semantic-conventions repo on January 27, 2026, closing Issue open-telemetry/semantic-conventions-genai#23. The two canonical attribute names are:

Attribute	Type	Stability	Description
`gen_ai.usage.cache_read.input_tokens`	`int`	Development	The number of input tokens served from a provider-managed cache
`gen_ai.usage.cache_creation.input_tokens`	`int`	Development	The number of input tokens written to a provider-managed cache

Both values SHOULD be included in gen_ai.usage.input_tokens. The gen_ai.usage.input_tokens description was updated to say: "This value SHOULD include all types of input tokens, including cached tokens."

These attributes are defined in the registry YAML and documented in gen-ai-spans.md.

Note on Naming: Dots vs. Underscores

The OTEL convention uses dots as namespace separators:

gen_ai.usage.cache_read.input_tokens (note: cache_read is one segment, then .input_tokens)
gen_ai.usage.cache_creation.input_tokens (note: cache_creation is one segment, then .input_tokens)

What Each Provider Uses Natively

Anthropic (usage object in API response):

cache_read_input_tokens — number of tokens read from cache
cache_creation_input_tokens — number of tokens written to cache
Notably, Anthropic's input_tokens field only counts tokens that are neither read from nor written to cache (i.e., the three fields are disjoint, not overlapping)
Source: Anthropic Prompt Caching docs

OpenAI (usage object in API response):

usage.prompt_tokens_details.cached_tokens — number of prompt tokens served from cache
OpenAI has no "cache write" concept exposed to users (caching is automatic and implicit)
usage.prompt_tokens includes cached tokens (they overlap)
Source: OpenAI Prompt Caching docs

Google / Vertex AI (Gemini) (usage_metadata in API response):

cached_content_token_count — number of cached tokens in the prompt
Google also has no explicit "cache creation" metric in the response; caching is managed via a separate Caching API
Source: Vertex AI Context Caching overview

Discussion History and Rejected Alternatives

Issue open-telemetry/semantic-conventions#2094 (closed in favor of open-telemetry/semantic-conventions-genai#23) initially proposed:

gen_ai.usage.input_cache_read_tokens
gen_ai.usage.input_cache_write_tokens

Issue open-telemetry/semantic-conventions-genai#23 explored several naming patterns before converging:

gen_ai.usage.cache_creation_input_tokens (flat, underscore-only)
gen_ai.usage.detailed.cache_creation_input_tokens (with detailed namespace)
gen_ai.usage.anthropic.cache_creation_input_tokens (provider-namespaced, rejected as too specific)

The Pydantic AI / Logfire team (Samuel Colvin) adopted a simpler schema in their codebase using: cache_write_tokens, cache_read_tokens, input_audio_tokens, etc. But they are tracking migration to the official OTEL conventions (logfire issue open-telemetry/semantic-conventions#1586).

Key Design Decisions

cache_read not cached: The naming distinguishes between "reading from cache" and "creating/writing to cache" — two distinct operations with different cost implications.
"creation" not "write": The term cache_creation was preferred over cache_write to align with Anthropic's terminology (cache_creation_input_tokens).
Included in input_tokens: The spec mandates that gen_ai.usage.input_tokens SHOULD be the aggregate total. Instrumentations must normalize providers that report disjoint counts (like Anthropic) by summing them.
Stability: Development: These attributes are not yet stable — they are in "Development" stability, meaning they could still change before stabilization.

Test plan

All 23 existing diagnostics-otel tests pass
New assertions verify gen_ai.usage.cache_read.input_tokens and gen_ai.usage.cache_creation.input_tokens are set correctly
New assertion verifies gen_ai.usage.input_tokens equals promptTokens (180) not raw input (100)
New assertion verifies gen_ai.usage.output_tokens is emitted on run.completed spans

…age span attributes Add gen_ai.usage.cache_read.input_tokens and gen_ai.usage.cache_creation.input_tokens span attributes alongside existing openclaw.tokens.cache_* attributes, following the OpenTelemetry GenAI semantic conventions merged in semconv PR openclaw#3163 (Jan 27, 2026). This allows OTEL-compatible observability platforms to pick up cache token counts using the official gen_ai.usage.* convention. Both recordModelInference and recordRunCompleted now emit the new attributes.

The OTEL GenAI semantic conventions specify that gen_ai.usage.input_tokens SHOULD include all input tokens including cached tokens. Previously, the raw provider input value (which for Anthropic excludes cached tokens) was mapped directly. Now use promptTokens (input + cacheRead + cacheWrite) which correctly represents total input tokens per the spec. Also adds gen_ai.usage.input_tokens and gen_ai.usage.output_tokens to the run.completed span which was previously missing them.

rogeriochaves force-pushed the otel-diagnostics-fixes-cache-tokens branch from 26cb19b to 3e3d7b7 Compare February 8, 2026 22:44

rogeriochaves force-pushed the otel-diagnostics-fixes-cache-tokens branch from 3e3d7b7 to a3073ba Compare February 8, 2026 22:59

Baukebrenninkmeijer reviewed Feb 9, 2026

View reviewed changes

Comment thread extensions/diagnostics-otel/src/service.ts

Baukebrenninkmeijer merged commit 895cfcf into orq-ai:otel-diagnostics-fixes Feb 9, 2026
0 of 2 checks passed

rogeriochaves mentioned this pull request Feb 10, 2026

feat(diagnostics-otel): nested message spans + conversation.id fix #3

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: emit cache tokens as gen_ai.usage span attributes#2

feat: emit cache tokens as gen_ai.usage span attributes#2
Baukebrenninkmeijer merged 2 commits intoorq-ai:otel-diagnostics-fixesfrom
langwatch:otel-diagnostics-fixes-cache-tokens

rogeriochaves commented Feb 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rogeriochaves commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Provider semantics comparison

OpenTelemetry GenAI Semantic Conventions: Cache Token Naming

The Final Standard (Merged January 27, 2026)

Note on Naming: Dots vs. Underscores

What Each Provider Uses Natively

Discussion History and Rejected Alternatives

Key Design Decisions

Test plan

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rogeriochaves commented Feb 8, 2026 •

edited

Loading