Skip to content

feat: emit cache tokens as gen_ai.usage span attributes#2

Merged
Baukebrenninkmeijer merged 2 commits intoorq-ai:otel-diagnostics-fixesfrom
langwatch:otel-diagnostics-fixes-cache-tokens
Feb 9, 2026
Merged

feat: emit cache tokens as gen_ai.usage span attributes#2
Baukebrenninkmeijer merged 2 commits intoorq-ai:otel-diagnostics-fixesfrom
langwatch:otel-diagnostics-fixes-cache-tokens

Conversation

@rogeriochaves
Copy link
Copy Markdown

@rogeriochaves rogeriochaves commented Feb 8, 2026

Summary

  • Adds gen_ai.usage.cache_read.input_tokens and gen_ai.usage.cache_creation.input_tokens as span attributes alongside the existing openclaw.tokens.cache_* attributes
  • Both recordModelInference and recordRunCompleted now emit the new attributes
  • Fixes gen_ai.usage.input_tokens to include all input tokens (including cached) per OTEL semconv, using promptTokens (= input + cacheRead + cacheWrite) instead of raw input which excludes cached tokens
  • Adds gen_ai.usage.output_tokens to recordRunCompleted span which was previously missing it
  • Adds test assertions for all new attributes

Context

While integrating OpenClaw's OTEL diagnostics with LangWatch, we found that cache token counts (cache_read, cache_write) were only emitted under the openclaw.tokens.* namespace. Since OTEL-compatible observability platforms can't pick those up automatically, this PR adds the tokens under the official OpenTelemetry GenAI semantic convention names.

Additionally, gen_ai.usage.input_tokens was being set to the raw provider input value, which for providers like Anthropic only represents uncached tokens. The OTEL GenAI spec explicitly states:

"This value SHOULD include all types of input tokens, including cached tokens. Instrumentations SHOULD make a best effort to populate this value, using a total provided by the provider when available or, depending on the provider API, by summing different token types parsed from the provider output."

And for both cache_read.input_tokens and cache_creation.input_tokens:

"The value SHOULD be included in gen_ai.usage.input_tokens."

This means gen_ai.usage.input_tokens must be the total of all input tokens, with cache tokens as subsets — not the raw Anthropic value where they are additive/disjoint. OpenClaw already computes this correctly as promptTokens (via derivePromptTokens() which sums input + cacheRead + cacheWrite), so the fix is simply using that value for the OTEL attribute.

Provider semantics comparison

Provider input_tokens meaning Cache relationship
Anthropic Only uncached tokens (after last cache breakpoint) total = input + cache_read + cache_creation (additive, disjoint)
OpenAI All prompt tokens including cached cached_tokens is a subset of prompt_tokens
OTEL spec All input tokens including cached cache_read and cache_creation are subsets of input_tokens

Since pi-ai exposes raw provider values and derivePromptTokens() already normalizes them, the OTEL plugin should use the normalized value — which this PR now does.

OpenTelemetry GenAI Semantic Conventions: Cache Token Naming

The Final Standard (Merged January 27, 2026)

PR open-telemetry/semantic-conventions#3163 was merged into the main branch of the open-telemetry/semantic-conventions repo on January 27, 2026, closing Issue open-telemetry/semantic-conventions-genai#23. The two canonical attribute names are:

Attribute Type Stability Description
gen_ai.usage.cache_read.input_tokens int Development The number of input tokens served from a provider-managed cache
gen_ai.usage.cache_creation.input_tokens int Development The number of input tokens written to a provider-managed cache

Both values SHOULD be included in gen_ai.usage.input_tokens. The gen_ai.usage.input_tokens description was updated to say: "This value SHOULD include all types of input tokens, including cached tokens."

These attributes are defined in the registry YAML and documented in gen-ai-spans.md.

Note on Naming: Dots vs. Underscores

The OTEL convention uses dots as namespace separators:

  • gen_ai.usage.cache_read.input_tokens (note: cache_read is one segment, then .input_tokens)
  • gen_ai.usage.cache_creation.input_tokens (note: cache_creation is one segment, then .input_tokens)

What Each Provider Uses Natively

Anthropic (usage object in API response):

  • cache_read_input_tokens — number of tokens read from cache
  • cache_creation_input_tokens — number of tokens written to cache
  • Notably, Anthropic's input_tokens field only counts tokens that are neither read from nor written to cache (i.e., the three fields are disjoint, not overlapping)
  • Source: Anthropic Prompt Caching docs

OpenAI (usage object in API response):

  • usage.prompt_tokens_details.cached_tokens — number of prompt tokens served from cache
  • OpenAI has no "cache write" concept exposed to users (caching is automatic and implicit)
  • usage.prompt_tokens includes cached tokens (they overlap)
  • Source: OpenAI Prompt Caching docs

Google / Vertex AI (Gemini) (usage_metadata in API response):

  • cached_content_token_count — number of cached tokens in the prompt
  • Google also has no explicit "cache creation" metric in the response; caching is managed via a separate Caching API
  • Source: Vertex AI Context Caching overview

Discussion History and Rejected Alternatives

Issue open-telemetry/semantic-conventions#2094 (closed in favor of open-telemetry/semantic-conventions-genai#23) initially proposed:

  • gen_ai.usage.input_cache_read_tokens
  • gen_ai.usage.input_cache_write_tokens

Issue open-telemetry/semantic-conventions-genai#23 explored several naming patterns before converging:

  • gen_ai.usage.cache_creation_input_tokens (flat, underscore-only)
  • gen_ai.usage.detailed.cache_creation_input_tokens (with detailed namespace)
  • gen_ai.usage.anthropic.cache_creation_input_tokens (provider-namespaced, rejected as too specific)

The Pydantic AI / Logfire team (Samuel Colvin) adopted a simpler schema in their codebase using: cache_write_tokens, cache_read_tokens, input_audio_tokens, etc. But they are tracking migration to the official OTEL conventions (logfire issue open-telemetry/semantic-conventions#1586).

Key Design Decisions

  1. cache_read not cached: The naming distinguishes between "reading from cache" and "creating/writing to cache" — two distinct operations with different cost implications.
  2. "creation" not "write": The term cache_creation was preferred over cache_write to align with Anthropic's terminology (cache_creation_input_tokens).
  3. Included in input_tokens: The spec mandates that gen_ai.usage.input_tokens SHOULD be the aggregate total. Instrumentations must normalize providers that report disjoint counts (like Anthropic) by summing them.
  4. Stability: Development: These attributes are not yet stable — they are in "Development" stability, meaning they could still change before stabilization.

Test plan

  • All 23 existing diagnostics-otel tests pass
  • New assertions verify gen_ai.usage.cache_read.input_tokens and gen_ai.usage.cache_creation.input_tokens are set correctly
  • New assertion verifies gen_ai.usage.input_tokens equals promptTokens (180) not raw input (100)
  • New assertion verifies gen_ai.usage.output_tokens is emitted on run.completed spans

@rogeriochaves rogeriochaves force-pushed the otel-diagnostics-fixes-cache-tokens branch from 26cb19b to 3e3d7b7 Compare February 8, 2026 22:44
…age span attributes

Add gen_ai.usage.cache_read.input_tokens and gen_ai.usage.cache_creation.input_tokens
span attributes alongside existing openclaw.tokens.cache_* attributes, following the
OpenTelemetry GenAI semantic conventions merged in semconv PR openclaw#3163 (Jan 27, 2026).

This allows OTEL-compatible observability platforms to pick up cache token counts using
the official gen_ai.usage.* convention.

Both recordModelInference and recordRunCompleted now emit the new attributes.
@rogeriochaves rogeriochaves force-pushed the otel-diagnostics-fixes-cache-tokens branch from 3e3d7b7 to a3073ba Compare February 8, 2026 22:59
The OTEL GenAI semantic conventions specify that gen_ai.usage.input_tokens
SHOULD include all input tokens including cached tokens. Previously, the
raw provider input value (which for Anthropic excludes cached tokens) was
mapped directly. Now use promptTokens (input + cacheRead + cacheWrite)
which correctly represents total input tokens per the spec.

Also adds gen_ai.usage.input_tokens and gen_ai.usage.output_tokens to
the run.completed span which was previously missing them.
Comment thread extensions/diagnostics-otel/src/service.ts
@Baukebrenninkmeijer Baukebrenninkmeijer merged commit 895cfcf into orq-ai:otel-diagnostics-fixes Feb 9, 2026
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants