feat: emit cache tokens as gen_ai.usage span attributes#2
Merged
Baukebrenninkmeijer merged 2 commits intoorq-ai:otel-diagnostics-fixesfrom Feb 9, 2026
Conversation
26cb19b to
3e3d7b7
Compare
…age span attributes Add gen_ai.usage.cache_read.input_tokens and gen_ai.usage.cache_creation.input_tokens span attributes alongside existing openclaw.tokens.cache_* attributes, following the OpenTelemetry GenAI semantic conventions merged in semconv PR openclaw#3163 (Jan 27, 2026). This allows OTEL-compatible observability platforms to pick up cache token counts using the official gen_ai.usage.* convention. Both recordModelInference and recordRunCompleted now emit the new attributes.
3e3d7b7 to
a3073ba
Compare
The OTEL GenAI semantic conventions specify that gen_ai.usage.input_tokens SHOULD include all input tokens including cached tokens. Previously, the raw provider input value (which for Anthropic excludes cached tokens) was mapped directly. Now use promptTokens (input + cacheRead + cacheWrite) which correctly represents total input tokens per the spec. Also adds gen_ai.usage.input_tokens and gen_ai.usage.output_tokens to the run.completed span which was previously missing them.
895cfcf
into
orq-ai:otel-diagnostics-fixes
0 of 2 checks passed
9 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
gen_ai.usage.cache_read.input_tokensandgen_ai.usage.cache_creation.input_tokensas span attributes alongside the existingopenclaw.tokens.cache_*attributesrecordModelInferenceandrecordRunCompletednow emit the new attributesgen_ai.usage.input_tokensto include all input tokens (including cached) per OTEL semconv, usingpromptTokens(= input + cacheRead + cacheWrite) instead of rawinputwhich excludes cached tokensgen_ai.usage.output_tokenstorecordRunCompletedspan which was previously missing itContext
While integrating OpenClaw's OTEL diagnostics with LangWatch, we found that cache token counts (
cache_read,cache_write) were only emitted under theopenclaw.tokens.*namespace. Since OTEL-compatible observability platforms can't pick those up automatically, this PR adds the tokens under the official OpenTelemetry GenAI semantic convention names.Additionally,
gen_ai.usage.input_tokenswas being set to the raw providerinputvalue, which for providers like Anthropic only represents uncached tokens. The OTEL GenAI spec explicitly states:And for both
cache_read.input_tokensandcache_creation.input_tokens:This means
gen_ai.usage.input_tokensmust be the total of all input tokens, with cache tokens as subsets — not the raw Anthropic value where they are additive/disjoint. OpenClaw already computes this correctly aspromptTokens(viaderivePromptTokens()which sumsinput + cacheRead + cacheWrite), so the fix is simply using that value for the OTEL attribute.Provider semantics comparison
input_tokensmeaningtotal = input + cache_read + cache_creation(additive, disjoint)cached_tokensis a subset ofprompt_tokenscache_readandcache_creationare subsets ofinput_tokensSince pi-ai exposes raw provider values and
derivePromptTokens()already normalizes them, the OTEL plugin should use the normalized value — which this PR now does.OpenTelemetry GenAI Semantic Conventions: Cache Token Naming
The Final Standard (Merged January 27, 2026)
PR open-telemetry/semantic-conventions#3163 was merged into the
mainbranch of theopen-telemetry/semantic-conventionsrepo on January 27, 2026, closing Issue open-telemetry/semantic-conventions-genai#23. The two canonical attribute names are:gen_ai.usage.cache_read.input_tokensintgen_ai.usage.cache_creation.input_tokensintBoth values SHOULD be included in
gen_ai.usage.input_tokens. Thegen_ai.usage.input_tokensdescription was updated to say: "This value SHOULD include all types of input tokens, including cached tokens."These attributes are defined in the registry YAML and documented in gen-ai-spans.md.
Note on Naming: Dots vs. Underscores
The OTEL convention uses dots as namespace separators:
gen_ai.usage.cache_read.input_tokens(note:cache_readis one segment, then.input_tokens)gen_ai.usage.cache_creation.input_tokens(note:cache_creationis one segment, then.input_tokens)What Each Provider Uses Natively
Anthropic (
usageobject in API response):cache_read_input_tokens— number of tokens read from cachecache_creation_input_tokens— number of tokens written to cacheinput_tokensfield only counts tokens that are neither read from nor written to cache (i.e., the three fields are disjoint, not overlapping)OpenAI (
usageobject in API response):usage.prompt_tokens_details.cached_tokens— number of prompt tokens served from cacheusage.prompt_tokensincludes cached tokens (they overlap)Google / Vertex AI (Gemini) (
usage_metadatain API response):cached_content_token_count— number of cached tokens in the promptDiscussion History and Rejected Alternatives
Issue open-telemetry/semantic-conventions#2094 (closed in favor of open-telemetry/semantic-conventions-genai#23) initially proposed:
gen_ai.usage.input_cache_read_tokensgen_ai.usage.input_cache_write_tokensIssue open-telemetry/semantic-conventions-genai#23 explored several naming patterns before converging:
gen_ai.usage.cache_creation_input_tokens(flat, underscore-only)gen_ai.usage.detailed.cache_creation_input_tokens(withdetailednamespace)gen_ai.usage.anthropic.cache_creation_input_tokens(provider-namespaced, rejected as too specific)The Pydantic AI / Logfire team (Samuel Colvin) adopted a simpler schema in their codebase using:
cache_write_tokens,cache_read_tokens,input_audio_tokens, etc. But they are tracking migration to the official OTEL conventions (logfire issue open-telemetry/semantic-conventions#1586).Key Design Decisions
cache_readnotcached: The naming distinguishes between "reading from cache" and "creating/writing to cache" — two distinct operations with different cost implications.cache_creationwas preferred overcache_writeto align with Anthropic's terminology (cache_creation_input_tokens).input_tokens: The spec mandates thatgen_ai.usage.input_tokensSHOULD be the aggregate total. Instrumentations must normalize providers that report disjoint counts (like Anthropic) by summing them.Test plan
gen_ai.usage.cache_read.input_tokensandgen_ai.usage.cache_creation.input_tokensare set correctlygen_ai.usage.input_tokensequalspromptTokens(180) not rawinput(100)gen_ai.usage.output_tokensis emitted onrun.completedspans