Skip to content

fix(anthropic): restore accidentally lost cache tokens attributes#3648

Merged
galkleinman merged 6 commits intotraceloop:mainfrom
lmnr-ai:fix/anthropic-cached-tokens
Feb 22, 2026
Merged

fix(anthropic): restore accidentally lost cache tokens attributes#3648
galkleinman merged 6 commits intotraceloop:mainfrom
lmnr-ai:fix/anthropic-cached-tokens

Conversation

@dinmukhamedm
Copy link
Copy Markdown
Collaborator

@dinmukhamedm dinmukhamedm commented Jan 31, 2026

This was accidentally removed in #3138 and some subsequent changes. Restoring. More context in #3647.

Note: this PR does NOT fully close #3647, as there will be a new attribute convention soon, see open-telemetry/semantic-conventions#3163

  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Important

Restores cache token attributes in Anthropic instrumentation and updates tests to verify correct attribute settings.

  • Behavior:
    • Restores cache token attributes in _aset_token_usage() and _set_token_usage() in __init__.py and _set_token_usage() in streaming.py.
    • Updates attribute names from GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS to LLM_USAGE_CACHE_READ_INPUT_TOKENS and GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS to LLM_USAGE_CACHE_CREATION_INPUT_TOKENS.
  • Tests:
    • Adds assertions in test_prompt_caching.py to verify cache token attributes are correctly set for both sync and async operations.

This description was created by Ellipsis for daa6ff5. You can customize this summary. It will automatically update as commits are pushed.

Summary by CodeRabbit

  • New Features

    • Added cache-related input token metrics to OpenTelemetry instrumentation for Anthropic, exposing cache read and cache creation token counts.
  • Bug Fixes

    • Standardized the structured output schema attribute to "gen_ai.request.structured_output_schema" for consistent telemetry reporting.
  • Chores

    • Updated OpenTelemetry semantic conventions dependency.

@dinmukhamedm dinmukhamedm requested a review from nirga January 31, 2026 16:19
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to daa6ff5 in 11 seconds. Click for details.
  • Reviewed 459 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_ZIhOqCTqy5KUUHF1

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 31, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Restores cache token attributes on Anthropic spans, centralizes caching assertions in tests via a new helper, switches a structured-output attribute lookup to a literal key, and bumps the opentelemetry-semantic-conventions-ai dependency version.

Changes

Cohort / File(s) Summary
Cache Token Metrics
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
Adds span attribute writes for gen_ai.usage.cache_read_input_tokens and gen_ai.usage.cache_creation_input_tokens when recording token usage.
Cache Metrics Test Infrastructure
packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py
Introduces _verify_caching_attributes(...) helper and replaces many per-test token assertions with calls to the helper to centralize validation across legacy, async, and streaming test variants.
Structured Output Schema Attribute
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py, packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py
Replaces usage of SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA with the literal attribute key "gen_ai.request.structured_output_schema" for structured JSON schema outputs.
Dependency Update
packages/opentelemetry-instrumentation-anthropic/pyproject.toml
Bumps opentelemetry-semantic-conventions-ai requirement from >=0.4.13,<0.5.0 to >=0.4.14,<0.5.0.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 sniffs the logs
Cache tokens hop back into view,
Spans now whisper what reads and creations do,
Tests sing one tune, tidy and true,
A tiny bump and a hop — hooray, review! 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: restoring cache token attributes that were accidentally removed, which matches the core objective of this pull request.
Linked Issues check ✅ Passed The PR successfully implements the primary objective from issue #3647: restoring cache_read_input_tokens and cache_creation_input_tokens attributes on Anthropic spans for both sync and async operations.
Out of Scope Changes check ✅ Passed All changes are directly scoped to restoring cache token attributes and updating tests; the change to span_utils.py uses the literal string key which aligns with the attribute restoration objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.


set_span_attribute(
span, SpanAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, cache_read_tokens
span, SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS, cache_read_tokens
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change it? the GEN_AI prefix is the new naming.. if possible and already exist in the official otel semconv, i would even import from there instead of using the local version.

(same for all the occurrences below)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current state, these (silently) fail with AttributeError. And we are very lucky that they are the last lines in functions decorated with dont_throw.

I did a little digging around, and, while it seems like the GEN_AI_ prefix is there in the local conventions definition, this has never been released. If you look at the blame, this was added on Oct 30 in #3138, but the last time that the local semantic conventions package was released was on August 22 (pypi, blame). Moreover, both variable prefixes resolve to the same attribute key starting with gen_ai.usage.

And yes, I would love to import from the official semconv (and I outlined the actual small difference in the attribute key in the linked issue description), but it looks like these are not in their Python package yet, as of v0.60.0b1.

@galkleinman if you could please bump the patch version in the opentelemetry-semantic-conventions-ai package and release, I'm more than happy to switch to the new prefix. Though frankly, nothing changes functionally (variables literally have the same values), and this is subject to change soon anyway, as we'll likely see these in the official otel package

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Will do it later today and then you'll be able to bump version here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.4.14 is released :)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@galkleinman I pushed the change. I realized that, unfortunately, another attribute has been removed and not properly tested in #3138, namely gen_ai.request.structured_output_schema. I had to keep it as a raw string in code, but I am open to feedback.

I remember this was originally my initiative, and I think I saw it in some version of OTel conventions, but perhaps they decided not to add it in the end of the day.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO let's keep it string for now, and create an issue to fix it. I don't want to halt this PR any more...

Comment thread packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py Outdated
Comment thread packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py (1)

90-111: Consider extracting a shared helper for the repeated cache-token assertion block.

The same assertion pattern (cross-span equality, directional checks, exact fixture values) is copy-pasted across all 12 tests with only the expected token counts varying. A small helper would cut ~180 lines of duplication and make future attribute-name migrations (the upcoming OTel semconv change) a single-point edit.

💡 Example helper
def assert_cache_token_attributes(
    cache_creation_span,
    cache_read_span,
    expected_cache_tokens: int,
    expected_input_tokens: int,
    creation_output_tokens: int,
    read_output_tokens: int,
):
    # cross-span consistency
    assert (
        cache_creation_span.attributes["gen_ai.usage.cache_creation_input_tokens"]
        == cache_read_span.attributes["gen_ai.usage.cache_read_input_tokens"]
    )
    # creation span: wrote to cache only
    assert cache_creation_span.attributes["gen_ai.usage.cache_read_input_tokens"] == 0
    assert cache_creation_span.attributes["gen_ai.usage.cache_creation_input_tokens"] != 0
    assert cache_creation_span.attributes["gen_ai.usage.cache_creation_input_tokens"] == expected_cache_tokens
    assert cache_creation_span.attributes["gen_ai.usage.input_tokens"] == expected_input_tokens
    assert cache_creation_span.attributes["gen_ai.usage.output_tokens"] == creation_output_tokens
    # read span: read from cache only
    assert cache_read_span.attributes["gen_ai.usage.cache_read_input_tokens"] != 0
    assert cache_read_span.attributes["gen_ai.usage.cache_creation_input_tokens"] == 0
    assert cache_read_span.attributes["gen_ai.usage.cache_read_input_tokens"] == expected_cache_tokens
    assert cache_read_span.attributes["gen_ai.usage.input_tokens"] == expected_input_tokens
    assert cache_read_span.attributes["gen_ai.usage.output_tokens"] == read_output_tokens

Then each test simply calls:

assert_cache_token_attributes(
    cache_creation_span, cache_read_span,
    expected_cache_tokens=1163, expected_input_tokens=1167,
    creation_output_tokens=187, read_output_tokens=202,
)

Also applies to: 185-206, 378-399, 510-531, 601-622, 795-816, 930-951, 1024-1045, 1223-1244, 1358-1379, 1453-1474, 1662-1683

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py`
around lines 90 - 111, The tests duplicate the same cache-token assertion block
across many files; add a single helper function (e.g.,
assert_cache_token_attributes) that accepts cache_creation_span, cache_read_span
and the expected counts (expected_cache_tokens, expected_input_tokens,
creation_output_tokens, read_output_tokens) and performs the cross-span equality
checks, directional zero/non-zero checks, and exact value assertions using the
same attribute names ("gen_ai.usage.cache_creation_input_tokens",
"gen_ai.usage.cache_read_input_tokens", "gen_ai.usage.input_tokens",
"gen_ai.usage.output_tokens"); then replace each repeated block (the uses of
cache_creation_span and cache_read_span shown in the diff) with a single call to
that helper passing the appropriate expected values so future attribute-name
changes or semantic-convention updates are updated in one place.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py`:
- Around line 175-183: The test contains a duplicated assertion comparing
cache_creation_span.attributes["gen_ai.usage.cache_creation_input_tokens"] to
cache_read_span.attributes["gen_ai.usage.cache_read_input_tokens"]; remove the
redundant second occurrence so the equality is asserted only once (locate the
duplicate in tests/test_prompt_caching.py around the block that references
cache_creation_span and cache_read_span and delete the repeated assertion
lines).
- Around line 368-376: Remove the duplicate assertion comparing
cache_creation_span.attributes["gen_ai.usage.cache_creation_input_tokens"] to
cache_read_span.attributes["gen_ai.usage.cache_read_input_tokens"] (the repeated
block using cache_creation_span and cache_read_span); leave a single assertion
that performs this comparison in the test (e.g., in tests/test_prompt_caching.py
within the test function that references cache_creation_span and
cache_read_span) so the redundant copy-pasted assertion is deleted.

---

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py`:
- Around line 90-111: The tests duplicate the same cache-token assertion block
across many files; add a single helper function (e.g.,
assert_cache_token_attributes) that accepts cache_creation_span, cache_read_span
and the expected counts (expected_cache_tokens, expected_input_tokens,
creation_output_tokens, read_output_tokens) and performs the cross-span equality
checks, directional zero/non-zero checks, and exact value assertions using the
same attribute names ("gen_ai.usage.cache_creation_input_tokens",
"gen_ai.usage.cache_read_input_tokens", "gen_ai.usage.input_tokens",
"gen_ai.usage.output_tokens"); then replace each repeated block (the uses of
cache_creation_span and cache_read_span shown in the diff) with a single call to
that helper passing the appropriate expected values so future attribute-name
changes or semantic-convention updates are updated in one place.

Comment thread packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py Outdated
Comment thread packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py (2)

27-42: Prefer > 0 over != 0 for token count assertions.

Token counts are non-negative integers; > 0 is the stronger, semantically correct form. != 0 would technically pass a negative value, which is invalid for a count.

♻️ Use `> 0` for positive-count assertions
-    assert (
-        cache_creation_span.attributes["gen_ai.usage.cache_creation_input_tokens"] != 0
-    )
+    assert cache_creation_span.attributes["gen_ai.usage.cache_creation_input_tokens"] > 0

 ...

-    assert cache_read_span.attributes["gen_ai.usage.cache_read_input_tokens"] != 0
+    assert cache_read_span.attributes["gen_ai.usage.cache_read_input_tokens"] > 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py`
around lines 27 - 42, Replace the weak non-zero assertions that use != 0 with
strict positive checks > 0: update the assertion on
cache_creation_span.attributes["gen_ai.usage.cache_creation_input_tokens"] (in
the cache-creation block) to use > 0 instead of != 0, and update the assertion
on cache_read_span.attributes["gen_ai.usage.cache_read_input_tokens"] (in the
cache-read block) to use > 0 instead of != 0 so token counts are asserted to be
strictly positive.

21-47: Use constant symbols instead of hardcoded string literals for attribute keys.

While the hardcoded strings "gen_ai.usage.cache_creation_input_tokens" and "gen_ai.usage.cache_read_input_tokens" are correct (the constants LLM_USAGE_CACHE_CREATION_INPUT_TOKENS and LLM_USAGE_CACHE_READ_INPUT_TOKENS map to these same values), using the constant symbols directly avoids magic strings and improves maintainability if attribute names ever change in the future.

♻️ Suggested refactor

Import the constants at the top of the test file and use them in _verify_caching_attributes:

from opentelemetry.semconv_ai import SpanAttributes

def _verify_caching_attributes(...):
    assert (
        cache_creation_span.attributes[SpanAttributes.LLM_USAGE_CACHE_CREATION_INPUT_TOKENS]
        == cache_read_span.attributes[SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS]
    )
    # ... apply consistently for all six attribute accesses
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py`
around lines 21 - 47, Replace hardcoded attribute string literals in
_verify_caching_attributes with the SpanAttributes constants from
opentelemetry.semconv_ai: import SpanAttributes at top of the test file and use
SpanAttributes.LLM_USAGE_CACHE_CREATION_INPUT_TOKENS and
SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS (and the corresponding
SpanAttributes for LLM_USAGE_INPUT_TOKENS and LLM_USAGE_OUTPUT_TOKENS) wherever
the code currently indexes cache_creation_span.attributes and
cache_read_span.attributes so all six attribute accesses reference the
SpanAttributes constants instead of raw string literals.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py`:
- Around line 111-115: Remove the redundant equality assertion comparing
cache_creation_span.attributes["gen_ai.usage.cache_creation_input_tokens"] and
cache_read_span.attributes["gen_ai.usage.cache_read_input_tokens"] from the
test; the same cross-span check is already performed by the helper function
_verify_caching_attributes, so delete the assertion block referencing
cache_creation_span and cache_read_span to avoid duplicate checks and keep tests
DRY.
- Around line 693-698: Remove the redundant inline equality assertion that
compares
cache_creation_span.attributes["gen_ai.usage.cache_creation_input_tokens"] to
cache_read_span.attributes["gen_ai.usage.cache_read_input_tokens"] in the test;
rely on the existing helper _verify_caching_attributes(cache_creation_span,
cache_read_span, 1169, 207, 224, 1165) to perform these checks instead—delete
the two-line assert block referencing cache_creation_span and cache_read_span to
avoid duplication.

---

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py`:
- Around line 27-42: Replace the weak non-zero assertions that use != 0 with
strict positive checks > 0: update the assertion on
cache_creation_span.attributes["gen_ai.usage.cache_creation_input_tokens"] (in
the cache-creation block) to use > 0 instead of != 0, and update the assertion
on cache_read_span.attributes["gen_ai.usage.cache_read_input_tokens"] (in the
cache-read block) to use > 0 instead of != 0 so token counts are asserted to be
strictly positive.
- Around line 21-47: Replace hardcoded attribute string literals in
_verify_caching_attributes with the SpanAttributes constants from
opentelemetry.semconv_ai: import SpanAttributes at top of the test file and use
SpanAttributes.LLM_USAGE_CACHE_CREATION_INPUT_TOKENS and
SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS (and the corresponding
SpanAttributes for LLM_USAGE_INPUT_TOKENS and LLM_USAGE_OUTPUT_TOKENS) wherever
the code currently indexes cache_creation_span.attributes and
cache_read_span.attributes so all six attribute accesses reference the
SpanAttributes constants instead of raw string literals.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py (1)

7-7: ⚠️ Potential issue | 🟡 Minor

SpanAttributes import is now unused — will fail Ruff F401 linting.

After lines 68 and 70 switched to the literal string, SpanAttributes is no longer referenced anywhere in this file. The [tool.ruff.lint] config selects the "F" rule-set (which includes F401), so this will fail the linter.

Either remove the import or — preferably, consistent with the rest of the codebase — restore the use of the semconv constant (see related comment in span_utils.py).

🔧 Proposed fix (remove unused import)
 import json
 
 import pytest
 from opentelemetry.semconv._incubating.attributes import (
     gen_ai_attributes as GenAIAttributes,
 )
-from opentelemetry.semconv_ai import SpanAttributes

Or, once the constant is verified to exist in 0.4.14, restore the constant references:

-    assert "gen_ai.request.structured_output_schema" in anthropic_span.attributes
+    assert SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA in anthropic_span.attributes
     schema_attr = json.loads(
-        anthropic_span.attributes["gen_ai.request.structured_output_schema"]
+        anthropic_span.attributes[SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA]
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py`
at line 7, The import SpanAttributes is now unused in
tests/test_structured_outputs.py and will trigger Ruff F401; either remove the
unused import line for SpanAttributes, or restore usage of the semconv constant
by replacing the literal attribute string(s) you added (the two occurrences you
changed to string literals) with the appropriate SpanAttributes member(s) (e.g.,
use SpanAttributes.<CONST_NAME> where the literal was used) so the import is
referenced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py`:
- Line 177: Replace the literal "gen_ai.request.structured_output_schema" with a
named semantic constant: add LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA =
"gen_ai.request.structured_output_schema" to the SpanAttributes constants (where
other SpanAttributes are defined) and then use
SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA in span_utils.py in place of
the literal; update any imports if necessary so span_utils.py references the new
SpanAttributes constant instead of the hard-coded string.

---

Outside diff comments:
In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py`:
- Line 7: The import SpanAttributes is now unused in
tests/test_structured_outputs.py and will trigger Ruff F401; either remove the
unused import line for SpanAttributes, or restore usage of the semconv constant
by replacing the literal attribute string(s) you added (the two occurrences you
changed to string literals) with the appropriate SpanAttributes member(s) (e.g.,
use SpanAttributes.<CONST_NAME> where the literal was used) so the import is
referenced.

set_span_attribute(
span,
SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA,
"gen_ai.request.structured_output_schema",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n "LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA\|structured_output_schema" \
  --type py \
  packages/opentelemetry-semantic-conventions-ai

Repository: traceloop/openllmetry

Length of output: 47


🏁 Script executed:

# Also check the imports and all attribute usages in the span_utils.py file
cat -n packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py | head -20

Repository: traceloop/openllmetry

Length of output: 689


🏁 Script executed:

# Check all attribute assignments in the file to understand the pattern
rg -n "SpanAttributes\.|GenAIAttributes\.|\"gen_ai\." \
  packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py \
  -A 1

Repository: traceloop/openllmetry

Length of output: 4042


🏁 Script executed:

# Check what constants are available in SpanAttributes
rg -n "LLM_REQUEST|structured" \
  --type py \
  packages/opentelemetry-semantic-conventions-ai

Repository: traceloop/openllmetry

Length of output: 642


🏁 Script executed:

# List all available files in the semantic-conventions-ai package
find packages/opentelemetry-semantic-conventions-ai -name "*.py" -type f | head -20

Repository: traceloop/openllmetry

Length of output: 515


🏁 Script executed:

# Check the main attributes file to see what constants are defined
cat -n packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py

Repository: traceloop/openllmetry

Length of output: 15476


Add LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA constant to SpanAttributes or use the appropriate semantic convention constant.

The constant LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA does not exist in opentelemetry-semantic-conventions-ai. Line 177 uses a literal string "gen_ai.request.structured_output_schema" while all other attributes in this file reference named constants from SpanAttributes or GenAIAttributes (e.g., SpanAttributes.LLM_REQUEST_FUNCTIONS at line 159, SpanAttributes.LLM_USAGE_TOTAL_TOKENS at lines 337, 361).

Either define this constant in SpanAttributes to align with semantic conventions best practices and maintain consistency, or verify that this attribute name is correct and document why a literal string is necessary here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py`
at line 177, Replace the literal "gen_ai.request.structured_output_schema" with
a named semantic constant: add LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA =
"gen_ai.request.structured_output_schema" to the SpanAttributes constants (where
other SpanAttributes are defined) and then use
SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA in span_utils.py in place of
the literal; update any imports if necessary so span_utils.py references the new
SpanAttributes constant instead of the hard-coded string.

Copy link
Copy Markdown
Contributor

@galkleinman galkleinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 💪

Just fix the lint & rebase...

@dinmukhamedm dinmukhamedm force-pushed the fix/anthropic-cached-tokens branch from 5100d19 to 01a7108 Compare February 22, 2026 21:24
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py (2)

111-114: Redundant assertion — already covered by _verify_caching_attributes on line 128.

The explicit check that cache_creation_input_tokens == cache_read_input_tokens (lines 111–114) is the first assertion inside _verify_caching_attributes (lines 21–24). The same redundancy appears at lines 693–696 in test_anthropic_prompt_caching_async_with_events_with_no_content.

♻️ Remove redundant assertions

In test_anthropic_prompt_caching_legacy:

-    assert (
-        cache_creation_span.attributes["gen_ai.usage.cache_creation_input_tokens"]
-        == cache_read_span.attributes["gen_ai.usage.cache_read_input_tokens"]
-    )
-
     assert (
         cache_creation_span.attributes.get("gen_ai.response.id")

In test_anthropic_prompt_caching_async_with_events_with_no_content (lines 693–696):

-    assert (
-        cache_creation_span.attributes["gen_ai.usage.cache_creation_input_tokens"]
-        == cache_read_span.attributes["gen_ai.usage.cache_read_input_tokens"]
-    )
-
     _verify_caching_attributes(cache_creation_span, cache_read_span, 1169, 207, 224, 1165)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py`
around lines 111 - 114, Remove the redundant explicit assertions that compare
cache_creation_span.attributes["gen_ai.usage.cache_creation_input_tokens"] to
cache_read_span.attributes["gen_ai.usage.cache_read_input_tokens"] in the tests;
these checks are already covered by the helper _verify_caching_attributes.
Specifically, delete the explicit equality assertion in
test_anthropic_prompt_caching_legacy (the one referencing cache_creation_span
and cache_read_span) and the duplicate in
test_anthropic_prompt_caching_async_with_events_with_no_content, leaving the
calls to _verify_caching_attributes intact.

128-128: Consider using keyword arguments for readability.

The positional call _verify_caching_attributes(cache_creation_span, cache_read_span, 1167, 187, 202, 1163) requires cross-referencing the helper signature to understand what each number means. This pattern repeats across all 12 call sites.

♻️ Example with keyword arguments
_verify_caching_attributes(
    cache_creation_span,
    cache_read_span,
    input_tokens=1167,
    cache_creation_span_output_tokens=187,
    cache_read_span_output_tokens=202,
    cached_tokens=1163,
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py`
at line 128, Replace the positional numeric arguments in the
_verify_caching_attributes calls with explicit keyword arguments for clarity:
locate each call like _verify_caching_attributes(cache_creation_span,
cache_read_span, 1167, 187, 202, 1163) and change the numeric args to
input_tokens=1167, cache_creation_span_output_tokens=187,
cache_read_span_output_tokens=202, cached_tokens=1163 (keeping the same first
two positional args cache_creation_span and cache_read_span); apply the same
change to all similar call sites so the meaning of each numeric literal is
explicit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py`:
- Around line 97-102: The cache token span attributes in streaming.py use
SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS and
SpanAttributes.LLM_USAGE_CACHE_CREATION_INPUT_TOKENS which are inconsistent with
the rest of the codebase that uses
SpanAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS and
SpanAttributes.GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS (as seen in
__init__.py); update the set_span_attribute calls in streaming.py to use the
GEN_AI_USAGE_CACHE_* constants instead and confirm the SpanAttributes import
covers those names.

---

Duplicate comments:
In
`@packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py`:
- Line 177: Replace the hard-coded string
"gen_ai.request.structured_output_schema" with a named constant: add
LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA = "gen_ai.request.structured_output_schema"
to the SpanAttributes constants class, then import/use
SpanAttributes.LLM_REQUEST_STRUCTURED_OUTPUT_SCHEMA in span_utils.py (where the
literal appears) so the file follows the same SpanAttributes/GenAIAttributes
constant pattern.

---

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py`:
- Around line 111-114: Remove the redundant explicit assertions that compare
cache_creation_span.attributes["gen_ai.usage.cache_creation_input_tokens"] to
cache_read_span.attributes["gen_ai.usage.cache_read_input_tokens"] in the tests;
these checks are already covered by the helper _verify_caching_attributes.
Specifically, delete the explicit equality assertion in
test_anthropic_prompt_caching_legacy (the one referencing cache_creation_span
and cache_read_span) and the duplicate in
test_anthropic_prompt_caching_async_with_events_with_no_content, leaving the
calls to _verify_caching_attributes intact.
- Line 128: Replace the positional numeric arguments in the
_verify_caching_attributes calls with explicit keyword arguments for clarity:
locate each call like _verify_caching_attributes(cache_creation_span,
cache_read_span, 1167, 187, 202, 1163) and change the numeric args to
input_tokens=1167, cache_creation_span_output_tokens=187,
cache_read_span_output_tokens=202, cached_tokens=1163 (keeping the same first
two positional args cache_creation_span and cache_read_span); apply the same
change to all similar call sites so the meaning of each numeric literal is
explicit.

Comment on lines +97 to +102
set_span_attribute(
span, SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS, cache_read_tokens
)
set_span_attribute(
span, SpanAttributes.LLM_USAGE_CACHE_CREATION_INPUT_TOKENS, cache_creation_tokens
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's verify the streaming.py file content at the specified lines
echo "=== Content of streaming.py around lines 97-102 ==="
cat -n packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py | sed -n '80,110p'

Repository: traceloop/openllmetry

Length of output: 1460


🏁 Script executed:

# Now check for cache token attributes in __init__.py
echo "=== Searching for cache token attributes in __init__.py ==="
rg -n "CACHE_READ_INPUT_TOKENS|CACHE_CREATION_INPUT_TOKENS" \
  --type py \
  packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/

Repository: traceloop/openllmetry

Length of output: 1227


🏁 Script executed:

# Check the set_span_attribute function implementation
echo "=== Checking set_span_attribute function in utils.py ==="
cat -n packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py | head -40

Repository: traceloop/openllmetry

Length of output: 1450


🏁 Script executed:

# Search for _set_token_usage and _aset_token_usage functions in __init__.py
echo "=== Searching for token usage functions in __init__.py ==="
rg -n "_set_token_usage|_aset_token_usage" \
  --type py \
  packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py \
  -A 15

Repository: traceloop/openllmetry

Length of output: 2829


🏁 Script executed:

# Check the actual cache token attribute lines in __init__.py more closely
echo "=== Checking cache token attributes in _aset_token_usage (around line 286) ==="
cat -n packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py | sed -n '280,295p'

echo ""
echo "=== Checking cache token attributes in _set_token_usage (around line 400) ==="
cat -n packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py | sed -n '395,410p'

Repository: traceloop/openllmetry

Length of output: 1455


🏁 Script executed:

# Verify the SpanAttributes constants used in both files
echo "=== SpanAttributes imports and usage in streaming.py ==="
head -50 packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py | grep -i "import\|SpanAttributes"

echo ""
echo "=== SpanAttributes imports in __init__.py ==="
head -50 packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py | grep -i "import\|SpanAttributes"

Repository: traceloop/openllmetry

Length of output: 1867


🏁 Script executed:

# Let's verify the SpanAttributes constants to understand the difference
echo "=== Checking what SpanAttributes constants are available ==="
rg "LLM_USAGE_CACHE|GEN_AI_USAGE_CACHE" --type py

Repository: traceloop/openllmetry

Length of output: 2886


🏁 Script executed:

# Check if these are custom constants defined in the package
echo "=== Looking for SpanAttributes definition/import ==="
rg -n "class SpanAttributes|from.*SpanAttributes" --type py packages/opentelemetry-instrumentation-anthropic/ -B 2 -A 2

Repository: traceloop/openllmetry

Length of output: 2561


🏁 Script executed:

# Quick check to confirm cache_read_tokens and cache_creation_tokens extraction in __init__.py
echo "=== Checking cache token extraction in __init__.py _aset_token_usage ==="
cat -n packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py | sed -n '200,230p'

Repository: traceloop/openllmetry

Length of output: 1619


Cache token attributes correctly restored, but SpanAttributes constants are inconsistently named across files.

The new set_span_attribute calls for cache tokens in streaming.py (lines 97-102) are correct and consistent with the existing token usage pattern. Cache tokens default to 0 (lines 82, 85), so these attributes will always be set, matching the behavior of LLM_USAGE_TOTAL_TOKENS on line 95.

The non-streaming path in __init__.py also sets cache token attributes in both _aset_token_usage() (lines 286, 290) and _set_token_usage() (lines 400, 404), as expected.

However, there's an inconsistency: streaming.py uses SpanAttributes.LLM_USAGE_CACHE_* constants while __init__.py uses SpanAttributes.GEN_AI_USAGE_CACHE_* constants. Both resolve to the same attribute names (e.g., "gen_ai.usage.cache_read_input_tokens"), so they function identically, but the naming should be consistent across the codebase. Use the same constant names in both files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py`
around lines 97 - 102, The cache token span attributes in streaming.py use
SpanAttributes.LLM_USAGE_CACHE_READ_INPUT_TOKENS and
SpanAttributes.LLM_USAGE_CACHE_CREATION_INPUT_TOKENS which are inconsistent with
the rest of the codebase that uses
SpanAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS and
SpanAttributes.GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS (as seen in
__init__.py); update the set_span_attribute calls in streaming.py to use the
GEN_AI_USAGE_CACHE_* constants instead and confirm the SpanAttributes import
covers those names.

@galkleinman galkleinman merged commit 03d49ae into traceloop:main Feb 22, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 Bug Report: cache read and cache creation input tokens aren't recorded on span attributes

2 participants