feat(anthropic): conform instrumentation to OTel GenAI semantic conventions by max-deygin-traceloop · Pull Request #3835 · traceloop/openllmetry

max-deygin-traceloop · 2026-03-22T08:47:06Z

What

Refactors the Anthropic instrumentation package to emit span attributes that comply with the OpenTelemetry GenAI semantic conventions spec.

Changes

New attributes (replaces legacy flat gen_ai.prompt.* / gen_ai.completion.* keys):

gen_ai.input.messages — full message array as JSON
gen_ai.output.messages — assistant response as JSON
gen_ai.system_instructions — system prompt as standalone attribute
gen_ai.tool.definitions — tool definitions as JSON
gen_ai.provider.name — new required attribute ("anthropic")
gen_ai.operation.name — new required attribute ("chat" / "text_completion")
gen_ai.response.finish_reasons — now emitted as an array regardless of TRACELOOP_TRACE_CONTENT setting

Bug fixes:

gen_ai.system value corrected from "Anthropic" to "anthropic" (spec enum value)
gen_ai.request.max_tokens now correctly reads max_tokens (Messages API) with fallback to max_tokens_to_sample (legacy Completions API)
Tool-use blocks in input messages no longer duplicated across both content and tool_calls
Streaming tool_calls.arguments correctly handles both string (streaming delta accumulation) and dict inputs without double-encoding

Checklist

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.
Validated locally against a running agent using ConsoleSpanExporter.

Confirmed all appear correctly on anthropic.chat spans:

gen_ai.provider.name: "anthropic", gen_ai.operation.name: "chat",
gen_ai.input.messages, gen_ai.output.messages, gen_ai.response.finish_reasons

Note: This is a recreation of #3808, which was auto-closed when its target branch was merged into main. No code changes were made; the branch is identical except for a fix removing the redundant gen_ai.system attribute (deprecated, replaced by gen_ai.provider.name).

Summary by CodeRabbit

Refactor
- Aligned instrumentation with GenAI semantic conventions: consolidated per-message fields into JSON-encoded input/output/tool payloads, standardized finish-reason mappings, switched total-token and response stop-reason keys to GenAI names, and unified provider/operation span naming.
Tests
- Updated and added tests to validate consolidated span payloads, finish-reason mappings, token accounting, streaming consolidation, and semconv compliance.
Chores
- Bumped opentelemetry-semconv-ai requirement to >=0.5.0.

CLAassistant · 2026-03-22T08:47:13Z

All committers have signed the CLA.

coderabbitai · 2026-03-22T08:47:20Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Migrates Anthropic instrumentation from legacy LLM semconv to GenAI semantic-convention keys, consolidates per-index prompt/completion attributes into JSON message payloads, updates provider/operation span attributes and finish-reason mapping, revises streaming/output serialization, and updates tests and dependency ranges.

Changes

Cohort / File(s)	Summary
Instrumentation entrypoints `packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py`, `packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py`	Replaced legacy LLM span attribute keys with GenAI semantic keys for provider/operation names, token usage, and response finish/stop reasons; updated span start attributes to emit GenAI provider/operation values.
Span serialization utils `packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py`	Added finish-reason mapping; replaced per-index prompt/completion attributes with consolidated `gen_ai.input.messages` / `gen_ai.output.messages` JSON payloads and `gen_ai.system_instructions`; converted message content into typed `parts` (text, tool_call, image_url, reasoning); normalized tool definitions and arguments; adjusted request keys (penalties, max_tokens_to_sample precedence) and streaming aggregation logic.
Tests — assertions & schema updates `packages/opentelemetry-instrumentation-anthropic/tests/*` (e.g., `test_messages.py`, `test_thinking.py`, `test_completion.py`, `test_structured_outputs.py`, `test_bedrock_with_raw_response.py`, `test_prompt_caching.py`)	Updated tests to parse JSON-encoded `gen_ai.input.messages` / `gen_ai.output.messages` and `gen_ai.tool.definitions`; replaced indexed attribute assertions with checks against consolidated message/parts structures and new `gen_ai.usage.*` token attributes; added json imports where needed.
New semconv compliance/unit tests `packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_compliance.py`, `packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py`	Added semconv compliance re-export test and detailed unit tests validating GenAI attribute emission, parts schema, tool call serialization, finish-reason mapping, streaming consolidation, and provider/operation identity.
Deps / config `packages/opentelemetry-instrumentation-anthropic/pyproject.toml`	Bumped `opentelemetry-semantic-conventions-ai` constraint to `>=0.5.0,<0.6.0` and raised `anthropic[bedrock]` test dependency minimum.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

🐛 Bug Report: span attributes gen_ai.prompt and gen_ai.completion are deprecated in the latest OpenTelemetry Semantic Conventions #3515 — Migration described in that issue matches this PR's move from legacy LLM attributes to GenAI structured attributes and updated finish/usage keys.

Possibly related PRs

feat(instrumentation): updated GenAI attributes to use OTel's #3138 — Overlapping changes migrating Anthropic instrumentation to GenAI semantic attributes and consolidated prompt/completion schema.

Suggested reviewers

nirga

Poem

🐰 I hopped through spans and payload streams,

Swapped old LLM keys for GenAI dreams,
Prompts packed as parts, tools named with care,
Finish reasons mapped, all tidy and fair,
A carrot-coded hop — observability beams 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 38.53% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and concisely summarizes the main objective: conforming the Anthropic instrumentation to OpenTelemetry GenAI semantic conventions. It accurately reflects the primary change across the codebase.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch max/tlp-1926-anthropic-instrumentation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py (1)
1274-1275: Assert the exact finish_reasons payload here.

in is too weak for the new array semantics—it will still pass if the instrumentation emits duplicates or unrelated finish reasons. These fixtures look single-choice, so an exact assertion will catch the regression this PR is trying to prevent.

Based on learnings: Follow the OpenTelemetry GenAI semantic specification at https://opentelemetry.io/docs/specs/semconv/gen-ai/

Also applies to: 1613-1614
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py`
around lines 1274 - 1275, Replace the weak membership check with an exact
payload assertion: instead of asserting response.stop_reason in finish_reasons,
assert that
anthropic_span.attributes[GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS] equals
the exact list expected (e.g. [response.stop_reason]) so duplicates or extra
values fail; do the same change for the other occurrence referenced (lines
around 1613-1614) and keep the reference to
GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS to locate the test.
packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py (1)
228-237: Consider simplifying the content parsing logic.

The nested conditional parsing (if isinstance(content, str) → json.loads check → if isinstance(content_parsed, list)) is complex. Since you control the span_utils output, you could assert a more specific expected structure.

However, this defensive approach does handle edge cases, so it's acceptable as-is.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py`
around lines 228 - 237, The current nested parsing around
assistant_msg["content"] (variables content → content_parsed) is more defensive
than needed; simplify by asserting the expected shape up front, e.g. require
content to be a str containing JSON when produced by span_utils, parse it once
into content_parsed, then assert isinstance(content_parsed, list) before
building types_in_content and checking "tool_use" is not present; update the
test code that references assistant_msg, content, content_parsed, and
types_in_content to remove the nested isinstance checks and make the expected
structure explicit.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/opentelemetry-instrumentation-anthropic/pyproject.toml`:
- Line 16: The pyproject.toml entry under tool.uv.sources masks the PyPI package
opentelemetry-semantic-conventions-ai so our tests never validate that the
published 0.5.0 wheel actually exports the symbols we import
(opentelemetry.semconv_ai.SpanAttributes, Meters,
SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY, and _testing); remove or
conditionally disable the local source override for
opentelemetry-semantic-conventions-ai in tool.uv.sources (or add an integration
test that installs the released wheel and imports those symbols) so CI/monorepo
tests exercise the real published artifact rather than the sibling checkout.

In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py`:
- Around line 1918-1932: The test fails because streaming builds separate
output.message entries for the same assistant turn (one for text and another for
tool_calls) instead of consolidating them like the non-streaming path; update
the streaming serialization to accumulate all blocks for a single assistant turn
and emit one message object with both "content" and "tool_calls" before writing
GEN_AI_OUTPUT_MESSAGES. Locate the streaming code that appends to
output_messages (the logic that produces the separate text_msg and tool_msgs)
and change it to merge content fragments and tool call entries into a single
assistant message (matching the non-streaming behavior shown around the
serialize logic in span_utils.py lines ~220–223), ensuring output_messages
yields one assistant message containing "content", "role":"assistant", and a
combined "tool_calls" array with arguments serialized the same way the
non-streaming path does.

---

Nitpick comments:
In `@packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py`:
- Around line 1274-1275: Replace the weak membership check with an exact payload
assertion: instead of asserting response.stop_reason in finish_reasons, assert
that anthropic_span.attributes[GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS]
equals the exact list expected (e.g. [response.stop_reason]) so duplicates or
extra values fail; do the same change for the other occurrence referenced (lines
around 1613-1614) and keep the reference to
GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS to locate the test.

In
`@packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py`:
- Around line 228-237: The current nested parsing around
assistant_msg["content"] (variables content → content_parsed) is more defensive
than needed; simplify by asserting the expected shape up front, e.g. require
content to be a str containing JSON when produced by span_utils, parse it once
into content_parsed, then assert isinstance(content_parsed, list) before
building types_in_content and checking "tool_use" is not present; update the
test code that references assistant_msg, content, content_parsed, and
types_in_content to remove the nested isinstance checks and make the expected
structure explicit.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8fd99eb7-5d57-4e35-9b6e-765b533d3d31

📥 Commits

Reviewing files that changed from the base of the PR and between 3f2418b and 4b2e8b8.

⛔ Files ignored due to path filters (1)

packages/opentelemetry-instrumentation-anthropic/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (12)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
packages/opentelemetry-instrumentation-anthropic/pyproject.toml
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py
packages/opentelemetry-instrumentation-anthropic/tests/test_completion.py
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py
packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_compliance.py
packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py
packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py
packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py

OzBenSimhonTraceloop

Furhter more, current anthropic sdk version is

OzBenSimhonTraceloop · 2026-03-23T08:38:50Z

+    stop_reason = response.get("stop_reason")
+
+    if stop_reason:
+        span.set_attribute(GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS, [stop_reason])


The OTel FinishReason enum defines: stop, length, content_filter, tool_call, error, while Anthropic are different AFAIK, so u need mapping here instead of taking Anthropic as is

OzBenSimhonTraceloop · 2026-03-23T09:22:49Z

Bump to latest

lets fix the content format

Address PR #3835 review comments from OzBenSimhonTraceloop: - Input/output messages now use "parts" array structure per gen-ai-input-messages.json and gen-ai-output-messages.json schemas - Map Anthropic finish reasons to OTel enum: end_turn→stop, tool_use→tool_call, max_tokens→length, stop_sequence→stop - Thinking content is now a ReasoningPart (type: "reasoning") inside the assistant message's parts array, not a separate message - Tool calls are now tool_call parts, not a separate "tool_calls" key - Streaming events consolidated into a single assistant message - Rewrote test_semconv_span_attrs.py as spec-driven unit tests (TDD) - Updated all VCR integration test assertions to match Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (2)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (2)

401-402: Remove dead code.

Lines 401-402 are a no-op – tool_arguments is already None if the condition is true.

🧹 Remove redundant lines

                 if isinstance(tool_arguments, str):
                     try:
                         tool_arguments = json.loads(tool_arguments)
                     except (json.JSONDecodeError, TypeError):
                         pass
-                elif tool_arguments is None:
-                    tool_arguments = None
                 parts.append({

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py`
around lines 401 - 402, Remove the redundant no-op branch that checks "elif
tool_arguments is None: tool_arguments = None" in span_utils.py; delete those
two lines so the surrounding if/elif logic no longer contains this dead branch
and the behavior for tool_arguments remains handled by the other existing
branches.

207-220: Async operations inside list comprehension may not work as expected.

The list comprehension on lines 207-220 contains await inside a regular list comprehension. While Python 3.10+ allows await in list comprehensions within async functions, this pattern creates all coroutines first and then awaits them sequentially, which is correct here. However, the structure is complex and hard to read.

Consider using an explicit loop or asyncio.gather for clarity:

♻️ Suggested simplification

-        content = [
-            (
-                await _process_image_item(
-                    model_as_dict(item),
-                    span.context.trace_id,
-                    span.context.span_id,
-                    message_index,
-                    j,
-                )
-                if _is_base64_image(model_as_dict(item))
-                else model_as_dict(item)
-            )
-            for j, item in enumerate(content)
-        ]
+        processed = []
+        for j, item in enumerate(content):
+            item_dict = model_as_dict(item)
+            if _is_base64_image(item_dict):
+                processed.append(await _process_image_item(
+                    item_dict,
+                    span.context.trace_id,
+                    span.context.span_id,
+                    message_index,
+                    j,
+                ))
+            else:
+                processed.append(item_dict)
+        content = processed

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py`
around lines 207 - 220, The list comprehension mixes await with conditional
logic (using _process_image_item, _is_base64_image, model_as_dict and
span.context.trace_id/span_id) which is hard to read; replace it by first
building a list of coroutines for items that need processing (e.g., for j,item
in enumerate(content) append _process_image_item(...) when
_is_base64_image(model_as_dict(item)) else append a completed value or use
asyncio.sleep(0)/None), then run them with asyncio.gather(...) to concurrently
await all results and assign the returned list back to content; alternatively
use an explicit async for loop to sequentially await _process_image_item for
clarity.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py`:
- Around line 401-402: Remove the redundant no-op branch that checks "elif
tool_arguments is None: tool_arguments = None" in span_utils.py; delete those
two lines so the surrounding if/elif logic no longer contains this dead branch
and the behavior for tool_arguments remains handled by the other existing
branches.
- Around line 207-220: The list comprehension mixes await with conditional logic
(using _process_image_item, _is_base64_image, model_as_dict and
span.context.trace_id/span_id) which is hard to read; replace it by first
building a list of coroutines for items that need processing (e.g., for j,item
in enumerate(content) append _process_image_item(...) when
_is_base64_image(model_as_dict(item)) else append a completed value or use
asyncio.sleep(0)/None), then run them with asyncio.gather(...) to concurrently
await all results and assign the returned list back to content; alternatively
use an explicit async for loop to sequentially await _process_image_item for
clarity.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 90a921e4-eef5-4e54-84b5-ba910e5af755

📥 Commits

Reviewing files that changed from the base of the PR and between c73dd21 and 32b81dd.

📒 Files selected for processing (8)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py
packages/opentelemetry-instrumentation-anthropic/tests/test_completion.py
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py
packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py
packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py
packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py

✅ Files skipped from review due to trivial changes (1)

packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py

OzBenSimhonTraceloop

@max-deygin-servicenow please bump the Anthropic SDK from 0.76 to 0.86 + fix the lint issue :)

Address PR #3835 review comments from OzBenSimhonTraceloop: - Input/output messages now use "parts" array structure per gen-ai-input-messages.json and gen-ai-output-messages.json schemas - Map Anthropic finish reasons to OTel enum: end_turn→stop, tool_use→tool_call, max_tokens→length, stop_sequence→stop - Thinking content is now a ReasoningPart (type: "reasoning") inside the assistant message's parts array, not a separate message - Tool calls are now tool_call parts, not a separate "tool_calls" key - Streaming events consolidated into a single assistant message - Rewrote test_semconv_span_attrs.py as spec-driven unit tests (TDD) - Updated all VCR integration test assertions to match Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (2)

packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py (2)
690-714: Same observation as sync streaming test.

The thinking variable is accumulated (lines 691, 699) but not asserted. See previous comment about the sync streaming test for the same consideration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py`
around lines 690 - 714, The test accumulates a `thinking` string from streaming
events but never asserts it; update the test (in the async streaming block where
`text` and `thinking` are built and where `anthropic_span` and `output_messages`
are inspected) to assert that the span/output reasoning part contains the
collected `thinking` content—e.g., locate `thinking` and `text` accumulation in
the async for loop and add an assertion comparing `thinking` to the reasoning
part(s) in `output_messages[0]["parts"]` (or to the relevant attribute on
`anthropic_span`) to ensure the streamed thinking deltas are represented in the
recorded spans/messages.
449-473: Consider adding assertion for accumulated thinking content.

The thinking variable accumulates streamed thinking deltas (lines 450, 458) but is not used in any assertion. While the any() check on line 471 verifies that reasoning parts exist, it doesn't validate the content matches the accumulated thinking.

If this is intentional due to streaming consolidation differences, consider adding a brief comment explaining why the thinking variable is accumulated but not asserted. Alternatively, if it's unused dead code, remove the accumulation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py`
around lines 449 - 473, The test accumulates streaming thinking deltas into the
variable thinking inside the response loop but never asserts or documents its
purpose; either remove the unused accumulation or add an assertion or
explanatory comment: locate the loop that builds text and thinking (the for
event in response block) and either (a) add an assertion that the reasoning part
in output_messages[0]["parts"] contains the accumulated thinking (compare
thinking to the appropriate reasoning part content), or (b) delete the thinking
accumulation if not needed, or (c) add a short comment above the accumulation
explaining why streamed thinking is not asserted (e.g., due to consolidation
differences), so the intent is clear to future readers.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py`:
- Around line 690-714: The test accumulates a `thinking` string from streaming
events but never asserts it; update the test (in the async streaming block where
`text` and `thinking` are built and where `anthropic_span` and `output_messages`
are inspected) to assert that the span/output reasoning part contains the
collected `thinking` content—e.g., locate `thinking` and `text` accumulation in
the async for loop and add an assertion comparing `thinking` to the reasoning
part(s) in `output_messages[0]["parts"]` (or to the relevant attribute on
`anthropic_span`) to ensure the streamed thinking deltas are represented in the
recorded spans/messages.
- Around line 449-473: The test accumulates streaming thinking deltas into the
variable thinking inside the response loop but never asserts or documents its
purpose; either remove the unused accumulation or add an assertion or
explanatory comment: locate the loop that builds text and thinking (the for
event in response block) and either (a) add an assertion that the reasoning part
in output_messages[0]["parts"] contains the accumulated thinking (compare
thinking to the appropriate reasoning part content), or (b) delete the thinking
accumulation if not needed, or (c) add a short comment above the accumulation
explaining why streamed thinking is not asserted (e.g., due to consolidation
differences), so the intent is clear to future readers.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e7a92ea5-daca-4a2a-a4e0-75ff6051f497

📥 Commits

Reviewing files that changed from the base of the PR and between f1971c3 and 10a5f47.

⛔ Files ignored due to path filters (1)

packages/opentelemetry-instrumentation-anthropic/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (12)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
packages/opentelemetry-instrumentation-anthropic/pyproject.toml
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py
packages/opentelemetry-instrumentation-anthropic/tests/test_completion.py
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py
packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_compliance.py
packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_span_attrs.py
packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py
packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py

✅ Files skipped from review due to trivial changes (6)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
packages/opentelemetry-instrumentation-anthropic/pyproject.toml
packages/opentelemetry-instrumentation-anthropic/tests/test_semconv_compliance.py
packages/opentelemetry-instrumentation-anthropic/tests/test_completion.py
packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py

🚧 Files skipped from review as they are similar to previous changes (4)

packages/opentelemetry-instrumentation-anthropic/tests/test_structured_outputs.py
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/init.py
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py

Address PR #3835 review comments from OzBenSimhonTraceloop: - Input/output messages now use "parts" array structure per gen-ai-input-messages.json and gen-ai-output-messages.json schemas - Map Anthropic finish reasons to OTel enum: end_turn→stop, tool_use→tool_call, max_tokens→length, stop_sequence→stop - Thinking content is now a ReasoningPart (type: "reasoning") inside the assistant message's parts array, not a separate message - Tool calls are now tool_call parts, not a separate "tool_calls" key - Streaming events consolidated into a single assistant message - Rewrote test_semconv_span_attrs.py as spec-driven unit tests (TDD) - Updated all VCR integration test assertions to match Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…→ GEN_AI_ - Import GEN_AI_REQUEST_FREQUENCY/PRESENCE_PENALTY from upstream gen_ai_attributes - Use SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS/IS_STREAMING/RESPONSE_FINISH_REASON/ RESPONSE_STOP_REASON (renamed from LLM_*) - Remove duplicate LLM_REQUEST_TYPE dict entry (operation_name var already handles it) - Update test_messages.py and test_bedrock_with_raw_response.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…pan_attrs.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- llm.usage.total_tokens → gen_ai.usage.total_tokens - gen_ai.usage.cache_creation_input_tokens → gen_ai.usage.cache_creation.input_tokens - gen_ai.usage.cache_read_input_tokens → gen_ai.usage.cache_read.input_tokens Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Address PR #3835 review comments from OzBenSimhonTraceloop: - Input/output messages now use "parts" array structure per gen-ai-input-messages.json and gen-ai-output-messages.json schemas - Map Anthropic finish reasons to OTel enum: end_turn→stop, tool_use→tool_call, max_tokens→length, stop_sequence→stop - Thinking content is now a ReasoningPart (type: "reasoning") inside the assistant message's parts array, not a separate message - Tool calls are now tool_call parts, not a separate "tool_calls" key - Streaming events consolidated into a single assistant message - Rewrote test_semconv_span_attrs.py as spec-driven unit tests (TDD) - Updated all VCR integration test assertions to match Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- system_instructions: emit JSON array-of-parts instead of plain string - images: emit OTel BlobPart/UriPart instead of OpenAI image_url format - finish_reason: omit key when absent (both streaming and non-streaming) instead of serializing null - cache token attrs: use non-deprecated dotted constants (gen_ai.usage.cache_creation.input_tokens) - remove redundant _DEPRECATED cache token aliases from semconv-ai - bump semconv-ai dependency lower bound to >=0.5.1 - add comprehensive semconv compliance unit tests - update VCR test assertions to match new compliant formats

max-deygin-traceloop added the Python SDK label Mar 22, 2026

coderabbitai Bot reviewed Mar 22, 2026

View reviewed changes

Comment thread packages/opentelemetry-instrumentation-anthropic/pyproject.toml Outdated

Comment thread packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py Outdated

doronkopit5 reviewed Mar 22, 2026

View reviewed changes

Comment thread ...pentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py

doronkopit5 reviewed Mar 22, 2026

View reviewed changes

Comment thread ...pentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py

doronkopit5 reviewed Mar 22, 2026

View reviewed changes

Comment thread packages/opentelemetry-instrumentation-anthropic/pyproject.toml Outdated

max-deygin-traceloop requested a review from doronkopit5 March 23, 2026 07:27

doronkopit5 previously approved these changes Mar 23, 2026

View reviewed changes

OzBenSimhonTraceloop reviewed Mar 23, 2026

View reviewed changes

coderabbitai Bot reviewed Mar 23, 2026

View reviewed changes

OzBenSimhonTraceloop approved these changes Mar 23, 2026

View reviewed changes

Comment thread ...pentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py

max-deygin-traceloop force-pushed the max/tlp-1926-anthropic-instrumentation branch from f1971c3 to 10a5f47 Compare March 24, 2026 14:56

coderabbitai Bot reviewed Mar 24, 2026

View reviewed changes

max-deygin-traceloop force-pushed the max/tlp-1926-anthropic-instrumentation branch from 10a5f47 to 3ec1309 Compare March 25, 2026 16:22

max-deygin-traceloop force-pushed the max/tlp-1926-anthropic-instrumentation branch from 827dd3b to 543070a Compare March 29, 2026 06:34

max-deygin-servicenow and others added 9 commits March 29, 2026 11:14

feat: Anthropic to gen-ai OTEL atributes

9eff4ac

Fixed a double encoded json

54e2520

fix: lint failures

c3a092b

Aded otel conv tests for Anthropic package

e6361a4

fix(anthropic): use SpanAttributes constant for structured_output_schema

62852a0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(anthropic): remove unused SpanAttributes import in test_semconv_s…

502339f

…pan_attrs.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

removed redundant GEN_AI_SYSTEM attr

480febe

max-deygin-traceloop and others added 5 commits March 29, 2026 11:14

uv lock bump

470f3dc

chore(anthropic): bump anthropic SDK from 0.76 to 0.86

8d55177

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(anthropic): fix lint line length in test_messages.py

d48203a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

max-deygin-traceloop force-pushed the max/tlp-1926-anthropic-instrumentation branch from 543070a to 53cbe2e Compare March 29, 2026 08:14

OzBenSimhonTraceloop and others added 2 commits March 29, 2026 12:08

Merge branch 'main' into max/tlp-1926-anthropic-instrumentation

79cd646

CR fixes #3

c5c0fe9

max-deygin-traceloop force-pushed the max/tlp-1926-anthropic-instrumentation branch from 911f2f1 to c5c0fe9 Compare March 29, 2026 10:53

max-deygin-traceloop added 2 commits March 29, 2026 14:33

CR comments #4

a57ee6d

version

3e4b487

OzBenSimhonTraceloop approved these changes Mar 29, 2026

View reviewed changes

OzBenSimhonTraceloop and others added 3 commits March 29, 2026 15:13

Merge branch 'main' into max/tlp-1926-anthropic-instrumentation

84b8c21

small-fix

4962b65

lint

dc35511

max-deygin-traceloop merged commit 1ebfd1b into main Mar 29, 2026
12 of 13 checks passed

max-deygin-traceloop deleted the max/tlp-1926-anthropic-instrumentation branch March 29, 2026 13:11

This was referenced Mar 29, 2026

feat(langchain): new semnconv langchain #3900

Merged

feat(bedrock): Instrumentation adjustment for Otel GenAI semconv support #3845

Merged

nate-mar mentioned this pull request Mar 30, 2026

fix(openllmetry): support new gen_ai.input/output.messages format (v0.55.0+) Arize-ai/openinference#2931

Merged

6 tasks

coderabbitai Bot mentioned this pull request Mar 31, 2026

feat(openai-agents): GenAI semconv compliance #3837

Merged

This was referenced Apr 12, 2026

feat(llamaindex): Instrumentation adjustment for Otel GenAI semconv support #3979

Merged

fix(groq): align instrumentation with OTel GenAI semconv v1.40.0 #4010

Merged

Conversation

max-deygin-traceloop commented Mar 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Changes

Bug fixes:

Checklist

Summary by CodeRabbit

Uh oh!

CLAassistant commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

OzBenSimhonTraceloop left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

OzBenSimhonTraceloop Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

OzBenSimhonTraceloop Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

OzBenSimhonTraceloop left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

max-deygin-traceloop commented Mar 22, 2026 •

edited by coderabbitai Bot

Loading

CLAassistant commented Mar 22, 2026 •

edited

Loading

coderabbitai Bot commented Mar 22, 2026 •

edited

Loading

OzBenSimhonTraceloop left a comment •

edited

Loading