Skip to content

feat(bedrock): Instrumentation adjustment for Otel GenAI semconv support #3845

Merged
max-deygin-traceloop merged 19 commits intomainfrom
max/tlp-1982-bedrock-instrumentation
Mar 30, 2026
Merged

feat(bedrock): Instrumentation adjustment for Otel GenAI semconv support #3845
max-deygin-traceloop merged 19 commits intomainfrom
max/tlp-1982-bedrock-instrumentation

Conversation

@max-deygin-traceloop
Copy link
Copy Markdown
Contributor

@max-deygin-traceloop max-deygin-traceloop commented Mar 24, 2026

Summary

  • Migrate Bedrock instrumentation to OTel GenAI semconv 0.5.0
  • Messages use parts format: {"role": "...", "parts": [{"type": "text", "content": "..."}]}
  • Output messages include finish_reason with OTel-mapped values
  • Set gen_ai.response.finish_reasons span attribute on all responses
  • Finish reason mapping: end_turnstop, tool_usetool_call,
    max_tokenslength, COMPLETEstop, FINISHstop,
    guardrail_intervenedcontent_filter
  • Rename all LLM_* / SpanAttributes.* refs to upstream GenAIAttributes.*
  • All providers covered: Anthropic, Cohere, AI21, Meta, Amazon Titan, Amazon Nova,
    imported models
  • Both invoke_model and converse / converse_stream APIs

Test plan

  • 172 existing unit tests updated and passing (VCR cassettes)
  • Run sample app with live AWS credentials to validate real spans
  • Run sdk-integration-tests against

Summary by CodeRabbit

  • New Features

    • Richer content-to-parts mapping for multimodal outputs and tool calls
    • Spans now include GenAI provider and operation names at start
    • Sample app example switched to the higher-level converse API
  • Bug Fixes

    • Unified and corrected finish-reason handling (streaming & non-streaming)
    • Fixed converse streaming tool-block accumulation and accurate stop-reason emission
    • Emitted events no longer use a sentinel "unknown" finish reason
  • Refactor

    • Adopted incubating GenAI semantic conventions and renamed metrics/guardrail identifiers
  • Tests

    • Added comprehensive semconv, finish-reason, and content-parts test suites; updated existing tests accordingly

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • ✅ Review completed - (🔄 Check again to review again)
📝 Walkthrough

Walkthrough

Reworks Bedrock instrumentation to use incubating GenAI semantic conventions: derive operation names, set provider/operation at span start, normalize finish reasons, restructure prompt/response into message/parts, adjust streaming/converse handling and event emission, rename metrics/constants, and expand semconv and finish-reason tests.

Changes

Cohort / File(s) Summary
Instrumentation Core
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py
Derive operation name from request body, resolve vendor/model, compute span name via _span_name, and start spans with GEN_AI_PROVIDER_NAME and GEN_AI_OPERATION_NAME. Added _derive_operation_name and _span_name.
Span & Semconv Migration
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
Replace legacy LLM attrs with incubating GenAI semconv (GEN_AI_PROVIDER_NAME, GEN_AI_OPERATION_NAME, GEN_AI_*_MESSAGES), add content→parts mappers (_anthropic_content_to_parts, _converse_content_to_parts), centralize finish-reason mapping, and change streaming converse attribute API (set_converse_streaming_response_span_attributes(..., finish_reason=None, tool_blocks=None)).
Event Models & Emission
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py, .../event_models.py
Make ChoiceEvent.finish_reason optional (Optional[str]), introduce _map_finish_reason, accept optional finish reasons for streaming converse emitter, omit finish_reason when None in emitted body, and change emitted log attribute key to GEN_AI_PROVIDER_NAME.
Guardrails & Metrics
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.py, .../prompt_caching.py
Rename guardrail/prompt-caching constants to GEN_AI_* variants, emit GenAIAttributes.GEN_AI_PROVIDER_NAME (use GenAiSystemValues.AWS_BEDROCK.value) and remove legacy gen_ai.vendor/gen_ai.system usages.
Streaming Handling
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py
Safer delta parsing (nullable delta["text"]), support input_json_delta concatenation, capture message_delta.stop_reason, merge streamed usage, and replace prints with logger.debug.
Tests — Semconv & Finish Reasons
packages/opentelemetry-instrumentation-bedrock/tests/...
Large test updates to assert GenAI semconv (provider/operation, GEN_AI_INPUT_MESSAGES/GEN_AI_OUTPUT_MESSAGES, GEN_AI_RESPONSE_FINISH_REASONS), add unit/compliance tests (test_content_parts.py, test_event_emitter.py, test_finish_reasons.py, test_semconv_compliance.py), and add shared test helpers in tests/traces/__init__.py.
Tests — Vendor/Provider Traces
packages/opentelemetry-instrumentation-bedrock/tests/traces/*
Updated many vendor trace tests (AI21, Anthropic, Cohere, Meta, Nova, Titan, ImportedModel) to expect model-qualified span names, GenAI semconv attributes, JSON message/parts assertions, and normalized finish reasons.
Tests — Metrics
packages/opentelemetry-instrumentation-bedrock/tests/metrics/*
Metric tests updated to reference GEN_AI_BEDROCK_* and GEN_AI_PROMPT_CACHING constants and to assert provider-name attribute where applicable.
Sample App
packages/sample-app/sample_app/bedrock_example_app.py
Example switched to brt.converse API, adjusted request/response handling, explicit boto3.Session(profile_name="stg"), and removed unused imports.
Minor / Misc
packages/opentelemetry-instrumentation-bedrock/tests/test_semconv.py
Added wildcard import from opentelemetry.semconv_ai._testing for test helpers.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Instrumentation as InstrumentationCore
    participant Tracer
    participant Model as BedrockRuntime
    participant Stream as StreamingWrapper
    participant Emitter as EventEmitter
    participant Span as Span

    Client->>Instrumentation: call invoke_model / converse (kwargs)
    Instrumentation->>Tracer: start_span(name derived via _derive_operation_name, attrs: GEN_AI_PROVIDER_NAME, GEN_AI_OPERATION_NAME)
    Tracer-->>Span: returns active span
    Instrumentation->>Model: perform call (invoke/converse)
    alt streaming response
        Model->>Stream: stream events/deltas
        Stream->>Instrumentation: aggregated events / deltas (including tool_blocks)
        Stream->>Emitter: emit streaming choice/converse events (finish_reason via _map_finish_reason)
        Stream->>Span: set attributes (GEN_AI_OUTPUT_MESSAGES, GEN_AI_RESPONSE_FINISH_REASONS)
    else non-streaming response
        Model-->>Instrumentation: full response
        Instrumentation->>Span: set attributes (input/output messages, finish reasons)
        Instrumentation->>Emitter: emit non-streaming choice/converse events (mapped finish_reason)
    end
    Instrumentation->>Span: end span
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

Python SDK

Suggested reviewers

  • OzBenSimhonTraceloop

Poem

🐰 I hopped from LLM names to GEN_AI bright,

I mapped finish reasons, set provider right.
Messages split to parts, streaming kept neat,
Spans and events now march in tidy beat.
A carrot for tests — hop on, semconv delight! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.15% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adjusting Bedrock instrumentation to support OpenTelemetry GenAI semantic conventions, which is the core objective of this comprehensive PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch max/tlp-1982-bedrock-instrumentation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1982-bedrock-instrumentation branch from f453b9a to 9dfa73a Compare March 24, 2026 14:56
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py (1)

1048-1068: ⚠️ Potential issue | 🟡 Minor

Cover the tool_use -> tool_call mapping here.

This test hits the new tool-use stop-reason path, but it only checks that non-text deltas do not crash. Please also assert the mapped span attribute so this PR's new behavior is actually exercised.

Suggested assertion
     assert (
         bedrock_span.attributes.get(GenAIAttributes.GEN_AI_OPERATION_NAME)
         == GenAiOperationNameValues.CHAT.value
     )
+    assert (
+        bedrock_span.attributes.get(GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS)
+        == ("tool_call",)
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py`
around lines 1048 - 1068, Add an assertion that verifies the tool-use
stop-reason is mapped to the tool_call operation on the span: when the test in
test_anthropic.py sees tool_use_found True, assert that
bedrock_span.attributes.get(GenAIAttributes.GEN_AI_OPERATION_NAME) equals
GenAiOperationNameValues.TOOL_CALL.value (use the existing bedrock_span,
GenAIAttributes and GenAiOperationNameValues symbols so the new check sits
beside the existing model/vendor/operation assertions).
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py (1)

1003-1024: ⚠️ Potential issue | 🟡 Minor

Assert the new finish-reason span attribute in this event-mode stream test.

This path goes through the same event-enabled streaming branch updated in bedrock/__init__.py. Without checking GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS, the missing span update stays invisible.

Suggested assertion
     assert (
         bedrock_span.attributes[SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS]
         == inputTokens + outputTokens
     )
+    assert (
+        bedrock_span.attributes[GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS]
+        == ("content_filter",)
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py`
around lines 1003 - 1024, Add an assertion to this event-mode streaming test to
verify the span was updated with the finish-reason attribute: after the existing
usage assertions in tests/traces/test_nova.py, assert that
bedrock_span.attributes[GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS] equals
the expected finish reason used in the test (e.g., the local finishReason
variable or the literal like "stop"). This ensures the branch updated in
bedrock.__init__ is exercised and the finish-reason attribute on bedrock_span is
validated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`:
- Around line 404-415: The current branch that calls
emit_streaming_converse_response_event(...) when should_emit_events() is true
omits updating the converse span with the finish reason, so
instrument_with_content / instrument_with_no_content never set
gen_ai.response.finish_reasons; update the code so the span is always annotated
with the finish_reason: extract the finish-reason write into a small helper
(e.g., set_converse_finish_reason(span, finish_reason)) and call it in both the
should_emit_events() path and the else path, or explicitly call a minimal setter
to set span attribute "gen_ai.response.finish_reasons" (using span and
stop_reason) before or after emit_streaming_converse_response_event(...) so
events remain enabled but spans still receive the finish reason via
set_converse_streaming_response_span_attributes or the new helper.

---

Outside diff comments:
In
`@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py`:
- Around line 1048-1068: Add an assertion that verifies the tool-use stop-reason
is mapped to the tool_call operation on the span: when the test in
test_anthropic.py sees tool_use_found True, assert that
bedrock_span.attributes.get(GenAIAttributes.GEN_AI_OPERATION_NAME) equals
GenAiOperationNameValues.TOOL_CALL.value (use the existing bedrock_span,
GenAIAttributes and GenAiOperationNameValues symbols so the new check sits
beside the existing model/vendor/operation assertions).

In `@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py`:
- Around line 1003-1024: Add an assertion to this event-mode streaming test to
verify the span was updated with the finish-reason attribute: after the existing
usage assertions in tests/traces/test_nova.py, assert that
bedrock_span.attributes[GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS] equals
the expected finish reason used in the test (e.g., the local finishReason
variable or the literal like "stop"). This ensures the branch updated in
bedrock.__init__ is exercised and the finish-reason attribute on bedrock_span is
validated.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9af6ddd1-72df-437a-9337-728430cd1668

📥 Commits

Reviewing files that changed from the base of the PR and between f453b9a and 9dfa73a.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-bedrock/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (16)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
  • packages/opentelemetry-instrumentation-bedrock/pyproject.toml
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_semconv.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
✅ Files skipped from review due to trivial changes (4)
  • packages/opentelemetry-instrumentation-bedrock/tests/test_semconv.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
  • packages/opentelemetry-instrumentation-bedrock/pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (5)
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py

@doronkopit5 doronkopit5 changed the title feat(bedrock) Instrumentation adjustment for Otel GenAI semconv support feat(bedrock): Instrumentation adjustment for Otel GenAI semconv support Mar 25, 2026
@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1982-bedrock-instrumentation branch from 9dfa73a to 861d53f Compare March 29, 2026 07:13
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (1)

233-294: ⚠️ Potential issue | 🔴 Critical

Duplicated code and incorrect indentation cause span lifecycle bugs.

The _handle_completion method has structural issues:

  1. Duplicated logic: Lines 244-253 duplicate the exact same code from lines 235-242 (metric_attributes, response_id setting, and duration recording).

  2. Incorrect nesting: Lines 256-294 (token usage calculation, streaming response handling, span.end()) are inside the if self._duration_histogram: block. This means if metrics are disabled (self._duration_histogram is None), the span never ends and token usage is never recorded.

Compare with AnthropicAsyncStream._complete_instrumentation (lines 399-454) which has the correct structure.

🐛 Proposed fix to correct the structure
     def _handle_completion(self):
         """Handle completion logic"""
         metric_attributes = shared_metrics_attributes(self._complete_response)
         set_span_attribute(self._span, GenAIAttributes.GEN_AI_RESPONSE_ID, self._complete_response.get("id"))
+
         if self._duration_histogram:
             duration = time.time() - self._start_time
             self._duration_histogram.record(
                 duration,
                 attributes=metric_attributes,
             )
 
-            # This mirrors the logic from build_from_streaming_response
-            metric_attributes = shared_metrics_attributes(self._complete_response)
-            set_span_attribute(self._span, GenAIAttributes.GEN_AI_RESPONSE_ID, self._complete_response.get("id"))
-
-            if self._duration_histogram:
-                duration = time.time() - self._start_time
-                self._duration_histogram.record(
-                    duration,
-                    attributes=metric_attributes,
-                )
-
-            # Calculate token usage
-            if Config.enrich_token_usage:
+        # Calculate token usage
+        if Config.enrich_token_usage:
+            try:
+                if usage := self._complete_response.get("usage"):
+                    prompt_tokens = usage.get("input_tokens", 0) or 0
+                else:
+                    prompt_tokens = count_prompt_tokens_from_request(self._instance, self._kwargs)
+
+                if usage := self._complete_response.get("usage"):
+                    completion_tokens = usage.get("output_tokens", 0) or 0
+                else:
+                    completion_content = ""
+                    if self._complete_response.get("events"):
+                        model_name = self._complete_response.get("model") or None
+                        for event in self._complete_response.get("events"):
+                            if event.get("text"):
+                                completion_content += event.get("text")
+
+                        if model_name and hasattr(self._instance, "count_tokens"):
+                            completion_tokens = self._instance.count_tokens(completion_content)
+
+                _set_token_usage(
+                    self._span,
+                    self._complete_response,
+                    prompt_tokens,
+                    completion_tokens,
+                    metric_attributes,
+                    self._token_histogram,
+                    self._choice_counter,
+                )
+            except Exception as e:
+                logger.warning("Failed to set token usage, error: %s", e)
+
+        _handle_streaming_response(self._span, self._event_logger, self._complete_response)
+
+        if self._span.is_recording():
+            self._span.set_status(Status(StatusCode.OK))
+            self._span.end()
+
+        self._instrumentation_completed = True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py`
around lines 233 - 294, The _handle_completion method contains duplicated metric
recording and incorrectly nests the token-usage calculation and span completion
inside the if self._duration_histogram block; remove the duplicated
metric_attributes/response_id/duration-recording lines (the second copy around
metric_attributes/duration recording) and dedent the token usage block,
_handle_streaming_response call, span status/set_status and self._span.end() so
they run whether or not self._duration_histogram is set; ensure you still record
duration only when self._duration_histogram is truthy, set
GenAIAttributes.GEN_AI_RESPONSE_ID on self._span once using
self._complete_response.get("id"), and preserve references to
Config.enrich_token_usage, _set_token_usage, _handle_streaming_response,
_token_histogram, _choice_counter and toggle self._instrumentation_completed =
True after ending the span.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 258-275: Compute and set the GEN_AI_RESPONSE_FINISH_REASONS
attribute unconditionally (outside the should_send_prompts() gating) so finish
reasons are recorded even in instrument_with_no_content mode: update
_set_generations_span_attributes to always collect finish_reasons and call
_set_span_attribute(span, GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS,
tuple(finish_reasons)) regardless of prompt/response content gating, and remove
or relocate any surrounding checks in callers like
set_model_choice_span_attributes() so only actual prompt/response content
remains gated; apply the same unconditional finish-reasons write to the other
analogous helpers mentioned (ranges around 333-355, 462-480, 513-538, 633-660,
711-725, 888-904, 907-920).
- Around line 104-121: The span attribute GenAIAttributes.GEN_AI_INPUT_MESSAGES
currently only serializes request_body["messages"], dropping any top-level
Anthropic system prompt; update the logic around request_body handling (the loop
building input_messages and the call to _set_span_attribute with span) to also
include a top-level request_body.get("system") as a message entry (e.g., role
"system" with parts derived via _text_part or _anthropic_content_to_parts
depending on type) before appending the existing messages, reusing _text_part
and _anthropic_content_to_parts to build parts and then JSON-serializing the
full list when calling _set_span_attribute.
- Around line 24-45: BEDROCK_FINISH_REASON_MAP currently misses AI21 and an
uppercase Bedrock guardrail token so raw provider values like "endoftext" and
"CONTENT_FILTERED" leak through; add explicit mappings (e.g. "endoftext": "stop"
and "CONTENT_FILTERED": "content_filter") to BEDROCK_FINISH_REASON_MAP and
update _map_finish_reason to normalize the incoming reason (coerce to str, then
check direct, reason.lower(), and reason.upper() lookups) before falling back to
the raw value so known provider variants are transformed to the expected GenAI
semconv values (refer to BEDROCK_FINISH_REASON_MAP and _map_finish_reason and
ensure output matches GEN_AI_RESPONSE_FINISH_REASONS).
- Around line 661-676: The code iterates each content block and creates separate
assistant messages, which breaks message boundaries and drops non-text parts;
instead, read the single Bedrock message at
response_body.get("output").get("message"), build one assistant message whose
"parts" is a list created by mapping each entry in message["content"] to a part
(use _text_part for items with "text" and preserve or transform non-text blocks
consistently—or call any existing helper like _part if present), then set
output_messages = [{ "role": "assistant", "parts": parts, "finish_reason": fr }]
and keep the _set_span_attribute calls (json.dumps(output_messages) and
GenAI_RESPONSE_FINISH_REASONS) as before so message boundaries and non-text
parts are preserved.

---

Outside diff comments:
In
`@packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py`:
- Around line 233-294: The _handle_completion method contains duplicated metric
recording and incorrectly nests the token-usage calculation and span completion
inside the if self._duration_histogram block; remove the duplicated
metric_attributes/response_id/duration-recording lines (the second copy around
metric_attributes/duration recording) and dedent the token usage block,
_handle_streaming_response call, span status/set_status and self._span.end() so
they run whether or not self._duration_histogram is set; ensure you still record
duration only when self._duration_histogram is truthy, set
GenAIAttributes.GEN_AI_RESPONSE_ID on self._span once using
self._complete_response.get("id"), and preserve references to
Config.enrich_token_usage, _set_token_usage, _handle_streaming_response,
_token_histogram, _choice_counter and toggle self._instrumentation_completed =
True after ending the span.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3c9f3c41-9237-4cd5-ad40-ecfcd297a39d

📥 Commits

Reviewing files that changed from the base of the PR and between 9dfa73a and 861d53f.

⛔ Files ignored due to path filters (33)
  • packages/opentelemetry-instrumentation-agno/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-alephalpha/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-anthropic/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-bedrock/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-chromadb/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-cohere/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-crewai/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-google-generativeai/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-groq/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-haystack/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-lancedb/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-langchain/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-llamaindex/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-marqo/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-mcp/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-milvus/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-mistralai/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-ollama/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-openai-agents/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-openai/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-pinecone/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-qdrant/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-replicate/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-sagemaker/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-together/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-transformers/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-vertexai/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-voyageai/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-watsonx/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-weaviate/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-writer/uv.lock is excluded by !**/*.lock
  • packages/sample-app/uv.lock is excluded by !**/*.lock
  • packages/traceloop-sdk/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (52)
  • packages/opentelemetry-instrumentation-agno/pyproject.toml
  • packages/opentelemetry-instrumentation-alephalpha/pyproject.toml
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
  • packages/opentelemetry-instrumentation-anthropic/pyproject.toml
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
  • packages/opentelemetry-instrumentation-bedrock/pyproject.toml
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_semconv.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
  • packages/opentelemetry-instrumentation-chromadb/pyproject.toml
  • packages/opentelemetry-instrumentation-cohere/pyproject.toml
  • packages/opentelemetry-instrumentation-crewai/pyproject.toml
  • packages/opentelemetry-instrumentation-google-generativeai/pyproject.toml
  • packages/opentelemetry-instrumentation-groq/pyproject.toml
  • packages/opentelemetry-instrumentation-haystack/pyproject.toml
  • packages/opentelemetry-instrumentation-lancedb/pyproject.toml
  • packages/opentelemetry-instrumentation-langchain/pyproject.toml
  • packages/opentelemetry-instrumentation-llamaindex/pyproject.toml
  • packages/opentelemetry-instrumentation-marqo/pyproject.toml
  • packages/opentelemetry-instrumentation-mcp/pyproject.toml
  • packages/opentelemetry-instrumentation-milvus/pyproject.toml
  • packages/opentelemetry-instrumentation-mistralai/pyproject.toml
  • packages/opentelemetry-instrumentation-ollama/pyproject.toml
  • packages/opentelemetry-instrumentation-openai-agents/pyproject.toml
  • packages/opentelemetry-instrumentation-openai/pyproject.toml
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
  • packages/opentelemetry-instrumentation-pinecone/pyproject.toml
  • packages/opentelemetry-instrumentation-qdrant/pyproject.toml
  • packages/opentelemetry-instrumentation-replicate/pyproject.toml
  • packages/opentelemetry-instrumentation-sagemaker/pyproject.toml
  • packages/opentelemetry-instrumentation-together/pyproject.toml
  • packages/opentelemetry-instrumentation-transformers/pyproject.toml
  • packages/opentelemetry-instrumentation-vertexai/pyproject.toml
  • packages/opentelemetry-instrumentation-voyageai/pyproject.toml
  • packages/opentelemetry-instrumentation-watsonx/pyproject.toml
  • packages/opentelemetry-instrumentation-weaviate/pyproject.toml
  • packages/opentelemetry-instrumentation-writer/pyproject.toml
  • packages/sample-app/pyproject.toml
  • packages/traceloop-sdk/pyproject.toml
✅ Files skipped from review due to trivial changes (29)
  • packages/opentelemetry-instrumentation-lancedb/pyproject.toml
  • packages/opentelemetry-instrumentation-milvus/pyproject.toml
  • packages/opentelemetry-instrumentation-groq/pyproject.toml
  • packages/opentelemetry-instrumentation-mistralai/pyproject.toml
  • packages/opentelemetry-instrumentation-cohere/pyproject.toml
  • packages/opentelemetry-instrumentation-anthropic/pyproject.toml
  • packages/opentelemetry-instrumentation-transformers/pyproject.toml
  • packages/opentelemetry-instrumentation-chromadb/pyproject.toml
  • packages/opentelemetry-instrumentation-haystack/pyproject.toml
  • packages/opentelemetry-instrumentation-bedrock/tests/test_semconv.py
  • packages/opentelemetry-instrumentation-mcp/pyproject.toml
  • packages/opentelemetry-instrumentation-google-generativeai/pyproject.toml
  • packages/opentelemetry-instrumentation-writer/pyproject.toml
  • packages/opentelemetry-instrumentation-vertexai/pyproject.toml
  • packages/opentelemetry-instrumentation-ollama/pyproject.toml
  • packages/opentelemetry-instrumentation-alephalpha/pyproject.toml
  • packages/opentelemetry-instrumentation-crewai/pyproject.toml
  • packages/opentelemetry-instrumentation-sagemaker/pyproject.toml
  • packages/traceloop-sdk/pyproject.toml
  • packages/opentelemetry-instrumentation-together/pyproject.toml
  • packages/opentelemetry-instrumentation-qdrant/pyproject.toml
  • packages/opentelemetry-instrumentation-openai/pyproject.toml
  • packages/opentelemetry-instrumentation-voyageai/pyproject.toml
  • packages/opentelemetry-instrumentation-bedrock/pyproject.toml
  • packages/opentelemetry-instrumentation-replicate/pyproject.toml
  • packages/opentelemetry-instrumentation-watsonx/pyproject.toml
  • packages/sample-app/pyproject.toml
  • packages/opentelemetry-instrumentation-openai-agents/pyproject.toml
  • packages/opentelemetry-instrumentation-marqo/pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (8)
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/init.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.py

@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1982-bedrock-instrumentation branch from 861d53f to 7cf46fe Compare March 29, 2026 08:18
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)

557-573: Test assertion for guardContent parts may be fragile.

The test at lines 561-563 asserts that guardContent blocks become text parts with json.dumps(block, default=str) as content. This tightly couples the test to the exact serialization format in _converse_content_to_parts. If the serialization changes (e.g., extracting the text from guardContent), this test will fail.

Consider whether this is the intended behavior or if guardContent should extract the actual text content instead of serializing the entire block.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py`
around lines 557 - 573, The test is too tightly coupled to the serialization
used by _converse_content_to_parts: instead of asserting parts content equals
json.dumps(block, default=str), change the assertions in
tests/traces/test_titan.py (the checks around
GenAIAttributes.GEN_AI_INPUT_MESSAGES) to accept either the serialized form or
the extracted text form; e.g., for each guardContent block check
parts[i]["type"] == "text" and then assert that parts[i]["content"] is either
json.dumps(block, default=str) or equals block.get("text") (or contains the
expected text substring), so the test won't break if _converse_content_to_parts
changes to extract text rather than serialize the whole block.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 103-121: Modify the Anthropic messages handling to include any
top-level system prompt before serializing messages: when processing
request_body in the branch that handles "messages" (the block that builds
input_messages using _text_part and _anthropic_content_to_parts and calls
_set_span_attribute with GenAIAttributes.GEN_AI_INPUT_MESSAGES), first check for
a "system" key in request_body and, if present, prepend a message dict for the
system (role "system" with parts produced by _text_part or
_anthropic_content_to_parts as appropriate) to the input_messages list so the
system prompt is preserved in the span attribute; use the same pattern as
_set_amazon_input_span_attributes to detect and convert the system value before
json.dumps and calling _set_span_attribute.

---

Nitpick comments:
In `@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py`:
- Around line 557-573: The test is too tightly coupled to the serialization used
by _converse_content_to_parts: instead of asserting parts content equals
json.dumps(block, default=str), change the assertions in
tests/traces/test_titan.py (the checks around
GenAIAttributes.GEN_AI_INPUT_MESSAGES) to accept either the serialized form or
the extracted text form; e.g., for each guardContent block check
parts[i]["type"] == "text" and then assert that parts[i]["content"] is either
json.dumps(block, default=str) or equals block.get("text") (or contains the
expected text substring), so the test won't break if _converse_content_to_parts
changes to extract text rather than serialize the whole block.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 28844d79-da4b-43a6-9ec2-ac9b535be326

📥 Commits

Reviewing files that changed from the base of the PR and between 861d53f and 7cf46fe.

📒 Files selected for processing (15)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_semconv.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
✅ Files skipped from review due to trivial changes (1)
  • packages/opentelemetry-instrumentation-bedrock/tests/test_semconv.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/init.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py

@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1982-bedrock-instrumentation branch from 0268c6d to 8b036b2 Compare March 29, 2026 09:59
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (3)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (1)

404-415: ⚠️ Potential issue | 🟠 Major

converse_stream spans still miss finish reasons in event mode.

In the should_emit_events() branch we only emit the choice log. set_converse_streaming_response_span_attributes(..., finish_reason=stop_reason) still runs only in the else path, so the same streamed response produces different span attributes depending on whether event logging is enabled.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`
around lines 404 - 415, The finishReason (stop_reason) isn't applied to
converse_stream spans when events are enabled because
set_converse_streaming_response_span_attributes(response_msg, role, span,
finish_reason=stop_reason) is only called in the else branch; update the logic
so that after determining stop_reason you always call
set_converse_streaming_response_span_attributes with finish_reason=stop_reason
(either before or after emit_streaming_converse_response_event), and keep
emit_streaming_converse_response_event(event_logger, response_msg, role,
stop_reason) in the should_emit_events() branch so both event emission and span
attribute setting occur regardless of event mode.
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (2)

22-45: ⚠️ Potential issue | 🟠 Major

Normalize the remaining Bedrock finish-reason variants.

This map still misses known values like AI21 endoftext and Bedrock/Titan CONTENT_FILTERED, and _map_finish_reason() is still case-sensitive. Those raw provider tokens will leak into gen_ai.response.finish_reasons unchanged.

💡 Proposed fix
 BEDROCK_FINISH_REASON_MAP = {
     # Anthropic via Bedrock
     "end_turn": "stop",
     "stop_sequence": "stop",
     "tool_use": "tool_call",
     "max_tokens": "length",
+    # AI21 via Bedrock
+    "endoftext": "stop",
     # Cohere via Bedrock
     "COMPLETE": "stop",
     "TOOL_CALL": "tool_call",
     "MAX_TOKENS": "length",
     # Amazon Titan
     "FINISH": "stop",
+    "CONTENT_FILTERED": "content_filter",
     # Converse API
     "guardrail_intervened": "content_filter",
 }
 
 def _map_finish_reason(reason):
     """Map provider-specific finish reason to OTel GenAI enum value."""
     if not reason:
         return None
-    return BEDROCK_FINISH_REASON_MAP.get(reason, reason)
+    reason = str(reason)
+    return (
+        BEDROCK_FINISH_REASON_MAP.get(reason)
+        or BEDROCK_FINISH_REASON_MAP.get(reason.lower())
+        or BEDROCK_FINISH_REASON_MAP.get(reason.upper())
+        or reason
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 22 - 45, BEDROCK_FINISH_REASON_MAP and _map_finish_reason are
leaking raw provider tokens because some known variants (e.g., "endoftext",
"CONTENT_FILTERED") are missing and the lookup is case-sensitive; update
BEDROCK_FINISH_REASON_MAP to include the missing variants (at least "endoftext"
-> "stop" and "CONTENT_FILTERED" -> "content_filter") and modify
_map_finish_reason to normalize the incoming reason (e.g., return None for
falsy, then lookup using reason_normalized = reason.strip().lower()) so keys
match a lowercased map and unknown values fall back to the original normalized
token or None as appropriate.

728-739: ⚠️ Potential issue | 🟠 Major

Keep Bedrock output.message as a single assistant message.

response_body["output"]["message"] is already one message. Iterating content here creates one assistant message per block and drops non-text blocks because only msg.get("text") is read.

💡 Proposed fix
     elif "output" in response_body:
         fr = _map_finish_reason(response_body.get("stopReason"))
-        msgs = response_body.get("output").get("message", {}).get("content", [])
-        output_messages = []
-        for msg in msgs:
-            output_messages.append(_output_message("assistant", [_text_part(msg.get("text"))], fr))
+        message = response_body.get("output", {}).get("message", {})
+        output_messages = [
+            _output_message(
+                message.get("role", "assistant"),
+                _converse_content_to_parts(message.get("content", [])),
+                fr,
+            )
+        ]
         _set_span_attribute(
             span,
             GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
             json.dumps(output_messages),
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 728 - 739, The code treats response_body["output"]["message"] as
multiple messages by iterating its "content" and only using msg.get("text"),
which splits a single assistant message and drops non-text blocks; change the
logic to create one assistant output message from
response_body.get("output").get("message") (not per content block), build its
parts by mapping each block in message.get("content", []) through _text_part but
preserve non-text blocks (e.g., include their raw dict or serialize them) so
_output_message("assistant", parts, fr) is called once, then set the
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES span attribute with that single message;
reference functions/vars: _map_finish_reason, _text_part, _output_message, span,
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, response_body.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`:
- Around line 456-469: The renamed metric constants removed the old
LLM_BEDROCK_* identifiers causing AttributeError for downstream imports; restore
backward-compatible aliases by adding the old names as assignments pointing to
the new constants (e.g., define LLM_BEDROCK_GUARDRAIL_ACTIVATION =
GuardrailMeters.GEN_AI_BEDROCK_GUARDRAIL_ACTIVATION,
LLM_BEDROCK_GUARDRAIL_LATENCY =
GuardrailMeters.GEN_AI_BEDROCK_GUARDRAIL_LATENCY, etc., and
LLM_BEDROCK_PROMPT_CACHING = PromptCaching.GEN_AI_PROMPT_CACHING) so both
GuardrailMeters and PromptCaching keep the new names while the legacy
LLM_BEDROCK_* symbols remain available.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 212-218: The code converts missing AI21 finishReason values into
the string "None" (via str(fr_data)) which then gets mapped/emitted as a
sentinel; change the logic in the AI21 handling to avoid serializing absent
values: for the block handling model_vendor == "ai21" (variables: response_body,
comp, fr_data, finish_reasons, _map_finish_reason) set raw =
comp.get("finishReason", {}).get("reason") if isinstance(fr_data, dict) else
(str(fr_data) if fr_data is not None else None) so that None stays None and is
skipped; apply the same defensive change to the other occurrence referenced in
the comment (lines ~547-555) so missing finishReason values are omitted rather
than serialized.
- Around line 990-1000: The streaming path in
set_converse_streaming_response_span_attributes currently flattens response into
a single text part via "".join(response), losing tool calls and non-text parts;
instead buffer and convert the streaming response into the same structured
message schema used by the non-streaming path: iterate the response items, map
each item to the appropriate part (using _text_part for text and whatever helper
logic the non-streaming path uses for tool_use/other parts), build the parts
list, then call _output_message(role, parts, fr) and json.dumps that result
before passing to _set_span_attribute; keep the existing use of
_map_finish_reason, _set_span_attribute, _output_message, _text_part and honor
should_send_prompts().

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py`:
- Line 75: The debug log currently emits the full accumulated streamed payload
via logger.debug("Accumulating body: %s", self._accumulating_body), which can
leak customer data and produce huge logs; change this to avoid logging raw
content—log a safe summary instead (e.g., length with
len(self._accumulating_body), a truncated preview with masking, or a
hash/checksum) and keep the logger call to reference logger.debug and
self._accumulating_body but only include the safe summary so no raw model/tool
payload is written to logs.
- Around line 52-60: The branch handling content_block_delta must correctly
handle streamed tool JSON fragments and thinking deltas: change the logic in the
content_block_delta branch (where _accumulating_body is updated) to (1) handle
delta.get("type") == "input_json_delta" by treating partial_json as either a
JSON fragment or string — if current.get("input") is a dict and partial_json
parses to a dict, merge keys; if partial_json is a string, append to a buffer
string (e.g., keep current["input_buffer"] or current["input"] as str) and only
attempt json.loads/merge inside a safe try/except so parsing errors don't raise
from _process_event; (2) add handling for delta.get("type") == "thinking_delta"
(or delta keys containing reasoning text) and append that text into a dedicated
field (e.g., current.setdefault("reasoning", "") and += delta_text) so reasoning
streams are preserved. Use symbol names from the diff: _accumulating_body,
content_block_delta branch, input_json_delta, thinking_delta, and ensure all
JSON parsing/merging is guarded to avoid raising exceptions inside the
`@dont_throw-wrapped` _process_event.

In `@packages/opentelemetry-instrumentation-bedrock/tests/test_event_emitter.py`:
- Around line 117-132: The test test_missing_finish_reason_not_unknown currently
checks logger output with assert log_record.body.get("finish_reason") !=
"unknown", which still passes if the emitter writes None/""; change this to
assert "finish_reason" not in log_record.body to explicitly require omission,
and apply the same change to the other tests in this file that use != "unknown"
for finish_reason; locate usages around emit_choice_events, logger.emit and
log_record.body to update the assertions accordingly.

---

Duplicate comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`:
- Around line 404-415: The finishReason (stop_reason) isn't applied to
converse_stream spans when events are enabled because
set_converse_streaming_response_span_attributes(response_msg, role, span,
finish_reason=stop_reason) is only called in the else branch; update the logic
so that after determining stop_reason you always call
set_converse_streaming_response_span_attributes with finish_reason=stop_reason
(either before or after emit_streaming_converse_response_event), and keep
emit_streaming_converse_response_event(event_logger, response_msg, role,
stop_reason) in the should_emit_events() branch so both event emission and span
attribute setting occur regardless of event mode.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 22-45: BEDROCK_FINISH_REASON_MAP and _map_finish_reason are
leaking raw provider tokens because some known variants (e.g., "endoftext",
"CONTENT_FILTERED") are missing and the lookup is case-sensitive; update
BEDROCK_FINISH_REASON_MAP to include the missing variants (at least "endoftext"
-> "stop" and "CONTENT_FILTERED" -> "content_filter") and modify
_map_finish_reason to normalize the incoming reason (e.g., return None for
falsy, then lookup using reason_normalized = reason.strip().lower()) so keys
match a lowercased map and unknown values fall back to the original normalized
token or None as appropriate.
- Around line 728-739: The code treats response_body["output"]["message"] as
multiple messages by iterating its "content" and only using msg.get("text"),
which splits a single assistant message and drops non-text blocks; change the
logic to create one assistant output message from
response_body.get("output").get("message") (not per content block), build its
parts by mapping each block in message.get("content", []) through _text_part but
preserve non-text blocks (e.g., include their raw dict or serialize them) so
_output_message("assistant", parts, fr) is called once, then set the
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES span attribute with that single message;
reference functions/vars: _map_finish_reason, _text_part, _output_message, span,
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, response_body.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7cbfa3b1-7c19-4343-ab25-d0c40edc8e5f

📥 Commits

Reviewing files that changed from the base of the PR and between 7cf46fe and 8b036b2.

📒 Files selected for processing (21)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_models.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_prompt_caching_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_content_parts.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_event_emitter.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_semconv.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
✅ Files skipped from review due to trivial changes (2)
  • packages/opentelemetry-instrumentation-bedrock/tests/test_semconv.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_models.py
🚧 Files skipped from review as they are similar to previous changes (8)
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

Comment on lines 456 to 469
class GuardrailMeters:
LLM_BEDROCK_GUARDRAIL_ACTIVATION = "gen_ai.bedrock.guardrail.activation"
LLM_BEDROCK_GUARDRAIL_LATENCY = "gen_ai.bedrock.guardrail.latency"
LLM_BEDROCK_GUARDRAIL_COVERAGE = "gen_ai.bedrock.guardrail.coverage"
LLM_BEDROCK_GUARDRAIL_SENSITIVE = "gen_ai.bedrock.guardrail.sensitive_info"
LLM_BEDROCK_GUARDRAIL_TOPICS = "gen_ai.bedrock.guardrail.topics"
LLM_BEDROCK_GUARDRAIL_CONTENT = "gen_ai.bedrock.guardrail.content"
LLM_BEDROCK_GUARDRAIL_WORDS = "gen_ai.bedrock.guardrail.words"
GEN_AI_BEDROCK_GUARDRAIL_ACTIVATION = "gen_ai.bedrock.guardrail.activation"
GEN_AI_BEDROCK_GUARDRAIL_LATENCY = "gen_ai.bedrock.guardrail.latency"
GEN_AI_BEDROCK_GUARDRAIL_COVERAGE = "gen_ai.bedrock.guardrail.coverage"
GEN_AI_BEDROCK_GUARDRAIL_SENSITIVE = "gen_ai.bedrock.guardrail.sensitive_info"
GEN_AI_BEDROCK_GUARDRAIL_TOPICS = "gen_ai.bedrock.guardrail.topics"
GEN_AI_BEDROCK_GUARDRAIL_CONTENT = "gen_ai.bedrock.guardrail.content"
GEN_AI_BEDROCK_GUARDRAIL_WORDS = "gen_ai.bedrock.guardrail.words"


class PromptCaching:
# will be moved under the AI SemConv. Not namespaced since also OpenAI supports this.
LLM_BEDROCK_PROMPT_CACHING = "gen_ai.prompt.caching"
GEN_AI_PROMPT_CACHING = "gen_ai.prompt.caching"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep backward-compatible aliases for the renamed metric constants.

GuardrailMeters and PromptCaching live in the package root, and the old LLM_BEDROCK_* names disappear completely here. That turns an otherwise additive semconv migration into an AttributeError for downstream code that imports those constants.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`
around lines 456 - 469, The renamed metric constants removed the old
LLM_BEDROCK_* identifiers causing AttributeError for downstream imports; restore
backward-compatible aliases by adding the old names as assignments pointing to
the new constants (e.g., define LLM_BEDROCK_GUARDRAIL_ACTIVATION =
GuardrailMeters.GEN_AI_BEDROCK_GUARDRAIL_ACTIVATION,
LLM_BEDROCK_GUARDRAIL_LATENCY =
GuardrailMeters.GEN_AI_BEDROCK_GUARDRAIL_LATENCY, etc., and
LLM_BEDROCK_PROMPT_CACHING = PromptCaching.GEN_AI_PROMPT_CACHING) so both
GuardrailMeters and PromptCaching keep the new names while the legacy
LLM_BEDROCK_* symbols remain available.


def _accumulate_events(self, event):
print(self._accumulating_body)
logger.debug("Accumulating body: %s", self._accumulating_body)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t log raw streamed payloads from the instrumentation.

Line 75 writes the full accumulated response/tool payload into application logs on every chunk. In this wrapper that includes model output and tool inputs, so enabling debug leaks customer content and can create very large log lines for long or multimodal responses.

💡 Safer logging
-        logger.debug("Accumulating body: %s", self._accumulating_body)
+        logger.debug(
+            "Accumulating Bedrock stream chunk: event_keys=%s accumulated_keys=%s",
+            list(event.keys()),
+            list(self._accumulating_body.keys()),
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py`
at line 75, The debug log currently emits the full accumulated streamed payload
via logger.debug("Accumulating body: %s", self._accumulating_body), which can
leak customer data and produce huge logs; change this to avoid logging raw
content—log a safe summary instead (e.g., length with
len(self._accumulating_body), a truncated preview with masking, or a
hash/checksum) and keep the logger call to reference logger.debug and
self._accumulating_body but only include the safe summary so no raw model/tool
payload is written to logs.

Comment on lines +117 to +132
def test_missing_finish_reason_not_unknown(self, mock_prompts, mock_events):
"""#16: Missing finish_reason should NOT be 'unknown'."""
response_body = {
"completion": "Hello!",
# no stop_reason key
}
response = {"body": MagicMock()}
response["body"].read.return_value = json.dumps(response_body).encode()

logger = MagicMock()
emit_choice_events(logger, response)

logger.emit.assert_called_once()
log_record = logger.emit.call_args[0][0]
# Should NOT be "unknown"
assert log_record.body.get("finish_reason") != "unknown"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Assert omission explicitly in the None finish-reason cases.

These tests are supposed to lock in “omit finish_reason when absent”, but != "unknown" still passes if the emitter writes None, "", or any other unexpected value. Checking for key absence would catch the regression this file describes.

💡 Proposed fix
-        assert log_record.body.get("finish_reason") != "unknown"
+        assert "finish_reason" not in log_record.body

Also applies to: 186-194, 212-219

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-bedrock/tests/test_event_emitter.py`
around lines 117 - 132, The test test_missing_finish_reason_not_unknown
currently checks logger output with assert log_record.body.get("finish_reason")
!= "unknown", which still passes if the emitter writes None/""; change this to
assert "finish_reason" not in log_record.body to explicitly require omission,
and apply the same change to the other tests in this file that use != "unknown"
for finish_reason; locate usages around emit_choice_events, logger.emit and
log_record.body to update the assertions accordingly.

@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1982-bedrock-instrumentation branch from 8b036b2 to ee6e754 Compare March 29, 2026 15:23
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py (1)

68-79: ⚠️ Potential issue | 🟠 Major

Handle AI21 finishReason values that are already strings.

Lines 77-79 still call .get("reason") unconditionally. The new regression coverage in packages/opentelemetry-instrumentation-bedrock/tests/test_review_fixes.py includes string-valued finishReasons, so this branch can raise AttributeError and break choice-event emission when events are enabled.

💡 Suggested change
     if response_body.get("completions") is not None:
         for i, message in enumerate(response_body.get("completions")):
+            finish_reason = message.get("finishReason")
+            if isinstance(finish_reason, dict):
+                finish_reason = finish_reason.get("reason")
             emit_event(
                 ChoiceEvent(
                     index=i,
                     message={
                         "content": message.get("data", {}).get("text"),
                         "role": "assistant",
                     },
-                    finish_reason=_map_finish_reason(
-                        message.get("finishReason", {}).get("reason")
-                    ),
+                    finish_reason=_map_finish_reason(finish_reason),
                 ),
                 event_logger,
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py`
around lines 68 - 79, The code is calling .get("reason") on response_body
completions' finishReason which may already be a string; update the ChoiceEvent
creation in the loop that emits completions so _map_finish_reason is passed the
correct value: inspect message.get("finishReason") and if it's a dict use
finishReason.get("reason") otherwise use the finishReason value (or None) before
calling _map_finish_reason; adjust the enumeration loop around ChoiceEvent and
the argument to _map_finish_reason accordingly so both dict and string
finishReason cases are handled safely.
♻️ Duplicate comments (2)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py (2)

75-75: ⚠️ Potential issue | 🟠 Major

Don’t debug-log raw streamed payloads.

Line 75 writes the full accumulated Bedrock payload on every chunk. That leaks prompts, completions, and tool inputs into logs and can produce very large log lines for long streams.

💡 Safer logging
-        logger.debug("Accumulating body: %s", self._accumulating_body)
+        logger.debug(
+            "Accumulating Bedrock stream chunk: event_keys=%s accumulated_keys=%s",
+            list(event.keys()),
+            list(self._accumulating_body.keys()),
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py`
at line 75, The debug statement is logging raw streamed payloads via
logger.debug("Accumulating body: %s", self._accumulating_body) which leaks
sensitive content; update the logging in the streaming wrapper (where
self._accumulating_body is used—e.g., StreamingWrapper or the method handling
chunks) to avoid printing the full payload and instead log non-sensitive
metadata such as the buffer length or a masked/hashed summary (for example, use
logger.debug("Accumulating body length=%d", len(self._accumulating_body)) or log
a fixed placeholder), removing any direct inclusion of self._accumulating_body
in logs.

52-60: ⚠️ Potential issue | 🟠 Major

Handle streamed tool JSON/reasoning deltas without assuming string state.

Lines 58-60 still assume the active content block already holds string fields. For tool-use blocks that start with input: {}, Line 60 becomes dict += str; because _process_event() is wrapped in @dont_throw, the failure is silent and the accumulated payload is incomplete.

💡 Suggested hardening
         elif type == "content_block_delta":
             delta = decoded_chunk.get("delta", {})
-            if delta.get("text") is not None:
-                self._accumulating_body["content"][-1]["text"] += delta["text"]
+            current = self._accumulating_body["content"][-1]
+            if delta.get("text") is not None:
+                if not isinstance(current.get("text"), str):
+                    current["text"] = ""
+                current["text"] += delta["text"]
             elif delta.get("type") == "input_json_delta":
                 partial_json = delta.get("partial_json", "")
-                current = self._accumulating_body["content"][-1]
-                current.setdefault("input", "")
+                if not isinstance(current.get("input"), str):
+                    current["input"] = ""
                 current["input"] += partial_json
+            elif delta.get("type") == "thinking_delta":
+                if not isinstance(current.get("thinking"), str):
+                    current["thinking"] = ""
+                current["thinking"] += delta.get("thinking", "")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py`
around lines 52 - 60, In the "content_block_delta" branch ensure the target
field is a string before concatenating: when handling delta.get("type") ==
"input_json_delta" fetch partial_json and the current block via current =
self._accumulating_body["content"][-1], then guard current["input"] so it is a
string (e.g., if the key is missing or not isinstance(current["input"], str) set
it to "" or coerce with str()) before doing current["input"] += partial_json;
update the logic in the same block where partial_json is appended so
dict-plus-str errors are avoided and accumulation works for tool-use blocks.
🧹 Nitpick comments (3)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (3)

358-369: Remove dead code: finish_reasons list is unused.

The finish_reasons list is populated but never used. The actual span attribute is set by _set_finish_reasons_unconditionally() which is called before this function.

♻️ Proposed fix
 def _set_generations_span_attributes(span, response_body):
     output_messages = []
-    finish_reasons = []
     for generation in response_body.get("generations"):
         fr = _map_finish_reason(generation.get("finish_reason"))
-        finish_reasons.append(fr)
         output_messages.append(_output_message("assistant", [_text_part(generation.get("text"))], fr))
     _set_span_attribute(
         span,
         GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
         json.dumps(output_messages),
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 358 - 369, In _set_generations_span_attributes, remove the unused
finish_reasons list and the finish_reasons.append(fr) call (leave the local fr
variable and its usage in _output_message intact) because finish reasons are set
unconditionally elsewhere by _set_finish_reasons_unconditionally(); update
_set_generations_span_attributes to not allocate or append to finish_reasons
(only build output_messages and set the span attribute with
json.dumps(output_messages)).

548-562: Remove dead code: finish_reasons list is unused.

Same pattern as _set_generations_span_attributes — the finish_reasons list is built but never used since the attribute is set by _set_finish_reasons_unconditionally().

♻️ Proposed fix
 def _set_span_completions_attributes(span, response_body):
     output_messages = []
-    finish_reasons = []
     for completion in response_body.get("completions"):
         fr_data = completion.get("finishReason", {})
         raw_reason = fr_data.get("reason") if isinstance(fr_data, dict) else str(fr_data) or None
         fr = _map_finish_reason(raw_reason)
-        if fr:
-            finish_reasons.append(fr)
         output_messages.append(_output_message("assistant", [_text_part(completion.get("data").get("text"))], fr))
     _set_span_attribute(
         span,
         GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
         json.dumps(output_messages),
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 548 - 562, In _set_span_completions_attributes remove the dead
finish_reasons aggregation: delete the finish_reasons = [] declaration and the
finish_reasons.append(fr) call inside the loop (we still need to call
_map_finish_reason(raw_reason) to produce fr and pass fr into _output_message),
since finish_reasons is never used and finish reasons are set elsewhere by
_set_finish_reasons_unconditionally(); keep the rest of the loop building
output_messages and the subsequent _set_span_attribute call for
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES.

704-716: Remove dead code: finish_reasons list is unused in the "results" branch.

Same pattern as other response handlers — finish_reasons is populated but never used.

♻️ Proposed fix
 def _set_amazon_response_span_attributes(span, response_body):
     if "results" in response_body:
         output_messages = []
-        finish_reasons = []
         for result in response_body.get("results"):
             fr = _map_finish_reason(result.get("completionReason"))
-            finish_reasons.append(fr)
             output_messages.append(_output_message("assistant", [_text_part(result.get("outputText"))], fr))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 704 - 716, In _set_amazon_response_span_attributes remove the
unused finish_reasons list and its population: eliminate the variable
finish_reasons and the line that appends fr (finish_reasons.append(fr)), but
keep mapping the completion reason via _map_finish_reason for constructing
output_messages; ensure output_messages is still built by calling
_output_message("assistant", [_text_part(result.get("outputText"))], fr) and
that only output_messages is JSON-dumped into the
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES span attribute.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/sample-app/sample_app/bedrock_example_app.py`:
- Around line 7-8: The sample app currently hardcodes a local AWS profile via
session = boto3.Session(profile_name="stg") which breaks CI, containers, and
IAM-role environments; replace this by relying on the default AWS
credential/provider chain (i.e., instantiate boto3.Session() with no
profile_name) or read the profile from the AWS_PROFILE environment variable
before passing it (e.g., profile = os.getenv("AWS_PROFILE");
boto3.Session(profile_name=profile) only when profile is set), and keep creating
the client with brt = session.client(service_name='bedrock-runtime') as before.

---

Outside diff comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py`:
- Around line 68-79: The code is calling .get("reason") on response_body
completions' finishReason which may already be a string; update the ChoiceEvent
creation in the loop that emits completions so _map_finish_reason is passed the
correct value: inspect message.get("finishReason") and if it's a dict use
finishReason.get("reason") otherwise use the finishReason value (or None) before
calling _map_finish_reason; adjust the enumeration loop around ChoiceEvent and
the argument to _map_finish_reason accordingly so both dict and string
finishReason cases are handled safely.

---

Duplicate comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py`:
- Line 75: The debug statement is logging raw streamed payloads via
logger.debug("Accumulating body: %s", self._accumulating_body) which leaks
sensitive content; update the logging in the streaming wrapper (where
self._accumulating_body is used—e.g., StreamingWrapper or the method handling
chunks) to avoid printing the full payload and instead log non-sensitive
metadata such as the buffer length or a masked/hashed summary (for example, use
logger.debug("Accumulating body length=%d", len(self._accumulating_body)) or log
a fixed placeholder), removing any direct inclusion of self._accumulating_body
in logs.
- Around line 52-60: In the "content_block_delta" branch ensure the target field
is a string before concatenating: when handling delta.get("type") ==
"input_json_delta" fetch partial_json and the current block via current =
self._accumulating_body["content"][-1], then guard current["input"] so it is a
string (e.g., if the key is missing or not isinstance(current["input"], str) set
it to "" or coerce with str()) before doing current["input"] += partial_json;
update the logic in the same block where partial_json is appended so
dict-plus-str errors are avoided and accumulation works for tool-use blocks.

---

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 358-369: In _set_generations_span_attributes, remove the unused
finish_reasons list and the finish_reasons.append(fr) call (leave the local fr
variable and its usage in _output_message intact) because finish reasons are set
unconditionally elsewhere by _set_finish_reasons_unconditionally(); update
_set_generations_span_attributes to not allocate or append to finish_reasons
(only build output_messages and set the span attribute with
json.dumps(output_messages)).
- Around line 548-562: In _set_span_completions_attributes remove the dead
finish_reasons aggregation: delete the finish_reasons = [] declaration and the
finish_reasons.append(fr) call inside the loop (we still need to call
_map_finish_reason(raw_reason) to produce fr and pass fr into _output_message),
since finish_reasons is never used and finish reasons are set elsewhere by
_set_finish_reasons_unconditionally(); keep the rest of the loop building
output_messages and the subsequent _set_span_attribute call for
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES.
- Around line 704-716: In _set_amazon_response_span_attributes remove the unused
finish_reasons list and its population: eliminate the variable finish_reasons
and the line that appends fr (finish_reasons.append(fr)), but keep mapping the
completion reason via _map_finish_reason for constructing output_messages;
ensure output_messages is still built by calling _output_message("assistant",
[_text_part(result.get("outputText"))], fr) and that only output_messages is
JSON-dumped into the GenAIAttributes.GEN_AI_OUTPUT_MESSAGES span attribute.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8d8d7e5d-9d5b-4ab7-8540-88fe1288cbf2

📥 Commits

Reviewing files that changed from the base of the PR and between 7cf46fe and 0e8bd63.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-bedrock/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (23)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_models.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_prompt_caching_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_content_parts.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_event_emitter.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_review_fixes.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_semconv.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
  • packages/sample-app/sample_app/bedrock_example_app.py
✅ Files skipped from review due to trivial changes (3)
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_semconv.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_content_parts.py
🚧 Files skipped from review as they are similar to previous changes (11)
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_prompt_caching_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_models.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_event_emitter.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/init.py

Comment on lines +7 to +8
session = boto3.Session(profile_name="stg")
brt = session.client(service_name='bedrock-runtime')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t hardcode a developer-local AWS profile in the sample app.

Lines 7-8 make this example fail anywhere the stg profile is missing, including CI, containers, and IAM-role based environments. Prefer the default credential chain, or read the profile from AWS_PROFILE instead of pinning it in code.

💡 Suggested change
+import os
 import boto3
@@
-session = boto3.Session(profile_name="stg")
-brt = session.client(service_name='bedrock-runtime')
+session = boto3.Session(profile_name=os.getenv("AWS_PROFILE"))
+brt = session.client(service_name="bedrock-runtime")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
session = boto3.Session(profile_name="stg")
brt = session.client(service_name='bedrock-runtime')
import os
import boto3
session = boto3.Session(profile_name=os.getenv("AWS_PROFILE"))
brt = session.client(service_name="bedrock-runtime")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/sample-app/sample_app/bedrock_example_app.py` around lines 7 - 8,
The sample app currently hardcodes a local AWS profile via session =
boto3.Session(profile_name="stg") which breaks CI, containers, and IAM-role
environments; replace this by relying on the default AWS credential/provider
chain (i.e., instantiate boto3.Session() with no profile_name) or read the
profile from the AWS_PROFILE environment variable before passing it (e.g.,
profile = os.getenv("AWS_PROFILE"); boto3.Session(profile_name=profile) only
when profile is set), and keep creating the client with brt =
session.client(service_name='bedrock-runtime') as before.

@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1982-bedrock-instrumentation branch from 0e8bd63 to 1b94be4 Compare March 30, 2026 07:06
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)

411-415: ⚠️ Potential issue | 🔴 Critical

Streaming implementation doesn't extract finish_reason for Titan responses.

The test_titan_invoke_stream_with_events_with_content test is missing finish_reason in the choice_event assertion (lines 411-415), but this reflects an underlying implementation issue rather than a test gap.

The problem: emit_streaming_response_event() in event_emitter.py (line 194) only checks for response_body.get("stop_reason"), but Titan streaming responses use completionReason or stopReason (not stop_reason). This causes _map_finish_reason(None) to be called, resulting in None for the finish_reason field.

In contrast, the non-streaming path correctly handles Titan via _set_finish_reasons_unconditionally() (span_utils.py lines 227-240), which is why non-streaming Titan tests include "finish_reason": "stop" (line 168).

Fix the streaming implementation to be model-aware when extracting finish_reason, similar to how the span attributes handler works.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py`
around lines 411 - 415, The streaming path is not model-aware when extracting
finish_reason in emit_streaming_response_event; update
emit_streaming_response_event to extract finish_reason the same way
span_utils._set_finish_reasons_unconditionally does for non-streaming flows
instead of only checking response_body.get("stop_reason"): detect Titan-style
keys ("completionReason", "stopReason") or consult the model identifier and call
the same mapping logic used by
_map_finish_reason/_set_finish_reasons_unconditionally so that
_map_finish_reason is given the correct raw token (e.g., "stop" for Titan)
before emitting the choice event.
♻️ Duplicate comments (4)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py (2)

75-75: ⚠️ Potential issue | 🟠 Major

Don’t log the raw accumulated streamed payload.

This writes model output and tool inputs into application logs on every chunk. That is both a privacy leak and an easy way to create huge log lines on long streams.

💡 Safer logging
-        logger.debug("Accumulating body: %s", self._accumulating_body)
+        logger.debug(
+            "Accumulating Bedrock stream chunk: accumulated_keys=%s",
+            list(self._accumulating_body.keys()),
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py`
at line 75, The debug log currently prints the raw accumulated stream payload
(self._accumulating_body), which can leak sensitive model/tool data and create
huge log entries; change the logger.debug call in streaming_wrapper.py to avoid
logging raw content—log a safe summary instead (e.g., the length via
len(self._accumulating_body) or a truncated/masked excerpt), or remove the debug
entirely, updating the logger.debug usage that references
self._accumulating_body accordingly.

52-60: ⚠️ Potential issue | 🟠 Major

Handle streamed tool JSON and reasoning deltas without type errors.

Line 60 can still hit dict += str when the current block already has a structured input, and this branch still drops reasoning/thinking deltas entirely. Because _process_event() is wrapped in @dont_throw, those failures degrade the emitted payload silently instead of surfacing.

💡 Safer delta accumulation
         elif type == "content_block_delta":
             delta = decoded_chunk.get("delta", {})
+            current = self._accumulating_body["content"][-1]
             if delta.get("text") is not None:
-                self._accumulating_body["content"][-1]["text"] += delta["text"]
+                if not isinstance(current.get("text"), str):
+                    current["text"] = ""
+                current["text"] += delta["text"]
             elif delta.get("type") == "input_json_delta":
                 partial_json = delta.get("partial_json", "")
-                current = self._accumulating_body["content"][-1]
-                current.setdefault("input", "")
+                if not isinstance(current.get("input"), str):
+                    current["input"] = ""
                 current["input"] += partial_json
+            elif delta.get("type") == "thinking_delta":
+                if not isinstance(current.get("thinking"), str):
+                    current["thinking"] = ""
+                current["thinking"] += delta.get("thinking", "")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py`
around lines 52 - 60, The branch handling type == "content_block_delta" can
perform dict += str and drops reasoning-style deltas; update the
input_json_delta handling in the same block (the accumulation that touches
self._accumulating_body["content"][-1] and current["input"]) to: detect whether
current.get("input") is a dict or string, try to json.loads(partial_json) and if
that succeeds merge (update) into the dict, otherwise append the raw
partial_json to a string field (or to a dedicated fallback key) so you never do
dict += str; also preserve and accumulate non-input deltas (e.g.,
reasoning/thinking) by handling delta.get("type") values other than
"input_json_delta" instead of dropping them; wrap parsing/merging in a narrow
try/except so failures are handled locally and do not perform an invalid
in-place addition (affecting _process_event via dont_throw).
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (2)

724-734: ⚠️ Potential issue | 🟠 Major

Preserve structured Converse output instead of flattening/splitting it.

response_body["output"]["message"] is already a single Bedrock message, but Lines 726-734 split it into one assistant message per block and drop non-text blocks. Lines 1003-1006 go the other direction and flatten streaming output into a single text part. Both paths break gen_ai.output.messages, so spans can report finish_reason="tool_call" while emitting no tool-call payload.

💡 Suggested direction
     elif "output" in response_body:
         fr = _map_finish_reason(response_body.get("stopReason"))
-        msgs = response_body.get("output").get("message", {}).get("content", [])
-        output_messages = []
-        for msg in msgs:
-            output_messages.append(_output_message("assistant", [_text_part(msg.get("text"))], fr))
+        message = response_body.get("output", {}).get("message", {})
+        output_messages = [
+            _output_message(
+                message.get("role", "assistant"),
+                _converse_content_to_parts(message.get("content", [])),
+                fr,
+            )
+        ]
         _set_span_attribute(
             span,
             GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
             json.dumps(output_messages),
         )
@@
-def set_converse_streaming_response_span_attributes(response, role, span, finish_reason=None):
+def set_converse_streaming_response_span_attributes(content_blocks, role, span, finish_reason=None):
     fr = _map_finish_reason(finish_reason)
     if fr:
         _set_span_attribute(span, GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS, (fr,))
     if not should_send_prompts():
         return
     _set_span_attribute(
         span,
         GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
-        json.dumps([_output_message(role, [_text_part("".join(response))], fr)]),
+        json.dumps([_output_message(role, _converse_content_to_parts(content_blocks), fr)]),
     )

The streaming caller also needs to buffer structured content blocks instead of a text-only list.

Based on learnings: Follow the OpenTelemetry GenAI semantic specification.

Also applies to: 997-1006

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 724 - 734, The code currently flattens and drops non-text blocks
when building gen_ai.output.messages; instead, when handling
response_body["output"]["message"] (in the branch that uses _map_finish_reason
and _output_message) stop splitting into one assistant message per text block
and preserve the original structured message content array (including non-text
blocks) as the content field passed into _output_message; similarly change the
streaming assembly logic (the streaming path around the _text_part usage) to
buffer and concatenate structured content blocks rather than only text parts so
that _set_span_attribute(…, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, …) receives
the full structured message objects and finish_reason/tool_call payloads remain
intact.

22-46: ⚠️ Potential issue | 🟠 Major

Normalize the remaining AWS finish reasons here.

AWS currently documents Converse stopReason values including content_filtered, and Titan Text completionReason values including FINISHED, LENGTH, STOP_CRITERIA_MET, and CONTENT_FILTERED. None of those normalize today, so GEN_AI_RESPONSE_FINISH_REASONS can still leak provider-specific tokens instead of semconv values. (docs.aws.amazon.com)

💡 Suggested normalization update
 BEDROCK_FINISH_REASON_MAP = {
     # Anthropic via Bedrock
     "end_turn": "stop",
     "stop_sequence": "stop",
     "tool_use": "tool_call",
     "max_tokens": "length",
     # Cohere via Bedrock
     "COMPLETE": "stop",
     "TOOL_CALL": "tool_call",
     "MAX_TOKENS": "length",
     "ERROR": "error",
     # Amazon Titan
     "FINISH": "stop",
+    "FINISHED": "stop",
+    "LENGTH": "length",
+    "STOP_CRITERIA_MET": "stop",
+    "CONTENT_FILTERED": "content_filter",
     # Converse API
     "guardrail_intervened": "content_filter",
+    "content_filtered": "content_filter",
 }
 
 
 def _map_finish_reason(reason):
     """Map provider-specific finish reason to OTel GenAI enum value."""
     if not reason:
         return None
-    return BEDROCK_FINISH_REASON_MAP.get(reason, reason)
+    normalized = str(reason)
+    return (
+        BEDROCK_FINISH_REASON_MAP.get(normalized)
+        or BEDROCK_FINISH_REASON_MAP.get(normalized.lower())
+        or BEDROCK_FINISH_REASON_MAP.get(normalized.upper())
+        or normalized
+    )

Based on learnings: Follow the OpenTelemetry GenAI semantic specification.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 22 - 46, Add the missing AWS/Titan/Converse finish-reason
normalizations to BEDROCK_FINISH_REASON_MAP so provider tokens map to OTel GenAI
semconv values (use keys like "content_filtered" and "CONTENT_FILTERED" ->
"content_filter"; "FINISHED" and "STOP_CRITERIA_MET" -> "stop"; "LENGTH" ->
"length"), leaving _map_finish_reason intact (it already uses
BEDROCK_FINISH_REASON_MAP.get). Ensure you include both case variants used by
different providers and follow the OpenTelemetry GenAI semantic spec for the
target values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py`:
- Around line 411-415: The streaming path is not model-aware when extracting
finish_reason in emit_streaming_response_event; update
emit_streaming_response_event to extract finish_reason the same way
span_utils._set_finish_reasons_unconditionally does for non-streaming flows
instead of only checking response_body.get("stop_reason"): detect Titan-style
keys ("completionReason", "stopReason") or consult the model identifier and call
the same mapping logic used by
_map_finish_reason/_set_finish_reasons_unconditionally so that
_map_finish_reason is given the correct raw token (e.g., "stop" for Titan)
before emitting the choice event.

---

Duplicate comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 724-734: The code currently flattens and drops non-text blocks
when building gen_ai.output.messages; instead, when handling
response_body["output"]["message"] (in the branch that uses _map_finish_reason
and _output_message) stop splitting into one assistant message per text block
and preserve the original structured message content array (including non-text
blocks) as the content field passed into _output_message; similarly change the
streaming assembly logic (the streaming path around the _text_part usage) to
buffer and concatenate structured content blocks rather than only text parts so
that _set_span_attribute(…, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, …) receives
the full structured message objects and finish_reason/tool_call payloads remain
intact.
- Around line 22-46: Add the missing AWS/Titan/Converse finish-reason
normalizations to BEDROCK_FINISH_REASON_MAP so provider tokens map to OTel GenAI
semconv values (use keys like "content_filtered" and "CONTENT_FILTERED" ->
"content_filter"; "FINISHED" and "STOP_CRITERIA_MET" -> "stop"; "LENGTH" ->
"length"), leaving _map_finish_reason intact (it already uses
BEDROCK_FINISH_REASON_MAP.get). Ensure you include both case variants used by
different providers and follow the OpenTelemetry GenAI semantic spec for the
target values.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py`:
- Line 75: The debug log currently prints the raw accumulated stream payload
(self._accumulating_body), which can leak sensitive model/tool data and create
huge log entries; change the logger.debug call in streaming_wrapper.py to avoid
logging raw content—log a safe summary instead (e.g., the length via
len(self._accumulating_body) or a truncated/masked excerpt), or remove the debug
entirely, updating the logger.debug usage that references
self._accumulating_body accordingly.
- Around line 52-60: The branch handling type == "content_block_delta" can
perform dict += str and drops reasoning-style deltas; update the
input_json_delta handling in the same block (the accumulation that touches
self._accumulating_body["content"][-1] and current["input"]) to: detect whether
current.get("input") is a dict or string, try to json.loads(partial_json) and if
that succeeds merge (update) into the dict, otherwise append the raw
partial_json to a string field (or to a dedicated fallback key) so you never do
dict += str; also preserve and accumulate non-input deltas (e.g.,
reasoning/thinking) by handling delta.get("type") values other than
"input_json_delta" instead of dropping them; wrap parsing/merging in a narrow
try/except so failures are handled locally and do not perform an invalid
in-place addition (affecting _process_event via dont_throw).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2192636d-7915-4d05-b45a-c6a0eedd5691

📥 Commits

Reviewing files that changed from the base of the PR and between 0e8bd63 and 1b94be4.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-bedrock/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (23)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_models.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_prompt_caching_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_content_parts.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_event_emitter.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_review_fixes.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_semconv.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
  • packages/sample-app/sample_app/bedrock_example_app.py
✅ Files skipped from review due to trivial changes (5)
  • packages/opentelemetry-instrumentation-bedrock/tests/test_semconv.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_prompt_caching_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_models.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_event_emitter.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_review_fixes.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
  • packages/sample-app/sample_app/bedrock_example_app.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/init.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (1)

619-641: ⚠️ Potential issue | 🟠 Major

Use chat for Amazon/Nova message-based invoke_model spans.

This code unconditionally emits gen_ai.operation.name = text_completion even when the request is message-based (contains a messages key with role/content pairs). Role-based chat requests should use chat per the OpenTelemetry GenAI semantic specification, consistent with existing Nova test assertions and other instrumentations in the codebase.

Suggested fix
 def _set_amazon_span_attributes(
     span, request_body, response_body, headers, metric_params
 ):
+    operation_name = (
+        GenAiOperationNameValues.CHAT.value
+        if "messages" in request_body
+        else GenAiOperationNameValues.TEXT_COMPLETION.value
+    )
     _set_span_attribute(
-        span, GenAIAttributes.GEN_AI_OPERATION_NAME, GenAiOperationNameValues.TEXT_COMPLETION.value
+        span, GenAIAttributes.GEN_AI_OPERATION_NAME, operation_name
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 619 - 641, Change the unconditional setting of
GenAIAttributes.GEN_AI_OPERATION_NAME to choose "chat" when the request_body
contains a messages-based request: inspect request_body for a "messages" key
(and/or role/content structure) before calling _set_span_attribute(span,
GenAIAttributes.GEN_AI_OPERATION_NAME, ...); if messages are present set the
value to the chat enum (instead of
GenAiOperationNameValues.TEXT_COMPLETION.value), otherwise keep TEXT_COMPLETION;
keep the existing handling of textGenerationConfig/inferenceConfig for
topP/temperature/max tokens unchanged.
♻️ Duplicate comments (2)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (2)

726-736: ⚠️ Potential issue | 🟠 Major

Keep output.message as one assistant message with multiple parts.

response_body["output"]["message"] is already a single Bedrock message. Splitting each content block into a separate GEN_AI_OUTPUT_MESSAGES entry breaks message boundaries, and any non-text block here becomes a bogus text part because only msg.get("text") is read.

💡 Suggested fix
     elif "output" in response_body:
         fr = _map_finish_reason(response_body.get("stopReason"))
-        msgs = response_body.get("output").get("message", {}).get("content", [])
-        output_messages = []
-        for msg in msgs:
-            output_messages.append(_output_message("assistant", [_text_part(msg.get("text"))], fr))
+        message = response_body.get("output", {}).get("message", {})
+        output_messages = [
+            _output_message(
+                message.get("role", "assistant"),
+                _converse_content_to_parts(message.get("content", [])),
+                fr,
+            )
+        ]
         _set_span_attribute(
             span,
             GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
             json.dumps(output_messages),
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 726 - 736, The code incorrectly turns each entry of
response_body["output"]["message"]["content"] into its own
GEN_AI_OUTPUT_MESSAGES entry; instead, keep the entire
response_body["output"]["message"] as a single assistant message with multiple
parts: collect parts by iterating msgs and mapping each content block into a
part (use _text_part(msg.get("text")) for text blocks and preserve/extend with
the existing part-creation logic for non-text blocks), then call
_output_message("assistant", parts, fr) once and pass json.dumps(of that single
message) to _set_span_attribute (refer to response_body, msgs, _text_part,
_output_message, _set_span_attribute, and
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES).

217-223: ⚠️ Potential issue | 🟡 Minor

Don't turn missing AI21 finish reasons into "None".

Both AI21 extraction sites stringify non-dict values before mapping. If Bedrock returns finishReason: null, that becomes "None" and gets emitted as a bogus semconv value instead of being omitted.

💡 Suggested fix
     elif model_vendor == "ai21":
         for comp in response_body.get("completions", []):
             fr_data = comp.get("finishReason", {})
-            raw = fr_data.get("reason") if isinstance(fr_data, dict) else str(fr_data)
+            raw = fr_data.get("reason") if isinstance(fr_data, dict) else fr_data
             fr = _map_finish_reason(raw)
             if fr:
                 finish_reasons.append(fr)
@@
     for completion in response_body.get("completions"):
         fr_data = completion.get("finishReason", {})
-        raw_reason = fr_data.get("reason") if isinstance(fr_data, dict) else str(fr_data) or None
+        raw_reason = fr_data.get("reason") if isinstance(fr_data, dict) else fr_data
         fr = _map_finish_reason(raw_reason)
         if fr:
             finish_reasons.append(fr)

Also applies to: 553-559

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 217 - 223, The code in the AI21 branch (the block using variables
fr_data, raw, _map_finish_reason and finish_reasons.append) stringifies non-dict
finishReason values, turning null into "None" and producing a bogus semconv;
change the logic so you treat None/null as absent: compute raw as
fr_data.get("reason") when fr_data is a dict, set raw = None when fr_data is
None, and only call _map_finish_reason and append to finish_reasons when raw is
not None (and mapping returns a value). Apply the same fix to the other AI21
extraction site that mirrors this logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 242-245: For the imported_model branch, also inspect
choices[0]["finish_reason"] when top-level response_body.get("stop_reason") is
absent: call _map_finish_reason on response_body.get("choices",
[{}])[0].get("finish_reason") and append that mapped value to finish_reasons
(same as you do for stop_reason) so the imported_model path captures
OpenAI-style finish reasons; update the code paths around model_vendor ==
"imported_model" and ensure _map_finish_reason is invoked for both stop_reason
and choices[0]["finish_reason"] before appending to finish_reasons.

---

Outside diff comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 619-641: Change the unconditional setting of
GenAIAttributes.GEN_AI_OPERATION_NAME to choose "chat" when the request_body
contains a messages-based request: inspect request_body for a "messages" key
(and/or role/content structure) before calling _set_span_attribute(span,
GenAIAttributes.GEN_AI_OPERATION_NAME, ...); if messages are present set the
value to the chat enum (instead of
GenAiOperationNameValues.TEXT_COMPLETION.value), otherwise keep TEXT_COMPLETION;
keep the existing handling of textGenerationConfig/inferenceConfig for
topP/temperature/max tokens unchanged.

---

Duplicate comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 726-736: The code incorrectly turns each entry of
response_body["output"]["message"]["content"] into its own
GEN_AI_OUTPUT_MESSAGES entry; instead, keep the entire
response_body["output"]["message"] as a single assistant message with multiple
parts: collect parts by iterating msgs and mapping each content block into a
part (use _text_part(msg.get("text")) for text blocks and preserve/extend with
the existing part-creation logic for non-text blocks), then call
_output_message("assistant", parts, fr) once and pass json.dumps(of that single
message) to _set_span_attribute (refer to response_body, msgs, _text_part,
_output_message, _set_span_attribute, and
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES).
- Around line 217-223: The code in the AI21 branch (the block using variables
fr_data, raw, _map_finish_reason and finish_reasons.append) stringifies non-dict
finishReason values, turning null into "None" and producing a bogus semconv;
change the logic so you treat None/null as absent: compute raw as
fr_data.get("reason") when fr_data is a dict, set raw = None when fr_data is
None, and only call _map_finish_reason and append to finish_reasons when raw is
not None (and mapping returns a value). Apply the same fix to the other AI21
extraction site that mirrors this logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2d651691-6317-4835-80c4-5241f537f6e3

📥 Commits

Reviewing files that changed from the base of the PR and between 1b94be4 and 11dbccd.

📒 Files selected for processing (12)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_finish_reasons.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_semconv_compliance.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/__init__.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/init.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (3)

363-375: Remove unused finish_reasons list in _set_generations_span_attributes.

The finish_reasons list (lines 365, 368) is built but never used or written to the span. Finish reasons are already set by _set_finish_reasons_unconditionally() called earlier in set_model_choice_span_attributes(). This dead code can be removed for clarity.

♻️ Suggested cleanup
 def _set_generations_span_attributes(span, response_body):
     output_messages = []
-    finish_reasons = []
     for generation in response_body.get("generations"):
         fr = _map_finish_reason(generation.get("finish_reason"))
-        finish_reasons.append(fr)
         output_messages.append(_output_message("assistant", [_text_part(generation.get("text"))], fr))
     _set_span_attribute(
         span,
         GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
         json.dumps(output_messages),
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 363 - 375, The function _set_generations_span_attributes builds an
unused list finish_reasons; remove the declaration finish_reasons = [] and the
finish_reasons.append(fr) call so only the mapped fr value is used to build
output_messages (keep the _map_finish_reason(generation.get("finish_reason"))
call and the output_messages.append(...) line), since finish reasons are handled
by _set_finish_reasons_unconditionally() elsewhere.

729-739: Consider consolidating Amazon output content blocks into a single message.

This path creates separate output messages for each content block in response_body["output"]["message"]["content"]. While this may be intentional for Amazon's response format, the Converse API returns a single assistant message with multiple content blocks. Creating separate messages fragments the logical response structure.

If Amazon Nova responses truly have one logical assistant turn, consolidate like the Converse path does:

💡 Optional refactor to preserve message boundaries
     elif "output" in response_body:
         fr = _map_finish_reason(response_body.get("stopReason"))
-        msgs = response_body.get("output").get("message", {}).get("content", [])
-        output_messages = []
-        for msg in msgs:
-            output_messages.append(_output_message("assistant", [_text_part(msg.get("text"))], fr))
+        message = response_body.get("output", {}).get("message", {})
+        parts = _converse_content_to_parts(message.get("content", []))
+        output_messages = [_output_message(message.get("role", "assistant"), parts, fr)]
         _set_span_attribute(
             span,
             GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
             json.dumps(output_messages),
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 729 - 739, The code currently creates a separate output message for
each content block in response_body["output"]["message"]["content"]; instead,
consolidate into a single assistant message by mapping the finish reason via
_map_finish_reason(response_body.get("stopReason")), converting each content
block with _text_part(msg.get("text")) into a list of parts, and building one
_output_message("assistant", parts_list, fr) (rather than appending per-msg
messages), then json.dumps that single output message and pass it to
_set_span_attribute(span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, ...); use the
existing symbols _text_part, _output_message, _map_finish_reason, and
_set_span_attribute to locate and implement the change.

554-567: Same dead code pattern: unused finish_reasons list.

Similar to _set_generations_span_attributes, the finish_reasons list here (lines 555, 561) is built but never used. Consider removing it for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 554 - 567, The local list finish_reasons is unused—remove its
declaration and the finish_reasons.append(fr) call while keeping the existing
computation of fr (via fr_data/raw_reason and _map_finish_reason) because fr is
passed into _output_message; update the loop in response_body.get("completions")
to no longer build finish_reasons, leaving output_messages construction and the
subsequent _set_span_attribute( span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
json.dumps(output_messages) ) intact.
packages/opentelemetry-instrumentation-bedrock/tests/test_semconv_compliance.py (1)

1021-1046: The _handle_converse import is stable and correctly located in the module.

The test's import from opentelemetry.instrumentation.bedrock is valid—_handle_converse is defined at line 397 of the __init__.py file. While testing private functions is acceptable for validating internal behavior, consider adding a comment to the test method documenting that it intentionally tests internal implementation details to clarify the testing approach for future maintainers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/tests/test_semconv_compliance.py`
around lines 1021 - 1046, The test imports and exercises the private function
_handle_converse directly; add a short clarifying comment above the
test_converse_event_mode_sets_finish_reasons test that states this is an
intentional test of internal implementation (testing a private symbol) to
validate behavior (e.g., finish_reasons being set) and to inform future
maintainers that the import from opentelemetry.instrumentation.bedrock is
deliberate; reference the private symbol _handle_converse and the test name in
the comment so the intent is explicit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 553-568: The finishReason extraction in
_set_span_completions_attributes can turn an explicit None into the string
"None"; change the raw_reason computation to explicitly treat None as missing
(e.g., if isinstance(fr_data, dict): raw_reason = fr_data.get("reason") else:
raw_reason = None if fr_data is None else str(fr_data)) and/or normalize the
result by setting raw_reason = None when raw_reason == "None" before calling
_map_finish_reason; update the raw_reason logic inside
_set_span_completions_attributes accordingly so _map_finish_reason never
receives the literal "None".

---

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 363-375: The function _set_generations_span_attributes builds an
unused list finish_reasons; remove the declaration finish_reasons = [] and the
finish_reasons.append(fr) call so only the mapped fr value is used to build
output_messages (keep the _map_finish_reason(generation.get("finish_reason"))
call and the output_messages.append(...) line), since finish reasons are handled
by _set_finish_reasons_unconditionally() elsewhere.
- Around line 729-739: The code currently creates a separate output message for
each content block in response_body["output"]["message"]["content"]; instead,
consolidate into a single assistant message by mapping the finish reason via
_map_finish_reason(response_body.get("stopReason")), converting each content
block with _text_part(msg.get("text")) into a list of parts, and building one
_output_message("assistant", parts_list, fr) (rather than appending per-msg
messages), then json.dumps that single output message and pass it to
_set_span_attribute(span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, ...); use the
existing symbols _text_part, _output_message, _map_finish_reason, and
_set_span_attribute to locate and implement the change.
- Around line 554-567: The local list finish_reasons is unused—remove its
declaration and the finish_reasons.append(fr) call while keeping the existing
computation of fr (via fr_data/raw_reason and _map_finish_reason) because fr is
passed into _output_message; update the loop in response_body.get("completions")
to no longer build finish_reasons, leaving output_messages construction and the
subsequent _set_span_attribute( span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
json.dumps(output_messages) ) intact.

In
`@packages/opentelemetry-instrumentation-bedrock/tests/test_semconv_compliance.py`:
- Around line 1021-1046: The test imports and exercises the private function
_handle_converse directly; add a short clarifying comment above the
test_converse_event_mode_sets_finish_reasons test that states this is an
intentional test of internal implementation (testing a private symbol) to
validate behavior (e.g., finish_reasons being set) and to inform future
maintainers that the import from opentelemetry.instrumentation.bedrock is
deliberate; reference the private symbol _handle_converse and the test name in
the comment so the intent is explicit.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4eb1fcc1-0341-4869-a347-295f507e590f

📥 Commits

Reviewing files that changed from the base of the PR and between 11dbccd and 1b09788.

📒 Files selected for processing (6)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_content_parts.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_semconv_compliance.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.py

Comment on lines 553 to 568
def _set_span_completions_attributes(span, response_body):
for i, completion in enumerate(response_body.get("completions")):
_set_span_attribute(
span,
f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.content",
completion.get("data").get("text"),
)
output_messages = []
finish_reasons = []
for completion in response_body.get("completions"):
fr_data = completion.get("finishReason", {})
raw_reason = fr_data.get("reason") if isinstance(fr_data, dict) else str(fr_data) or None
fr = _map_finish_reason(raw_reason)
if fr:
finish_reasons.append(fr)
output_messages.append(_output_message("assistant", [_text_part(completion.get("data").get("text"))], fr))
_set_span_attribute(
span,
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
json.dumps(output_messages),
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

AI21 finish reason extraction can still emit "None" string literal.

Line 558 attempts to handle missing finishReason but has an edge case:

raw_reason = fr_data.get("reason") if isinstance(fr_data, dict) else str(fr_data) or None

If finishReason is explicitly None (not absent), str(None) produces "None", and "None" or None evaluates to "None" (truthy string). This would pass through _map_finish_reason and be emitted as-is.

💡 Suggested fix
     for completion in response_body.get("completions"):
         fr_data = completion.get("finishReason", {})
-        raw_reason = fr_data.get("reason") if isinstance(fr_data, dict) else str(fr_data) or None
+        raw_reason = fr_data.get("reason") if isinstance(fr_data, dict) else (fr_data if fr_data else None)
         fr = _map_finish_reason(raw_reason)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 553 - 568, The finishReason extraction in
_set_span_completions_attributes can turn an explicit None into the string
"None"; change the raw_reason computation to explicitly treat None as missing
(e.g., if isinstance(fr_data, dict): raw_reason = fr_data.get("reason") else:
raw_reason = None if fr_data is None else str(fr_data)) and/or normalize the
result by setting raw_reason = None when raw_reason == "None" before calling
_map_finish_reason; update the raw_reason logic inside
_set_span_completions_attributes accordingly so _map_finish_reason never
receives the literal "None".

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (3)

246-249: ⚠️ Potential issue | 🟠 Major

Read imported-model finish reasons from choices[0] too.

This still only inspects top-level stop_reason, so imported-model spans lose GEN_AI_RESPONSE_FINISH_REASONS and output_messages[0]["finish_reason"] when the response only carries choices[0]["finish_reason"]. The updated imported-model event tests already expect "length" from that field, so spans and logs diverge.

🛠️ Suggested fix
     elif model_vendor == "imported_model":
-        fr = _map_finish_reason(response_body.get("stop_reason"))
+        choices = response_body.get("choices") or []
+        raw_reason = response_body.get("stop_reason")
+        if raw_reason is None and choices:
+            raw_reason = choices[0].get("finish_reason")
+        fr = _map_finish_reason(raw_reason)
         if fr:
             finish_reasons.append(fr)
@@
 def _set_imported_model_response_span_attributes(span, response_body):
-    fr = _map_finish_reason(response_body.get("stop_reason"))
+    choices = response_body.get("choices") or []
+    raw_reason = response_body.get("stop_reason")
+    if raw_reason is None and choices:
+        raw_reason = choices[0].get("finish_reason")
+    fr = _map_finish_reason(raw_reason)
     content = response_body.get("generation")
-    if content is None and response_body.get("choices"):
-        content = response_body["choices"][0].get("text")
+    if content is None and choices:
+        content = choices[0].get("text")
     _set_span_attribute(
         span,
         GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
         json.dumps([_output_message("assistant", [_text_part(content)], fr)]),
     )

Also applies to: 774-782

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 246 - 249, The imported_model branch only looks at top-level
response_body.get("stop_reason"), so when finish reasons exist under
choices[0]["finish_reason"] they are missed; update the branch handling
model_vendor == "imported_model" (and the analogous block around the other
occurrence) to also inspect response_body.get("choices",
[])[0].get("finish_reason") (falling back to top-level if absent), pass that
value into _map_finish_reason, and append the mapped result to finish_reasons
and set output_messages[0]["finish_reason"] as appropriate so
GEN_AI_RESPONSE_FINISH_REASONS and output_messages align with imported-model
events.

729-739: ⚠️ Potential issue | 🟠 Major

Keep output.message as one assistant message.

response_body["output"]["message"] is already a single Bedrock message. Splitting each content block into a separate output_messages entry breaks message boundaries, and msg.get("text") drops tool/media blocks that _converse_content_to_parts() already preserves in the converse path.

🛠️ Suggested fix
     elif "output" in response_body:
         fr = _map_finish_reason(response_body.get("stopReason"))
-        msgs = response_body.get("output").get("message", {}).get("content", [])
-        output_messages = []
-        for msg in msgs:
-            output_messages.append(_output_message("assistant", [_text_part(msg.get("text"))], fr))
+        message = response_body.get("output", {}).get("message", {})
+        output_messages = [
+            _output_message(
+                message.get("role", "assistant"),
+                _converse_content_to_parts(message.get("content", [])),
+                fr,
+            )
+        ]
         _set_span_attribute(
             span,
             GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
             json.dumps(output_messages),
         )
Based on learnings: the OTel GenAI semantic convention formal JSON schema for `gen_ai.output.messages` uses `{"role": "...", "parts": [...]}` format.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 729 - 739, The code is splitting
response_body["output"]["message"]["content"] into multiple messages and losing
non-text parts; instead treat response_body["output"]["message"] as a single
assistant message: call _converse_content_to_parts on
response_body.get("output").get("message") (or equivalent to obtain parts),
build one output message via _output_message("assistant", parts, fr) where fr =
_map_finish_reason(response_body.get("stopReason")), and then call
_set_span_attribute(span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
json.dumps([that_single_message])); remove the loop that iterates msgs and avoid
using msg.get("text") so tool/media blocks are preserved.

221-227: ⚠️ Potential issue | 🟡 Minor

Keep AI21 null finish reasons unset.

If Bedrock returns finishReason: null, both sites coerce it into the literal "None" and emit that as a finish reason. Treat None as missing instead so the attribute is omitted rather than carrying an invalid semconv value.

🛠️ Suggested fix
     elif model_vendor == "ai21":
         for comp in response_body.get("completions", []):
             fr_data = comp.get("finishReason", {})
-            raw = fr_data.get("reason") if isinstance(fr_data, dict) else str(fr_data)
+            raw = (
+                fr_data.get("reason")
+                if isinstance(fr_data, dict)
+                else (None if fr_data is None else str(fr_data))
+            )
             fr = _map_finish_reason(raw)
             if fr:
                 finish_reasons.append(fr)
@@
 def _set_span_completions_attributes(span, response_body):
     output_messages = []
     finish_reasons = []
     for completion in response_body.get("completions"):
         fr_data = completion.get("finishReason", {})
-        raw_reason = fr_data.get("reason") if isinstance(fr_data, dict) else str(fr_data) or None
+        raw_reason = (
+            fr_data.get("reason")
+            if isinstance(fr_data, dict)
+            else (None if fr_data is None else str(fr_data))
+        )
         fr = _map_finish_reason(raw_reason)
         if fr:
             finish_reasons.append(fr)

Also applies to: 553-562

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 221 - 227, When model_vendor == "ai21" (and the similar block at
553-562) skip finishReason values that are null instead of coercing them to the
string "None": inspect comp.get("finishReason") into fr_data and if fr_data is
None (or if it's a dict whose "reason" is None) simply continue/skip adding a
finish reason; only call _map_finish_reason with a real non-null raw value and
append its result when non-empty. This ensures finish_reasons and the
_map_finish_reason call are only invoked for actual values and not for Bedrock's
null finishReason.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 622-624: The code in _set_amazon_span_attributes unconditionally
sets GenAIAttributes.GEN_AI_OPERATION_NAME to
GenAiOperationNameValues.TEXT_COMPLETION.value which overwrites correct
operation names for Nova messages-v1 (chat) requests; change the logic to avoid
overwriting or to set the correct value based on the request type: in
_set_amazon_span_attributes inspect the incoming request/model/operation (e.g.,
check for "messages" or "messages-v1") and if it indicates a chat/messages call
set GEN_AI_OPERATION_NAME to GenAiOperationNameValues.CHAT.value (or set to
TEXT_COMPLETION only when the request clearly is a text-completion), otherwise
do not overwrite an existing GEN_AI_OPERATION_NAME attribute on span; update the
_set_span_attribute(span, GenAIAttributes.GEN_AI_OPERATION_NAME, ...) call
accordingly to be conditional.

---

Duplicate comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 246-249: The imported_model branch only looks at top-level
response_body.get("stop_reason"), so when finish reasons exist under
choices[0]["finish_reason"] they are missed; update the branch handling
model_vendor == "imported_model" (and the analogous block around the other
occurrence) to also inspect response_body.get("choices",
[])[0].get("finish_reason") (falling back to top-level if absent), pass that
value into _map_finish_reason, and append the mapped result to finish_reasons
and set output_messages[0]["finish_reason"] as appropriate so
GEN_AI_RESPONSE_FINISH_REASONS and output_messages align with imported-model
events.
- Around line 729-739: The code is splitting
response_body["output"]["message"]["content"] into multiple messages and losing
non-text parts; instead treat response_body["output"]["message"] as a single
assistant message: call _converse_content_to_parts on
response_body.get("output").get("message") (or equivalent to obtain parts),
build one output message via _output_message("assistant", parts, fr) where fr =
_map_finish_reason(response_body.get("stopReason")), and then call
_set_span_attribute(span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
json.dumps([that_single_message])); remove the loop that iterates msgs and avoid
using msg.get("text") so tool/media blocks are preserved.
- Around line 221-227: When model_vendor == "ai21" (and the similar block at
553-562) skip finishReason values that are null instead of coercing them to the
string "None": inspect comp.get("finishReason") into fr_data and if fr_data is
None (or if it's a dict whose "reason" is None) simply continue/skip adding a
finish reason; only call _map_finish_reason with a real non-null raw value and
append its result when non-empty. This ensures finish_reasons and the
_map_finish_reason call are only invoked for actual values and not for Bedrock's
null finishReason.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d9ae533b-070d-488c-8328-3b78d50e73d3

📥 Commits

Reviewing files that changed from the base of the PR and between e008b0a and 07136f6.

📒 Files selected for processing (10)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/init.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (1)

2-18: ⚠️ Potential issue | 🟡 Minor

Keep the import block contiguous to satisfy Ruff E402.

Line 5 splits the module-level imports, and Ruff is already flagging this file for E402.

Suggested fix
 import json
 import logging
 import time
-
-logger = logging.getLogger(__name__)
 from opentelemetry.instrumentation.bedrock.config import Config
 from opentelemetry.instrumentation.bedrock.utils import should_send_prompts
 from opentelemetry.semconv._incubating.attributes import (
     gen_ai_attributes as GenAIAttributes,
 )
@@
 from opentelemetry.semconv._incubating.attributes.gen_ai_attributes import (
     GenAiOperationNameValues,
     GenAiSystemValues,
 )
 from opentelemetry.semconv_ai import SpanAttributes
+
+logger = logging.getLogger(__name__)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 2 - 18, Move the module-level imports into a single contiguous
block and place the logger assignment after them: relocate the statement logger
= logging.getLogger(__name__) so it appears below all import lines (including
logging, time, Config, should_send_prompts, GenAIAttributes,
AWS_BEDROCK_GUARDRAIL_ID, GenAiOperationNameValues, GenAiSystemValues, and
SpanAttributes) and remove the blank line that currently splits the imports to
satisfy Ruff E402.
♻️ Duplicate comments (4)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (1)

529-541: ⚠️ Potential issue | 🟠 Major

Keep the legacy LLM_BEDROCK_* constant names as aliases.

GuardrailMeters and PromptCaching are public exports. Dropping the old attribute names here turns a semconv migration into an AttributeError for downstream imports.

Suggested fix
 class GuardrailMeters:
     GEN_AI_BEDROCK_GUARDRAIL_ACTIVATION = "gen_ai.bedrock.guardrail.activation"
     GEN_AI_BEDROCK_GUARDRAIL_LATENCY = "gen_ai.bedrock.guardrail.latency"
     GEN_AI_BEDROCK_GUARDRAIL_COVERAGE = "gen_ai.bedrock.guardrail.coverage"
     GEN_AI_BEDROCK_GUARDRAIL_SENSITIVE = "gen_ai.bedrock.guardrail.sensitive_info"
     GEN_AI_BEDROCK_GUARDRAIL_TOPICS = "gen_ai.bedrock.guardrail.topics"
     GEN_AI_BEDROCK_GUARDRAIL_CONTENT = "gen_ai.bedrock.guardrail.content"
     GEN_AI_BEDROCK_GUARDRAIL_WORDS = "gen_ai.bedrock.guardrail.words"
+    LLM_BEDROCK_GUARDRAIL_ACTIVATION = GEN_AI_BEDROCK_GUARDRAIL_ACTIVATION
+    LLM_BEDROCK_GUARDRAIL_LATENCY = GEN_AI_BEDROCK_GUARDRAIL_LATENCY
+    LLM_BEDROCK_GUARDRAIL_COVERAGE = GEN_AI_BEDROCK_GUARDRAIL_COVERAGE
+    # repeat for the remaining removed LLM_BEDROCK_* symbols
@@
 class PromptCaching:
     # will be moved under the AI SemConv. Not namespaced since also OpenAI supports this.
     GEN_AI_PROMPT_CACHING = "gen_ai.prompt.caching"
+    LLM_BEDROCK_PROMPT_CACHING = GEN_AI_PROMPT_CACHING
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`
around lines 529 - 541, The new classes GuardrailMeters and PromptCaching
removed legacy constant names causing downstream AttributeError; add alias
attributes with the old LLM_BEDROCK_* names on GuardrailMeters (e.g.,
LLM_BEDROCK_GUARDRAIL_ACTIVATION -> GEN_AI_BEDROCK_GUARDRAIL_ACTIVATION, etc.)
and LLM_BEDROCK_PROMPT_CACHING on PromptCaching that point to the new constants
so existing imports continue to work; modify the class definitions
(GuardrailMeters and PromptCaching) to define those LLM_BEDROCK_* class
attributes as references to the corresponding GEN_AI_* constants.
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (3)

252-255: ⚠️ Potential issue | 🟠 Major

Fallback to choices[0].finish_reason for imported models.

This path already falls back to choices[0].text when generation is absent, but it never reads choices[0].finish_reason. OpenAI-style imported responses will therefore lose both the span-level and message-level finish reason.

Suggested fix
     elif model_vendor == "imported_model":
-        fr = _map_finish_reason(response_body.get("stop_reason"))
+        choices = response_body.get("choices") or []
+        raw_reason = response_body.get("stop_reason")
+        if raw_reason is None and choices:
+            raw_reason = choices[0].get("finish_reason")
+        fr = _map_finish_reason(raw_reason)
         if fr:
             finish_reasons.append(fr)
@@
 def _set_imported_model_response_span_attributes(span, response_body):
-    fr = _map_finish_reason(response_body.get("stop_reason"))
-    content = response_body.get("generation")
-    if content is None and response_body.get("choices"):
-        content = response_body["choices"][0].get("text")
+    choices = response_body.get("choices") or []
+    raw_reason = response_body.get("stop_reason")
+    if raw_reason is None and choices:
+        raw_reason = choices[0].get("finish_reason")
+    fr = _map_finish_reason(raw_reason)
+    content = response_body.get("generation")
+    if content is None and choices:
+        content = choices[0].get("text")
     _set_span_attribute(
         span,
         GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
         json.dumps([_output_message("assistant", [_text_part(content)], fr)]),
     )

Also applies to: 780-788

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 252 - 255, The imported_model branch currently maps finish reason
only from response_body.get("stop_reason") via _map_finish_reason and appends it
to finish_reasons, but when generation is absent it should also fall back to
OpenAI-style choices[0].finish_reason; update the branch in span_utils.py (the
block handling model_vendor == "imported_model" that calls _map_finish_reason
and finish_reasons.append) to, if fr is falsy, attempt to read
response_body.get("choices", [{}])[0].get("finish_reason") and pass that through
_map_finish_reason before appending; apply the same change to the duplicated
logic at the other occurrence around the 780–788 region so both locations handle
choices[0].finish_reason as a fallback.

227-230: ⚠️ Potential issue | 🟠 Major

Keep missing AI21 finish reasons as None, not "None".

Line 230 and Line 563 still stringify non-dict finishReason values. If AI21 returns null, _map_finish_reason() will emit "None" as a real finish reason.

Suggested fix
-            raw = fr_data.get("reason") if isinstance(fr_data, dict) else str(fr_data)
+            raw = fr_data.get("reason") if isinstance(fr_data, dict) else (
+                str(fr_data) if fr_data is not None else None
+            )
@@
-        raw_reason = fr_data.get("reason") if isinstance(fr_data, dict) else str(fr_data) or None
+        raw_reason = fr_data.get("reason") if isinstance(fr_data, dict) else (
+            str(fr_data) if fr_data is not None else None
+        )

Also applies to: 561-564

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 227 - 230, The code currently stringifies non-dict AI21
finishReason values (producing "None"); update the logic in span_utils.py (the
ai21 branch and the other occurrence around _map_finish_reason) to avoid calling
str() on non-dict finishReason values so that None stays None. Concretely, in
the ai21 loop (and the duplicate block near _map_finish_reason) set fr_data =
comp.get("finishReason") and then raw = fr_data.get("reason") if
isinstance(fr_data, dict) else fr_data (no str conversion), ensuring
finishReason values that are None remain None.

735-745: ⚠️ Potential issue | 🟠 Major

Serialize Bedrock output.message as one message with multiple parts.

The output branch currently emits one assistant message per content block and only keeps msg["text"]. That breaks message boundaries and drops tool/image/document blocks from Converse responses.

Suggested fix
     elif "output" in response_body:
         fr = _map_finish_reason(response_body.get("stopReason"))
-        msgs = response_body.get("output").get("message", {}).get("content", [])
-        output_messages = []
-        for msg in msgs:
-            output_messages.append(_output_message("assistant", [_text_part(msg.get("text"))], fr))
+        message = response_body.get("output", {}).get("message", {})
+        output_messages = [
+            _output_message(
+                message.get("role", "assistant"),
+                _converse_content_to_parts(message.get("content", [])),
+                fr,
+            )
+        ]
         _set_span_attribute(
             span,
             GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
             json.dumps(output_messages),
         )

Based on learnings, the formal JSON schema for gen_ai.output.messages uses the {role, parts} structure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 735 - 745, The current "output" branch in span_utils.py builds one
assistant message per content block and only uses msg["text"], which breaks
message boundaries and drops non-text parts; update the branch that handles
response_body["output"] to create a single assistant message with parts equal to
the original content blocks (preserving non-text blocks) by mapping each content
item through _text_part or a generic part-preserving mapper, compute fr via
_map_finish_reason(response_body.get("stopReason")), build one output_messages
entry with role "assistant" and parts list (not multiple messages), pass that to
_output_message, and then call _set_span_attribute(span,
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, json.dumps(output_messages)); reuse
existing helpers _text_part, _output_message, _map_finish_reason, and
_set_span_attribute to keep formatting consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`:
- Around line 245-252: The span started via tracer.start_span (span =
tracer.start_span(_span_name(operation_name, _model), ...)) is created outside a
context and may never be ended if fn(*args, **kwargs) raises; wrap the call to
fn(*args, **kwargs) in a try/except that calls span.end() (and records the
exception on the span if appropriate) before re-raising so the span is always
closed, then call _handle_stream_call(span, kwargs, response, metric_params,
event_logger) only in the success path; apply the same try/except pattern to the
other streaming block that also starts spans (the analogous code around
_handle_stream_call at the later block).
- Around line 441-450: The bug is that code appends every "toolUse" fragment
from event["contentBlockStart"] and event["contentBlockDelta"] directly to
tool_blocks producing duplicate/partial tool_call parts; change the logic to
merge fragments instead: when you see a "contentBlockStart" with a "toolUse",
store it as a pending tool fragment (e.g., pending_tool or merge into the last
tool_blocks entry if it is incomplete) rather than unconditionally appending to
tool_blocks; when you later see a "contentBlockDelta" with "toolUse", detect the
corresponding pending fragment (by block id or by checking the last
pending/incomplete entry in tool_blocks) and extend/merge its fields
(concatenate lists/strings or update nested keys) instead of appending a new
entry; apply the same merged-append logic to the other similar blocks referenced
(the blocks around the other event handling ranges) to ensure only
completed/merged toolUse payloads are emitted.

In
`@packages/opentelemetry-instrumentation-bedrock/tests/test_semconv_compliance.py`:
- Around line 1029-1048: The test is patching utils.should_emit_events but
_handle_converse calls the symbol imported into the
opentelemetry.instrumentation.bedrock module; change the patch targets to the
names imported by that module so the branch is actually forced. Update the
decorators in test_converse_event_mode_sets_finish_reasons to patch
"opentelemetry.instrumentation.bedrock.should_emit_events" (and likewise
"opentelemetry.instrumentation.bedrock.should_send_prompts") so
_handle_converse(span, ...) uses the patched booleans when exercising the
event-mode branch.

---

Outside diff comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 2-18: Move the module-level imports into a single contiguous block
and place the logger assignment after them: relocate the statement logger =
logging.getLogger(__name__) so it appears below all import lines (including
logging, time, Config, should_send_prompts, GenAIAttributes,
AWS_BEDROCK_GUARDRAIL_ID, GenAiOperationNameValues, GenAiSystemValues, and
SpanAttributes) and remove the blank line that currently splits the imports to
satisfy Ruff E402.

---

Duplicate comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`:
- Around line 529-541: The new classes GuardrailMeters and PromptCaching removed
legacy constant names causing downstream AttributeError; add alias attributes
with the old LLM_BEDROCK_* names on GuardrailMeters (e.g.,
LLM_BEDROCK_GUARDRAIL_ACTIVATION -> GEN_AI_BEDROCK_GUARDRAIL_ACTIVATION, etc.)
and LLM_BEDROCK_PROMPT_CACHING on PromptCaching that point to the new constants
so existing imports continue to work; modify the class definitions
(GuardrailMeters and PromptCaching) to define those LLM_BEDROCK_* class
attributes as references to the corresponding GEN_AI_* constants.

In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 252-255: The imported_model branch currently maps finish reason
only from response_body.get("stop_reason") via _map_finish_reason and appends it
to finish_reasons, but when generation is absent it should also fall back to
OpenAI-style choices[0].finish_reason; update the branch in span_utils.py (the
block handling model_vendor == "imported_model" that calls _map_finish_reason
and finish_reasons.append) to, if fr is falsy, attempt to read
response_body.get("choices", [{}])[0].get("finish_reason") and pass that through
_map_finish_reason before appending; apply the same change to the duplicated
logic at the other occurrence around the 780–788 region so both locations handle
choices[0].finish_reason as a fallback.
- Around line 227-230: The code currently stringifies non-dict AI21 finishReason
values (producing "None"); update the logic in span_utils.py (the ai21 branch
and the other occurrence around _map_finish_reason) to avoid calling str() on
non-dict finishReason values so that None stays None. Concretely, in the ai21
loop (and the duplicate block near _map_finish_reason) set fr_data =
comp.get("finishReason") and then raw = fr_data.get("reason") if
isinstance(fr_data, dict) else fr_data (no str conversion), ensuring
finishReason values that are None remain None.
- Around line 735-745: The current "output" branch in span_utils.py builds one
assistant message per content block and only uses msg["text"], which breaks
message boundaries and drops non-text parts; update the branch that handles
response_body["output"] to create a single assistant message with parts equal to
the original content blocks (preserving non-text blocks) by mapping each content
item through _text_part or a generic part-preserving mapper, compute fr via
_map_finish_reason(response_body.get("stopReason")), build one output_messages
entry with role "assistant" and parts list (not multiple messages), pass that to
_output_message, and then call _set_span_attribute(span,
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, json.dumps(output_messages)); reuse
existing helpers _text_part, _output_message, _map_finish_reason, and
_set_span_attribute to keep formatting consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e6a6bc4c-4af9-4ca2-a4f7-3cd6e8636361

📥 Commits

Reviewing files that changed from the base of the PR and between e008b0a and c3e57c0.

📒 Files selected for processing (13)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
  • packages/opentelemetry-instrumentation-bedrock/pyproject.toml
  • packages/opentelemetry-instrumentation-bedrock/tests/test_content_parts.py
  • packages/opentelemetry-instrumentation-bedrock/tests/test_semconv_compliance.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
🚧 Files skipped from review as they are similar to previous changes (5)
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py

Comment on lines +1029 to +1048
@patch("opentelemetry.instrumentation.bedrock.utils.should_emit_events", return_value=True)
@patch("opentelemetry.instrumentation.bedrock.utils.should_send_prompts", return_value=True)
def test_converse_event_mode_sets_finish_reasons(self, _mock_prompts, _mock_events):
"""Converse event-mode branch must still set finish_reasons on the span."""
from opentelemetry.instrumentation.bedrock import _handle_converse

span = _mock_span()
response = {
"output": {"message": {"role": "assistant", "content": [{"text": "Hi"}]}},
"stopReason": "end_turn",
"usage": {"inputTokens": 2, "outputTokens": 1},
}
kwargs = {
"modelId": "anthropic.claude-3-sonnet-20240229-v1:0",
"messages": [{"role": "user", "content": [{"text": "Hello"}]}],
}
metric_params = _mock_metric_params()

_handle_converse(span, kwargs, response, metric_params, MagicMock())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -n 'from opentelemetry\.instrumentation\.bedrock\.utils import .*should_emit_events' packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py
rg -n '@patch\("opentelemetry\.instrumentation\.bedrock\.utils\.should_emit_events"' packages/opentelemetry-instrumentation-bedrock/tests/test_semconv_compliance.py
python - <<'PY'
from unittest.mock import patch

class Utils:
    def should_emit_events(self):
        return False

utils = Utils()
should_emit_events = utils.should_emit_events

with patch.object(utils, "should_emit_events", return_value=True):
    print(should_emit_events())  # stays False: imported alias is unchanged
PY

Repository: traceloop/openllmetry

Length of output: 273


🏁 Script executed:

# Find how should_emit_events is actually imported/used in bedrock modules
fd . packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock --type f -name "*.py" | head -10

Repository: traceloop/openllmetry

Length of output: 295


🏁 Script executed:

# Check bedrock __init__.py for imports
head -100 packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py

Repository: traceloop/openllmetry

Length of output: 3601


🏁 Script executed:

# Find _handle_converse implementation
rg -n "_handle_converse" packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/ -A 5

Repository: traceloop/openllmetry

Length of output: 3366


🏁 Script executed:

# Check how should_emit_events is used within _handle_converse or its module
rg -n "should_emit_events" packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/ -B 2 -A 2

Repository: traceloop/openllmetry

Length of output: 6284


Patch the symbol imported by bedrock.__init__, not utils.should_emit_events.

_handle_converse() calls should_emit_events() using the reference imported into opentelemetry.instrumentation.bedrock at lines 42-45. Line 1029 patches utils.should_emit_events, so this test does not actually force the event-mode branch and can pass while the real path regresses.

Suggested fix
-    `@patch`("opentelemetry.instrumentation.bedrock.utils.should_emit_events", return_value=True)
+    `@patch`("opentelemetry.instrumentation.bedrock.should_emit_events", return_value=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-bedrock/tests/test_semconv_compliance.py`
around lines 1029 - 1048, The test is patching utils.should_emit_events but
_handle_converse calls the symbol imported into the
opentelemetry.instrumentation.bedrock module; change the patch targets to the
names imported by that module so the branch is actually forced. Update the
decorators in test_converse_event_mode_sets_finish_reasons to patch
"opentelemetry.instrumentation.bedrock.should_emit_events" (and likewise
"opentelemetry.instrumentation.bedrock.should_send_prompts") so
_handle_converse(span, ...) uses the patched booleans when exercising the
event-mode branch.

@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1982-bedrock-instrumentation branch from c572d9c to 05734ee Compare March 30, 2026 13:30
max-deygin-traceloop and others added 10 commits March 30, 2026 18:38
- Bump opentelemetry-semantic-conventions-ai to >=0.5.0,<0.6.0
- GEN_AI_SYSTEM → GEN_AI_PROVIDER_NAME with GenAiSystemValues.AWS_BEDROCK
- LLM_REQUEST_TYPE → GEN_AI_OPERATION_NAME with GenAiOperationNameValues
- LLM_USAGE_TOTAL_TOKENS → GEN_AI_USAGE_TOTAL_TOKENS
- Indexed GEN_AI_PROMPT/COMPLETION → GEN_AI_INPUT/OUTPUT_MESSAGES (JSON)
- Hardcoded "AWS" → "aws.bedrock" via GenAiSystemValues enum
- Add shared semconv compliance test (test_semconv.py)
- Update all trace/metric tests for new attribute names and values

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on mapping

- Messages use parts array format: {"role": "...", "parts": [{"type": "text", "content": "..."}]}
- Output messages include finish_reason with OTel-mapped values
- Set gen_ai.response.finish_reasons span attribute on all responses
- Finish reason mapping: end_turn→stop, tool_use→tool_call, max_tokens→length,
  COMPLETE→stop, FINISH→stop, guardrail_intervened→content_filter
- Added helpers for converting Anthropic and Converse content blocks to parts
- Updated all existing tests to assert new format

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix _map_finish_reason(None) to return None instead of "stop"
- Capture Anthropic system instructions via gen_ai.system_instructions
- Fix tool call arguments double-encoding
- Map image content to BlobPart/UriPart
- Move system instructions to gen_ai.system_instructions (Converse + Nova)
- Fix deprecated gen_ai.system in prompt_caching.py
- Fix converse stream missing stop reason default
- Handle non-text content block deltas in streaming wrapper
- Capture message_delta stop_reason in streaming wrapper
- Conditionally include finish_reason in output messages
- Remove debug print in streaming wrapper
- Fix hardcoded gen_ai.vendor in guardrail/span_utils
- Fix hardcoded exception counter metric name
- Rename LLM_ prefix constants to GEN_AI_
- Update all tests for new behavior
- #14: _converse_content_to_parts handles image blocks → BlobPart
- #15: _anthropic_content_to_parts maps thinking blocks → ReasoningPart
- #16: ChoiceEvent.finish_reason defaults to None (not "unknown")
- #17: Event emitter maps provider finish reasons via _map_finish_reason()
- #18: _converse_content_to_parts handles video/document blocks
- #19: _anthropic_content_to_parts parses string tool_use input via json.loads
- #22: finish_reasons set unconditionally (not gated by should_send_prompts)

New unit tests: test_content_parts.py, test_event_emitter.py
Updated integration tests: anthropic, cohere, meta, nova, titan
@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1982-bedrock-instrumentation branch from aa8e67f to 022b175 Compare March 30, 2026 15:40
@max-deygin-traceloop max-deygin-traceloop merged commit 194b0a7 into main Mar 30, 2026
12 of 13 checks passed
@max-deygin-traceloop max-deygin-traceloop deleted the max/tlp-1982-bedrock-instrumentation branch April 12, 2026 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants