feat(bedrock): Instrumentation adjustment for Otel GenAI semconv support #3845
feat(bedrock): Instrumentation adjustment for Otel GenAI semconv support #3845max-deygin-traceloop merged 19 commits intomainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReworks Bedrock instrumentation to use incubating GenAI semantic conventions: derive operation names, set provider/operation at span start, normalize finish reasons, restructure prompt/response into message/parts, adjust streaming/converse handling and event emission, rename metrics/constants, and expand semconv and finish-reason tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Instrumentation as InstrumentationCore
participant Tracer
participant Model as BedrockRuntime
participant Stream as StreamingWrapper
participant Emitter as EventEmitter
participant Span as Span
Client->>Instrumentation: call invoke_model / converse (kwargs)
Instrumentation->>Tracer: start_span(name derived via _derive_operation_name, attrs: GEN_AI_PROVIDER_NAME, GEN_AI_OPERATION_NAME)
Tracer-->>Span: returns active span
Instrumentation->>Model: perform call (invoke/converse)
alt streaming response
Model->>Stream: stream events/deltas
Stream->>Instrumentation: aggregated events / deltas (including tool_blocks)
Stream->>Emitter: emit streaming choice/converse events (finish_reason via _map_finish_reason)
Stream->>Span: set attributes (GEN_AI_OUTPUT_MESSAGES, GEN_AI_RESPONSE_FINISH_REASONS)
else non-streaming response
Model-->>Instrumentation: full response
Instrumentation->>Span: set attributes (input/output messages, finish reasons)
Instrumentation->>Emitter: emit non-streaming choice/converse events (mapped finish_reason)
end
Instrumentation->>Span: end span
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
f453b9a to
9dfa73a
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py (1)
1048-1068:⚠️ Potential issue | 🟡 MinorCover the
tool_use -> tool_callmapping here.This test hits the new tool-use stop-reason path, but it only checks that non-text deltas do not crash. Please also assert the mapped span attribute so this PR's new behavior is actually exercised.
Suggested assertion
assert ( bedrock_span.attributes.get(GenAIAttributes.GEN_AI_OPERATION_NAME) == GenAiOperationNameValues.CHAT.value ) + assert ( + bedrock_span.attributes.get(GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS) + == ("tool_call",) + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py` around lines 1048 - 1068, Add an assertion that verifies the tool-use stop-reason is mapped to the tool_call operation on the span: when the test in test_anthropic.py sees tool_use_found True, assert that bedrock_span.attributes.get(GenAIAttributes.GEN_AI_OPERATION_NAME) equals GenAiOperationNameValues.TOOL_CALL.value (use the existing bedrock_span, GenAIAttributes and GenAiOperationNameValues symbols so the new check sits beside the existing model/vendor/operation assertions).packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py (1)
1003-1024:⚠️ Potential issue | 🟡 MinorAssert the new finish-reason span attribute in this event-mode stream test.
This path goes through the same event-enabled streaming branch updated in
bedrock/__init__.py. Without checkingGenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS, the missing span update stays invisible.Suggested assertion
assert ( bedrock_span.attributes[SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS] == inputTokens + outputTokens ) + assert ( + bedrock_span.attributes[GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS] + == ("content_filter",) + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py` around lines 1003 - 1024, Add an assertion to this event-mode streaming test to verify the span was updated with the finish-reason attribute: after the existing usage assertions in tests/traces/test_nova.py, assert that bedrock_span.attributes[GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS] equals the expected finish reason used in the test (e.g., the local finishReason variable or the literal like "stop"). This ensures the branch updated in bedrock.__init__ is exercised and the finish-reason attribute on bedrock_span is validated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`:
- Around line 404-415: The current branch that calls
emit_streaming_converse_response_event(...) when should_emit_events() is true
omits updating the converse span with the finish reason, so
instrument_with_content / instrument_with_no_content never set
gen_ai.response.finish_reasons; update the code so the span is always annotated
with the finish_reason: extract the finish-reason write into a small helper
(e.g., set_converse_finish_reason(span, finish_reason)) and call it in both the
should_emit_events() path and the else path, or explicitly call a minimal setter
to set span attribute "gen_ai.response.finish_reasons" (using span and
stop_reason) before or after emit_streaming_converse_response_event(...) so
events remain enabled but spans still receive the finish reason via
set_converse_streaming_response_span_attributes or the new helper.
---
Outside diff comments:
In
`@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py`:
- Around line 1048-1068: Add an assertion that verifies the tool-use stop-reason
is mapped to the tool_call operation on the span: when the test in
test_anthropic.py sees tool_use_found True, assert that
bedrock_span.attributes.get(GenAIAttributes.GEN_AI_OPERATION_NAME) equals
GenAiOperationNameValues.TOOL_CALL.value (use the existing bedrock_span,
GenAIAttributes and GenAiOperationNameValues symbols so the new check sits
beside the existing model/vendor/operation assertions).
In `@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py`:
- Around line 1003-1024: Add an assertion to this event-mode streaming test to
verify the span was updated with the finish-reason attribute: after the existing
usage assertions in tests/traces/test_nova.py, assert that
bedrock_span.attributes[GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS] equals
the expected finish reason used in the test (e.g., the local finishReason
variable or the literal like "stop"). This ensures the branch updated in
bedrock.__init__ is exercised and the finish-reason attribute on bedrock_span is
validated.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9af6ddd1-72df-437a-9337-728430cd1668
⛔ Files ignored due to path filters (1)
packages/opentelemetry-instrumentation-bedrock/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (16)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.pypackages/opentelemetry-instrumentation-bedrock/pyproject.tomlpackages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.pypackages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.pypackages/opentelemetry-instrumentation-bedrock/tests/test_semconv.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
✅ Files skipped from review due to trivial changes (4)
- packages/opentelemetry-instrumentation-bedrock/tests/test_semconv.py
- packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
- packages/opentelemetry-instrumentation-bedrock/pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (5)
- packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py
9dfa73a to
861d53f
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (1)
233-294:⚠️ Potential issue | 🔴 CriticalDuplicated code and incorrect indentation cause span lifecycle bugs.
The
_handle_completionmethod has structural issues:
Duplicated logic: Lines 244-253 duplicate the exact same code from lines 235-242 (metric_attributes, response_id setting, and duration recording).
Incorrect nesting: Lines 256-294 (token usage calculation, streaming response handling,
span.end()) are inside theif self._duration_histogram:block. This means if metrics are disabled (self._duration_histogramis None), the span never ends and token usage is never recorded.Compare with
AnthropicAsyncStream._complete_instrumentation(lines 399-454) which has the correct structure.🐛 Proposed fix to correct the structure
def _handle_completion(self): """Handle completion logic""" metric_attributes = shared_metrics_attributes(self._complete_response) set_span_attribute(self._span, GenAIAttributes.GEN_AI_RESPONSE_ID, self._complete_response.get("id")) + if self._duration_histogram: duration = time.time() - self._start_time self._duration_histogram.record( duration, attributes=metric_attributes, ) - # This mirrors the logic from build_from_streaming_response - metric_attributes = shared_metrics_attributes(self._complete_response) - set_span_attribute(self._span, GenAIAttributes.GEN_AI_RESPONSE_ID, self._complete_response.get("id")) - - if self._duration_histogram: - duration = time.time() - self._start_time - self._duration_histogram.record( - duration, - attributes=metric_attributes, - ) - - # Calculate token usage - if Config.enrich_token_usage: + # Calculate token usage + if Config.enrich_token_usage: + try: + if usage := self._complete_response.get("usage"): + prompt_tokens = usage.get("input_tokens", 0) or 0 + else: + prompt_tokens = count_prompt_tokens_from_request(self._instance, self._kwargs) + + if usage := self._complete_response.get("usage"): + completion_tokens = usage.get("output_tokens", 0) or 0 + else: + completion_content = "" + if self._complete_response.get("events"): + model_name = self._complete_response.get("model") or None + for event in self._complete_response.get("events"): + if event.get("text"): + completion_content += event.get("text") + + if model_name and hasattr(self._instance, "count_tokens"): + completion_tokens = self._instance.count_tokens(completion_content) + + _set_token_usage( + self._span, + self._complete_response, + prompt_tokens, + completion_tokens, + metric_attributes, + self._token_histogram, + self._choice_counter, + ) + except Exception as e: + logger.warning("Failed to set token usage, error: %s", e) + + _handle_streaming_response(self._span, self._event_logger, self._complete_response) + + if self._span.is_recording(): + self._span.set_status(Status(StatusCode.OK)) + self._span.end() + + self._instrumentation_completed = True🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py` around lines 233 - 294, The _handle_completion method contains duplicated metric recording and incorrectly nests the token-usage calculation and span completion inside the if self._duration_histogram block; remove the duplicated metric_attributes/response_id/duration-recording lines (the second copy around metric_attributes/duration recording) and dedent the token usage block, _handle_streaming_response call, span status/set_status and self._span.end() so they run whether or not self._duration_histogram is set; ensure you still record duration only when self._duration_histogram is truthy, set GenAIAttributes.GEN_AI_RESPONSE_ID on self._span once using self._complete_response.get("id"), and preserve references to Config.enrich_token_usage, _set_token_usage, _handle_streaming_response, _token_histogram, _choice_counter and toggle self._instrumentation_completed = True after ending the span.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 258-275: Compute and set the GEN_AI_RESPONSE_FINISH_REASONS
attribute unconditionally (outside the should_send_prompts() gating) so finish
reasons are recorded even in instrument_with_no_content mode: update
_set_generations_span_attributes to always collect finish_reasons and call
_set_span_attribute(span, GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS,
tuple(finish_reasons)) regardless of prompt/response content gating, and remove
or relocate any surrounding checks in callers like
set_model_choice_span_attributes() so only actual prompt/response content
remains gated; apply the same unconditional finish-reasons write to the other
analogous helpers mentioned (ranges around 333-355, 462-480, 513-538, 633-660,
711-725, 888-904, 907-920).
- Around line 104-121: The span attribute GenAIAttributes.GEN_AI_INPUT_MESSAGES
currently only serializes request_body["messages"], dropping any top-level
Anthropic system prompt; update the logic around request_body handling (the loop
building input_messages and the call to _set_span_attribute with span) to also
include a top-level request_body.get("system") as a message entry (e.g., role
"system" with parts derived via _text_part or _anthropic_content_to_parts
depending on type) before appending the existing messages, reusing _text_part
and _anthropic_content_to_parts to build parts and then JSON-serializing the
full list when calling _set_span_attribute.
- Around line 24-45: BEDROCK_FINISH_REASON_MAP currently misses AI21 and an
uppercase Bedrock guardrail token so raw provider values like "endoftext" and
"CONTENT_FILTERED" leak through; add explicit mappings (e.g. "endoftext": "stop"
and "CONTENT_FILTERED": "content_filter") to BEDROCK_FINISH_REASON_MAP and
update _map_finish_reason to normalize the incoming reason (coerce to str, then
check direct, reason.lower(), and reason.upper() lookups) before falling back to
the raw value so known provider variants are transformed to the expected GenAI
semconv values (refer to BEDROCK_FINISH_REASON_MAP and _map_finish_reason and
ensure output matches GEN_AI_RESPONSE_FINISH_REASONS).
- Around line 661-676: The code iterates each content block and creates separate
assistant messages, which breaks message boundaries and drops non-text parts;
instead, read the single Bedrock message at
response_body.get("output").get("message"), build one assistant message whose
"parts" is a list created by mapping each entry in message["content"] to a part
(use _text_part for items with "text" and preserve or transform non-text blocks
consistently—or call any existing helper like _part if present), then set
output_messages = [{ "role": "assistant", "parts": parts, "finish_reason": fr }]
and keep the _set_span_attribute calls (json.dumps(output_messages) and
GenAI_RESPONSE_FINISH_REASONS) as before so message boundaries and non-text
parts are preserved.
---
Outside diff comments:
In
`@packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py`:
- Around line 233-294: The _handle_completion method contains duplicated metric
recording and incorrectly nests the token-usage calculation and span completion
inside the if self._duration_histogram block; remove the duplicated
metric_attributes/response_id/duration-recording lines (the second copy around
metric_attributes/duration recording) and dedent the token usage block,
_handle_streaming_response call, span status/set_status and self._span.end() so
they run whether or not self._duration_histogram is set; ensure you still record
duration only when self._duration_histogram is truthy, set
GenAIAttributes.GEN_AI_RESPONSE_ID on self._span once using
self._complete_response.get("id"), and preserve references to
Config.enrich_token_usage, _set_token_usage, _handle_streaming_response,
_token_histogram, _choice_counter and toggle self._instrumentation_completed =
True after ending the span.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3c9f3c41-9237-4cd5-ad40-ecfcd297a39d
⛔ Files ignored due to path filters (33)
packages/opentelemetry-instrumentation-agno/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-alephalpha/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-anthropic/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-bedrock/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-chromadb/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-cohere/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-crewai/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-google-generativeai/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-groq/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-haystack/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-lancedb/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-langchain/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-llamaindex/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-marqo/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-mcp/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-milvus/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-mistralai/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-ollama/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-openai-agents/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-openai/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-pinecone/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-qdrant/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-replicate/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-sagemaker/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-together/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-transformers/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-vertexai/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-voyageai/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-watsonx/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-weaviate/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-writer/uv.lockis excluded by!**/*.lockpackages/sample-app/uv.lockis excluded by!**/*.lockpackages/traceloop-sdk/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (52)
packages/opentelemetry-instrumentation-agno/pyproject.tomlpackages/opentelemetry-instrumentation-alephalpha/pyproject.tomlpackages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.pypackages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.pypackages/opentelemetry-instrumentation-anthropic/pyproject.tomlpackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.pypackages/opentelemetry-instrumentation-bedrock/pyproject.tomlpackages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.pypackages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.pypackages/opentelemetry-instrumentation-bedrock/tests/test_semconv.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.pypackages/opentelemetry-instrumentation-chromadb/pyproject.tomlpackages/opentelemetry-instrumentation-cohere/pyproject.tomlpackages/opentelemetry-instrumentation-crewai/pyproject.tomlpackages/opentelemetry-instrumentation-google-generativeai/pyproject.tomlpackages/opentelemetry-instrumentation-groq/pyproject.tomlpackages/opentelemetry-instrumentation-haystack/pyproject.tomlpackages/opentelemetry-instrumentation-lancedb/pyproject.tomlpackages/opentelemetry-instrumentation-langchain/pyproject.tomlpackages/opentelemetry-instrumentation-llamaindex/pyproject.tomlpackages/opentelemetry-instrumentation-marqo/pyproject.tomlpackages/opentelemetry-instrumentation-mcp/pyproject.tomlpackages/opentelemetry-instrumentation-milvus/pyproject.tomlpackages/opentelemetry-instrumentation-mistralai/pyproject.tomlpackages/opentelemetry-instrumentation-ollama/pyproject.tomlpackages/opentelemetry-instrumentation-openai-agents/pyproject.tomlpackages/opentelemetry-instrumentation-openai/pyproject.tomlpackages/opentelemetry-instrumentation-openai/tests/traces/test_azure.pypackages/opentelemetry-instrumentation-openai/tests/traces/test_chat.pypackages/opentelemetry-instrumentation-pinecone/pyproject.tomlpackages/opentelemetry-instrumentation-qdrant/pyproject.tomlpackages/opentelemetry-instrumentation-replicate/pyproject.tomlpackages/opentelemetry-instrumentation-sagemaker/pyproject.tomlpackages/opentelemetry-instrumentation-together/pyproject.tomlpackages/opentelemetry-instrumentation-transformers/pyproject.tomlpackages/opentelemetry-instrumentation-vertexai/pyproject.tomlpackages/opentelemetry-instrumentation-voyageai/pyproject.tomlpackages/opentelemetry-instrumentation-watsonx/pyproject.tomlpackages/opentelemetry-instrumentation-weaviate/pyproject.tomlpackages/opentelemetry-instrumentation-writer/pyproject.tomlpackages/sample-app/pyproject.tomlpackages/traceloop-sdk/pyproject.toml
✅ Files skipped from review due to trivial changes (29)
- packages/opentelemetry-instrumentation-lancedb/pyproject.toml
- packages/opentelemetry-instrumentation-milvus/pyproject.toml
- packages/opentelemetry-instrumentation-groq/pyproject.toml
- packages/opentelemetry-instrumentation-mistralai/pyproject.toml
- packages/opentelemetry-instrumentation-cohere/pyproject.toml
- packages/opentelemetry-instrumentation-anthropic/pyproject.toml
- packages/opentelemetry-instrumentation-transformers/pyproject.toml
- packages/opentelemetry-instrumentation-chromadb/pyproject.toml
- packages/opentelemetry-instrumentation-haystack/pyproject.toml
- packages/opentelemetry-instrumentation-bedrock/tests/test_semconv.py
- packages/opentelemetry-instrumentation-mcp/pyproject.toml
- packages/opentelemetry-instrumentation-google-generativeai/pyproject.toml
- packages/opentelemetry-instrumentation-writer/pyproject.toml
- packages/opentelemetry-instrumentation-vertexai/pyproject.toml
- packages/opentelemetry-instrumentation-ollama/pyproject.toml
- packages/opentelemetry-instrumentation-alephalpha/pyproject.toml
- packages/opentelemetry-instrumentation-crewai/pyproject.toml
- packages/opentelemetry-instrumentation-sagemaker/pyproject.toml
- packages/traceloop-sdk/pyproject.toml
- packages/opentelemetry-instrumentation-together/pyproject.toml
- packages/opentelemetry-instrumentation-qdrant/pyproject.toml
- packages/opentelemetry-instrumentation-openai/pyproject.toml
- packages/opentelemetry-instrumentation-voyageai/pyproject.toml
- packages/opentelemetry-instrumentation-bedrock/pyproject.toml
- packages/opentelemetry-instrumentation-replicate/pyproject.toml
- packages/opentelemetry-instrumentation-watsonx/pyproject.toml
- packages/sample-app/pyproject.toml
- packages/opentelemetry-instrumentation-openai-agents/pyproject.toml
- packages/opentelemetry-instrumentation-marqo/pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (8)
- packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.py
- packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/init.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.py
861d53f to
7cf46fe
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)
557-573: Test assertion for guardContent parts may be fragile.The test at lines 561-563 asserts that guardContent blocks become text parts with
json.dumps(block, default=str)as content. This tightly couples the test to the exact serialization format in_converse_content_to_parts. If the serialization changes (e.g., extracting the text from guardContent), this test will fail.Consider whether this is the intended behavior or if guardContent should extract the actual text content instead of serializing the entire block.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py` around lines 557 - 573, The test is too tightly coupled to the serialization used by _converse_content_to_parts: instead of asserting parts content equals json.dumps(block, default=str), change the assertions in tests/traces/test_titan.py (the checks around GenAIAttributes.GEN_AI_INPUT_MESSAGES) to accept either the serialized form or the extracted text form; e.g., for each guardContent block check parts[i]["type"] == "text" and then assert that parts[i]["content"] is either json.dumps(block, default=str) or equals block.get("text") (or contains the expected text substring), so the test won't break if _converse_content_to_parts changes to extract text rather than serialize the whole block.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 103-121: Modify the Anthropic messages handling to include any
top-level system prompt before serializing messages: when processing
request_body in the branch that handles "messages" (the block that builds
input_messages using _text_part and _anthropic_content_to_parts and calls
_set_span_attribute with GenAIAttributes.GEN_AI_INPUT_MESSAGES), first check for
a "system" key in request_body and, if present, prepend a message dict for the
system (role "system" with parts produced by _text_part or
_anthropic_content_to_parts as appropriate) to the input_messages list so the
system prompt is preserved in the span attribute; use the same pattern as
_set_amazon_input_span_attributes to detect and convert the system value before
json.dumps and calling _set_span_attribute.
---
Nitpick comments:
In `@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py`:
- Around line 557-573: The test is too tightly coupled to the serialization used
by _converse_content_to_parts: instead of asserting parts content equals
json.dumps(block, default=str), change the assertions in
tests/traces/test_titan.py (the checks around
GenAIAttributes.GEN_AI_INPUT_MESSAGES) to accept either the serialized form or
the extracted text form; e.g., for each guardContent block check
parts[i]["type"] == "text" and then assert that parts[i]["content"] is either
json.dumps(block, default=str) or equals block.get("text") (or contains the
expected text substring), so the test won't break if _converse_content_to_parts
changes to extract text rather than serialize the whole block.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 28844d79-da4b-43a6-9ec2-ac9b535be326
📒 Files selected for processing (15)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.pypackages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.pypackages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.pypackages/opentelemetry-instrumentation-bedrock/tests/test_semconv.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
✅ Files skipped from review due to trivial changes (1)
- packages/opentelemetry-instrumentation-bedrock/tests/test_semconv.py
🚧 Files skipped from review as they are similar to previous changes (6)
- packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.py
- packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/init.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py
0268c6d to
8b036b2
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (3)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (1)
404-415:⚠️ Potential issue | 🟠 Major
converse_streamspans still miss finish reasons in event mode.In the
should_emit_events()branch we only emit the choice log.set_converse_streaming_response_span_attributes(..., finish_reason=stop_reason)still runs only in theelsepath, so the same streamed response produces different span attributes depending on whether event logging is enabled.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py` around lines 404 - 415, The finishReason (stop_reason) isn't applied to converse_stream spans when events are enabled because set_converse_streaming_response_span_attributes(response_msg, role, span, finish_reason=stop_reason) is only called in the else branch; update the logic so that after determining stop_reason you always call set_converse_streaming_response_span_attributes with finish_reason=stop_reason (either before or after emit_streaming_converse_response_event), and keep emit_streaming_converse_response_event(event_logger, response_msg, role, stop_reason) in the should_emit_events() branch so both event emission and span attribute setting occur regardless of event mode.packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (2)
22-45:⚠️ Potential issue | 🟠 MajorNormalize the remaining Bedrock finish-reason variants.
This map still misses known values like AI21
endoftextand Bedrock/TitanCONTENT_FILTERED, and_map_finish_reason()is still case-sensitive. Those raw provider tokens will leak intogen_ai.response.finish_reasonsunchanged.💡 Proposed fix
BEDROCK_FINISH_REASON_MAP = { # Anthropic via Bedrock "end_turn": "stop", "stop_sequence": "stop", "tool_use": "tool_call", "max_tokens": "length", + # AI21 via Bedrock + "endoftext": "stop", # Cohere via Bedrock "COMPLETE": "stop", "TOOL_CALL": "tool_call", "MAX_TOKENS": "length", # Amazon Titan "FINISH": "stop", + "CONTENT_FILTERED": "content_filter", # Converse API "guardrail_intervened": "content_filter", } def _map_finish_reason(reason): """Map provider-specific finish reason to OTel GenAI enum value.""" if not reason: return None - return BEDROCK_FINISH_REASON_MAP.get(reason, reason) + reason = str(reason) + return ( + BEDROCK_FINISH_REASON_MAP.get(reason) + or BEDROCK_FINISH_REASON_MAP.get(reason.lower()) + or BEDROCK_FINISH_REASON_MAP.get(reason.upper()) + or reason + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py` around lines 22 - 45, BEDROCK_FINISH_REASON_MAP and _map_finish_reason are leaking raw provider tokens because some known variants (e.g., "endoftext", "CONTENT_FILTERED") are missing and the lookup is case-sensitive; update BEDROCK_FINISH_REASON_MAP to include the missing variants (at least "endoftext" -> "stop" and "CONTENT_FILTERED" -> "content_filter") and modify _map_finish_reason to normalize the incoming reason (e.g., return None for falsy, then lookup using reason_normalized = reason.strip().lower()) so keys match a lowercased map and unknown values fall back to the original normalized token or None as appropriate.
728-739:⚠️ Potential issue | 🟠 MajorKeep Bedrock
output.messageas a single assistant message.
response_body["output"]["message"]is already one message. Iteratingcontenthere creates one assistant message per block and drops non-text blocks because onlymsg.get("text")is read.💡 Proposed fix
elif "output" in response_body: fr = _map_finish_reason(response_body.get("stopReason")) - msgs = response_body.get("output").get("message", {}).get("content", []) - output_messages = [] - for msg in msgs: - output_messages.append(_output_message("assistant", [_text_part(msg.get("text"))], fr)) + message = response_body.get("output", {}).get("message", {}) + output_messages = [ + _output_message( + message.get("role", "assistant"), + _converse_content_to_parts(message.get("content", [])), + fr, + ) + ] _set_span_attribute( span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, json.dumps(output_messages), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py` around lines 728 - 739, The code treats response_body["output"]["message"] as multiple messages by iterating its "content" and only using msg.get("text"), which splits a single assistant message and drops non-text blocks; change the logic to create one assistant output message from response_body.get("output").get("message") (not per content block), build its parts by mapping each block in message.get("content", []) through _text_part but preserve non-text blocks (e.g., include their raw dict or serialize them) so _output_message("assistant", parts, fr) is called once, then set the GenAIAttributes.GEN_AI_OUTPUT_MESSAGES span attribute with that single message; reference functions/vars: _map_finish_reason, _text_part, _output_message, span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, response_body.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`:
- Around line 456-469: The renamed metric constants removed the old
LLM_BEDROCK_* identifiers causing AttributeError for downstream imports; restore
backward-compatible aliases by adding the old names as assignments pointing to
the new constants (e.g., define LLM_BEDROCK_GUARDRAIL_ACTIVATION =
GuardrailMeters.GEN_AI_BEDROCK_GUARDRAIL_ACTIVATION,
LLM_BEDROCK_GUARDRAIL_LATENCY =
GuardrailMeters.GEN_AI_BEDROCK_GUARDRAIL_LATENCY, etc., and
LLM_BEDROCK_PROMPT_CACHING = PromptCaching.GEN_AI_PROMPT_CACHING) so both
GuardrailMeters and PromptCaching keep the new names while the legacy
LLM_BEDROCK_* symbols remain available.
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 212-218: The code converts missing AI21 finishReason values into
the string "None" (via str(fr_data)) which then gets mapped/emitted as a
sentinel; change the logic in the AI21 handling to avoid serializing absent
values: for the block handling model_vendor == "ai21" (variables: response_body,
comp, fr_data, finish_reasons, _map_finish_reason) set raw =
comp.get("finishReason", {}).get("reason") if isinstance(fr_data, dict) else
(str(fr_data) if fr_data is not None else None) so that None stays None and is
skipped; apply the same defensive change to the other occurrence referenced in
the comment (lines ~547-555) so missing finishReason values are omitted rather
than serialized.
- Around line 990-1000: The streaming path in
set_converse_streaming_response_span_attributes currently flattens response into
a single text part via "".join(response), losing tool calls and non-text parts;
instead buffer and convert the streaming response into the same structured
message schema used by the non-streaming path: iterate the response items, map
each item to the appropriate part (using _text_part for text and whatever helper
logic the non-streaming path uses for tool_use/other parts), build the parts
list, then call _output_message(role, parts, fr) and json.dumps that result
before passing to _set_span_attribute; keep the existing use of
_map_finish_reason, _set_span_attribute, _output_message, _text_part and honor
should_send_prompts().
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py`:
- Line 75: The debug log currently emits the full accumulated streamed payload
via logger.debug("Accumulating body: %s", self._accumulating_body), which can
leak customer data and produce huge logs; change this to avoid logging raw
content—log a safe summary instead (e.g., length with
len(self._accumulating_body), a truncated preview with masking, or a
hash/checksum) and keep the logger call to reference logger.debug and
self._accumulating_body but only include the safe summary so no raw model/tool
payload is written to logs.
- Around line 52-60: The branch handling content_block_delta must correctly
handle streamed tool JSON fragments and thinking deltas: change the logic in the
content_block_delta branch (where _accumulating_body is updated) to (1) handle
delta.get("type") == "input_json_delta" by treating partial_json as either a
JSON fragment or string — if current.get("input") is a dict and partial_json
parses to a dict, merge keys; if partial_json is a string, append to a buffer
string (e.g., keep current["input_buffer"] or current["input"] as str) and only
attempt json.loads/merge inside a safe try/except so parsing errors don't raise
from _process_event; (2) add handling for delta.get("type") == "thinking_delta"
(or delta keys containing reasoning text) and append that text into a dedicated
field (e.g., current.setdefault("reasoning", "") and += delta_text) so reasoning
streams are preserved. Use symbol names from the diff: _accumulating_body,
content_block_delta branch, input_json_delta, thinking_delta, and ensure all
JSON parsing/merging is guarded to avoid raising exceptions inside the
`@dont_throw-wrapped` _process_event.
In `@packages/opentelemetry-instrumentation-bedrock/tests/test_event_emitter.py`:
- Around line 117-132: The test test_missing_finish_reason_not_unknown currently
checks logger output with assert log_record.body.get("finish_reason") !=
"unknown", which still passes if the emitter writes None/""; change this to
assert "finish_reason" not in log_record.body to explicitly require omission,
and apply the same change to the other tests in this file that use != "unknown"
for finish_reason; locate usages around emit_choice_events, logger.emit and
log_record.body to update the assertions accordingly.
---
Duplicate comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`:
- Around line 404-415: The finishReason (stop_reason) isn't applied to
converse_stream spans when events are enabled because
set_converse_streaming_response_span_attributes(response_msg, role, span,
finish_reason=stop_reason) is only called in the else branch; update the logic
so that after determining stop_reason you always call
set_converse_streaming_response_span_attributes with finish_reason=stop_reason
(either before or after emit_streaming_converse_response_event), and keep
emit_streaming_converse_response_event(event_logger, response_msg, role,
stop_reason) in the should_emit_events() branch so both event emission and span
attribute setting occur regardless of event mode.
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 22-45: BEDROCK_FINISH_REASON_MAP and _map_finish_reason are
leaking raw provider tokens because some known variants (e.g., "endoftext",
"CONTENT_FILTERED") are missing and the lookup is case-sensitive; update
BEDROCK_FINISH_REASON_MAP to include the missing variants (at least "endoftext"
-> "stop" and "CONTENT_FILTERED" -> "content_filter") and modify
_map_finish_reason to normalize the incoming reason (e.g., return None for
falsy, then lookup using reason_normalized = reason.strip().lower()) so keys
match a lowercased map and unknown values fall back to the original normalized
token or None as appropriate.
- Around line 728-739: The code treats response_body["output"]["message"] as
multiple messages by iterating its "content" and only using msg.get("text"),
which splits a single assistant message and drops non-text blocks; change the
logic to create one assistant output message from
response_body.get("output").get("message") (not per content block), build its
parts by mapping each block in message.get("content", []) through _text_part but
preserve non-text blocks (e.g., include their raw dict or serialize them) so
_output_message("assistant", parts, fr) is called once, then set the
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES span attribute with that single message;
reference functions/vars: _map_finish_reason, _text_part, _output_message, span,
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, response_body.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7cbfa3b1-7c19-4343-ab25-d0c40edc8e5f
📒 Files selected for processing (21)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_models.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.pypackages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.pypackages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.pypackages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_prompt_caching_metrics.pypackages/opentelemetry-instrumentation-bedrock/tests/test_content_parts.pypackages/opentelemetry-instrumentation-bedrock/tests/test_event_emitter.pypackages/opentelemetry-instrumentation-bedrock/tests/test_semconv.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
✅ Files skipped from review due to trivial changes (2)
- packages/opentelemetry-instrumentation-bedrock/tests/test_semconv.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_models.py
🚧 Files skipped from review as they are similar to previous changes (8)
- packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.py
- packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
| class GuardrailMeters: | ||
| LLM_BEDROCK_GUARDRAIL_ACTIVATION = "gen_ai.bedrock.guardrail.activation" | ||
| LLM_BEDROCK_GUARDRAIL_LATENCY = "gen_ai.bedrock.guardrail.latency" | ||
| LLM_BEDROCK_GUARDRAIL_COVERAGE = "gen_ai.bedrock.guardrail.coverage" | ||
| LLM_BEDROCK_GUARDRAIL_SENSITIVE = "gen_ai.bedrock.guardrail.sensitive_info" | ||
| LLM_BEDROCK_GUARDRAIL_TOPICS = "gen_ai.bedrock.guardrail.topics" | ||
| LLM_BEDROCK_GUARDRAIL_CONTENT = "gen_ai.bedrock.guardrail.content" | ||
| LLM_BEDROCK_GUARDRAIL_WORDS = "gen_ai.bedrock.guardrail.words" | ||
| GEN_AI_BEDROCK_GUARDRAIL_ACTIVATION = "gen_ai.bedrock.guardrail.activation" | ||
| GEN_AI_BEDROCK_GUARDRAIL_LATENCY = "gen_ai.bedrock.guardrail.latency" | ||
| GEN_AI_BEDROCK_GUARDRAIL_COVERAGE = "gen_ai.bedrock.guardrail.coverage" | ||
| GEN_AI_BEDROCK_GUARDRAIL_SENSITIVE = "gen_ai.bedrock.guardrail.sensitive_info" | ||
| GEN_AI_BEDROCK_GUARDRAIL_TOPICS = "gen_ai.bedrock.guardrail.topics" | ||
| GEN_AI_BEDROCK_GUARDRAIL_CONTENT = "gen_ai.bedrock.guardrail.content" | ||
| GEN_AI_BEDROCK_GUARDRAIL_WORDS = "gen_ai.bedrock.guardrail.words" | ||
|
|
||
|
|
||
| class PromptCaching: | ||
| # will be moved under the AI SemConv. Not namespaced since also OpenAI supports this. | ||
| LLM_BEDROCK_PROMPT_CACHING = "gen_ai.prompt.caching" | ||
| GEN_AI_PROMPT_CACHING = "gen_ai.prompt.caching" | ||
|
|
There was a problem hiding this comment.
Keep backward-compatible aliases for the renamed metric constants.
GuardrailMeters and PromptCaching live in the package root, and the old LLM_BEDROCK_* names disappear completely here. That turns an otherwise additive semconv migration into an AttributeError for downstream code that imports those constants.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`
around lines 456 - 469, The renamed metric constants removed the old
LLM_BEDROCK_* identifiers causing AttributeError for downstream imports; restore
backward-compatible aliases by adding the old names as assignments pointing to
the new constants (e.g., define LLM_BEDROCK_GUARDRAIL_ACTIVATION =
GuardrailMeters.GEN_AI_BEDROCK_GUARDRAIL_ACTIVATION,
LLM_BEDROCK_GUARDRAIL_LATENCY =
GuardrailMeters.GEN_AI_BEDROCK_GUARDRAIL_LATENCY, etc., and
LLM_BEDROCK_PROMPT_CACHING = PromptCaching.GEN_AI_PROMPT_CACHING) so both
GuardrailMeters and PromptCaching keep the new names while the legacy
LLM_BEDROCK_* symbols remain available.
|
|
||
| def _accumulate_events(self, event): | ||
| print(self._accumulating_body) | ||
| logger.debug("Accumulating body: %s", self._accumulating_body) |
There was a problem hiding this comment.
Don’t log raw streamed payloads from the instrumentation.
Line 75 writes the full accumulated response/tool payload into application logs on every chunk. In this wrapper that includes model output and tool inputs, so enabling debug leaks customer content and can create very large log lines for long or multimodal responses.
💡 Safer logging
- logger.debug("Accumulating body: %s", self._accumulating_body)
+ logger.debug(
+ "Accumulating Bedrock stream chunk: event_keys=%s accumulated_keys=%s",
+ list(event.keys()),
+ list(self._accumulating_body.keys()),
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py`
at line 75, The debug log currently emits the full accumulated streamed payload
via logger.debug("Accumulating body: %s", self._accumulating_body), which can
leak customer data and produce huge logs; change this to avoid logging raw
content—log a safe summary instead (e.g., length with
len(self._accumulating_body), a truncated preview with masking, or a
hash/checksum) and keep the logger call to reference logger.debug and
self._accumulating_body but only include the safe summary so no raw model/tool
payload is written to logs.
| def test_missing_finish_reason_not_unknown(self, mock_prompts, mock_events): | ||
| """#16: Missing finish_reason should NOT be 'unknown'.""" | ||
| response_body = { | ||
| "completion": "Hello!", | ||
| # no stop_reason key | ||
| } | ||
| response = {"body": MagicMock()} | ||
| response["body"].read.return_value = json.dumps(response_body).encode() | ||
|
|
||
| logger = MagicMock() | ||
| emit_choice_events(logger, response) | ||
|
|
||
| logger.emit.assert_called_once() | ||
| log_record = logger.emit.call_args[0][0] | ||
| # Should NOT be "unknown" | ||
| assert log_record.body.get("finish_reason") != "unknown" |
There was a problem hiding this comment.
Assert omission explicitly in the None finish-reason cases.
These tests are supposed to lock in “omit finish_reason when absent”, but != "unknown" still passes if the emitter writes None, "", or any other unexpected value. Checking for key absence would catch the regression this file describes.
💡 Proposed fix
- assert log_record.body.get("finish_reason") != "unknown"
+ assert "finish_reason" not in log_record.bodyAlso applies to: 186-194, 212-219
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/opentelemetry-instrumentation-bedrock/tests/test_event_emitter.py`
around lines 117 - 132, The test test_missing_finish_reason_not_unknown
currently checks logger output with assert log_record.body.get("finish_reason")
!= "unknown", which still passes if the emitter writes None/""; change this to
assert "finish_reason" not in log_record.body to explicitly require omission,
and apply the same change to the other tests in this file that use != "unknown"
for finish_reason; locate usages around emit_choice_events, logger.emit and
log_record.body to update the assertions accordingly.
8b036b2 to
ee6e754
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py (1)
68-79:⚠️ Potential issue | 🟠 MajorHandle AI21
finishReasonvalues that are already strings.Lines 77-79 still call
.get("reason")unconditionally. The new regression coverage inpackages/opentelemetry-instrumentation-bedrock/tests/test_review_fixes.pyincludes string-valuedfinishReasons, so this branch can raiseAttributeErrorand break choice-event emission when events are enabled.💡 Suggested change
if response_body.get("completions") is not None: for i, message in enumerate(response_body.get("completions")): + finish_reason = message.get("finishReason") + if isinstance(finish_reason, dict): + finish_reason = finish_reason.get("reason") emit_event( ChoiceEvent( index=i, message={ "content": message.get("data", {}).get("text"), "role": "assistant", }, - finish_reason=_map_finish_reason( - message.get("finishReason", {}).get("reason") - ), + finish_reason=_map_finish_reason(finish_reason), ), event_logger, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py` around lines 68 - 79, The code is calling .get("reason") on response_body completions' finishReason which may already be a string; update the ChoiceEvent creation in the loop that emits completions so _map_finish_reason is passed the correct value: inspect message.get("finishReason") and if it's a dict use finishReason.get("reason") otherwise use the finishReason value (or None) before calling _map_finish_reason; adjust the enumeration loop around ChoiceEvent and the argument to _map_finish_reason accordingly so both dict and string finishReason cases are handled safely.
♻️ Duplicate comments (2)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py (2)
75-75:⚠️ Potential issue | 🟠 MajorDon’t debug-log raw streamed payloads.
Line 75 writes the full accumulated Bedrock payload on every chunk. That leaks prompts, completions, and tool inputs into logs and can produce very large log lines for long streams.
💡 Safer logging
- logger.debug("Accumulating body: %s", self._accumulating_body) + logger.debug( + "Accumulating Bedrock stream chunk: event_keys=%s accumulated_keys=%s", + list(event.keys()), + list(self._accumulating_body.keys()), + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py` at line 75, The debug statement is logging raw streamed payloads via logger.debug("Accumulating body: %s", self._accumulating_body) which leaks sensitive content; update the logging in the streaming wrapper (where self._accumulating_body is used—e.g., StreamingWrapper or the method handling chunks) to avoid printing the full payload and instead log non-sensitive metadata such as the buffer length or a masked/hashed summary (for example, use logger.debug("Accumulating body length=%d", len(self._accumulating_body)) or log a fixed placeholder), removing any direct inclusion of self._accumulating_body in logs.
52-60:⚠️ Potential issue | 🟠 MajorHandle streamed tool JSON/reasoning deltas without assuming string state.
Lines 58-60 still assume the active content block already holds string fields. For tool-use blocks that start with
input: {}, Line 60 becomesdict += str; because_process_event()is wrapped in@dont_throw, the failure is silent and the accumulated payload is incomplete.💡 Suggested hardening
elif type == "content_block_delta": delta = decoded_chunk.get("delta", {}) - if delta.get("text") is not None: - self._accumulating_body["content"][-1]["text"] += delta["text"] + current = self._accumulating_body["content"][-1] + if delta.get("text") is not None: + if not isinstance(current.get("text"), str): + current["text"] = "" + current["text"] += delta["text"] elif delta.get("type") == "input_json_delta": partial_json = delta.get("partial_json", "") - current = self._accumulating_body["content"][-1] - current.setdefault("input", "") + if not isinstance(current.get("input"), str): + current["input"] = "" current["input"] += partial_json + elif delta.get("type") == "thinking_delta": + if not isinstance(current.get("thinking"), str): + current["thinking"] = "" + current["thinking"] += delta.get("thinking", "")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py` around lines 52 - 60, In the "content_block_delta" branch ensure the target field is a string before concatenating: when handling delta.get("type") == "input_json_delta" fetch partial_json and the current block via current = self._accumulating_body["content"][-1], then guard current["input"] so it is a string (e.g., if the key is missing or not isinstance(current["input"], str) set it to "" or coerce with str()) before doing current["input"] += partial_json; update the logic in the same block where partial_json is appended so dict-plus-str errors are avoided and accumulation works for tool-use blocks.
🧹 Nitpick comments (3)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (3)
358-369: Remove dead code:finish_reasonslist is unused.The
finish_reasonslist is populated but never used. The actual span attribute is set by_set_finish_reasons_unconditionally()which is called before this function.♻️ Proposed fix
def _set_generations_span_attributes(span, response_body): output_messages = [] - finish_reasons = [] for generation in response_body.get("generations"): fr = _map_finish_reason(generation.get("finish_reason")) - finish_reasons.append(fr) output_messages.append(_output_message("assistant", [_text_part(generation.get("text"))], fr)) _set_span_attribute( span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, json.dumps(output_messages), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py` around lines 358 - 369, In _set_generations_span_attributes, remove the unused finish_reasons list and the finish_reasons.append(fr) call (leave the local fr variable and its usage in _output_message intact) because finish reasons are set unconditionally elsewhere by _set_finish_reasons_unconditionally(); update _set_generations_span_attributes to not allocate or append to finish_reasons (only build output_messages and set the span attribute with json.dumps(output_messages)).
548-562: Remove dead code:finish_reasonslist is unused.Same pattern as
_set_generations_span_attributes— thefinish_reasonslist is built but never used since the attribute is set by_set_finish_reasons_unconditionally().♻️ Proposed fix
def _set_span_completions_attributes(span, response_body): output_messages = [] - finish_reasons = [] for completion in response_body.get("completions"): fr_data = completion.get("finishReason", {}) raw_reason = fr_data.get("reason") if isinstance(fr_data, dict) else str(fr_data) or None fr = _map_finish_reason(raw_reason) - if fr: - finish_reasons.append(fr) output_messages.append(_output_message("assistant", [_text_part(completion.get("data").get("text"))], fr)) _set_span_attribute( span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, json.dumps(output_messages), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py` around lines 548 - 562, In _set_span_completions_attributes remove the dead finish_reasons aggregation: delete the finish_reasons = [] declaration and the finish_reasons.append(fr) call inside the loop (we still need to call _map_finish_reason(raw_reason) to produce fr and pass fr into _output_message), since finish_reasons is never used and finish reasons are set elsewhere by _set_finish_reasons_unconditionally(); keep the rest of the loop building output_messages and the subsequent _set_span_attribute call for GenAIAttributes.GEN_AI_OUTPUT_MESSAGES.
704-716: Remove dead code:finish_reasonslist is unused in the"results"branch.Same pattern as other response handlers —
finish_reasonsis populated but never used.♻️ Proposed fix
def _set_amazon_response_span_attributes(span, response_body): if "results" in response_body: output_messages = [] - finish_reasons = [] for result in response_body.get("results"): fr = _map_finish_reason(result.get("completionReason")) - finish_reasons.append(fr) output_messages.append(_output_message("assistant", [_text_part(result.get("outputText"))], fr))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py` around lines 704 - 716, In _set_amazon_response_span_attributes remove the unused finish_reasons list and its population: eliminate the variable finish_reasons and the line that appends fr (finish_reasons.append(fr)), but keep mapping the completion reason via _map_finish_reason for constructing output_messages; ensure output_messages is still built by calling _output_message("assistant", [_text_part(result.get("outputText"))], fr) and that only output_messages is JSON-dumped into the GenAIAttributes.GEN_AI_OUTPUT_MESSAGES span attribute.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/sample-app/sample_app/bedrock_example_app.py`:
- Around line 7-8: The sample app currently hardcodes a local AWS profile via
session = boto3.Session(profile_name="stg") which breaks CI, containers, and
IAM-role environments; replace this by relying on the default AWS
credential/provider chain (i.e., instantiate boto3.Session() with no
profile_name) or read the profile from the AWS_PROFILE environment variable
before passing it (e.g., profile = os.getenv("AWS_PROFILE");
boto3.Session(profile_name=profile) only when profile is set), and keep creating
the client with brt = session.client(service_name='bedrock-runtime') as before.
---
Outside diff comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py`:
- Around line 68-79: The code is calling .get("reason") on response_body
completions' finishReason which may already be a string; update the ChoiceEvent
creation in the loop that emits completions so _map_finish_reason is passed the
correct value: inspect message.get("finishReason") and if it's a dict use
finishReason.get("reason") otherwise use the finishReason value (or None) before
calling _map_finish_reason; adjust the enumeration loop around ChoiceEvent and
the argument to _map_finish_reason accordingly so both dict and string
finishReason cases are handled safely.
---
Duplicate comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py`:
- Line 75: The debug statement is logging raw streamed payloads via
logger.debug("Accumulating body: %s", self._accumulating_body) which leaks
sensitive content; update the logging in the streaming wrapper (where
self._accumulating_body is used—e.g., StreamingWrapper or the method handling
chunks) to avoid printing the full payload and instead log non-sensitive
metadata such as the buffer length or a masked/hashed summary (for example, use
logger.debug("Accumulating body length=%d", len(self._accumulating_body)) or log
a fixed placeholder), removing any direct inclusion of self._accumulating_body
in logs.
- Around line 52-60: In the "content_block_delta" branch ensure the target field
is a string before concatenating: when handling delta.get("type") ==
"input_json_delta" fetch partial_json and the current block via current =
self._accumulating_body["content"][-1], then guard current["input"] so it is a
string (e.g., if the key is missing or not isinstance(current["input"], str) set
it to "" or coerce with str()) before doing current["input"] += partial_json;
update the logic in the same block where partial_json is appended so
dict-plus-str errors are avoided and accumulation works for tool-use blocks.
---
Nitpick comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 358-369: In _set_generations_span_attributes, remove the unused
finish_reasons list and the finish_reasons.append(fr) call (leave the local fr
variable and its usage in _output_message intact) because finish reasons are set
unconditionally elsewhere by _set_finish_reasons_unconditionally(); update
_set_generations_span_attributes to not allocate or append to finish_reasons
(only build output_messages and set the span attribute with
json.dumps(output_messages)).
- Around line 548-562: In _set_span_completions_attributes remove the dead
finish_reasons aggregation: delete the finish_reasons = [] declaration and the
finish_reasons.append(fr) call inside the loop (we still need to call
_map_finish_reason(raw_reason) to produce fr and pass fr into _output_message),
since finish_reasons is never used and finish reasons are set elsewhere by
_set_finish_reasons_unconditionally(); keep the rest of the loop building
output_messages and the subsequent _set_span_attribute call for
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES.
- Around line 704-716: In _set_amazon_response_span_attributes remove the unused
finish_reasons list and its population: eliminate the variable finish_reasons
and the line that appends fr (finish_reasons.append(fr)), but keep mapping the
completion reason via _map_finish_reason for constructing output_messages;
ensure output_messages is still built by calling _output_message("assistant",
[_text_part(result.get("outputText"))], fr) and that only output_messages is
JSON-dumped into the GenAIAttributes.GEN_AI_OUTPUT_MESSAGES span attribute.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8d8d7e5d-9d5b-4ab7-8540-88fe1288cbf2
⛔ Files ignored due to path filters (1)
packages/opentelemetry-instrumentation-bedrock/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (23)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_models.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.pypackages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.pypackages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.pypackages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_prompt_caching_metrics.pypackages/opentelemetry-instrumentation-bedrock/tests/test_content_parts.pypackages/opentelemetry-instrumentation-bedrock/tests/test_event_emitter.pypackages/opentelemetry-instrumentation-bedrock/tests/test_review_fixes.pypackages/opentelemetry-instrumentation-bedrock/tests/test_semconv.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.pypackages/sample-app/sample_app/bedrock_example_app.py
✅ Files skipped from review due to trivial changes (3)
- packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.py
- packages/opentelemetry-instrumentation-bedrock/tests/test_semconv.py
- packages/opentelemetry-instrumentation-bedrock/tests/test_content_parts.py
🚧 Files skipped from review as they are similar to previous changes (11)
- packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_prompt_caching_metrics.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_models.py
- packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
- packages/opentelemetry-instrumentation-bedrock/tests/test_event_emitter.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/init.py
| session = boto3.Session(profile_name="stg") | ||
| brt = session.client(service_name='bedrock-runtime') |
There was a problem hiding this comment.
Don’t hardcode a developer-local AWS profile in the sample app.
Lines 7-8 make this example fail anywhere the stg profile is missing, including CI, containers, and IAM-role based environments. Prefer the default credential chain, or read the profile from AWS_PROFILE instead of pinning it in code.
💡 Suggested change
+import os
import boto3
@@
-session = boto3.Session(profile_name="stg")
-brt = session.client(service_name='bedrock-runtime')
+session = boto3.Session(profile_name=os.getenv("AWS_PROFILE"))
+brt = session.client(service_name="bedrock-runtime")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| session = boto3.Session(profile_name="stg") | |
| brt = session.client(service_name='bedrock-runtime') | |
| import os | |
| import boto3 | |
| session = boto3.Session(profile_name=os.getenv("AWS_PROFILE")) | |
| brt = session.client(service_name="bedrock-runtime") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/sample-app/sample_app/bedrock_example_app.py` around lines 7 - 8,
The sample app currently hardcodes a local AWS profile via session =
boto3.Session(profile_name="stg") which breaks CI, containers, and IAM-role
environments; replace this by relying on the default AWS credential/provider
chain (i.e., instantiate boto3.Session() with no profile_name) or read the
profile from the AWS_PROFILE environment variable before passing it (e.g.,
profile = os.getenv("AWS_PROFILE"); boto3.Session(profile_name=profile) only
when profile is set), and keep creating the client with brt =
session.client(service_name='bedrock-runtime') as before.
0e8bd63 to
1b94be4
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)
411-415:⚠️ Potential issue | 🔴 CriticalStreaming implementation doesn't extract
finish_reasonfor Titan responses.The
test_titan_invoke_stream_with_events_with_contenttest is missingfinish_reasonin thechoice_eventassertion (lines 411-415), but this reflects an underlying implementation issue rather than a test gap.The problem:
emit_streaming_response_event()inevent_emitter.py(line 194) only checks forresponse_body.get("stop_reason"), but Titan streaming responses usecompletionReasonorstopReason(notstop_reason). This causes_map_finish_reason(None)to be called, resulting inNonefor the finish_reason field.In contrast, the non-streaming path correctly handles Titan via
_set_finish_reasons_unconditionally()(span_utils.py lines 227-240), which is why non-streaming Titan tests include"finish_reason": "stop"(line 168).Fix the streaming implementation to be model-aware when extracting finish_reason, similar to how the span attributes handler works.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py` around lines 411 - 415, The streaming path is not model-aware when extracting finish_reason in emit_streaming_response_event; update emit_streaming_response_event to extract finish_reason the same way span_utils._set_finish_reasons_unconditionally does for non-streaming flows instead of only checking response_body.get("stop_reason"): detect Titan-style keys ("completionReason", "stopReason") or consult the model identifier and call the same mapping logic used by _map_finish_reason/_set_finish_reasons_unconditionally so that _map_finish_reason is given the correct raw token (e.g., "stop" for Titan) before emitting the choice event.
♻️ Duplicate comments (4)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py (2)
75-75:⚠️ Potential issue | 🟠 MajorDon’t log the raw accumulated streamed payload.
This writes model output and tool inputs into application logs on every chunk. That is both a privacy leak and an easy way to create huge log lines on long streams.
💡 Safer logging
- logger.debug("Accumulating body: %s", self._accumulating_body) + logger.debug( + "Accumulating Bedrock stream chunk: accumulated_keys=%s", + list(self._accumulating_body.keys()), + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py` at line 75, The debug log currently prints the raw accumulated stream payload (self._accumulating_body), which can leak sensitive model/tool data and create huge log entries; change the logger.debug call in streaming_wrapper.py to avoid logging raw content—log a safe summary instead (e.g., the length via len(self._accumulating_body) or a truncated/masked excerpt), or remove the debug entirely, updating the logger.debug usage that references self._accumulating_body accordingly.
52-60:⚠️ Potential issue | 🟠 MajorHandle streamed tool JSON and reasoning deltas without type errors.
Line 60 can still hit
dict += strwhen the current block already has a structuredinput, and this branch still drops reasoning/thinking deltas entirely. Because_process_event()is wrapped in@dont_throw, those failures degrade the emitted payload silently instead of surfacing.💡 Safer delta accumulation
elif type == "content_block_delta": delta = decoded_chunk.get("delta", {}) + current = self._accumulating_body["content"][-1] if delta.get("text") is not None: - self._accumulating_body["content"][-1]["text"] += delta["text"] + if not isinstance(current.get("text"), str): + current["text"] = "" + current["text"] += delta["text"] elif delta.get("type") == "input_json_delta": partial_json = delta.get("partial_json", "") - current = self._accumulating_body["content"][-1] - current.setdefault("input", "") + if not isinstance(current.get("input"), str): + current["input"] = "" current["input"] += partial_json + elif delta.get("type") == "thinking_delta": + if not isinstance(current.get("thinking"), str): + current["thinking"] = "" + current["thinking"] += delta.get("thinking", "")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py` around lines 52 - 60, The branch handling type == "content_block_delta" can perform dict += str and drops reasoning-style deltas; update the input_json_delta handling in the same block (the accumulation that touches self._accumulating_body["content"][-1] and current["input"]) to: detect whether current.get("input") is a dict or string, try to json.loads(partial_json) and if that succeeds merge (update) into the dict, otherwise append the raw partial_json to a string field (or to a dedicated fallback key) so you never do dict += str; also preserve and accumulate non-input deltas (e.g., reasoning/thinking) by handling delta.get("type") values other than "input_json_delta" instead of dropping them; wrap parsing/merging in a narrow try/except so failures are handled locally and do not perform an invalid in-place addition (affecting _process_event via dont_throw).packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (2)
724-734:⚠️ Potential issue | 🟠 MajorPreserve structured Converse output instead of flattening/splitting it.
response_body["output"]["message"]is already a single Bedrock message, but Lines 726-734 split it into one assistant message per block and drop non-text blocks. Lines 1003-1006 go the other direction and flatten streaming output into a single text part. Both paths breakgen_ai.output.messages, so spans can reportfinish_reason="tool_call"while emitting no tool-call payload.💡 Suggested direction
elif "output" in response_body: fr = _map_finish_reason(response_body.get("stopReason")) - msgs = response_body.get("output").get("message", {}).get("content", []) - output_messages = [] - for msg in msgs: - output_messages.append(_output_message("assistant", [_text_part(msg.get("text"))], fr)) + message = response_body.get("output", {}).get("message", {}) + output_messages = [ + _output_message( + message.get("role", "assistant"), + _converse_content_to_parts(message.get("content", [])), + fr, + ) + ] _set_span_attribute( span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, json.dumps(output_messages), ) @@ -def set_converse_streaming_response_span_attributes(response, role, span, finish_reason=None): +def set_converse_streaming_response_span_attributes(content_blocks, role, span, finish_reason=None): fr = _map_finish_reason(finish_reason) if fr: _set_span_attribute(span, GenAIAttributes.GEN_AI_RESPONSE_FINISH_REASONS, (fr,)) if not should_send_prompts(): return _set_span_attribute( span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, - json.dumps([_output_message(role, [_text_part("".join(response))], fr)]), + json.dumps([_output_message(role, _converse_content_to_parts(content_blocks), fr)]), )The streaming caller also needs to buffer structured content blocks instead of a text-only list.
Based on learnings: Follow the OpenTelemetry GenAI semantic specification.
Also applies to: 997-1006
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py` around lines 724 - 734, The code currently flattens and drops non-text blocks when building gen_ai.output.messages; instead, when handling response_body["output"]["message"] (in the branch that uses _map_finish_reason and _output_message) stop splitting into one assistant message per text block and preserve the original structured message content array (including non-text blocks) as the content field passed into _output_message; similarly change the streaming assembly logic (the streaming path around the _text_part usage) to buffer and concatenate structured content blocks rather than only text parts so that _set_span_attribute(…, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, …) receives the full structured message objects and finish_reason/tool_call payloads remain intact.
22-46:⚠️ Potential issue | 🟠 MajorNormalize the remaining AWS finish reasons here.
AWS currently documents Converse
stopReasonvalues includingcontent_filtered, and Titan TextcompletionReasonvalues includingFINISHED,LENGTH,STOP_CRITERIA_MET, andCONTENT_FILTERED. None of those normalize today, soGEN_AI_RESPONSE_FINISH_REASONScan still leak provider-specific tokens instead of semconv values. (docs.aws.amazon.com)💡 Suggested normalization update
BEDROCK_FINISH_REASON_MAP = { # Anthropic via Bedrock "end_turn": "stop", "stop_sequence": "stop", "tool_use": "tool_call", "max_tokens": "length", # Cohere via Bedrock "COMPLETE": "stop", "TOOL_CALL": "tool_call", "MAX_TOKENS": "length", "ERROR": "error", # Amazon Titan "FINISH": "stop", + "FINISHED": "stop", + "LENGTH": "length", + "STOP_CRITERIA_MET": "stop", + "CONTENT_FILTERED": "content_filter", # Converse API "guardrail_intervened": "content_filter", + "content_filtered": "content_filter", } def _map_finish_reason(reason): """Map provider-specific finish reason to OTel GenAI enum value.""" if not reason: return None - return BEDROCK_FINISH_REASON_MAP.get(reason, reason) + normalized = str(reason) + return ( + BEDROCK_FINISH_REASON_MAP.get(normalized) + or BEDROCK_FINISH_REASON_MAP.get(normalized.lower()) + or BEDROCK_FINISH_REASON_MAP.get(normalized.upper()) + or normalized + )Based on learnings: Follow the OpenTelemetry GenAI semantic specification.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py` around lines 22 - 46, Add the missing AWS/Titan/Converse finish-reason normalizations to BEDROCK_FINISH_REASON_MAP so provider tokens map to OTel GenAI semconv values (use keys like "content_filtered" and "CONTENT_FILTERED" -> "content_filter"; "FINISHED" and "STOP_CRITERIA_MET" -> "stop"; "LENGTH" -> "length"), leaving _map_finish_reason intact (it already uses BEDROCK_FINISH_REASON_MAP.get). Ensure you include both case variants used by different providers and follow the OpenTelemetry GenAI semantic spec for the target values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py`:
- Around line 411-415: The streaming path is not model-aware when extracting
finish_reason in emit_streaming_response_event; update
emit_streaming_response_event to extract finish_reason the same way
span_utils._set_finish_reasons_unconditionally does for non-streaming flows
instead of only checking response_body.get("stop_reason"): detect Titan-style
keys ("completionReason", "stopReason") or consult the model identifier and call
the same mapping logic used by
_map_finish_reason/_set_finish_reasons_unconditionally so that
_map_finish_reason is given the correct raw token (e.g., "stop" for Titan)
before emitting the choice event.
---
Duplicate comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 724-734: The code currently flattens and drops non-text blocks
when building gen_ai.output.messages; instead, when handling
response_body["output"]["message"] (in the branch that uses _map_finish_reason
and _output_message) stop splitting into one assistant message per text block
and preserve the original structured message content array (including non-text
blocks) as the content field passed into _output_message; similarly change the
streaming assembly logic (the streaming path around the _text_part usage) to
buffer and concatenate structured content blocks rather than only text parts so
that _set_span_attribute(…, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, …) receives
the full structured message objects and finish_reason/tool_call payloads remain
intact.
- Around line 22-46: Add the missing AWS/Titan/Converse finish-reason
normalizations to BEDROCK_FINISH_REASON_MAP so provider tokens map to OTel GenAI
semconv values (use keys like "content_filtered" and "CONTENT_FILTERED" ->
"content_filter"; "FINISHED" and "STOP_CRITERIA_MET" -> "stop"; "LENGTH" ->
"length"), leaving _map_finish_reason intact (it already uses
BEDROCK_FINISH_REASON_MAP.get). Ensure you include both case variants used by
different providers and follow the OpenTelemetry GenAI semantic spec for the
target values.
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.py`:
- Line 75: The debug log currently prints the raw accumulated stream payload
(self._accumulating_body), which can leak sensitive model/tool data and create
huge log entries; change the logger.debug call in streaming_wrapper.py to avoid
logging raw content—log a safe summary instead (e.g., the length via
len(self._accumulating_body) or a truncated/masked excerpt), or remove the debug
entirely, updating the logger.debug usage that references
self._accumulating_body accordingly.
- Around line 52-60: The branch handling type == "content_block_delta" can
perform dict += str and drops reasoning-style deltas; update the
input_json_delta handling in the same block (the accumulation that touches
self._accumulating_body["content"][-1] and current["input"]) to: detect whether
current.get("input") is a dict or string, try to json.loads(partial_json) and if
that succeeds merge (update) into the dict, otherwise append the raw
partial_json to a string field (or to a dedicated fallback key) so you never do
dict += str; also preserve and accumulate non-input deltas (e.g.,
reasoning/thinking) by handling delta.get("type") values other than
"input_json_delta" instead of dropping them; wrap parsing/merging in a narrow
try/except so failures are handled locally and do not perform an invalid
in-place addition (affecting _process_event via dont_throw).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2192636d-7915-4d05-b45a-c6a0eedd5691
⛔ Files ignored due to path filters (1)
packages/opentelemetry-instrumentation-bedrock/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (23)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_models.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/streaming_wrapper.pypackages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.pypackages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.pypackages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_prompt_caching_metrics.pypackages/opentelemetry-instrumentation-bedrock/tests/test_content_parts.pypackages/opentelemetry-instrumentation-bedrock/tests/test_event_emitter.pypackages/opentelemetry-instrumentation-bedrock/tests/test_review_fixes.pypackages/opentelemetry-instrumentation-bedrock/tests/test_semconv.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.pypackages/sample-app/sample_app/bedrock_example_app.py
✅ Files skipped from review due to trivial changes (5)
- packages/opentelemetry-instrumentation-bedrock/tests/test_semconv.py
- packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_prompt_caching_metrics.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_models.py
- packages/opentelemetry-instrumentation-bedrock/tests/test_event_emitter.py
- packages/opentelemetry-instrumentation-bedrock/tests/test_review_fixes.py
🚧 Files skipped from review as they are similar to previous changes (6)
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
- packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
- packages/sample-app/sample_app/bedrock_example_app.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/event_emitter.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/init.py
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (1)
619-641:⚠️ Potential issue | 🟠 MajorUse
chatfor Amazon/Nova message-basedinvoke_modelspans.This code unconditionally emits
gen_ai.operation.name = text_completioneven when the request is message-based (contains amessageskey with role/content pairs). Role-based chat requests should usechatper the OpenTelemetry GenAI semantic specification, consistent with existing Nova test assertions and other instrumentations in the codebase.Suggested fix
def _set_amazon_span_attributes( span, request_body, response_body, headers, metric_params ): + operation_name = ( + GenAiOperationNameValues.CHAT.value + if "messages" in request_body + else GenAiOperationNameValues.TEXT_COMPLETION.value + ) _set_span_attribute( - span, GenAIAttributes.GEN_AI_OPERATION_NAME, GenAiOperationNameValues.TEXT_COMPLETION.value + span, GenAIAttributes.GEN_AI_OPERATION_NAME, operation_name )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py` around lines 619 - 641, Change the unconditional setting of GenAIAttributes.GEN_AI_OPERATION_NAME to choose "chat" when the request_body contains a messages-based request: inspect request_body for a "messages" key (and/or role/content structure) before calling _set_span_attribute(span, GenAIAttributes.GEN_AI_OPERATION_NAME, ...); if messages are present set the value to the chat enum (instead of GenAiOperationNameValues.TEXT_COMPLETION.value), otherwise keep TEXT_COMPLETION; keep the existing handling of textGenerationConfig/inferenceConfig for topP/temperature/max tokens unchanged.
♻️ Duplicate comments (2)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (2)
726-736:⚠️ Potential issue | 🟠 MajorKeep
output.messageas one assistant message with multiple parts.
response_body["output"]["message"]is already a single Bedrock message. Splitting each content block into a separateGEN_AI_OUTPUT_MESSAGESentry breaks message boundaries, and any non-text block here becomes a bogus text part because onlymsg.get("text")is read.💡 Suggested fix
elif "output" in response_body: fr = _map_finish_reason(response_body.get("stopReason")) - msgs = response_body.get("output").get("message", {}).get("content", []) - output_messages = [] - for msg in msgs: - output_messages.append(_output_message("assistant", [_text_part(msg.get("text"))], fr)) + message = response_body.get("output", {}).get("message", {}) + output_messages = [ + _output_message( + message.get("role", "assistant"), + _converse_content_to_parts(message.get("content", [])), + fr, + ) + ] _set_span_attribute( span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, json.dumps(output_messages), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py` around lines 726 - 736, The code incorrectly turns each entry of response_body["output"]["message"]["content"] into its own GEN_AI_OUTPUT_MESSAGES entry; instead, keep the entire response_body["output"]["message"] as a single assistant message with multiple parts: collect parts by iterating msgs and mapping each content block into a part (use _text_part(msg.get("text")) for text blocks and preserve/extend with the existing part-creation logic for non-text blocks), then call _output_message("assistant", parts, fr) once and pass json.dumps(of that single message) to _set_span_attribute (refer to response_body, msgs, _text_part, _output_message, _set_span_attribute, and GenAIAttributes.GEN_AI_OUTPUT_MESSAGES).
217-223:⚠️ Potential issue | 🟡 MinorDon't turn missing AI21 finish reasons into
"None".Both AI21 extraction sites stringify non-dict values before mapping. If Bedrock returns
finishReason: null, that becomes"None"and gets emitted as a bogus semconv value instead of being omitted.💡 Suggested fix
elif model_vendor == "ai21": for comp in response_body.get("completions", []): fr_data = comp.get("finishReason", {}) - raw = fr_data.get("reason") if isinstance(fr_data, dict) else str(fr_data) + raw = fr_data.get("reason") if isinstance(fr_data, dict) else fr_data fr = _map_finish_reason(raw) if fr: finish_reasons.append(fr) @@ for completion in response_body.get("completions"): fr_data = completion.get("finishReason", {}) - raw_reason = fr_data.get("reason") if isinstance(fr_data, dict) else str(fr_data) or None + raw_reason = fr_data.get("reason") if isinstance(fr_data, dict) else fr_data fr = _map_finish_reason(raw_reason) if fr: finish_reasons.append(fr)Also applies to: 553-559
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py` around lines 217 - 223, The code in the AI21 branch (the block using variables fr_data, raw, _map_finish_reason and finish_reasons.append) stringifies non-dict finishReason values, turning null into "None" and producing a bogus semconv; change the logic so you treat None/null as absent: compute raw as fr_data.get("reason") when fr_data is a dict, set raw = None when fr_data is None, and only call _map_finish_reason and append to finish_reasons when raw is not None (and mapping returns a value). Apply the same fix to the other AI21 extraction site that mirrors this logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 242-245: For the imported_model branch, also inspect
choices[0]["finish_reason"] when top-level response_body.get("stop_reason") is
absent: call _map_finish_reason on response_body.get("choices",
[{}])[0].get("finish_reason") and append that mapped value to finish_reasons
(same as you do for stop_reason) so the imported_model path captures
OpenAI-style finish reasons; update the code paths around model_vendor ==
"imported_model" and ensure _map_finish_reason is invoked for both stop_reason
and choices[0]["finish_reason"] before appending to finish_reasons.
---
Outside diff comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 619-641: Change the unconditional setting of
GenAIAttributes.GEN_AI_OPERATION_NAME to choose "chat" when the request_body
contains a messages-based request: inspect request_body for a "messages" key
(and/or role/content structure) before calling _set_span_attribute(span,
GenAIAttributes.GEN_AI_OPERATION_NAME, ...); if messages are present set the
value to the chat enum (instead of
GenAiOperationNameValues.TEXT_COMPLETION.value), otherwise keep TEXT_COMPLETION;
keep the existing handling of textGenerationConfig/inferenceConfig for
topP/temperature/max tokens unchanged.
---
Duplicate comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 726-736: The code incorrectly turns each entry of
response_body["output"]["message"]["content"] into its own
GEN_AI_OUTPUT_MESSAGES entry; instead, keep the entire
response_body["output"]["message"] as a single assistant message with multiple
parts: collect parts by iterating msgs and mapping each content block into a
part (use _text_part(msg.get("text")) for text blocks and preserve/extend with
the existing part-creation logic for non-text blocks), then call
_output_message("assistant", parts, fr) once and pass json.dumps(of that single
message) to _set_span_attribute (refer to response_body, msgs, _text_part,
_output_message, _set_span_attribute, and
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES).
- Around line 217-223: The code in the AI21 branch (the block using variables
fr_data, raw, _map_finish_reason and finish_reasons.append) stringifies non-dict
finishReason values, turning null into "None" and producing a bogus semconv;
change the logic so you treat None/null as absent: compute raw as
fr_data.get("reason") when fr_data is a dict, set raw = None when fr_data is
None, and only call _map_finish_reason and append to finish_reasons when raw is
not None (and mapping returns a value). Apply the same fix to the other AI21
extraction site that mirrors this logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2d651691-6317-4835-80c4-5241f537f6e3
📒 Files selected for processing (12)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.pypackages/opentelemetry-instrumentation-bedrock/tests/test_finish_reasons.pypackages/opentelemetry-instrumentation-bedrock/tests/test_semconv_compliance.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/__init__.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
🚧 Files skipped from review as they are similar to previous changes (4)
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/init.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (3)
363-375: Remove unusedfinish_reasonslist in_set_generations_span_attributes.The
finish_reasonslist (lines 365, 368) is built but never used or written to the span. Finish reasons are already set by_set_finish_reasons_unconditionally()called earlier inset_model_choice_span_attributes(). This dead code can be removed for clarity.♻️ Suggested cleanup
def _set_generations_span_attributes(span, response_body): output_messages = [] - finish_reasons = [] for generation in response_body.get("generations"): fr = _map_finish_reason(generation.get("finish_reason")) - finish_reasons.append(fr) output_messages.append(_output_message("assistant", [_text_part(generation.get("text"))], fr)) _set_span_attribute( span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, json.dumps(output_messages), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py` around lines 363 - 375, The function _set_generations_span_attributes builds an unused list finish_reasons; remove the declaration finish_reasons = [] and the finish_reasons.append(fr) call so only the mapped fr value is used to build output_messages (keep the _map_finish_reason(generation.get("finish_reason")) call and the output_messages.append(...) line), since finish reasons are handled by _set_finish_reasons_unconditionally() elsewhere.
729-739: Consider consolidating Amazon output content blocks into a single message.This path creates separate output messages for each content block in
response_body["output"]["message"]["content"]. While this may be intentional for Amazon's response format, the Converse API returns a single assistant message with multiple content blocks. Creating separate messages fragments the logical response structure.If Amazon Nova responses truly have one logical assistant turn, consolidate like the Converse path does:
💡 Optional refactor to preserve message boundaries
elif "output" in response_body: fr = _map_finish_reason(response_body.get("stopReason")) - msgs = response_body.get("output").get("message", {}).get("content", []) - output_messages = [] - for msg in msgs: - output_messages.append(_output_message("assistant", [_text_part(msg.get("text"))], fr)) + message = response_body.get("output", {}).get("message", {}) + parts = _converse_content_to_parts(message.get("content", [])) + output_messages = [_output_message(message.get("role", "assistant"), parts, fr)] _set_span_attribute( span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, json.dumps(output_messages), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py` around lines 729 - 739, The code currently creates a separate output message for each content block in response_body["output"]["message"]["content"]; instead, consolidate into a single assistant message by mapping the finish reason via _map_finish_reason(response_body.get("stopReason")), converting each content block with _text_part(msg.get("text")) into a list of parts, and building one _output_message("assistant", parts_list, fr) (rather than appending per-msg messages), then json.dumps that single output message and pass it to _set_span_attribute(span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, ...); use the existing symbols _text_part, _output_message, _map_finish_reason, and _set_span_attribute to locate and implement the change.
554-567: Same dead code pattern: unusedfinish_reasonslist.Similar to
_set_generations_span_attributes, thefinish_reasonslist here (lines 555, 561) is built but never used. Consider removing it for consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py` around lines 554 - 567, The local list finish_reasons is unused—remove its declaration and the finish_reasons.append(fr) call while keeping the existing computation of fr (via fr_data/raw_reason and _map_finish_reason) because fr is passed into _output_message; update the loop in response_body.get("completions") to no longer build finish_reasons, leaving output_messages construction and the subsequent _set_span_attribute( span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, json.dumps(output_messages) ) intact.packages/opentelemetry-instrumentation-bedrock/tests/test_semconv_compliance.py (1)
1021-1046: The_handle_converseimport is stable and correctly located in the module.The test's import from
opentelemetry.instrumentation.bedrockis valid—_handle_converseis defined at line 397 of the__init__.pyfile. While testing private functions is acceptable for validating internal behavior, consider adding a comment to the test method documenting that it intentionally tests internal implementation details to clarify the testing approach for future maintainers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/tests/test_semconv_compliance.py` around lines 1021 - 1046, The test imports and exercises the private function _handle_converse directly; add a short clarifying comment above the test_converse_event_mode_sets_finish_reasons test that states this is an intentional test of internal implementation (testing a private symbol) to validate behavior (e.g., finish_reasons being set) and to inform future maintainers that the import from opentelemetry.instrumentation.bedrock is deliberate; reference the private symbol _handle_converse and the test name in the comment so the intent is explicit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 553-568: The finishReason extraction in
_set_span_completions_attributes can turn an explicit None into the string
"None"; change the raw_reason computation to explicitly treat None as missing
(e.g., if isinstance(fr_data, dict): raw_reason = fr_data.get("reason") else:
raw_reason = None if fr_data is None else str(fr_data)) and/or normalize the
result by setting raw_reason = None when raw_reason == "None" before calling
_map_finish_reason; update the raw_reason logic inside
_set_span_completions_attributes accordingly so _map_finish_reason never
receives the literal "None".
---
Nitpick comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 363-375: The function _set_generations_span_attributes builds an
unused list finish_reasons; remove the declaration finish_reasons = [] and the
finish_reasons.append(fr) call so only the mapped fr value is used to build
output_messages (keep the _map_finish_reason(generation.get("finish_reason"))
call and the output_messages.append(...) line), since finish reasons are handled
by _set_finish_reasons_unconditionally() elsewhere.
- Around line 729-739: The code currently creates a separate output message for
each content block in response_body["output"]["message"]["content"]; instead,
consolidate into a single assistant message by mapping the finish reason via
_map_finish_reason(response_body.get("stopReason")), converting each content
block with _text_part(msg.get("text")) into a list of parts, and building one
_output_message("assistant", parts_list, fr) (rather than appending per-msg
messages), then json.dumps that single output message and pass it to
_set_span_attribute(span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, ...); use the
existing symbols _text_part, _output_message, _map_finish_reason, and
_set_span_attribute to locate and implement the change.
- Around line 554-567: The local list finish_reasons is unused—remove its
declaration and the finish_reasons.append(fr) call while keeping the existing
computation of fr (via fr_data/raw_reason and _map_finish_reason) because fr is
passed into _output_message; update the loop in response_body.get("completions")
to no longer build finish_reasons, leaving output_messages construction and the
subsequent _set_span_attribute( span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
json.dumps(output_messages) ) intact.
In
`@packages/opentelemetry-instrumentation-bedrock/tests/test_semconv_compliance.py`:
- Around line 1021-1046: The test imports and exercises the private function
_handle_converse directly; add a short clarifying comment above the
test_converse_event_mode_sets_finish_reasons test that states this is an
intentional test of internal implementation (testing a private symbol) to
validate behavior (e.g., finish_reasons being set) and to inform future
maintainers that the import from opentelemetry.instrumentation.bedrock is
deliberate; reference the private symbol _handle_converse and the test name in
the comment so the intent is explicit.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4eb1fcc1-0341-4869-a347-295f507e590f
📒 Files selected for processing (6)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.pypackages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.pypackages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.pypackages/opentelemetry-instrumentation-bedrock/tests/test_content_parts.pypackages/opentelemetry-instrumentation-bedrock/tests/test_semconv_compliance.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.py
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_metrics.py
- packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_guardrails_metrics.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.py
| def _set_span_completions_attributes(span, response_body): | ||
| for i, completion in enumerate(response_body.get("completions")): | ||
| _set_span_attribute( | ||
| span, | ||
| f"{GenAIAttributes.GEN_AI_COMPLETION}.{i}.content", | ||
| completion.get("data").get("text"), | ||
| ) | ||
| output_messages = [] | ||
| finish_reasons = [] | ||
| for completion in response_body.get("completions"): | ||
| fr_data = completion.get("finishReason", {}) | ||
| raw_reason = fr_data.get("reason") if isinstance(fr_data, dict) else str(fr_data) or None | ||
| fr = _map_finish_reason(raw_reason) | ||
| if fr: | ||
| finish_reasons.append(fr) | ||
| output_messages.append(_output_message("assistant", [_text_part(completion.get("data").get("text"))], fr)) | ||
| _set_span_attribute( | ||
| span, | ||
| GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, | ||
| json.dumps(output_messages), | ||
| ) | ||
|
|
There was a problem hiding this comment.
AI21 finish reason extraction can still emit "None" string literal.
Line 558 attempts to handle missing finishReason but has an edge case:
raw_reason = fr_data.get("reason") if isinstance(fr_data, dict) else str(fr_data) or NoneIf finishReason is explicitly None (not absent), str(None) produces "None", and "None" or None evaluates to "None" (truthy string). This would pass through _map_finish_reason and be emitted as-is.
💡 Suggested fix
for completion in response_body.get("completions"):
fr_data = completion.get("finishReason", {})
- raw_reason = fr_data.get("reason") if isinstance(fr_data, dict) else str(fr_data) or None
+ raw_reason = fr_data.get("reason") if isinstance(fr_data, dict) else (fr_data if fr_data else None)
fr = _map_finish_reason(raw_reason)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`
around lines 553 - 568, The finishReason extraction in
_set_span_completions_attributes can turn an explicit None into the string
"None"; change the raw_reason computation to explicitly treat None as missing
(e.g., if isinstance(fr_data, dict): raw_reason = fr_data.get("reason") else:
raw_reason = None if fr_data is None else str(fr_data)) and/or normalize the
result by setting raw_reason = None when raw_reason == "None" before calling
_map_finish_reason; update the raw_reason logic inside
_set_span_completions_attributes accordingly so _map_finish_reason never
receives the literal "None".
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (3)
246-249:⚠️ Potential issue | 🟠 MajorRead imported-model finish reasons from
choices[0]too.This still only inspects top-level
stop_reason, so imported-model spans loseGEN_AI_RESPONSE_FINISH_REASONSandoutput_messages[0]["finish_reason"]when the response only carrieschoices[0]["finish_reason"]. The updated imported-model event tests already expect"length"from that field, so spans and logs diverge.🛠️ Suggested fix
elif model_vendor == "imported_model": - fr = _map_finish_reason(response_body.get("stop_reason")) + choices = response_body.get("choices") or [] + raw_reason = response_body.get("stop_reason") + if raw_reason is None and choices: + raw_reason = choices[0].get("finish_reason") + fr = _map_finish_reason(raw_reason) if fr: finish_reasons.append(fr) @@ def _set_imported_model_response_span_attributes(span, response_body): - fr = _map_finish_reason(response_body.get("stop_reason")) + choices = response_body.get("choices") or [] + raw_reason = response_body.get("stop_reason") + if raw_reason is None and choices: + raw_reason = choices[0].get("finish_reason") + fr = _map_finish_reason(raw_reason) content = response_body.get("generation") - if content is None and response_body.get("choices"): - content = response_body["choices"][0].get("text") + if content is None and choices: + content = choices[0].get("text") _set_span_attribute( span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, json.dumps([_output_message("assistant", [_text_part(content)], fr)]), )Also applies to: 774-782
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py` around lines 246 - 249, The imported_model branch only looks at top-level response_body.get("stop_reason"), so when finish reasons exist under choices[0]["finish_reason"] they are missed; update the branch handling model_vendor == "imported_model" (and the analogous block around the other occurrence) to also inspect response_body.get("choices", [])[0].get("finish_reason") (falling back to top-level if absent), pass that value into _map_finish_reason, and append the mapped result to finish_reasons and set output_messages[0]["finish_reason"] as appropriate so GEN_AI_RESPONSE_FINISH_REASONS and output_messages align with imported-model events.
729-739:⚠️ Potential issue | 🟠 MajorKeep
output.messageas one assistant message.
response_body["output"]["message"]is already a single Bedrock message. Splitting each content block into a separateoutput_messagesentry breaks message boundaries, andmsg.get("text")drops tool/media blocks that_converse_content_to_parts()already preserves in the converse path.Based on learnings: the OTel GenAI semantic convention formal JSON schema for `gen_ai.output.messages` uses `{"role": "...", "parts": [...]}` format.🛠️ Suggested fix
elif "output" in response_body: fr = _map_finish_reason(response_body.get("stopReason")) - msgs = response_body.get("output").get("message", {}).get("content", []) - output_messages = [] - for msg in msgs: - output_messages.append(_output_message("assistant", [_text_part(msg.get("text"))], fr)) + message = response_body.get("output", {}).get("message", {}) + output_messages = [ + _output_message( + message.get("role", "assistant"), + _converse_content_to_parts(message.get("content", [])), + fr, + ) + ] _set_span_attribute( span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, json.dumps(output_messages), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py` around lines 729 - 739, The code is splitting response_body["output"]["message"]["content"] into multiple messages and losing non-text parts; instead treat response_body["output"]["message"] as a single assistant message: call _converse_content_to_parts on response_body.get("output").get("message") (or equivalent to obtain parts), build one output message via _output_message("assistant", parts, fr) where fr = _map_finish_reason(response_body.get("stopReason")), and then call _set_span_attribute(span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, json.dumps([that_single_message])); remove the loop that iterates msgs and avoid using msg.get("text") so tool/media blocks are preserved.
221-227:⚠️ Potential issue | 🟡 MinorKeep AI21
nullfinish reasons unset.If Bedrock returns
finishReason: null, both sites coerce it into the literal"None"and emit that as a finish reason. TreatNoneas missing instead so the attribute is omitted rather than carrying an invalid semconv value.🛠️ Suggested fix
elif model_vendor == "ai21": for comp in response_body.get("completions", []): fr_data = comp.get("finishReason", {}) - raw = fr_data.get("reason") if isinstance(fr_data, dict) else str(fr_data) + raw = ( + fr_data.get("reason") + if isinstance(fr_data, dict) + else (None if fr_data is None else str(fr_data)) + ) fr = _map_finish_reason(raw) if fr: finish_reasons.append(fr) @@ def _set_span_completions_attributes(span, response_body): output_messages = [] finish_reasons = [] for completion in response_body.get("completions"): fr_data = completion.get("finishReason", {}) - raw_reason = fr_data.get("reason") if isinstance(fr_data, dict) else str(fr_data) or None + raw_reason = ( + fr_data.get("reason") + if isinstance(fr_data, dict) + else (None if fr_data is None else str(fr_data)) + ) fr = _map_finish_reason(raw_reason) if fr: finish_reasons.append(fr)Also applies to: 553-562
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py` around lines 221 - 227, When model_vendor == "ai21" (and the similar block at 553-562) skip finishReason values that are null instead of coercing them to the string "None": inspect comp.get("finishReason") into fr_data and if fr_data is None (or if it's a dict whose "reason" is None) simply continue/skip adding a finish reason; only call _map_finish_reason with a real non-null raw value and append its result when non-empty. This ensures finish_reasons and the _map_finish_reason call are only invoked for actual values and not for Bedrock's null finishReason.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 622-624: The code in _set_amazon_span_attributes unconditionally
sets GenAIAttributes.GEN_AI_OPERATION_NAME to
GenAiOperationNameValues.TEXT_COMPLETION.value which overwrites correct
operation names for Nova messages-v1 (chat) requests; change the logic to avoid
overwriting or to set the correct value based on the request type: in
_set_amazon_span_attributes inspect the incoming request/model/operation (e.g.,
check for "messages" or "messages-v1") and if it indicates a chat/messages call
set GEN_AI_OPERATION_NAME to GenAiOperationNameValues.CHAT.value (or set to
TEXT_COMPLETION only when the request clearly is a text-completion), otherwise
do not overwrite an existing GEN_AI_OPERATION_NAME attribute on span; update the
_set_span_attribute(span, GenAIAttributes.GEN_AI_OPERATION_NAME, ...) call
accordingly to be conditional.
---
Duplicate comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 246-249: The imported_model branch only looks at top-level
response_body.get("stop_reason"), so when finish reasons exist under
choices[0]["finish_reason"] they are missed; update the branch handling
model_vendor == "imported_model" (and the analogous block around the other
occurrence) to also inspect response_body.get("choices",
[])[0].get("finish_reason") (falling back to top-level if absent), pass that
value into _map_finish_reason, and append the mapped result to finish_reasons
and set output_messages[0]["finish_reason"] as appropriate so
GEN_AI_RESPONSE_FINISH_REASONS and output_messages align with imported-model
events.
- Around line 729-739: The code is splitting
response_body["output"]["message"]["content"] into multiple messages and losing
non-text parts; instead treat response_body["output"]["message"] as a single
assistant message: call _converse_content_to_parts on
response_body.get("output").get("message") (or equivalent to obtain parts),
build one output message via _output_message("assistant", parts, fr) where fr =
_map_finish_reason(response_body.get("stopReason")), and then call
_set_span_attribute(span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
json.dumps([that_single_message])); remove the loop that iterates msgs and avoid
using msg.get("text") so tool/media blocks are preserved.
- Around line 221-227: When model_vendor == "ai21" (and the similar block at
553-562) skip finishReason values that are null instead of coercing them to the
string "None": inspect comp.get("finishReason") into fr_data and if fr_data is
None (or if it's a dict whose "reason" is None) simply continue/skip adding a
finish reason; only call _map_finish_reason with a real non-null raw value and
append its result when non-empty. This ensures finish_reasons and the
_map_finish_reason call are only invoked for actual values and not for Bedrock's
null finishReason.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d9ae533b-070d-488c-8328-3b78d50e73d3
📒 Files selected for processing (10)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/init.py
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (1)
2-18:⚠️ Potential issue | 🟡 MinorKeep the import block contiguous to satisfy Ruff E402.
Line 5 splits the module-level imports, and Ruff is already flagging this file for E402.
Suggested fix
import json import logging import time - -logger = logging.getLogger(__name__) from opentelemetry.instrumentation.bedrock.config import Config from opentelemetry.instrumentation.bedrock.utils import should_send_prompts from opentelemetry.semconv._incubating.attributes import ( gen_ai_attributes as GenAIAttributes, ) @@ from opentelemetry.semconv._incubating.attributes.gen_ai_attributes import ( GenAiOperationNameValues, GenAiSystemValues, ) from opentelemetry.semconv_ai import SpanAttributes + +logger = logging.getLogger(__name__)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py` around lines 2 - 18, Move the module-level imports into a single contiguous block and place the logger assignment after them: relocate the statement logger = logging.getLogger(__name__) so it appears below all import lines (including logging, time, Config, should_send_prompts, GenAIAttributes, AWS_BEDROCK_GUARDRAIL_ID, GenAiOperationNameValues, GenAiSystemValues, and SpanAttributes) and remove the blank line that currently splits the imports to satisfy Ruff E402.
♻️ Duplicate comments (4)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (1)
529-541:⚠️ Potential issue | 🟠 MajorKeep the legacy
LLM_BEDROCK_*constant names as aliases.
GuardrailMetersandPromptCachingare public exports. Dropping the old attribute names here turns a semconv migration into anAttributeErrorfor downstream imports.Suggested fix
class GuardrailMeters: GEN_AI_BEDROCK_GUARDRAIL_ACTIVATION = "gen_ai.bedrock.guardrail.activation" GEN_AI_BEDROCK_GUARDRAIL_LATENCY = "gen_ai.bedrock.guardrail.latency" GEN_AI_BEDROCK_GUARDRAIL_COVERAGE = "gen_ai.bedrock.guardrail.coverage" GEN_AI_BEDROCK_GUARDRAIL_SENSITIVE = "gen_ai.bedrock.guardrail.sensitive_info" GEN_AI_BEDROCK_GUARDRAIL_TOPICS = "gen_ai.bedrock.guardrail.topics" GEN_AI_BEDROCK_GUARDRAIL_CONTENT = "gen_ai.bedrock.guardrail.content" GEN_AI_BEDROCK_GUARDRAIL_WORDS = "gen_ai.bedrock.guardrail.words" + LLM_BEDROCK_GUARDRAIL_ACTIVATION = GEN_AI_BEDROCK_GUARDRAIL_ACTIVATION + LLM_BEDROCK_GUARDRAIL_LATENCY = GEN_AI_BEDROCK_GUARDRAIL_LATENCY + LLM_BEDROCK_GUARDRAIL_COVERAGE = GEN_AI_BEDROCK_GUARDRAIL_COVERAGE + # repeat for the remaining removed LLM_BEDROCK_* symbols @@ class PromptCaching: # will be moved under the AI SemConv. Not namespaced since also OpenAI supports this. GEN_AI_PROMPT_CACHING = "gen_ai.prompt.caching" + LLM_BEDROCK_PROMPT_CACHING = GEN_AI_PROMPT_CACHING🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py` around lines 529 - 541, The new classes GuardrailMeters and PromptCaching removed legacy constant names causing downstream AttributeError; add alias attributes with the old LLM_BEDROCK_* names on GuardrailMeters (e.g., LLM_BEDROCK_GUARDRAIL_ACTIVATION -> GEN_AI_BEDROCK_GUARDRAIL_ACTIVATION, etc.) and LLM_BEDROCK_PROMPT_CACHING on PromptCaching that point to the new constants so existing imports continue to work; modify the class definitions (GuardrailMeters and PromptCaching) to define those LLM_BEDROCK_* class attributes as references to the corresponding GEN_AI_* constants.packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py (3)
252-255:⚠️ Potential issue | 🟠 MajorFallback to
choices[0].finish_reasonfor imported models.This path already falls back to
choices[0].textwhengenerationis absent, but it never readschoices[0].finish_reason. OpenAI-style imported responses will therefore lose both the span-level and message-level finish reason.Suggested fix
elif model_vendor == "imported_model": - fr = _map_finish_reason(response_body.get("stop_reason")) + choices = response_body.get("choices") or [] + raw_reason = response_body.get("stop_reason") + if raw_reason is None and choices: + raw_reason = choices[0].get("finish_reason") + fr = _map_finish_reason(raw_reason) if fr: finish_reasons.append(fr) @@ def _set_imported_model_response_span_attributes(span, response_body): - fr = _map_finish_reason(response_body.get("stop_reason")) - content = response_body.get("generation") - if content is None and response_body.get("choices"): - content = response_body["choices"][0].get("text") + choices = response_body.get("choices") or [] + raw_reason = response_body.get("stop_reason") + if raw_reason is None and choices: + raw_reason = choices[0].get("finish_reason") + fr = _map_finish_reason(raw_reason) + content = response_body.get("generation") + if content is None and choices: + content = choices[0].get("text") _set_span_attribute( span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, json.dumps([_output_message("assistant", [_text_part(content)], fr)]), )Also applies to: 780-788
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py` around lines 252 - 255, The imported_model branch currently maps finish reason only from response_body.get("stop_reason") via _map_finish_reason and appends it to finish_reasons, but when generation is absent it should also fall back to OpenAI-style choices[0].finish_reason; update the branch in span_utils.py (the block handling model_vendor == "imported_model" that calls _map_finish_reason and finish_reasons.append) to, if fr is falsy, attempt to read response_body.get("choices", [{}])[0].get("finish_reason") and pass that through _map_finish_reason before appending; apply the same change to the duplicated logic at the other occurrence around the 780–788 region so both locations handle choices[0].finish_reason as a fallback.
227-230:⚠️ Potential issue | 🟠 MajorKeep missing AI21 finish reasons as
None, not"None".Line 230 and Line 563 still stringify non-dict
finishReasonvalues. If AI21 returnsnull,_map_finish_reason()will emit"None"as a real finish reason.Suggested fix
- raw = fr_data.get("reason") if isinstance(fr_data, dict) else str(fr_data) + raw = fr_data.get("reason") if isinstance(fr_data, dict) else ( + str(fr_data) if fr_data is not None else None + ) @@ - raw_reason = fr_data.get("reason") if isinstance(fr_data, dict) else str(fr_data) or None + raw_reason = fr_data.get("reason") if isinstance(fr_data, dict) else ( + str(fr_data) if fr_data is not None else None + )Also applies to: 561-564
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py` around lines 227 - 230, The code currently stringifies non-dict AI21 finishReason values (producing "None"); update the logic in span_utils.py (the ai21 branch and the other occurrence around _map_finish_reason) to avoid calling str() on non-dict finishReason values so that None stays None. Concretely, in the ai21 loop (and the duplicate block near _map_finish_reason) set fr_data = comp.get("finishReason") and then raw = fr_data.get("reason") if isinstance(fr_data, dict) else fr_data (no str conversion), ensuring finishReason values that are None remain None.
735-745:⚠️ Potential issue | 🟠 MajorSerialize Bedrock
output.messageas one message with multiple parts.The
outputbranch currently emits one assistant message per content block and only keepsmsg["text"]. That breaks message boundaries and drops tool/image/document blocks from Converse responses.Suggested fix
elif "output" in response_body: fr = _map_finish_reason(response_body.get("stopReason")) - msgs = response_body.get("output").get("message", {}).get("content", []) - output_messages = [] - for msg in msgs: - output_messages.append(_output_message("assistant", [_text_part(msg.get("text"))], fr)) + message = response_body.get("output", {}).get("message", {}) + output_messages = [ + _output_message( + message.get("role", "assistant"), + _converse_content_to_parts(message.get("content", [])), + fr, + ) + ] _set_span_attribute( span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, json.dumps(output_messages), )Based on learnings, the formal JSON schema for
gen_ai.output.messagesuses the{role, parts}structure.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py` around lines 735 - 745, The current "output" branch in span_utils.py builds one assistant message per content block and only uses msg["text"], which breaks message boundaries and drops non-text parts; update the branch that handles response_body["output"] to create a single assistant message with parts equal to the original content blocks (preserving non-text blocks) by mapping each content item through _text_part or a generic part-preserving mapper, compute fr via _map_finish_reason(response_body.get("stopReason")), build one output_messages entry with role "assistant" and parts list (not multiple messages), pass that to _output_message, and then call _set_span_attribute(span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, json.dumps(output_messages)); reuse existing helpers _text_part, _output_message, _map_finish_reason, and _set_span_attribute to keep formatting consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`:
- Around line 245-252: The span started via tracer.start_span (span =
tracer.start_span(_span_name(operation_name, _model), ...)) is created outside a
context and may never be ended if fn(*args, **kwargs) raises; wrap the call to
fn(*args, **kwargs) in a try/except that calls span.end() (and records the
exception on the span if appropriate) before re-raising so the span is always
closed, then call _handle_stream_call(span, kwargs, response, metric_params,
event_logger) only in the success path; apply the same try/except pattern to the
other streaming block that also starts spans (the analogous code around
_handle_stream_call at the later block).
- Around line 441-450: The bug is that code appends every "toolUse" fragment
from event["contentBlockStart"] and event["contentBlockDelta"] directly to
tool_blocks producing duplicate/partial tool_call parts; change the logic to
merge fragments instead: when you see a "contentBlockStart" with a "toolUse",
store it as a pending tool fragment (e.g., pending_tool or merge into the last
tool_blocks entry if it is incomplete) rather than unconditionally appending to
tool_blocks; when you later see a "contentBlockDelta" with "toolUse", detect the
corresponding pending fragment (by block id or by checking the last
pending/incomplete entry in tool_blocks) and extend/merge its fields
(concatenate lists/strings or update nested keys) instead of appending a new
entry; apply the same merged-append logic to the other similar blocks referenced
(the blocks around the other event handling ranges) to ensure only
completed/merged toolUse payloads are emitted.
In
`@packages/opentelemetry-instrumentation-bedrock/tests/test_semconv_compliance.py`:
- Around line 1029-1048: The test is patching utils.should_emit_events but
_handle_converse calls the symbol imported into the
opentelemetry.instrumentation.bedrock module; change the patch targets to the
names imported by that module so the branch is actually forced. Update the
decorators in test_converse_event_mode_sets_finish_reasons to patch
"opentelemetry.instrumentation.bedrock.should_emit_events" (and likewise
"opentelemetry.instrumentation.bedrock.should_send_prompts") so
_handle_converse(span, ...) uses the patched booleans when exercising the
event-mode branch.
---
Outside diff comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 2-18: Move the module-level imports into a single contiguous block
and place the logger assignment after them: relocate the statement logger =
logging.getLogger(__name__) so it appears below all import lines (including
logging, time, Config, should_send_prompts, GenAIAttributes,
AWS_BEDROCK_GUARDRAIL_ID, GenAiOperationNameValues, GenAiSystemValues, and
SpanAttributes) and remove the blank line that currently splits the imports to
satisfy Ruff E402.
---
Duplicate comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`:
- Around line 529-541: The new classes GuardrailMeters and PromptCaching removed
legacy constant names causing downstream AttributeError; add alias attributes
with the old LLM_BEDROCK_* names on GuardrailMeters (e.g.,
LLM_BEDROCK_GUARDRAIL_ACTIVATION -> GEN_AI_BEDROCK_GUARDRAIL_ACTIVATION, etc.)
and LLM_BEDROCK_PROMPT_CACHING on PromptCaching that point to the new constants
so existing imports continue to work; modify the class definitions
(GuardrailMeters and PromptCaching) to define those LLM_BEDROCK_* class
attributes as references to the corresponding GEN_AI_* constants.
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.py`:
- Around line 252-255: The imported_model branch currently maps finish reason
only from response_body.get("stop_reason") via _map_finish_reason and appends it
to finish_reasons, but when generation is absent it should also fall back to
OpenAI-style choices[0].finish_reason; update the branch in span_utils.py (the
block handling model_vendor == "imported_model" that calls _map_finish_reason
and finish_reasons.append) to, if fr is falsy, attempt to read
response_body.get("choices", [{}])[0].get("finish_reason") and pass that through
_map_finish_reason before appending; apply the same change to the duplicated
logic at the other occurrence around the 780–788 region so both locations handle
choices[0].finish_reason as a fallback.
- Around line 227-230: The code currently stringifies non-dict AI21 finishReason
values (producing "None"); update the logic in span_utils.py (the ai21 branch
and the other occurrence around _map_finish_reason) to avoid calling str() on
non-dict finishReason values so that None stays None. Concretely, in the ai21
loop (and the duplicate block near _map_finish_reason) set fr_data =
comp.get("finishReason") and then raw = fr_data.get("reason") if
isinstance(fr_data, dict) else fr_data (no str conversion), ensuring
finishReason values that are None remain None.
- Around line 735-745: The current "output" branch in span_utils.py builds one
assistant message per content block and only uses msg["text"], which breaks
message boundaries and drops non-text parts; update the branch that handles
response_body["output"] to create a single assistant message with parts equal to
the original content blocks (preserving non-text blocks) by mapping each content
item through _text_part or a generic part-preserving mapper, compute fr via
_map_finish_reason(response_body.get("stopReason")), build one output_messages
entry with role "assistant" and parts list (not multiple messages), pass that to
_output_message, and then call _set_span_attribute(span,
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, json.dumps(output_messages)); reuse
existing helpers _text_part, _output_message, _map_finish_reason, and
_set_span_attribute to keep formatting consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e6a6bc4c-4af9-4ca2-a4f7-3cd6e8636361
📒 Files selected for processing (13)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/span_utils.pypackages/opentelemetry-instrumentation-bedrock/pyproject.tomlpackages/opentelemetry-instrumentation-bedrock/tests/test_content_parts.pypackages/opentelemetry-instrumentation-bedrock/tests/test_semconv_compliance.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_guardrails.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_meta.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
🚧 Files skipped from review as they are similar to previous changes (5)
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_imported_model.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_cohere.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_nova.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_ai21.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py
| @patch("opentelemetry.instrumentation.bedrock.utils.should_emit_events", return_value=True) | ||
| @patch("opentelemetry.instrumentation.bedrock.utils.should_send_prompts", return_value=True) | ||
| def test_converse_event_mode_sets_finish_reasons(self, _mock_prompts, _mock_events): | ||
| """Converse event-mode branch must still set finish_reasons on the span.""" | ||
| from opentelemetry.instrumentation.bedrock import _handle_converse | ||
|
|
||
| span = _mock_span() | ||
| response = { | ||
| "output": {"message": {"role": "assistant", "content": [{"text": "Hi"}]}}, | ||
| "stopReason": "end_turn", | ||
| "usage": {"inputTokens": 2, "outputTokens": 1}, | ||
| } | ||
| kwargs = { | ||
| "modelId": "anthropic.claude-3-sonnet-20240229-v1:0", | ||
| "messages": [{"role": "user", "content": [{"text": "Hello"}]}], | ||
| } | ||
| metric_params = _mock_metric_params() | ||
|
|
||
| _handle_converse(span, kwargs, response, metric_params, MagicMock()) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n 'from opentelemetry\.instrumentation\.bedrock\.utils import .*should_emit_events' packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py
rg -n '@patch\("opentelemetry\.instrumentation\.bedrock\.utils\.should_emit_events"' packages/opentelemetry-instrumentation-bedrock/tests/test_semconv_compliance.py
python - <<'PY'
from unittest.mock import patch
class Utils:
def should_emit_events(self):
return False
utils = Utils()
should_emit_events = utils.should_emit_events
with patch.object(utils, "should_emit_events", return_value=True):
print(should_emit_events()) # stays False: imported alias is unchanged
PYRepository: traceloop/openllmetry
Length of output: 273
🏁 Script executed:
# Find how should_emit_events is actually imported/used in bedrock modules
fd . packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock --type f -name "*.py" | head -10Repository: traceloop/openllmetry
Length of output: 295
🏁 Script executed:
# Check bedrock __init__.py for imports
head -100 packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.pyRepository: traceloop/openllmetry
Length of output: 3601
🏁 Script executed:
# Find _handle_converse implementation
rg -n "_handle_converse" packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/ -A 5Repository: traceloop/openllmetry
Length of output: 3366
🏁 Script executed:
# Check how should_emit_events is used within _handle_converse or its module
rg -n "should_emit_events" packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/ -B 2 -A 2Repository: traceloop/openllmetry
Length of output: 6284
Patch the symbol imported by bedrock.__init__, not utils.should_emit_events.
_handle_converse() calls should_emit_events() using the reference imported into opentelemetry.instrumentation.bedrock at lines 42-45. Line 1029 patches utils.should_emit_events, so this test does not actually force the event-mode branch and can pass while the real path regresses.
Suggested fix
- `@patch`("opentelemetry.instrumentation.bedrock.utils.should_emit_events", return_value=True)
+ `@patch`("opentelemetry.instrumentation.bedrock.should_emit_events", return_value=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@packages/opentelemetry-instrumentation-bedrock/tests/test_semconv_compliance.py`
around lines 1029 - 1048, The test is patching utils.should_emit_events but
_handle_converse calls the symbol imported into the
opentelemetry.instrumentation.bedrock module; change the patch targets to the
names imported by that module so the branch is actually forced. Update the
decorators in test_converse_event_mode_sets_finish_reasons to patch
"opentelemetry.instrumentation.bedrock.should_emit_events" (and likewise
"opentelemetry.instrumentation.bedrock.should_send_prompts") so
_handle_converse(span, ...) uses the patched booleans when exercising the
event-mode branch.
c572d9c to
05734ee
Compare
- Bump opentelemetry-semantic-conventions-ai to >=0.5.0,<0.6.0 - GEN_AI_SYSTEM → GEN_AI_PROVIDER_NAME with GenAiSystemValues.AWS_BEDROCK - LLM_REQUEST_TYPE → GEN_AI_OPERATION_NAME with GenAiOperationNameValues - LLM_USAGE_TOTAL_TOKENS → GEN_AI_USAGE_TOTAL_TOKENS - Indexed GEN_AI_PROMPT/COMPLETION → GEN_AI_INPUT/OUTPUT_MESSAGES (JSON) - Hardcoded "AWS" → "aws.bedrock" via GenAiSystemValues enum - Add shared semconv compliance test (test_semconv.py) - Update all trace/metric tests for new attribute names and values Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on mapping
- Messages use parts array format: {"role": "...", "parts": [{"type": "text", "content": "..."}]}
- Output messages include finish_reason with OTel-mapped values
- Set gen_ai.response.finish_reasons span attribute on all responses
- Finish reason mapping: end_turn→stop, tool_use→tool_call, max_tokens→length,
COMPLETE→stop, FINISH→stop, guardrail_intervened→content_filter
- Added helpers for converting Anthropic and Converse content blocks to parts
- Updated all existing tests to assert new format
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix _map_finish_reason(None) to return None instead of "stop" - Capture Anthropic system instructions via gen_ai.system_instructions - Fix tool call arguments double-encoding - Map image content to BlobPart/UriPart - Move system instructions to gen_ai.system_instructions (Converse + Nova) - Fix deprecated gen_ai.system in prompt_caching.py - Fix converse stream missing stop reason default - Handle non-text content block deltas in streaming wrapper - Capture message_delta stop_reason in streaming wrapper - Conditionally include finish_reason in output messages - Remove debug print in streaming wrapper - Fix hardcoded gen_ai.vendor in guardrail/span_utils - Fix hardcoded exception counter metric name - Rename LLM_ prefix constants to GEN_AI_ - Update all tests for new behavior
- #14: _converse_content_to_parts handles image blocks → BlobPart - #15: _anthropic_content_to_parts maps thinking blocks → ReasoningPart - #16: ChoiceEvent.finish_reason defaults to None (not "unknown") - #17: Event emitter maps provider finish reasons via _map_finish_reason() - #18: _converse_content_to_parts handles video/document blocks - #19: _anthropic_content_to_parts parses string tool_use input via json.loads - #22: finish_reasons set unconditionally (not gated by should_send_prompts) New unit tests: test_content_parts.py, test_event_emitter.py Updated integration tests: anthropic, cohere, meta, nova, titan
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
aa8e67f to
022b175
Compare
Summary
{"role": "...", "parts": [{"type": "text", "content": "..."}]}finish_reasonwith OTel-mapped valuesgen_ai.response.finish_reasonsspan attribute on all responsesend_turn→stop,tool_use→tool_call,max_tokens→length,COMPLETE→stop,FINISH→stop,guardrail_intervened→content_filterLLM_*/SpanAttributes.*refs to upstreamGenAIAttributes.*imported models
invoke_modelandconverse/converse_streamAPIsTest plan
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Tests