fix(anthropic): support with_raw_response wrapper for span generation by nirga · Pull Request #3250 · traceloop/openllmetry

nirga · 2025-08-12T13:15:32Z

Fixes issue where AsyncAnthropicBedrock calls using with_raw_response.create() were not generating proper spans with response attributes and token usage.

The problem was that with_raw_response wraps responses in a LegacyAPIResponse object that needs to be parsed differently than regular response objects.

Changes:

Add _extract_response_data() helper to handle both regular and wrapped responses
Update all response processing functions to use the helper:
- set_response_attributes()
- _set_span_completions()
- shared_metrics_attributes()
- _set_token_usage() and _aset_token_usage()
Add bedrock extras to test dependencies for AsyncAnthropicBedrock testing
Add comprehensive tests for both with_raw_response and regular create calls

Resolves spans not being generated for AsyncAnthropicBedrock with_raw_response calls.

🤖 Generated with Claude Code

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.

Important

Fixes span generation for AsyncAnthropicBedrock with with_raw_response by adding response parsing helpers and updating related functions.

Behavior:
- Adds _extract_response_data() and _aextract_response_data() in utils.py to handle both regular and wrapped responses.
- Updates set_response_attributes(), _set_span_completions(), shared_metrics_attributes(), _set_token_usage(), and _aset_token_usage() to use the new helper functions.
- Adds bedrock extras to test dependencies in pyproject.toml.
Tests:
- Adds tests in test_bedrock_with_raw_response.py for AsyncAnthropicBedrock with with_raw_response and regular create calls.
- Includes VCR cassettes for test scenarios in tests/cassettes/test_bedrock_with_raw_response/.

^{This description was created by}^{for 59a8423. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

New Features
- Broader support for Anthropic/Bedrock message paths (including with_raw_response and Beta variants) so spans and metrics populate across more SDK flavors.
Bug Fixes
- Centralized response normalization and async-safe handling to ensure correct completions and token usage; improved instrumentation error handling and logging.
Tests
- Added async Bedrock integration tests and VCR cassettes for raw-response, regular create, and beta flows.
Chores
- Test dependency updated to include Bedrock extras for the Anthropic package.

coderabbitai · 2025-08-12T13:15:40Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

📥 Commits

Reviewing files that changed from the base of the PR and between 5230ec5 and 71688e1.

📒 Files selected for processing (4)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (9 hunks)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (3 hunks)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (1 hunks)

 _____________________________________
< Please step away from the keyboard! >
 -------------------------------------
  \
   \   (\__/)
       (•ㅅ•)
       / 　 づ

Walkthrough

Centralizes response normalization with a new _extract_response_data used across instrumentation, replaces ad-hoc dict/dict handling for token and span attribute extraction, expands WRAPPED_METHODS to cover Bedrock/beta/with_raw_response surfaces, adds Bedrock async tests and VCR cassettes, and updates test dependency to anthropic[bedrock].

Changes

Cohort / File(s)	Summary
Response normalization utility `packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py`	Adds `_extract_response_data(response)` to normalize responses to a dict-like payload (handles dict, objects with `parse()`, fallback to `__dict__`), and updates `shared_metrics_attributes` to use it.
Token usage normalization `packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py`	Replaces per-call `isinstance`/`.__dict__` fallbacks in `_set_token_usage` and `_aset_token_usage` with calls to `_extract_response_data(response)` before accessing usage fields; expands WRAPPED_METHODS for additional SDK/Beta/with_raw_response message methods; adds wrapping debug logs and broader exception handling.
Span attribute normalization `packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py`	Imports `_extract_response_data` and normalizes `response` in `set_response_attributes` and `_set_span_completions` (removes manual dict fallback logic) before reading model/id/usage/completions.
Bedrock tests and cassettes `packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py`, `packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/*`	Adds async integration tests for `AsyncAnthropicBedrock` (regular create and `with_raw_response`, including beta path) and three VCR cassettes capturing Bedrock invoke interactions; tests assert spans, completions, and token usage attributes.
Test dependency update `packages/opentelemetry-instrumentation-anthropic/pyproject.toml`	Changes test dependency to `anthropic = {extras = ["bedrock"], version = ">=0.36.0"}` under `[tool.poetry.group.test.dependencies]`.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant Client as Anthropic/Bedrock Client
    participant Instr as OTel Anthropic instrumentation
    participant Utils as _extract_response_data
    participant Span as OTel Span

    App->>Client: messages.create / messages.with_raw_response.create(...)
    Client-->>Instr: response (dict / wrapper / object)
    Instr->>Utils: _extract_response_data(response)
    Utils-->>Instr: normalized dict-like response
    Instr->>Span: set_response_attributes, _set_span_completions, _set_token_usage
    Span-->>App: recorded span with attributes and usage

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

fix(anthropic): async stream manager #3220: Overlaps WRAPPED_METHODS expansion and stream/stream-manager handling related to wrapping message/stream APIs.
fix(anthropic): various fixes around tools parsing #3204: Modifies span_utils.py and _set_span_completions, overlapping on response normalization and completion handling.

Suggested reviewers

doronkopit5

Poem

I nibble bytes beneath the log-lit moon,
I parse raw wrappers, humming a tune.
Carrots become dicts, tidy and bright,
Spans hop forward in telemetry light.
Bedrock jokes and tokens — traced just right. 🥕🐇

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/anthropic-with-raw-response-spans

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to 1a67ed3 in 1 minute and 54 seconds. Click for details.

Reviewed 397 lines of code in 7 files
Skipped 0 files when reviewing.
Skipped posting 6 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:133

Draft comment:
Refactor: Replacing manual conversion with _extract_response_data enhances consistency when handling wrapped responses. Consider checking for cases when an empty dict is returned.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The code already has proper null checks after calling _extract_response_data. The function is used to extract data in a consistent way, and the code is already defensive about handling empty/missing data. The comment seems to be suggesting additional checks that aren't necessary given the existing null handling. I could be wrong about the implementation of _extract_response_data - without seeing that code, I can't be 100% certain about its behavior. Even without seeing _extract_response_data, the calling code clearly handles None/empty cases properly through .get() calls and null checks. Additional empty dict checks would be redundant. The comment should be removed as it suggests unnecessary additional checks when the code already handles empty/null cases appropriately.

2. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py:174

Draft comment:
Nice update: Using _extract_response_data in _set_span_completions and set_response_attributes ensures uniform extraction of response data, including wrapped responses.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

3. packages/opentelemetry-instrumentation-anthropic/pyproject.toml:39

Draft comment:
Good improvement: Specifying the 'bedrock' extras for the anthropic dependency ensures tests run against AsyncAnthropicBedrock.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

4.

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml:1

Draft comment:
Cassette test added for regular create calls. This provides good baseline coverage for span attributes.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

5.

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml:1

Draft comment:
Cassette test for with_raw_response calls correctly includes the x-stainless-raw-response header. This helps verify the span generation for wrapped responses.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

6. packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py:30

Draft comment:
Comprehensive tests for both with_raw_response and regular create methods validate the span attributes and token usage correctly. Test coverage appears robust.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_3DZZR7zwpoemcbIx

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev · 2025-08-12T13:17:35Z

+                parsed_response = parsed_response.__dict__
+            return parsed_response
+        except Exception:
+            pass


The _extract_response_data helper centralizes response extraction. Consider logging the exception inside the try/except block (instead of silently passing) to aid debugging potential parsing issues.

Suggested change

pass

logging.debug("Failed to parse response in _extract_response_data", exc_info=True)

ellipsis-dev

Skipped PR review on 9bd6080 because no changed files had a supported extension. If you think this was in error, please contact us and we'll fix it right away.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1)

64-85: Helper function correctly handles response normalization.

The _extract_response_data function provides robust extraction logic to handle both regular responses and with_raw_response wrapped responses. The fallback chain (dict → parse() → dict → empty dict) is well-designed and exception-safe.

Previous review correctly identified that logging the exception would aid debugging. Consider adding debug logging in the exception handler at line 78.

🧹 Nitpick comments (4)

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml (1)
22-45: Filter volatile and identifying headers in cassettes to reduce churn and avoid leaking environment details.

Headers like x-amz-date, user-agent, x-stainless-* and runtime versions are not needed for matching and change frequently, causing noisy cassette diffs and potential environment leakage.

Add a test-wide VCR configuration (e.g., in tests/conftest.py) to filter and normalize headers:
# tests/conftest.py
import re
import pytest

SENSITIVE_HEADERS = [
    "authorization",
    "x-amz-date",
    "x-amzn-trace-id",
    "x-stainless-arch",
    "x-stainless-lang",
    "x-stainless-os",
    "x-stainless-package-version",
    "x-stainless-raw-response",
    "x-stainless-read-timeout",
    "x-stainless-retry-count",
    "x-stainless-runtime",
    "x-stainless-runtime-version",
    "x-stainless-timeout",
    "user-agent",
    "content-length",
]

@pytest.fixture(scope="session")
def vcr_config():
    return {
        "filter_headers": [(h, "<REDACTED>") for h in SENSITIVE_HEADERS],
        # Ensure request matching does not depend on transient headers
        "match_on": ["method", "scheme", "host", "port", "path", "query", "body"],
        "decode_compressed_response": True,
    }
This keeps playback deterministic and private.
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (3)
16-20: Prefer also honoring AWS_DEFAULT_REGION for better local/dev parity.

Some environments set AWS_DEFAULT_REGION instead of AWS_REGION. Fall back to either before defaulting.

Apply this diff:
-    aws_region = os.environ.get("AWS_REGION", "us-east-1")
+    aws_region = os.environ.get("AWS_REGION") or os.environ.get("AWS_DEFAULT_REGION") or "us-east-1"
46-48: Minor assertion simplification.

After asserting len(spans) == 1, checking all(...) is redundant. Access the single span directly.

Apply this diff:
-    assert len(spans) == 1
-    assert all(span.name == "anthropic.chat" for span in spans)
+    assert len(spans) == 1
+    assert spans[0].name == "anthropic.chat"
91-94: Minor assertion simplification (same as above).

Mirror the simplification to avoid redundant all(...).

Apply this diff:
-    assert len(spans) == 1
-    assert all(span.name == "anthropic.chat" for span in spans)
+    assert len(spans) == 1
+    assert spans[0].name == "anthropic.chat"

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba68225 and 9bd6080.

⛔ Files ignored due to path filters (1)

packages/opentelemetry-instrumentation-anthropic/poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (2 hunks)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (3 hunks)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1 hunks)
packages/opentelemetry-instrumentation-anthropic/pyproject.toml (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (3)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (1)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1)

_extract_response_data (64-84)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (1)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1)

_extract_response_data (64-84)

packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (3)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-257)

packages/opentelemetry-instrumentation-milvus/tests/conftest.py (1)

reader (37-41)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)

get_finished_spans (40-43)

🪛 Gitleaks (8.27.2)

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml

11-11: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml

11-11: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

🪛 Checkov (3.2.334)

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml

[HIGH] 11-12: AWS Access Key

(CKV_SECRET_2)

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml

[HIGH] 11-12: AWS Access Key

(CKV_SECRET_2)

🔇 Additional comments (11)

packages/opentelemetry-instrumentation-anthropic/pyproject.toml (1)

39-39: LGTM! Bedrock extras added for comprehensive testing.

This change appropriately includes the bedrock extras in the anthropic dependency to enable testing of AsyncAnthropicBedrock functionality, which is directly needed for the new tests in this PR.

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1)

87-97: Response normalization properly applied.

The shared_metrics_attributes function now correctly uses the centralized _extract_response_data helper to normalize the response before extracting attributes. This ensures consistent behavior across both regular and wrapped responses.

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (2)

133-134: Centralized response normalization applied correctly.

The async token usage function now properly uses _extract_response_data to normalize responses before accessing usage attributes, replacing the previous inline dict detection logic. This ensures consistent handling of both regular and with_raw_response wrapped responses.

226-227: Consistent normalization in sync token usage function.

The sync version correctly mirrors the async implementation by using _extract_response_data for response normalization. This maintains consistency across both sync and async code paths.

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml (2)

48-51: Test response data looks appropriate.

The cassette contains a realistic Anthropic Claude response with proper token usage information (17 input tokens, 50 output tokens) and a valid joke response about OpenTelemetry. This data will effectively test the response parsing logic.

11-13: Sensitive AWS credentials detected in VCR cassettes
The following cassette files each contain a hard-coded AWS4-HMAC-SHA256 Authorization header with an AKIA… key:

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml

Please confirm that these are disposable, non-production credentials. If they’re real, scrub or mask them (e.g. using VCR’s filter_sensitive_data) before merging.

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (3)

11-11: Import correctly added for response normalization.

The import of _extract_response_data is properly added to support the centralized response extraction functionality.

174-174: Response normalization applied in completion processing.

The _set_span_completions function now correctly normalizes the response before processing completions. This ensures that both regular and with_raw_response wrapped responses are handled consistently when extracting completion data.

241-241: Consistent normalization in response attributes.

The set_response_attributes function properly applies response normalization, replacing the previous manual dict fallback logic with the centralized helper. This ensures consistent attribute extraction across all response types.

packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (2)

28-71: LGTM: Good coverage for the with_raw_response path, including token usage and content extraction.

The assertions validate prompts, completions, and usage metrics. Using response.parse() when available mirrors the LegacyAPIResponse flow under test.

74-115: LGTM: Regular create flow assertions mirror the raw-response test and exercise the normalized path.

Consistent validation of prompts, completions, and token usage ensures both paths are covered.

coderabbitai · 2025-08-12T13:31:18Z

+      authorization:
+      - AWS4-HMAC-SHA256 Credential=AKIAQEMAC2MSQDTITCKK/20250812/us-east-1/bedrock/aws4_request,
+        SignedHeaders=accept;accept-encoding;content-length;content-type;host;x-amz-date;x-stainless-arch;x-stainless-lang;x-stainless-os;x-stainless-package-version;x-stainless-raw-response;x-stainless-read-timeout;x-stainless-retry-count;x-stainless-runtime;x-stainless-runtime-version;x-stainless-timeout,
+        Signature=163e22b87236d5029439f088af26ed28cfa293f95a22e4cdba8789e58e8709c2
+      connection:


💡 Verification agent

🧩 Analysis chain

Redact the AWS Authorization header from the cassette and rotate credentials if these are real.

Static analysis flagged this as an AWS credential pattern (Gitleaks: aws-access-token; Checkov: CKV_SECRET_2). Even in recorded fixtures, committing an AWS4-HMAC-SHA256 Authorization header with an AKIA key ID and Signature is a secret leakage risk and makes replays brittle. Redact it and configure VCR to filter it going forward.

Apply this diff to sanitize the cassette now:

authorization: - - AWS4-HMAC-SHA256 Credential=AKIAQEMAC2MSQDTITCKK/20250812/us-east-1/bedrock/aws4_request, - SignedHeaders=accept;accept-encoding;content-length;content-type;host;x-amz-date;x-stainless-arch;x-stainless-lang;x-stainless-os;x-stainless-package-version;x-stainless-raw-response;x-stainless-read-timeout;x-stainless-retry-count;x-stainless-runtime;x-stainless-runtime-version;x-stainless-timeout, - Signature=163e22b87236d5029439f088af26ed28cfa293f95a22e4cdba8789e58e8709c2 + - "<REDACTED>"

To prevent reintroductions, add VCR config to filter these headers during recording (example below) and re-record cassettes. If these credentials were real, rotate the key pair immediately.

🏁 Script executed:

#!/bin/bash # Find potential AWS credentials or recorded Authorization headers across the repo rg -n --hidden --no-ignore 'AKIA[0-9A-Z]{16}' || true rg -n --hidden --no-ignore 'AWS4-HMAC-SHA256' || true rg -n --hidden --no-ignore 'x-amz-secret|aws_secret_access_key|aws_access_key_id' || true

Length of output: 2676

Redact AWS Authorization header from anthropic cassettes

Static analysis flagged AWS4-HMAC-SHA256 authorization headers in recorded fixtures, leaking credentials and making tests brittle. Remove these headers now and configure VCR to filter them going forward. If these credentials are real, rotate them immediately.

Affected cassettes:

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml (line 11)

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml (line 11)

Apply this diff in each file:

authorization: - - AWS4-HMAC-SHA256 Credential=AKIAQEMAC2MSQDTITCKK/20250812/us-east-1/bedrock/aws4_request, - SignedHeaders=accept;accept-encoding;content-length;content-type;host;x-amz-date;x-stainless-arch;x-stainless-lang;x-stainless-os;x-stainless-package-version;x-stainless-raw-response;x-stainless-read-timeout;x-stainless-retry-count;x-stainless-runtime;x-stainless-runtime-version;x-stainless-timeout, - Signature=163e22b87236d5029439f088af26ed28cfa293f95a22e4cdba8789e58e8709c2 + - "<REDACTED>"

Add a vcr_config fixture in
packages/opentelemetry-instrumentation-anthropic/tests/conftest.py to filter the authorization header:

import pytest @pytest.fixture(autouse=True) def vcr_config(): return { "filter_headers": ["authorization"], }

Re-record these cassettes after making these changes to prevent reintroducing secrets.

🧰 Tools

🪛 Gitleaks (8.27.2)

11-11: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

🪛 Checkov (3.2.334)

[HIGH] 11-12: AWS Access Key

(CKV_SECRET_2)

🤖 Prompt for AI Agents

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml lines 10-14: the cassette contains a raw AWS4-HMAC-SHA256 authorization header that must be removed; delete the authorization header block from this file (and the same header at the indicated line in packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml), add a vcr_config fixture to packages/opentelemetry-instrumentation-anthropic/tests/conftest.py that filters the "authorization" header for all tests, and then re-record the affected cassettes; if the exposed credentials are real, rotate them immediately.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed d392ee1 in 47 seconds. Click for details.

Reviewed 35 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py:10

Draft comment:
Removed extraneous trailing whitespace between the skip and credentials section. This aligns with clean formatting best practices.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py:71

Draft comment:
Removed trailing whitespace from the pytest.mark.asyncio decorator for consistency.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

3. packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py:116

Draft comment:
Added a newline at the end of file to follow POSIX newline conventions.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_AjsHngmj8ZL2c4Lw

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml (1)
10-13: Critical: Redact AWS Authorization header in cassette and filter going forward

The cassette includes an AWS4-HMAC-SHA256 Authorization header with an AKIA key ID and signature. This is a secret-leak risk and makes tests brittle. Redact it and configure VCR to filter this header for all recordings. If the credentials are real, rotate them immediately.

Apply this diff to sanitize the cassette now:
       authorization:
-      - AWS4-HMAC-SHA256 Credential=AKIAQEMAC2MSQDTITCKK/20250812/us-east-1/bedrock/aws4_request,
-        SignedHeaders=accept;accept-encoding;content-length;content-type;host;x-amz-date;x-stainless-arch;x-stainless-lang;x-stainless-os;x-stainless-package-version;x-stainless-raw-response;x-stainless-read-timeout;x-stainless-retry-count;x-stainless-runtime;x-stainless-runtime-version;x-stainless-timeout,
-        Signature=163e22b87236d5029439f088af26ed28cfa293f95a22e4cdba8789e58e8709c2
+      - "<REDACTED>"
Additionally, add a VCR config to filter these headers during recording (in packages/opentelemetry-instrumentation-anthropic/tests/conftest.py):
import pytest

@pytest.fixture(autouse=True)
def vcr_config():
    return {
        "filter_headers": ["authorization"],
    }
Run this script to find any other leaked AWS credentials/headers across the repo:
#!/bin/bash
set -euo pipefail
echo "Scanning for potential AWS credential leaks..."
rg -n --hidden --no-ignore 'AKIA[0-9A-Z]{16}' || true
rg -n --hidden --no-ignore 'AWS4-HMAC-SHA256' || true
rg -n --hidden --no-ignore 'x-amz-secret|aws_secret_access_key|aws_access_key_id' || true
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml (1)
10-13: Critical: Redact AWS Authorization header in cassette and filter going forward

Same issue here: the cassette contains an AWS4-HMAC-SHA256 Authorization header with an AKIA key ID and signature. Redact it and filter this header via VCR to avoid future leaks. Rotate keys if real.

Apply this diff:
       authorization:
-      - AWS4-HMAC-SHA256 Credential=AKIAQEMAC2MSQDTITCKK/20250812/us-east-1/bedrock/aws4_request,
-        SignedHeaders=accept;accept-encoding;content-length;content-type;host;x-amz-date;x-stainless-arch;x-stainless-lang;x-stainless-os;x-stainless-package-version;x-stainless-read-timeout;x-stainless-retry-count;x-stainless-runtime;x-stainless-runtime-version;x-stainless-timeout,
-        Signature=7eb3fdf5c7fe741fb5066e245bf529e9b385d7667eedfe0a955bdabec62ae1aa
+      - "<REDACTED>"

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (1)
12-25: Guard against accidental credential use and improve region fallback

Avoid depending on real AWS credentials during test runs; replays shouldn’t require them. Also prefer a more universal region fallback.

Apply:
 def async_anthropic_bedrock_client():
     if AsyncAnthropicBedrock is None:
         pytest.skip("AsyncAnthropicBedrock not available")

-    # Try to get credentials from environment first
-    aws_access_key = os.environ.get("AWS_ACCESS_KEY_ID", "test-key")
-    aws_secret_key = os.environ.get("AWS_SECRET_ACCESS_KEY", "test-secret")
-    aws_region = os.environ.get("AWS_REGION", "us-east-1")
+    # Try to get credentials from environment first (tests rely on VCR replays)
+    aws_access_key = os.environ.get("AWS_ACCESS_KEY_ID", "dummy")
+    aws_secret_key = os.environ.get("AWS_SECRET_ACCESS_KEY", "dummy")
+    aws_region = os.environ.get("AWS_REGION") or os.environ.get("AWS_DEFAULT_REGION", "us-east-1")
Also ensure VCR filters the Authorization header globally (see cassette review comments) to prevent recording signatures even when real credentials are present during recording.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9bd6080 and d392ee1.

⛔ Files ignored due to path filters (1)

packages/opentelemetry-instrumentation-anthropic/poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (2 hunks)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (3 hunks)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1 hunks)
packages/opentelemetry-instrumentation-anthropic/pyproject.toml (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (4)

packages/opentelemetry-instrumentation-anthropic/pyproject.toml
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/init.py
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py

🧰 Additional context used

🧬 Code Graph Analysis (1)

packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (3)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-257)

packages/opentelemetry-instrumentation-milvus/tests/conftest.py (1)

reader (37-41)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)

get_finished_spans (40-43)

🪛 Gitleaks (8.27.2)

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml

11-11: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml

11-11: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

🪛 Checkov (3.2.334)

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml

[HIGH] 11-12: AWS Access Key

(CKV_SECRET_2)

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml

[HIGH] 11-12: AWS Access Key

(CKV_SECRET_2)

🔇 Additional comments (1)

packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (1)

65-71: LGTM: usage token assertions validate normalization path

The checks on prompt/completion/total tokens are solid and directly validate the new response-normalization logic for both raw and regular paths.

Also applies to: 109-115

coderabbitai · 2025-08-12T14:41:50Z

+    spans = span_exporter.get_finished_spans()
+    assert len(spans) == 1
+    assert all(span.name == "anthropic.chat" for span in spans)
+
+    anthropic_span = spans[0]
+    assert (


🛠️ Refactor suggestion

Make span assertions resilient to additional spans

Asserting len(spans) == 1 is brittle if future instrumentation emits auxiliary spans. Filter by name and assert exactly one anthropic.chat span.

Apply:

- spans = span_exporter.get_finished_spans() - assert len(spans) == 1 - assert all(span.name == "anthropic.chat" for span in spans) - - anthropic_span = spans[0] + spans = span_exporter.get_finished_spans() + anthropic_spans = [s for s in spans if s.name == "anthropic.chat"] + assert len(anthropic_spans) == 1 + anthropic_span = anthropic_spans[0]

Also applies to: 91-96

🤖 Prompt for AI Agents

In packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py around lines 45-50 (and similarly at 91-96), the test currently asserts len(spans) == 1 which is brittle; change the assertions to filter finished spans by span.name == "anthropic.chat", assert that the filtered list has exactly one element, and then set anthropic_span to that single filtered span (e.g., filtered_spans[0]); apply the same change to the block at lines 91-96 so all checks use the filtered list instead of assuming only one span exists.

Fixes issue where AsyncAnthropicBedrock calls using with_raw_response.create() were not generating proper spans with response attributes and token usage. The problem was that with_raw_response wraps responses in a LegacyAPIResponse object that needs to be parsed differently than regular response objects. Changes: - Add _extract_response_data() helper to handle both regular and wrapped responses - Update all response processing functions to use the helper: - set_response_attributes() - _set_span_completions() - shared_metrics_attributes() - _set_token_usage() and _aset_token_usage() - Add bedrock extras to test dependencies for AsyncAnthropicBedrock testing - Add comprehensive tests for both with_raw_response and regular create calls Resolves spans not being generated for AsyncAnthropicBedrock with_raw_response calls. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Update lock file to include bedrock extras added to anthropic dependency for AsyncAnthropicBedrock testing support. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

- Remove trailing whitespace - Fix blank line whitespace - Add newline at end of file 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Add test for AsyncAnthropicBedrock beta.messages.with_raw_response.create() to ensure the fix works for the customer's specific use case. This covers the exact scenario mentioned by the user: self.client.beta.messages.with_raw_response.create(...) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 8d20dd2 in 1 minute and 38 seconds. Click for details.

Reviewed 55 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py:161

Draft comment:
Add a newline at end of file to meet style best practices.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While having a newline at the end of files is indeed a common best practice, this is something that should be enforced by linters or EditorConfig rather than manual code review comments. It's a minor style issue that doesn't affect functionality. Additionally, many modern editors automatically add this newline, and git will warn about this. The comment is technically correct about the best practice. Missing newlines can cause issues with some tools and make diffs less clean. While true, this is exactly the kind of minor, obvious issue that should be handled by automation rather than taking up reviewer and author time. This comment should be deleted as it's too minor and obvious to warrant manual review attention, and should be handled by automation instead.

2. packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py:55

Draft comment:
Consider extracting the repeated raw response parsing logic into a helper function to reduce duplication.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_utINxrBk8m4KTUUw

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

…upport Add instrumentation for with_raw_response wrapper methods across all Anthropic API endpoints: - Regular SDK messages with_raw_response methods - Beta API with_raw_response methods for regular SDK - Beta API with_raw_response methods for Bedrock SDK Improve response data extraction to properly handle LegacyAPIResponse wrappers from with_raw_response calls by implementing parse() method support. This addresses the issue where spans were not generated for AsyncAnthropicBedrock beta.messages.with_raw_response.create calls, which is critical for a major client. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 5230ec5 in 2 minutes and 28 seconds. Click for details.

Reviewed 220 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 6 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:81

Draft comment:
New instrumentation entries for with_raw_response (both regular and async, beta API, and Bedrock SDK) have been added. Ensure these mappings match the current SDK methods.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is asking the PR author to ensure that the mappings match the current SDK methods, which is a form of asking for confirmation or verification. This violates the rule against asking the author to confirm or ensure behavior.

2. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:710

Draft comment:
The exception handling for wrapping methods now catches all Exceptions (with debug logging). This is acceptable for instrumentation but consider if more context or a higher log level might be useful in production.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

3. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py:70

Draft comment:
In _extract_response_data, the added debug logging in the exception block is useful for diagnosing parse failures. Verify that the fallback to dict is appropriate if parse() fails.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

4. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py:83

Draft comment:
The fallback block assigns response.dict to a temporary variable before returning it. This is redundant; consider returning response.dict directly for simplicity.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While the comment is technically correct that the code could be simpler, this is an extremely minor style issue. The temporary variable might have been added for debugging purposes or to make the code more readable. The change doesn't impact functionality or maintainability in any significant way. According to the rules, we should not make purely informative comments or comments about obvious/unimportant issues. The temporary variable might actually serve a purpose - it could make debugging easier or make the code more readable for some developers. Am I being too quick to dismiss potential benefits? Even if the temporary variable has some minor benefits, the comment still falls into the category of being too minor and not clearly requiring a code change. The current code works perfectly fine either way. The comment should be deleted as it addresses an extremely minor style issue that doesn't clearly require a change, violating the rule about not making obvious or unimportant comments.

5.

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_beta_with_raw_response.yaml:51

Draft comment:
Typo: The phrase "here''s" appears to use a double apostrophe. Consider changing it to "here's".
Reason this comment was not posted:
Comment was on unchanged code.

6.

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_beta_with_raw_response.yaml:57

Draft comment:
Typo: The phrase "you''d" seems to have an extra apostrophe. Consider changing it to "you'd".
Reason this comment was not posted:
Comment looked like it was already resolved.

Workflow ID: wflow_inkoWhzGb7mtiUGX

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (1)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (1)
825-826: Inconsistent exception handling between WRAPPED_METHODS and WRAPPED_AMETHODS.

The async methods wrapping still only catches ModuleNotFoundError while the sync methods now catch broader Exception types. This inconsistency could lead to different failure behaviors.

Apply this diff to make the exception handling consistent:
-            except ModuleNotFoundError:
+            except Exception as e:
+                logger.debug(f"Failed to wrap {wrap_package}.{wrap_object}.{wrap_method}: {e}")
                 pass  # that's ok, we don't want to fail if some methods do not exist

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_beta_with_raw_response.yaml (1)

22-33: Optional: filter volatile headers to reduce cassette churn.

User-Agent, runtime/version, and stainless headers tend to change across environments/versions and can cause unnecessary cassette diffs. Consider filtering them out.

Suggested addition to the vcr_config filter_headers (on top of Authorization):

user-agent

x-stainless-arch

x-stainless-lang

x-stainless-os

x-stainless-package-version

x-stainless-runtime

x-stainless-runtime-version

x-stainless-timeout

x-stainless-read-timeout

x-stainless-retry-count

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a01f46b and 5230ec5.

📒 Files selected for processing (3)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (4 hunks)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1 hunks)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_beta_with_raw_response.yaml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py

🧰 Additional context used

🧬 Code Graph Analysis (1)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (1)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1)

_extract_response_data (64-87)

🪛 Gitleaks (8.27.2)

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_beta_with_raw_response.yaml

11-11: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

🪛 Checkov (3.2.334)

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_beta_with_raw_response.yaml

[HIGH] 11-12: AWS Access Key

(CKV_SECRET_2)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Lint
GitHub Check: Build Packages (3.11)

🔇 Additional comments (7)

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_beta_with_raw_response.yaml (3)

34-35: Good: Explicitly exercising with_raw_response path.

x-stainless-raw-response: 'true' confirms this cassette covers the wrapped-response scenario the PR fixes.

50-59: Good coverage: usage in body and token counts in headers present.

Both response.usage and X-Amzn-Bedrock-*-Token-Count headers are recorded, allowing tests to validate extraction from either source as updated by _extract_response_data().

Also applies to: 69-74

1-81: Clean AWS Credentials Verification Passed

No occurrences of AWS Access Key IDs (AKIA…), SigV4 Authorization headers (AWS4-HMAC-SHA256 Credential=), or generic Bearer/Basic tokens were found after scanning the entire repository. The codebase is free of leaked AWS identifiers.

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (4)

84-172: LGTM! Comprehensive expansion of instrumented API surface.

The addition of all the new wrapped methods properly covers the gaps for with_raw_response calls and beta API methods across both regular Anthropic SDK and Bedrock SDK. This addresses the core issue where AsyncAnthropicBedrock calls using with_raw_response.create() were not generating proper spans.

222-223: Centralized response normalization correctly implemented.

The import and usage of _extract_response_data properly handles both regular responses and wrapped LegacyAPIResponse objects by normalizing them to dictionary format before token usage extraction.

315-316: Consistent response normalization in sync version.

The same centralized approach is correctly applied to the synchronous token usage function, ensuring consistent handling across both async and sync code paths.

802-805: Improved error handling and logging for wrapped methods.

The enhanced exception handling now catches broader Exception types (not just ModuleNotFoundError) and adds debug logging for both successful and failed method wrapping attempts. This will help with troubleshooting instrumentation issues while gracefully continuing when methods don't exist.

coderabbitai · 2025-08-12T15:41:39Z

+      authorization:
+      - AWS4-HMAC-SHA256 Credential=AKIAQEMAC2MSQDTITCKK/20250812/us-east-1/bedrock/aws4_request,
+        SignedHeaders=accept;accept-encoding;content-length;content-type;host;x-amz-date;x-stainless-arch;x-stainless-lang;x-stainless-os;x-stainless-package-version;x-stainless-raw-response;x-stainless-read-timeout;x-stainless-retry-count;x-stainless-runtime;x-stainless-runtime-version;x-stainless-timeout,
+        Signature=c2c5f7bbfea072d8914f9500ba1d01f219242c57037156147fb496a489d6c514


🛠️ Refactor suggestion

⚠️ Potential issue

Critical: Authorization header leaks AWS Access Key ID (blocker).

Static analysis (Gitleaks/Checkov) correctly flags this. Even in cassettes, persisting an AKIA Access Key ID and SigV4 signature is a secret-scanning violation. Redact it and configure VCR to filter this header to prevent future leaks and flaky matches.

Apply this diff to redact the header in the cassette:

- authorization: - - AWS4-HMAC-SHA256 Credential=AKIAQEMAC2MSQDTITCKK/20250812/us-east-1/bedrock/aws4_request, - SignedHeaders=accept;accept-encoding;content-length;content-type;host;x-amz-date;x-stainless-arch;x-stainless-lang;x-stainless-os;x-stainless-package-version;x-stainless-raw-response;x-stainless-read-timeout;x-stainless-retry-count;x-stainless-runtime;x-stainless-runtime-version;x-stainless-timeout, - Signature=c2c5f7bbfea072d8914f9500ba1d01f219242c57037156147fb496a489d6c514 + authorization: + - "<REDACTED>"

Additionally, add a VCR/pytest-recording filter to strip these headers at record/playback time to avoid reintroducing the leak:

Python (e.g., tests/conftest.py):

import pytest @pytest.fixture(scope="session") def vcr_config(): return { "filter_headers": [ "authorization", "x-amz-date", "x-amzn-requestid", # optionally reduce cassette churn: "user-agent", ] }

🧰 Tools

🪛 Gitleaks (8.27.2)

11-11: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

🪛 Checkov (3.2.334)

[HIGH] 11-12: AWS Access Key

(CKV_SECRET_2)

🤖 Prompt for AI Agents

In packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_beta_with_raw_response.yaml around lines 10-13, the authorization header contains a real AWS Access Key ID and SigV4 signature which must be removed; replace the header value with a neutral placeholder like "<REDACTED_AUTHORIZATION>" (or remove the header entry entirely) in the cassette file to eliminate the leaked secret, and then add a VCR/pytest-recording config (e.g., in tests/conftest.py) that filters "authorization", "x-amz-date", "x-amzn-requestid" (and optionally "user-agent") during record/playback so future recordings will strip those headers automatically.

…or $1B customer Key fixes implemented: 1. **Moved beta async methods to WRAPPED_AMETHODS**: Beta API async methods were incorrectly in WRAPPED_METHODS instead of WRAPPED_AMETHODS, preventing async wrapper execution 2. **Created async response processing functions**: Added _aextract_response_data, aset_response_attributes, _aset_span_completions, ashared_metrics_attributes, and _ahandle_response to properly handle async response extraction 3. **Fixed coroutine handling**: Async response processing can now properly await coroutines instead of failing with "Cannot extract data from unawaited coroutine" 4. **Comprehensive with_raw_response support**: Added instrumentation for all with_raw_response endpoints across regular SDK and Bedrock SDK This resolves the critical issue where AsyncAnthropicBedrock beta.messages.with_raw_response.create was not generating spans with proper response content and token usage metrics. Tests now pass: ✅ Regular bedrock with_raw_response ✅ Regular bedrock create ✅ Beta API bedrock with_raw_response (CRITICAL for $1B customer) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 0c4b780 in 2 minutes and 3 seconds. Click for details.

Reviewed 337 lines of code in 4 files
Skipped 0 files when reviewing.
Skipped posting 8 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:55

Draft comment:
The reordering/removal and re-addition of beta API async wrappers for regular and Bedrock SDK looks intentional. Please double-check that no duplicate or conflicting wrappers remain.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

2. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:219

Draft comment:
Switching to the async helper _aextract_response_data and awaiting its result ensures consistency. Verify all call sites are updated accordingly.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

3. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:822

Draft comment:
Catching a broad Exception here may mask unexpected errors. Consider narrowing the exception or logging more details for debugging.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

4. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:695

Draft comment:
The async _awrap now awaits ashared_metrics_attributes and _ahandle_response. This ensures correct async handling—good update.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

5. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py:64

Draft comment:
In _aextract_response_data, if response.parse() might be async in future it would be good to check and await it. Currently it handles only synchronous parse().
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

6. packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py:56

Draft comment:
Tests access raw response content correctly by checking for the parse() method. This test structure adequately verifies the span generation for with_raw_response.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

7. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py:189

Draft comment:
Typo: The comment says "Antrhopic"; should be "Anthropic".
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While the typo fix is technically correct, it's just a misspelling in a comment that doesn't affect functionality. The comment is about documentation, not code. According to the rules, we should not make purely informative comments or comments that are unimportant. A typo in a comment is very low priority. The typo could be confusing to future readers and it appears in multiple places in the code. It's a simple fix that improves code quality. While it's a valid correction, the typo doesn't impact code functionality and fixing it provides minimal value. The rules specifically say to avoid unimportant comments. Delete this comment as it's too minor and doesn't affect code functionality. Comment typos don't meet the bar for required code changes.

8. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py:64

Draft comment:
Typo: The function name _aextract_response_data appears to be missing the 'sync' part. Consider renaming it to _async_extract_response_data for better clarity and consistency with the asynchronous nature of the function.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The codebase consistently uses the 'a' prefix for async versions of functions. This is seen in multiple places like ashared_metrics_attributes and acount_prompt_tokens_from_request. While _async_extract_response_data might be more verbose, changing the existing pattern would reduce consistency. The current name follows the established convention. The comment has a point that _async_extract_response_data would be more immediately clear to readers unfamiliar with the codebase's conventions. However, consistency within a codebase is more important than slightly improved clarity for new readers. The 'a' prefix is a clear pattern here that developers will quickly learn. The comment should be deleted as it suggests breaking an established naming convention in the codebase. The current name is consistent with the project's patterns.

Workflow ID: wflow_e0CyyXAYhGNzO7eC

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

…se and beta API - Add beta API method paths to WRAPPED_AMETHODS for AsyncAnthropicBedrock - Create async response processing functions to handle coroutines properly - Add _aextract_response_data() to handle both coroutines and LegacyAPIResponse objects - Create comprehensive test coverage for with_raw_response and beta API endpoints - Fix issue where spans were not generated for $1B customer use case 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

ellipsis-dev

Important

Looks good to me! 👍

Reviewed cc5f94b in 1 minute and 16 seconds. Click for details.

Reviewed 64 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:81

Draft comment:
Explicit instrumentation for with_raw_response methods has been removed from the WRAPPED_METHODS list. Ensure that the unified response processing (via _extract_response_data) now correctly handles these raw responses and that backward compatibility is maintained.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the author to ensure that the unified response processing now correctly handles raw responses and maintains backward compatibility. This is a request for confirmation and testing, which violates the rules.

2. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:107

Draft comment:
Removal of with_raw_response wrappers for Beta API (regular Anthropic SDK) appears intentional. Please confirm tests cover these flows using the new unified handling.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment asks the PR author to confirm that tests cover the new flows after removing certain wrappers. This violates the rule against asking the author to ensure that changes are tested.

3. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:133

Draft comment:
Removal of with_raw_response wrappers for the Beta API (Bedrock SDK) methods is noted. Verify that instrumentation now correctly processes these responses.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_9LoULUcRIYO3MsDq

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

- Remove unused exception variable - Fix whitespace in blank lines - Add newline at end of test file 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 59a8423 in 54 seconds. Click for details.

Reviewed 63 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:797

Draft comment:
Removed the unused exception variable in the except clause. This is fine if you don't need the error detail, but consider logging the error if debugging is needed.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py:64

Draft comment:
Minor whitespace cleanup in async _aextract_response_data and _extract_response_data functions improves readability.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

3. packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py:158

Draft comment:
Added newline at end of file ensures proper POSIX compliance and avoids potential warnings.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_V1PItxlG0V0nfL4R

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}