Skip to content

fix(anthropic): support with_raw_response wrapper for span generation#3250

Merged
nirga merged 10 commits intomainfrom
fix/anthropic-with-raw-response-spans
Aug 12, 2025
Merged

fix(anthropic): support with_raw_response wrapper for span generation#3250
nirga merged 10 commits intomainfrom
fix/anthropic-with-raw-response-spans

Conversation

@nirga
Copy link
Copy Markdown
Member

@nirga nirga commented Aug 12, 2025

Fixes issue where AsyncAnthropicBedrock calls using with_raw_response.create() were not generating proper spans with response attributes and token usage.

The problem was that with_raw_response wraps responses in a LegacyAPIResponse object that needs to be parsed differently than regular response objects.

Changes:

  • Add _extract_response_data() helper to handle both regular and wrapped responses
  • Update all response processing functions to use the helper:
    • set_response_attributes()
    • _set_span_completions()
    • shared_metrics_attributes()
    • _set_token_usage() and _aset_token_usage()
  • Add bedrock extras to test dependencies for AsyncAnthropicBedrock testing
  • Add comprehensive tests for both with_raw_response and regular create calls

Resolves spans not being generated for AsyncAnthropicBedrock with_raw_response calls.

🤖 Generated with Claude Code

  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Important

Fixes span generation for AsyncAnthropicBedrock with with_raw_response by adding response parsing helpers and updating related functions.

  • Behavior:
    • Adds _extract_response_data() and _aextract_response_data() in utils.py to handle both regular and wrapped responses.
    • Updates set_response_attributes(), _set_span_completions(), shared_metrics_attributes(), _set_token_usage(), and _aset_token_usage() to use the new helper functions.
    • Adds bedrock extras to test dependencies in pyproject.toml.
  • Tests:
    • Adds tests in test_bedrock_with_raw_response.py for AsyncAnthropicBedrock with with_raw_response and regular create calls.
    • Includes VCR cassettes for test scenarios in tests/cassettes/test_bedrock_with_raw_response/.

This description was created by Ellipsis for 59a8423. You can customize this summary. It will automatically update as commits are pushed.


Summary by CodeRabbit

  • New Features
    • Broader support for Anthropic/Bedrock message paths (including with_raw_response and Beta variants) so spans and metrics populate across more SDK flavors.
  • Bug Fixes
    • Centralized response normalization and async-safe handling to ensure correct completions and token usage; improved instrumentation error handling and logging.
  • Tests
    • Added async Bedrock integration tests and VCR cassettes for raw-response, regular create, and beta flows.
  • Chores
    • Test dependency updated to include Bedrock extras for the Anthropic package.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Aug 12, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

📥 Commits

Reviewing files that changed from the base of the PR and between 5230ec5 and 71688e1.

📒 Files selected for processing (4)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (9 hunks)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (3 hunks)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (1 hunks)
 _____________________________________
< Please step away from the keyboard! >
 -------------------------------------
  \
   \   (\__/)
       (•ㅅ•)
       /   づ

Walkthrough

Centralizes response normalization with a new _extract_response_data used across instrumentation, replaces ad-hoc dict/dict handling for token and span attribute extraction, expands WRAPPED_METHODS to cover Bedrock/beta/with_raw_response surfaces, adds Bedrock async tests and VCR cassettes, and updates test dependency to anthropic[bedrock].

Changes

Cohort / File(s) Summary
Response normalization utility
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py
Adds _extract_response_data(response) to normalize responses to a dict-like payload (handles dict, objects with parse(), fallback to __dict__), and updates shared_metrics_attributes to use it.
Token usage normalization
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py
Replaces per-call isinstance/.__dict__ fallbacks in _set_token_usage and _aset_token_usage with calls to _extract_response_data(response) before accessing usage fields; expands WRAPPED_METHODS for additional SDK/Beta/with_raw_response message methods; adds wrapping debug logs and broader exception handling.
Span attribute normalization
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
Imports _extract_response_data and normalizes response in set_response_attributes and _set_span_completions (removes manual dict fallback logic) before reading model/id/usage/completions.
Bedrock tests and cassettes
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py, packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/*
Adds async integration tests for AsyncAnthropicBedrock (regular create and with_raw_response, including beta path) and three VCR cassettes capturing Bedrock invoke interactions; tests assert spans, completions, and token usage attributes.
Test dependency update
packages/opentelemetry-instrumentation-anthropic/pyproject.toml
Changes test dependency to anthropic = {extras = ["bedrock"], version = ">=0.36.0"} under [tool.poetry.group.test.dependencies].

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant Client as Anthropic/Bedrock Client
    participant Instr as OTel Anthropic instrumentation
    participant Utils as _extract_response_data
    participant Span as OTel Span

    App->>Client: messages.create / messages.with_raw_response.create(...)
    Client-->>Instr: response (dict / wrapper / object)
    Instr->>Utils: _extract_response_data(response)
    Utils-->>Instr: normalized dict-like response
    Instr->>Span: set_response_attributes, _set_span_completions, _set_token_usage
    Span-->>App: recorded span with attributes and usage
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • doronkopit5

Poem

I nibble bytes beneath the log-lit moon,
I parse raw wrappers, humming a tune.
Carrots become dicts, tidy and bright,
Spans hop forward in telemetry light.
Bedrock jokes and tokens — traced just right. 🥕🐇

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/anthropic-with-raw-response-spans

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to 1a67ed3 in 1 minute and 54 seconds. Click for details.
  • Reviewed 397 lines of code in 7 files
  • Skipped 0 files when reviewing.
  • Skipped posting 6 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:133
  • Draft comment:
    Refactor: Replacing manual conversion with _extract_response_data enhances consistency when handling wrapped responses. Consider checking for cases when an empty dict is returned.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The code already has proper null checks after calling _extract_response_data. The function is used to extract data in a consistent way, and the code is already defensive about handling empty/missing data. The comment seems to be suggesting additional checks that aren't necessary given the existing null handling. I could be wrong about the implementation of _extract_response_data - without seeing that code, I can't be 100% certain about its behavior. Even without seeing _extract_response_data, the calling code clearly handles None/empty cases properly through .get() calls and null checks. Additional empty dict checks would be redundant. The comment should be removed as it suggests unnecessary additional checks when the code already handles empty/null cases appropriately.
2. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py:174
  • Draft comment:
    Nice update: Using _extract_response_data in _set_span_completions and set_response_attributes ensures uniform extraction of response data, including wrapped responses.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None
3. packages/opentelemetry-instrumentation-anthropic/pyproject.toml:39
  • Draft comment:
    Good improvement: Specifying the 'bedrock' extras for the anthropic dependency ensures tests run against AsyncAnthropicBedrock.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None
4. packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml:1
  • Draft comment:
    Cassette test added for regular create calls. This provides good baseline coverage for span attributes.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None
5. packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml:1
  • Draft comment:
    Cassette test for with_raw_response calls correctly includes the x-stainless-raw-response header. This helps verify the span generation for wrapped responses.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None
6. packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py:30
  • Draft comment:
    Comprehensive tests for both with_raw_response and regular create methods validate the span attributes and token usage correctly. Test coverage appears robust.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_3DZZR7zwpoemcbIx

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

parsed_response = parsed_response.__dict__
return parsed_response
except Exception:
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _extract_response_data helper centralizes response extraction. Consider logging the exception inside the try/except block (instead of silently passing) to aid debugging potential parsing issues.

Suggested change
pass
logging.debug("Failed to parse response in _extract_response_data", exc_info=True)

Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipped PR review on 9bd6080 because no changed files had a supported extension. If you think this was in error, please contact us and we'll fix it right away.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1)

64-85: Helper function correctly handles response normalization.

The _extract_response_data function provides robust extraction logic to handle both regular responses and with_raw_response wrapped responses. The fallback chain (dict → parse() → dict → empty dict) is well-designed and exception-safe.

Previous review correctly identified that logging the exception would aid debugging. Consider adding debug logging in the exception handler at line 78.

🧹 Nitpick comments (4)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml (1)

22-45: Filter volatile and identifying headers in cassettes to reduce churn and avoid leaking environment details.

Headers like x-amz-date, user-agent, x-stainless-* and runtime versions are not needed for matching and change frequently, causing noisy cassette diffs and potential environment leakage.

Add a test-wide VCR configuration (e.g., in tests/conftest.py) to filter and normalize headers:

# tests/conftest.py
import re
import pytest

SENSITIVE_HEADERS = [
    "authorization",
    "x-amz-date",
    "x-amzn-trace-id",
    "x-stainless-arch",
    "x-stainless-lang",
    "x-stainless-os",
    "x-stainless-package-version",
    "x-stainless-raw-response",
    "x-stainless-read-timeout",
    "x-stainless-retry-count",
    "x-stainless-runtime",
    "x-stainless-runtime-version",
    "x-stainless-timeout",
    "user-agent",
    "content-length",
]

@pytest.fixture(scope="session")
def vcr_config():
    return {
        "filter_headers": [(h, "<REDACTED>") for h in SENSITIVE_HEADERS],
        # Ensure request matching does not depend on transient headers
        "match_on": ["method", "scheme", "host", "port", "path", "query", "body"],
        "decode_compressed_response": True,
    }

This keeps playback deterministic and private.

packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (3)

16-20: Prefer also honoring AWS_DEFAULT_REGION for better local/dev parity.

Some environments set AWS_DEFAULT_REGION instead of AWS_REGION. Fall back to either before defaulting.

Apply this diff:

-    aws_region = os.environ.get("AWS_REGION", "us-east-1")
+    aws_region = os.environ.get("AWS_REGION") or os.environ.get("AWS_DEFAULT_REGION") or "us-east-1"

46-48: Minor assertion simplification.

After asserting len(spans) == 1, checking all(...) is redundant. Access the single span directly.

Apply this diff:

-    assert len(spans) == 1
-    assert all(span.name == "anthropic.chat" for span in spans)
+    assert len(spans) == 1
+    assert spans[0].name == "anthropic.chat"

91-94: Minor assertion simplification (same as above).

Mirror the simplification to avoid redundant all(...).

Apply this diff:

-    assert len(spans) == 1
-    assert all(span.name == "anthropic.chat" for span in spans)
+    assert len(spans) == 1
+    assert spans[0].name == "anthropic.chat"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba68225 and 9bd6080.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-anthropic/poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (2 hunks)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (3 hunks)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1 hunks)
  • packages/opentelemetry-instrumentation-anthropic/pyproject.toml (1 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (1)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1)
  • _extract_response_data (64-84)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (1)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1)
  • _extract_response_data (64-84)
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (3)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-257)
packages/opentelemetry-instrumentation-milvus/tests/conftest.py (1)
  • reader (37-41)
packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)
  • get_finished_spans (40-43)
🪛 Gitleaks (8.27.2)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml

11-11: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml

11-11: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

🪛 Checkov (3.2.334)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml

[HIGH] 11-12: AWS Access Key

(CKV_SECRET_2)

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml

[HIGH] 11-12: AWS Access Key

(CKV_SECRET_2)

🔇 Additional comments (11)
packages/opentelemetry-instrumentation-anthropic/pyproject.toml (1)

39-39: LGTM! Bedrock extras added for comprehensive testing.

This change appropriately includes the bedrock extras in the anthropic dependency to enable testing of AsyncAnthropicBedrock functionality, which is directly needed for the new tests in this PR.

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1)

87-97: Response normalization properly applied.

The shared_metrics_attributes function now correctly uses the centralized _extract_response_data helper to normalize the response before extracting attributes. This ensures consistent behavior across both regular and wrapped responses.

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (2)

133-134: Centralized response normalization applied correctly.

The async token usage function now properly uses _extract_response_data to normalize responses before accessing usage attributes, replacing the previous inline dict detection logic. This ensures consistent handling of both regular and with_raw_response wrapped responses.


226-227: Consistent normalization in sync token usage function.

The sync version correctly mirrors the async implementation by using _extract_response_data for response normalization. This maintains consistency across both sync and async code paths.

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml (2)

48-51: Test response data looks appropriate.

The cassette contains a realistic Anthropic Claude response with proper token usage information (17 input tokens, 50 output tokens) and a valid joke response about OpenTelemetry. This data will effectively test the response parsing logic.


11-13: Sensitive AWS credentials detected in VCR cassettes
The following cassette files each contain a hard-coded AWS4-HMAC-SHA256 Authorization header with an AKIA… key:

  • packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml
  • packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml

Please confirm that these are disposable, non-production credentials. If they’re real, scrub or mask them (e.g. using VCR’s filter_sensitive_data) before merging.

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (3)

11-11: Import correctly added for response normalization.

The import of _extract_response_data is properly added to support the centralized response extraction functionality.


174-174: Response normalization applied in completion processing.

The _set_span_completions function now correctly normalizes the response before processing completions. This ensures that both regular and with_raw_response wrapped responses are handled consistently when extracting completion data.


241-241: Consistent normalization in response attributes.

The set_response_attributes function properly applies response normalization, replacing the previous manual dict fallback logic with the centralized helper. This ensures consistent attribute extraction across all response types.

packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (2)

28-71: LGTM: Good coverage for the with_raw_response path, including token usage and content extraction.

The assertions validate prompts, completions, and usage metrics. Using response.parse() when available mirrors the LegacyAPIResponse flow under test.


74-115: LGTM: Regular create flow assertions mirror the raw-response test and exercise the normalized path.

Consistent validation of prompts, completions, and token usage ensures both paths are covered.

Comment on lines +10 to +14
authorization:
- AWS4-HMAC-SHA256 Credential=AKIAQEMAC2MSQDTITCKK/20250812/us-east-1/bedrock/aws4_request,
SignedHeaders=accept;accept-encoding;content-length;content-type;host;x-amz-date;x-stainless-arch;x-stainless-lang;x-stainless-os;x-stainless-package-version;x-stainless-raw-response;x-stainless-read-timeout;x-stainless-retry-count;x-stainless-runtime;x-stainless-runtime-version;x-stainless-timeout,
Signature=163e22b87236d5029439f088af26ed28cfa293f95a22e4cdba8789e58e8709c2
connection:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Redact the AWS Authorization header from the cassette and rotate credentials if these are real.

Static analysis flagged this as an AWS credential pattern (Gitleaks: aws-access-token; Checkov: CKV_SECRET_2). Even in recorded fixtures, committing an AWS4-HMAC-SHA256 Authorization header with an AKIA key ID and Signature is a secret leakage risk and makes replays brittle. Redact it and configure VCR to filter it going forward.

Apply this diff to sanitize the cassette now:

       authorization:
-      - AWS4-HMAC-SHA256 Credential=AKIAQEMAC2MSQDTITCKK/20250812/us-east-1/bedrock/aws4_request,
-        SignedHeaders=accept;accept-encoding;content-length;content-type;host;x-amz-date;x-stainless-arch;x-stainless-lang;x-stainless-os;x-stainless-package-version;x-stainless-raw-response;x-stainless-read-timeout;x-stainless-retry-count;x-stainless-runtime;x-stainless-runtime-version;x-stainless-timeout,
-        Signature=163e22b87236d5029439f088af26ed28cfa293f95a22e4cdba8789e58e8709c2
+      - "<REDACTED>"

To prevent reintroductions, add VCR config to filter these headers during recording (example below) and re-record cassettes. If these credentials were real, rotate the key pair immediately.


🏁 Script executed:

#!/bin/bash
# Find potential AWS credentials or recorded Authorization headers across the repo
rg -n --hidden --no-ignore 'AKIA[0-9A-Z]{16}' || true
rg -n --hidden --no-ignore 'AWS4-HMAC-SHA256' || true
rg -n --hidden --no-ignore 'x-amz-secret|aws_secret_access_key|aws_access_key_id' || true

Length of output: 2676


Redact AWS Authorization header from anthropic cassettes

Static analysis flagged AWS4-HMAC-SHA256 authorization headers in recorded fixtures, leaking credentials and making tests brittle. Remove these headers now and configure VCR to filter them going forward. If these credentials are real, rotate them immediately.

Affected cassettes:

  • packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml (line 11)
  • packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml (line 11)

Apply this diff in each file:

       authorization:
-      - AWS4-HMAC-SHA256 Credential=AKIAQEMAC2MSQDTITCKK/20250812/us-east-1/bedrock/aws4_request,
-        SignedHeaders=accept;accept-encoding;content-length;content-type;host;x-amz-date;x-stainless-arch;x-stainless-lang;x-stainless-os;x-stainless-package-version;x-stainless-raw-response;x-stainless-read-timeout;x-stainless-retry-count;x-stainless-runtime;x-stainless-runtime-version;x-stainless-timeout,
-        Signature=163e22b87236d5029439f088af26ed28cfa293f95a22e4cdba8789e58e8709c2
+      - "<REDACTED>"

Add a vcr_config fixture in
packages/opentelemetry-instrumentation-anthropic/tests/conftest.py to filter the authorization header:

import pytest

@pytest.fixture(autouse=True)
def vcr_config():
    return {
        "filter_headers": ["authorization"],
    }

Re-record these cassettes after making these changes to prevent reintroducing secrets.

🧰 Tools
🪛 Gitleaks (8.27.2)

11-11: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

🪛 Checkov (3.2.334)

[HIGH] 11-12: AWS Access Key

(CKV_SECRET_2)

🤖 Prompt for AI Agents
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml
lines 10-14: the cassette contains a raw AWS4-HMAC-SHA256 authorization header
that must be removed; delete the authorization header block from this file (and
the same header at the indicated line in
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml),
add a vcr_config fixture to
packages/opentelemetry-instrumentation-anthropic/tests/conftest.py that filters
the "authorization" header for all tests, and then re-record the affected
cassettes; if the exposed credentials are real, rotate them immediately.

@nirga nirga force-pushed the fix/anthropic-with-raw-response-spans branch from 9bd6080 to 26c4738 Compare August 12, 2025 14:01
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed d392ee1 in 47 seconds. Click for details.
  • Reviewed 35 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py:10
  • Draft comment:
    Removed extraneous trailing whitespace between the skip and credentials section. This aligns with clean formatting best practices.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
2. packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py:71
  • Draft comment:
    Removed trailing whitespace from the pytest.mark.asyncio decorator for consistency.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
3. packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py:116
  • Draft comment:
    Added a newline at the end of file to follow POSIX newline conventions.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_AjsHngmj8ZL2c4Lw

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml (1)

10-13: Critical: Redact AWS Authorization header in cassette and filter going forward

The cassette includes an AWS4-HMAC-SHA256 Authorization header with an AKIA key ID and signature. This is a secret-leak risk and makes tests brittle. Redact it and configure VCR to filter this header for all recordings. If the credentials are real, rotate them immediately.

Apply this diff to sanitize the cassette now:

       authorization:
-      - AWS4-HMAC-SHA256 Credential=AKIAQEMAC2MSQDTITCKK/20250812/us-east-1/bedrock/aws4_request,
-        SignedHeaders=accept;accept-encoding;content-length;content-type;host;x-amz-date;x-stainless-arch;x-stainless-lang;x-stainless-os;x-stainless-package-version;x-stainless-raw-response;x-stainless-read-timeout;x-stainless-retry-count;x-stainless-runtime;x-stainless-runtime-version;x-stainless-timeout,
-        Signature=163e22b87236d5029439f088af26ed28cfa293f95a22e4cdba8789e58e8709c2
+      - "<REDACTED>"

Additionally, add a VCR config to filter these headers during recording (in packages/opentelemetry-instrumentation-anthropic/tests/conftest.py):

import pytest

@pytest.fixture(autouse=True)
def vcr_config():
    return {
        "filter_headers": ["authorization"],
    }

Run this script to find any other leaked AWS credentials/headers across the repo:

#!/bin/bash
set -euo pipefail
echo "Scanning for potential AWS credential leaks..."
rg -n --hidden --no-ignore 'AKIA[0-9A-Z]{16}' || true
rg -n --hidden --no-ignore 'AWS4-HMAC-SHA256' || true
rg -n --hidden --no-ignore 'x-amz-secret|aws_secret_access_key|aws_access_key_id' || true
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml (1)

10-13: Critical: Redact AWS Authorization header in cassette and filter going forward

Same issue here: the cassette contains an AWS4-HMAC-SHA256 Authorization header with an AKIA key ID and signature. Redact it and filter this header via VCR to avoid future leaks. Rotate keys if real.

Apply this diff:

       authorization:
-      - AWS4-HMAC-SHA256 Credential=AKIAQEMAC2MSQDTITCKK/20250812/us-east-1/bedrock/aws4_request,
-        SignedHeaders=accept;accept-encoding;content-length;content-type;host;x-amz-date;x-stainless-arch;x-stainless-lang;x-stainless-os;x-stainless-package-version;x-stainless-read-timeout;x-stainless-retry-count;x-stainless-runtime;x-stainless-runtime-version;x-stainless-timeout,
-        Signature=7eb3fdf5c7fe741fb5066e245bf529e9b385d7667eedfe0a955bdabec62ae1aa
+      - "<REDACTED>"
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (1)

12-25: Guard against accidental credential use and improve region fallback

  • Avoid depending on real AWS credentials during test runs; replays shouldn’t require them. Also prefer a more universal region fallback.

Apply:

 def async_anthropic_bedrock_client():
     if AsyncAnthropicBedrock is None:
         pytest.skip("AsyncAnthropicBedrock not available")

-    # Try to get credentials from environment first
-    aws_access_key = os.environ.get("AWS_ACCESS_KEY_ID", "test-key")
-    aws_secret_key = os.environ.get("AWS_SECRET_ACCESS_KEY", "test-secret")
-    aws_region = os.environ.get("AWS_REGION", "us-east-1")
+    # Try to get credentials from environment first (tests rely on VCR replays)
+    aws_access_key = os.environ.get("AWS_ACCESS_KEY_ID", "dummy")
+    aws_secret_key = os.environ.get("AWS_SECRET_ACCESS_KEY", "dummy")
+    aws_region = os.environ.get("AWS_REGION") or os.environ.get("AWS_DEFAULT_REGION", "us-east-1")

Also ensure VCR filters the Authorization header globally (see cassette review comments) to prevent recording signatures even when real credentials are present during recording.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9bd6080 and d392ee1.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-anthropic/poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (2 hunks)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py (3 hunks)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1 hunks)
  • packages/opentelemetry-instrumentation-anthropic/pyproject.toml (1 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • packages/opentelemetry-instrumentation-anthropic/pyproject.toml
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/init.py
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (3)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-257)
packages/opentelemetry-instrumentation-milvus/tests/conftest.py (1)
  • reader (37-41)
packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)
  • get_finished_spans (40-43)
🪛 Gitleaks (8.27.2)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml

11-11: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml

11-11: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

🪛 Checkov (3.2.334)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_regular_create.yaml

[HIGH] 11-12: AWS Access Key

(CKV_SECRET_2)

packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_with_raw_response.yaml

[HIGH] 11-12: AWS Access Key

(CKV_SECRET_2)

🔇 Additional comments (1)
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py (1)

65-71: LGTM: usage token assertions validate normalization path

The checks on prompt/completion/total tokens are solid and directly validate the new response-normalization logic for both raw and regular paths.

Also applies to: 109-115

Comment on lines +45 to +50
spans = span_exporter.get_finished_spans()
assert len(spans) == 1
assert all(span.name == "anthropic.chat" for span in spans)

anthropic_span = spans[0]
assert (
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make span assertions resilient to additional spans

Asserting len(spans) == 1 is brittle if future instrumentation emits auxiliary spans. Filter by name and assert exactly one anthropic.chat span.

Apply:

-    spans = span_exporter.get_finished_spans()
-    assert len(spans) == 1
-    assert all(span.name == "anthropic.chat" for span in spans)
-
-    anthropic_span = spans[0]
+    spans = span_exporter.get_finished_spans()
+    anthropic_spans = [s for s in spans if s.name == "anthropic.chat"]
+    assert len(anthropic_spans) == 1
+    anthropic_span = anthropic_spans[0]

Also applies to: 91-96

🤖 Prompt for AI Agents
In
packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py
around lines 45-50 (and similarly at 91-96), the test currently asserts
len(spans) == 1 which is brittle; change the assertions to filter finished spans
by span.name == "anthropic.chat", assert that the filtered list has exactly one
element, and then set anthropic_span to that single filtered span (e.g.,
filtered_spans[0]); apply the same change to the block at lines 91-96 so all
checks use the filtered list instead of assuming only one span exists.

nirga and others added 4 commits August 12, 2025 18:08
Fixes issue where AsyncAnthropicBedrock calls using with_raw_response.create()
were not generating proper spans with response attributes and token usage.

The problem was that with_raw_response wraps responses in a LegacyAPIResponse
object that needs to be parsed differently than regular response objects.

Changes:
- Add _extract_response_data() helper to handle both regular and wrapped responses
- Update all response processing functions to use the helper:
  - set_response_attributes()
  - _set_span_completions()
  - shared_metrics_attributes()
  - _set_token_usage() and _aset_token_usage()
- Add bedrock extras to test dependencies for AsyncAnthropicBedrock testing
- Add comprehensive tests for both with_raw_response and regular create calls

Resolves spans not being generated for AsyncAnthropicBedrock with_raw_response calls.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Update lock file to include bedrock extras added to anthropic dependency
for AsyncAnthropicBedrock testing support.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Remove trailing whitespace
- Fix blank line whitespace
- Add newline at end of file

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add test for AsyncAnthropicBedrock beta.messages.with_raw_response.create()
to ensure the fix works for the customer's specific use case.

This covers the exact scenario mentioned by the user:
self.client.beta.messages.with_raw_response.create(...)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@nirga nirga force-pushed the fix/anthropic-with-raw-response-spans branch from d392ee1 to 8d20dd2 Compare August 12, 2025 15:08
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 8d20dd2 in 1 minute and 38 seconds. Click for details.
  • Reviewed 55 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py:161
  • Draft comment:
    Add a newline at end of file to meet style best practices.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While having a newline at the end of files is indeed a common best practice, this is something that should be enforced by linters or EditorConfig rather than manual code review comments. It's a minor style issue that doesn't affect functionality. Additionally, many modern editors automatically add this newline, and git will warn about this. The comment is technically correct about the best practice. Missing newlines can cause issues with some tools and make diffs less clean. While true, this is exactly the kind of minor, obvious issue that should be handled by automation rather than taking up reviewer and author time. This comment should be deleted as it's too minor and obvious to warrant manual review attention, and should be handled by automation instead.
2. packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py:55
  • Draft comment:
    Consider extracting the repeated raw response parsing logic into a helper function to reduce duplication.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_utINxrBk8m4KTUUw

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

nirga and others added 2 commits August 12, 2025 18:10
…upport

Add instrumentation for with_raw_response wrapper methods across all Anthropic API endpoints:
- Regular SDK messages with_raw_response methods
- Beta API with_raw_response methods for regular SDK
- Beta API with_raw_response methods for Bedrock SDK

Improve response data extraction to properly handle LegacyAPIResponse wrappers
from with_raw_response calls by implementing parse() method support.

This addresses the issue where spans were not generated for AsyncAnthropicBedrock
beta.messages.with_raw_response.create calls, which is critical for a major client.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 5230ec5 in 2 minutes and 28 seconds. Click for details.
  • Reviewed 220 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 6 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:81
  • Draft comment:
    New instrumentation entries for with_raw_response (both regular and async, beta API, and Bedrock SDK) have been added. Ensure these mappings match the current SDK methods.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is asking the PR author to ensure that the mappings match the current SDK methods, which is a form of asking for confirmation or verification. This violates the rule against asking the author to confirm or ensure behavior.
2. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:710
  • Draft comment:
    The exception handling for wrapping methods now catches all Exceptions (with debug logging). This is acceptable for instrumentation but consider if more context or a higher log level might be useful in production.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py:70
  • Draft comment:
    In _extract_response_data, the added debug logging in the exception block is useful for diagnosing parse failures. Verify that the fallback to dict is appropriate if parse() fails.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py:83
  • Draft comment:
    The fallback block assigns response.dict to a temporary variable before returning it. This is redundant; consider returning response.dict directly for simplicity.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While the comment is technically correct that the code could be simpler, this is an extremely minor style issue. The temporary variable might have been added for debugging purposes or to make the code more readable. The change doesn't impact functionality or maintainability in any significant way. According to the rules, we should not make purely informative comments or comments about obvious/unimportant issues. The temporary variable might actually serve a purpose - it could make debugging easier or make the code more readable for some developers. Am I being too quick to dismiss potential benefits? Even if the temporary variable has some minor benefits, the comment still falls into the category of being too minor and not clearly requiring a code change. The current code works perfectly fine either way. The comment should be deleted as it addresses an extremely minor style issue that doesn't clearly require a change, violating the rule about not making obvious or unimportant comments.
5. packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_beta_with_raw_response.yaml:51
  • Draft comment:
    Typo: The phrase "here''s" appears to use a double apostrophe. Consider changing it to "here's".
  • Reason this comment was not posted:
    Comment was on unchanged code.
6. packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_beta_with_raw_response.yaml:57
  • Draft comment:
    Typo: The phrase "you''d" seems to have an extra apostrophe. Consider changing it to "you'd".
  • Reason this comment was not posted:
    Comment looked like it was already resolved.

Workflow ID: wflow_inkoWhzGb7mtiUGX

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (1)

825-826: Inconsistent exception handling between WRAPPED_METHODS and WRAPPED_AMETHODS.

The async methods wrapping still only catches ModuleNotFoundError while the sync methods now catch broader Exception types. This inconsistency could lead to different failure behaviors.

Apply this diff to make the exception handling consistent:

-            except ModuleNotFoundError:
+            except Exception as e:
+                logger.debug(f"Failed to wrap {wrap_package}.{wrap_object}.{wrap_method}: {e}")
                 pass  # that's ok, we don't want to fail if some methods do not exist
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_beta_with_raw_response.yaml (1)

22-33: Optional: filter volatile headers to reduce cassette churn.

User-Agent, runtime/version, and stainless headers tend to change across environments/versions and can cause unnecessary cassette diffs. Consider filtering them out.

Suggested addition to the vcr_config filter_headers (on top of Authorization):

  • user-agent
  • x-stainless-arch
  • x-stainless-lang
  • x-stainless-os
  • x-stainless-package-version
  • x-stainless-runtime
  • x-stainless-runtime-version
  • x-stainless-timeout
  • x-stainless-read-timeout
  • x-stainless-retry-count
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a01f46b and 5230ec5.

📒 Files selected for processing (3)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (4 hunks)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1 hunks)
  • packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_beta_with_raw_response.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (1)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py (1)
  • _extract_response_data (64-87)
🪛 Gitleaks (8.27.2)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_beta_with_raw_response.yaml

11-11: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

🪛 Checkov (3.2.334)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_beta_with_raw_response.yaml

[HIGH] 11-12: AWS Access Key

(CKV_SECRET_2)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Lint
  • GitHub Check: Build Packages (3.11)
🔇 Additional comments (7)
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_beta_with_raw_response.yaml (3)

34-35: Good: Explicitly exercising with_raw_response path.

x-stainless-raw-response: 'true' confirms this cassette covers the wrapped-response scenario the PR fixes.


50-59: Good coverage: usage in body and token counts in headers present.

Both response.usage and X-Amzn-Bedrock-*-Token-Count headers are recorded, allowing tests to validate extraction from either source as updated by _extract_response_data().

Also applies to: 69-74


1-81: Clean AWS Credentials Verification Passed

No occurrences of AWS Access Key IDs (AKIA…), SigV4 Authorization headers (AWS4-HMAC-SHA256 Credential=), or generic Bearer/Basic tokens were found after scanning the entire repository. The codebase is free of leaked AWS identifiers.

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py (4)

84-172: LGTM! Comprehensive expansion of instrumented API surface.

The addition of all the new wrapped methods properly covers the gaps for with_raw_response calls and beta API methods across both regular Anthropic SDK and Bedrock SDK. This addresses the core issue where AsyncAnthropicBedrock calls using with_raw_response.create() were not generating proper spans.


222-223: Centralized response normalization correctly implemented.

The import and usage of _extract_response_data properly handles both regular responses and wrapped LegacyAPIResponse objects by normalizing them to dictionary format before token usage extraction.


315-316: Consistent response normalization in sync version.

The same centralized approach is correctly applied to the synchronous token usage function, ensuring consistent handling across both async and sync code paths.


802-805: Improved error handling and logging for wrapped methods.

The enhanced exception handling now catches broader Exception types (not just ModuleNotFoundError) and adds debug logging for both successful and failed method wrapping attempts. This will help with troubleshooting instrumentation issues while gracefully continuing when methods don't exist.

Comment on lines +10 to +13
authorization:
- AWS4-HMAC-SHA256 Credential=AKIAQEMAC2MSQDTITCKK/20250812/us-east-1/bedrock/aws4_request,
SignedHeaders=accept;accept-encoding;content-length;content-type;host;x-amz-date;x-stainless-arch;x-stainless-lang;x-stainless-os;x-stainless-package-version;x-stainless-raw-response;x-stainless-read-timeout;x-stainless-retry-count;x-stainless-runtime;x-stainless-runtime-version;x-stainless-timeout,
Signature=c2c5f7bbfea072d8914f9500ba1d01f219242c57037156147fb496a489d6c514
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Critical: Authorization header leaks AWS Access Key ID (blocker).

Static analysis (Gitleaks/Checkov) correctly flags this. Even in cassettes, persisting an AKIA Access Key ID and SigV4 signature is a secret-scanning violation. Redact it and configure VCR to filter this header to prevent future leaks and flaky matches.

Apply this diff to redact the header in the cassette:

-      authorization:
-      - AWS4-HMAC-SHA256 Credential=AKIAQEMAC2MSQDTITCKK/20250812/us-east-1/bedrock/aws4_request,
-        SignedHeaders=accept;accept-encoding;content-length;content-type;host;x-amz-date;x-stainless-arch;x-stainless-lang;x-stainless-os;x-stainless-package-version;x-stainless-raw-response;x-stainless-read-timeout;x-stainless-retry-count;x-stainless-runtime;x-stainless-runtime-version;x-stainless-timeout,
-        Signature=c2c5f7bbfea072d8914f9500ba1d01f219242c57037156147fb496a489d6c514
+      authorization:
+      - "<REDACTED>"

Additionally, add a VCR/pytest-recording filter to strip these headers at record/playback time to avoid reintroducing the leak:

Python (e.g., tests/conftest.py):

import pytest

@pytest.fixture(scope="session")
def vcr_config():
    return {
        "filter_headers": [
            "authorization",
            "x-amz-date",
            "x-amzn-requestid",
            # optionally reduce cassette churn:
            "user-agent",
        ]
    }
🧰 Tools
🪛 Gitleaks (8.27.2)

11-11: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

🪛 Checkov (3.2.334)

[HIGH] 11-12: AWS Access Key

(CKV_SECRET_2)

🤖 Prompt for AI Agents
In
packages/opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_with_raw_response/test_async_anthropic_bedrock_beta_with_raw_response.yaml
around lines 10-13, the authorization header contains a real AWS Access Key ID
and SigV4 signature which must be removed; replace the header value with a
neutral placeholder like "<REDACTED_AUTHORIZATION>" (or remove the header entry
entirely) in the cassette file to eliminate the leaked secret, and then add a
VCR/pytest-recording config (e.g., in tests/conftest.py) that filters
"authorization", "x-amz-date", "x-amzn-requestid" (and optionally "user-agent")
during record/playback so future recordings will strip those headers
automatically.

nirga and others added 2 commits August 12, 2025 18:42
…or $1B customer

Key fixes implemented:
1. **Moved beta async methods to WRAPPED_AMETHODS**: Beta API async methods were incorrectly
   in WRAPPED_METHODS instead of WRAPPED_AMETHODS, preventing async wrapper execution
2. **Created async response processing functions**: Added _aextract_response_data,
   aset_response_attributes, _aset_span_completions, ashared_metrics_attributes, and
   _ahandle_response to properly handle async response extraction
3. **Fixed coroutine handling**: Async response processing can now properly await
   coroutines instead of failing with "Cannot extract data from unawaited coroutine"
4. **Comprehensive with_raw_response support**: Added instrumentation for all
   with_raw_response endpoints across regular SDK and Bedrock SDK

This resolves the critical issue where AsyncAnthropicBedrock beta.messages.with_raw_response.create
was not generating spans with proper response content and token usage metrics.

Tests now pass:
✅ Regular bedrock with_raw_response
✅ Regular bedrock create
✅ Beta API bedrock with_raw_response (CRITICAL for $1B customer)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 0c4b780 in 2 minutes and 3 seconds. Click for details.
  • Reviewed 337 lines of code in 4 files
  • Skipped 0 files when reviewing.
  • Skipped posting 8 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:55
  • Draft comment:
    The reordering/removal and re-addition of beta API async wrappers for regular and Bedrock SDK looks intentional. Please double-check that no duplicate or conflicting wrappers remain.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50% None
2. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:219
  • Draft comment:
    Switching to the async helper _aextract_response_data and awaiting its result ensures consistency. Verify all call sites are updated accordingly.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:822
  • Draft comment:
    Catching a broad Exception here may mask unexpected errors. Consider narrowing the exception or logging more details for debugging.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:695
  • Draft comment:
    The async _awrap now awaits ashared_metrics_attributes and _ahandle_response. This ensures correct async handling—good update.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
5. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py:64
  • Draft comment:
    In _aextract_response_data, if response.parse() might be async in future it would be good to check and await it. Currently it handles only synchronous parse().
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50% None
6. packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py:56
  • Draft comment:
    Tests access raw response content correctly by checking for the parse() method. This test structure adequately verifies the span generation for with_raw_response.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
7. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/span_utils.py:189
  • Draft comment:
    Typo: The comment says "Antrhopic"; should be "Anthropic".
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While the typo fix is technically correct, it's just a misspelling in a comment that doesn't affect functionality. The comment is about documentation, not code. According to the rules, we should not make purely informative comments or comments that are unimportant. A typo in a comment is very low priority. The typo could be confusing to future readers and it appears in multiple places in the code. It's a simple fix that improves code quality. While it's a valid correction, the typo doesn't impact code functionality and fixing it provides minimal value. The rules specifically say to avoid unimportant comments. Delete this comment as it's too minor and doesn't affect code functionality. Comment typos don't meet the bar for required code changes.
8. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py:64
  • Draft comment:
    Typo: The function name _aextract_response_data appears to be missing the 'sync' part. Consider renaming it to _async_extract_response_data for better clarity and consistency with the asynchronous nature of the function.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The codebase consistently uses the 'a' prefix for async versions of functions. This is seen in multiple places like ashared_metrics_attributes and acount_prompt_tokens_from_request. While _async_extract_response_data might be more verbose, changing the existing pattern would reduce consistency. The current name follows the established convention. The comment has a point that _async_extract_response_data would be more immediately clear to readers unfamiliar with the codebase's conventions. However, consistency within a codebase is more important than slightly improved clarity for new readers. The 'a' prefix is a clear pattern here that developers will quickly learn. The comment should be deleted as it suggests breaking an established naming convention in the codebase. The current name is consistent with the project's patterns.

Workflow ID: wflow_e0CyyXAYhGNzO7eC

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

…se and beta API

- Add beta API method paths to WRAPPED_AMETHODS for AsyncAnthropicBedrock
- Create async response processing functions to handle coroutines properly
- Add _aextract_response_data() to handle both coroutines and LegacyAPIResponse objects
- Create comprehensive test coverage for with_raw_response and beta API endpoints
- Fix issue where spans were not generated for $1B customer use case

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed cc5f94b in 1 minute and 16 seconds. Click for details.
  • Reviewed 64 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:81
  • Draft comment:
    Explicit instrumentation for with_raw_response methods has been removed from the WRAPPED_METHODS list. Ensure that the unified response processing (via _extract_response_data) now correctly handles these raw responses and that backward compatibility is maintained.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the author to ensure that the unified response processing now correctly handles raw responses and maintains backward compatibility. This is a request for confirmation and testing, which violates the rules.
2. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:107
  • Draft comment:
    Removal of with_raw_response wrappers for Beta API (regular Anthropic SDK) appears intentional. Please confirm tests cover these flows using the new unified handling.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment asks the PR author to confirm that tests cover the new flows after removing certain wrappers. This violates the rule against asking the author to ensure that changes are tested.
3. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:133
  • Draft comment:
    Removal of with_raw_response wrappers for the Beta API (Bedrock SDK) methods is noted. Verify that instrumentation now correctly processes these responses.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_9LoULUcRIYO3MsDq

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

- Remove unused exception variable
- Fix whitespace in blank lines
- Add newline at end of test file

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 59a8423 in 54 seconds. Click for details.
  • Reviewed 63 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py:797
  • Draft comment:
    Removed the unused exception variable in the except clause. This is fine if you don't need the error detail, but consider logging the error if debugging is needed.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
2. packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/utils.py:64
  • Draft comment:
    Minor whitespace cleanup in async _aextract_response_data and _extract_response_data functions improves readability.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
3. packages/opentelemetry-instrumentation-anthropic/tests/test_bedrock_with_raw_response.py:158
  • Draft comment:
    Added newline at end of file ensures proper POSIX compliance and avoids potential warnings.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_V1PItxlG0V0nfL4R

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@nirga nirga merged commit 1206daf into main Aug 12, 2025
8 checks passed
@nirga nirga deleted the fix/anthropic-with-raw-response-spans branch August 12, 2025 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants