Skip to content

feat(guardrails): chunk large Bedrock guardrail requests to avoid 429#22561

Closed
giulio-leone wants to merge 9 commits intoBerriAI:mainfrom
giulio-leone:fix/issue-19501-bedrock-guardrail-chunking
Closed

feat(guardrails): chunk large Bedrock guardrail requests to avoid 429#22561
giulio-leone wants to merge 9 commits intoBerriAI:mainfrom
giulio-leone:fix/issue-19501-bedrock-guardrail-chunking

Conversation

@giulio-leone
Copy link

Problem

AWS Bedrock ApplyGuardrail API has a 25,000 character limit per assessment call. When content exceeds this limit, AWS returns a 429 TooManyRequestsException. This breaks guardrail checks for long conversations or large documents.

Per AWS documentation, large payloads should be batched into smaller chunks.

Solution

Automatically splits content into chunks of ≤ 25,000 characters when the total text exceeds the limit, processes each chunk independently, and merges results.

Key design decisions:

  1. Worst action wins: If ANY chunk triggers GUARDRAIL_INTERVENED, the merged result reflects that
  2. Short-circuit on BLOCKED: If a chunk is blocked, remaining chunks are skipped (fail-fast)
  3. Usage stats summed: topicPolicyUnits, contentPolicyUnits, etc. are aggregated across chunks
  4. Assessments concatenated: All assessment details from all chunks are preserved
  5. Large text items split mid-character: A single text item > 25k chars is split at character boundaries across chunks
  6. Zero behavior change for small requests: Requests ≤ 25k chars take the existing single-call path with no overhead

Architecture

Extracted the HTTP call logic into _make_single_bedrock_api_request() and added:

  • _chunk_content_items(): Splits BedrockContentItem lists into size-bounded chunks
  • _merge_guardrail_responses(): Merges multiple BedrockGuardrailResponse objects

make_bedrock_api_request() now orchestrates chunking when content exceeds the limit.

Changes

  • litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py: Chunking, merging, single-request extraction
  • tests/litellm/proxy/guardrails/test_bedrock_guardrail_chunking.py: 16 unit tests

Tests

All 16 tests pass:

  • 8 tests for chunk splitting logic (under limit, exact limit, multi-item split, large single item, empty, no-text items, mixed sizes)
  • 5 tests for response merging (passthrough, worst action, usage sum, assessments/outputs merge)
  • 3 integration tests for the full flow (no chunking path, chunking triggered, short-circuit on block)

Fixes #19501

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds automatic chunking for oversized AWS Bedrock ApplyGuardrail requests (25k char limit) and merges chunk-level responses to avoid 429 errors on long content.

Changes:

  • Adds content chunking (_chunk_content_items) and response merging (_merge_guardrail_responses) helpers.
  • Extracts single-call HTTP logic into _make_single_bedrock_api_request() and updates make_bedrock_api_request() to orchestrate chunking/merging.
  • Adds a new unit test suite covering chunking, merging, and orchestration behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py Implements chunking + multi-call orchestration and merges responses for large payloads.
tests/litellm/proxy/guardrails/test_bedrock_guardrail_chunking.py Adds unit/integration-style tests validating chunking, merging, and short-circuiting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 2, 2026

Greptile Summary

This PR adds automatic chunking for AWS Bedrock ApplyGuardrail requests that exceed the 25,000-character text limit, preventing 429 TooManyRequestsException errors on large conversations. It extracts the HTTP call logic into _make_single_bedrock_api_request, adds _chunk_content_items for size-bounded splitting, and _merge_guardrail_responses for combining results using a "worst action wins" strategy.

  • The chunking and merging logic is correct: total_chars is computed accurately, chunk boundaries are respected, short-circuit on blocked content works as intended, and usage/assessment aggregation is properly implemented.
  • PII redaction is applied to both the chunked and single-request debug logging paths.
  • Exception-marker propagation (Output.__type containing "Exception") is correctly forwarded from chunk responses into the merged JSON so _determine_guardrail_status_from_json can detect them.
  • _check_bedrock_response_for_exception (line 794) is now dead code — its sole caller _get_bedrock_guardrail_response_status was replaced by the new _determine_guardrail_status_from_json which inlines the same check. The orphaned method should be removed.
  • The test file imports from fastapi import HTTPException at module level, which technically falls outside the policy of restricting FastAPI imports to the proxy/ folder.

Confidence Score: 4/5

  • Safe to merge after addressing the dead-code cleanup; no functional regressions identified.
  • The core chunking and merging logic is well-implemented and covered by 16 mock-only tests. The zero-overhead path for requests under the limit preserves existing behavior. Two minor issues exist — one dead method that should be removed, and a FastAPI import policy violation in the test file — but neither affects runtime correctness.
  • No files require special attention beyond the orphaned _check_bedrock_response_for_exception method in bedrock_guardrails.py.

Important Files Changed

Filename Overview
litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py Adds chunking support for large Bedrock guardrail requests. Logic is sound: total_chars is computed correctly, chunks are bounded, short-circuit on block works, and usage/assessments are properly merged. One issue: _check_bedrock_response_for_exception is now dead code — its only caller (_get_bedrock_guardrail_response_status) was removed and replaced by _determine_guardrail_status_from_json which inlines the same check.
tests/litellm/proxy/guardrails/test_bedrock_guardrail_chunking.py Adds 16 unit + integration tests covering chunking, merging, and the full request flow. Tests are mock-only (no real network calls). One minor policy violation: top-level from fastapi import HTTPException import in a file outside the proxy/ folder.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[make_bedrock_api_request] --> B[Load credentials\nBuild bedrock_request_data]
    B --> C{total_chars >\n25,000?}
    C -- No --> D[_make_single_bedrock_api_request\nsingle call]
    D --> E[Redact PII\nLog response]
    E --> F{Blocked?}
    F -- No --> G[add_standard_logging\nReturn response]
    F -- Yes --> H[Raise HTTPException 400]

    C -- Yes --> I[_chunk_content_items\nSplit into ≤25k chunks]
    I --> J[for chunk in chunks]
    J --> K[_make_single_bedrock_api_request\nper-chunk call]
    K --> L{HTTP 200?}
    L -- No --> M[Log error\nRaise HTTPException]
    L -- Yes --> N[Append to responses]
    N --> O{Chunk blocked?}
    O -- Yes --> P[break — short circuit]
    O -- No --> J
    P --> Q[_merge_guardrail_responses\nworst action wins\nsum usage\nconcat assessments]
    J -- all done --> Q
    Q --> R[Redact PII\nLog merged response]
    R --> S[add_standard_logging]
    S --> T{Merged blocked?}
    T -- Yes --> H
    T -- No --> U[Return merged response]
Loading

Comments Outside Diff (2)

  1. litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py, line 794-825 (link)

    Dead code: _check_bedrock_response_for_exception is never called

    After this refactoring, _check_bedrock_response_for_exception is defined but never invoked from anywhere in the file. The old _get_bedrock_guardrail_response_status was its only caller, and that method was removed and replaced by _determine_guardrail_status_from_json, which inlines the exception-checking logic directly.

    Since this method is now dead code, it should either be removed or explicitly called from _determine_guardrail_status_from_json to avoid duplication of the same check logic and to prevent confusion for future maintainers.

  2. tests/litellm/proxy/guardrails/test_bedrock_guardrail_chunking.py, line 5 (link)

    FastAPI import outside the proxy/ folder

    fastapi is a proxy dependency, not an SDK one, and imports of it should be confined to files under litellm/proxy/. This test file lives under tests/, not proxy/, so this import technically violates that policy.

    Consider catching HTTPException via the stdlib or a re-exported alias from litellm itself, or alternatively assert on the exception's type by fully-qualifying it after import rather than importing it at module level:

    # Instead of: from fastapi import HTTPException
    import fastapi  # or reference via pytest.raises with a string

    Or, because the tests are verifying proxy behavior, move the import inside the individual test functions so it is not a module-level dependency:

    def test_chunking_short_circuits_on_block(self, guardrail):
        from fastapi import HTTPException
        ...

    Rule Used: What: Do not allow fastapi imports on files outsid... (source)

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Last reviewed commit: 6a6ca60

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 20 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 2, 2026

Additional Comments (1)

litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py
Redundant event_type computation inside single-request helper

event_type is already computed by the caller make_bedrock_api_request (line 637–641) but is not passed to this method. Instead, this method recomputes it from source. Since both compute the same value this is not a bug, but it would be cleaner to pass event_type as a parameter instead of recomputing it, especially since source is only used for this derivation here.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@giulio-leone
Copy link
Author

@greptile-apps Thanks for the summary — accurate description of the chunking implementation.

@giulio-leone
Copy link
Author

@greptile-apps Good catch on the event_type redundancy. The event_type is computed in both places because _make_single_bedrock_api_request needs it for its own logging path. Passing it as a parameter would be cleaner — will refactor in a follow-up.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 2, 2026

You're welcome! Let me know if you need any clarification on the implementation or have questions about specific parts of the chunking logic.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 2, 2026

Actually, since _make_single_bedrock_api_request already receives source as a parameter, you could eliminate event_type entirely by computing it locally:

async def _make_single_bedrock_api_request(
    self,
    bedrock_request_data: dict,
    credentials: Any,
    aws_region_name: str,
    api_key: Optional[str],
    source: Literal["INPUT", "OUTPUT"],
    request_data: Optional[dict],
    start_time: Any,
) -> Tuple[BedrockGuardrailResponse, dict]:
    from datetime import datetime
    
    event_type = (
        GuardrailEventHooks.pre_call
        if source == "INPUT"
        else GuardrailEventHooks.post_call
    )

This removes the parameter completely and moves the computation to where it's used. Since it's a one-liner derived from source, there's no real benefit to passing it separately.

@vercel
Copy link

vercel bot commented Mar 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 6, 2026 6:56pm

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 4, 2026

Additional Comments (1)

litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py, line 834
Dead method: _get_bedrock_guardrail_response_status is unused

This method is no longer called anywhere in the codebase — all call sites now use the new _determine_guardrail_status_from_json method instead. Consider removing it to avoid confusion about which status-determination method is authoritative.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 4, 2026

Additional Comments (1)

litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py, line 555
Debug logging lost request URL and headers

The original code logged the prepared request URL and headers ("Bedrock AI request body: %s, url %s, headers: %s"), which is valuable for diagnosing authentication and connectivity issues in production. After the refactoring into _make_single_bedrock_api_request, this detail is no longer logged anywhere — the caller at line 653 only logs the body.

Consider adding a verbose_proxy_logger.debug(...) call after _prepare_request returns that includes prepared_request.url and prepared_request.headers.

@giulio-leone
Copy link
Author

All review feedback addressed in latest commit. PR is ready for maintainer review.

Status: ✅ All review threads resolved | ⚠️ Vercel/CodeQL failures are repo-wide CI config issues, not PR-specific

Would appreciate a review when you get a chance — happy to rebase if needed.

@giulio-leone
Copy link
Author

recheck

@giulio-leone
Copy link
Author

All review feedback addressed in latest push. Ready for re-review.

@giulio-leone
Copy link
Author

All review feedback addressed — remaining unresolved threads are automated bot suggestions that have been acknowledged with replies. No merge conflicts. Ready for maintainer review 🙏

@giulio-leone
Copy link
Author

recheck

1 similar comment
@giulio-leone
Copy link
Author

recheck

giulio-leone and others added 3 commits March 5, 2026 21:27
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Strip authorization header from debug log output to avoid leaking
credentials. Uses dict comprehension to exclude sensitive headers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Include body alongside URL and headers in the _make_single_bedrock_api_request
debug log, restoring parity with the original logging that included all three.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add max_chars <= 0 guard in _chunk_content_items to prevent infinite loop
- Propagate AWS exception markers through _merge_guardrail_responses so
  _determine_guardrail_status_from_json detects failures in chunked path
- Remove dead _get_bedrock_guardrail_response_status method (replaced by
  _determine_guardrail_status_from_json)
- Add tests for infinite loop guard and exception marker propagation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The refactoring that extracted _make_single_bedrock_api_request left
the top-level debug log with only the request body. Restored the
guardrail identifier, version, and region to the log for debugging
chunked requests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The refactored make_bedrock_api_request lost the request URL and headers
from the top-level debug log. Compute the endpoint URL before logging
and include it along with base headers for full observability parity
with the original code.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@giulio-leone
Copy link
Author

Closing to reduce PR volume. The fix remains valid — happy to resubmit individually if the team finds it useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Receive 429 Too many request when sending over 25 000 characters request to Bedrock Guardrails

3 participants