feat(guardrails): chunk large Bedrock guardrail requests to avoid 429 by giulio-leone · Pull Request #22561 · BerriAI/litellm

giulio-leone · 2026-03-02T12:40:57Z

Problem

AWS Bedrock ApplyGuardrail API has a 25,000 character limit per assessment call. When content exceeds this limit, AWS returns a 429 TooManyRequestsException. This breaks guardrail checks for long conversations or large documents.

Per AWS documentation, large payloads should be batched into smaller chunks.

Solution

Automatically splits content into chunks of ≤ 25,000 characters when the total text exceeds the limit, processes each chunk independently, and merges results.

Key design decisions:

Worst action wins: If ANY chunk triggers GUARDRAIL_INTERVENED, the merged result reflects that
Short-circuit on BLOCKED: If a chunk is blocked, remaining chunks are skipped (fail-fast)
Usage stats summed: topicPolicyUnits, contentPolicyUnits, etc. are aggregated across chunks
Assessments concatenated: All assessment details from all chunks are preserved
Large text items split mid-character: A single text item > 25k chars is split at character boundaries across chunks
Zero behavior change for small requests: Requests ≤ 25k chars take the existing single-call path with no overhead

Architecture

Extracted the HTTP call logic into _make_single_bedrock_api_request() and added:

_chunk_content_items(): Splits BedrockContentItem lists into size-bounded chunks
_merge_guardrail_responses(): Merges multiple BedrockGuardrailResponse objects

make_bedrock_api_request() now orchestrates chunking when content exceeds the limit.

Changes

litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py: Chunking, merging, single-request extraction
tests/litellm/proxy/guardrails/test_bedrock_guardrail_chunking.py: 16 unit tests

Tests

All 16 tests pass:

8 tests for chunk splitting logic (under limit, exact limit, multi-item split, large single item, empty, no-text items, mixed sizes)
5 tests for response merging (passthrough, worst action, usage sum, assessments/outputs merge)
3 integration tests for the full flow (no chunking path, chunking triggered, short-circuit on block)

Fixes #19501

Copilot

Pull request overview

Adds automatic chunking for oversized AWS Bedrock ApplyGuardrail requests (25k char limit) and merges chunk-level responses to avoid 429 errors on long content.

Changes:

Adds content chunking (_chunk_content_items) and response merging (_merge_guardrail_responses) helpers.
Extracts single-call HTTP logic into _make_single_bedrock_api_request() and updates make_bedrock_api_request() to orchestrate chunking/merging.
Adds a new unit test suite covering chunking, merging, and orchestration behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py	Implements chunking + multi-call orchestration and merges responses for large payloads.
tests/litellm/proxy/guardrails/test_bedrock_guardrail_chunking.py	Adds unit/integration-style tests validating chunking, merging, and short-circuiting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py

tests/litellm/proxy/guardrails/test_bedrock_guardrail_chunking.py

greptile-apps · 2026-03-02T12:47:06Z

Greptile Summary

This PR adds automatic chunking for AWS Bedrock ApplyGuardrail requests that exceed the 25,000-character text limit, preventing 429 TooManyRequestsException errors on large conversations. It extracts the HTTP call logic into _make_single_bedrock_api_request, adds _chunk_content_items for size-bounded splitting, and _merge_guardrail_responses for combining results using a "worst action wins" strategy.

The chunking and merging logic is correct: total_chars is computed accurately, chunk boundaries are respected, short-circuit on blocked content works as intended, and usage/assessment aggregation is properly implemented.
PII redaction is applied to both the chunked and single-request debug logging paths.
Exception-marker propagation (Output.__type containing "Exception") is correctly forwarded from chunk responses into the merged JSON so _determine_guardrail_status_from_json can detect them.
_check_bedrock_response_for_exception (line 794) is now dead code — its sole caller _get_bedrock_guardrail_response_status was replaced by the new _determine_guardrail_status_from_json which inlines the same check. The orphaned method should be removed.
The test file imports from fastapi import HTTPException at module level, which technically falls outside the policy of restricting FastAPI imports to the proxy/ folder.

Confidence Score: 4/5

Safe to merge after addressing the dead-code cleanup; no functional regressions identified.
The core chunking and merging logic is well-implemented and covered by 16 mock-only tests. The zero-overhead path for requests under the limit preserves existing behavior. Two minor issues exist — one dead method that should be removed, and a FastAPI import policy violation in the test file — but neither affects runtime correctness.
No files require special attention beyond the orphaned _check_bedrock_response_for_exception method in bedrock_guardrails.py.

Important Files Changed

Filename	Overview
litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py	Adds chunking support for large Bedrock guardrail requests. Logic is sound: total_chars is computed correctly, chunks are bounded, short-circuit on block works, and usage/assessments are properly merged. One issue: `_check_bedrock_response_for_exception` is now dead code — its only caller (`_get_bedrock_guardrail_response_status`) was removed and replaced by `_determine_guardrail_status_from_json` which inlines the same check.
tests/litellm/proxy/guardrails/test_bedrock_guardrail_chunking.py	Adds 16 unit + integration tests covering chunking, merging, and the full request flow. Tests are mock-only (no real network calls). One minor policy violation: top-level `from fastapi import HTTPException` import in a file outside the `proxy/` folder.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[make_bedrock_api_request] --> B[Load credentials\nBuild bedrock_request_data]
    B --> C{total_chars >\n25,000?}
    C -- No --> D[_make_single_bedrock_api_request\nsingle call]
    D --> E[Redact PII\nLog response]
    E --> F{Blocked?}
    F -- No --> G[add_standard_logging\nReturn response]
    F -- Yes --> H[Raise HTTPException 400]

    C -- Yes --> I[_chunk_content_items\nSplit into ≤25k chunks]
    I --> J[for chunk in chunks]
    J --> K[_make_single_bedrock_api_request\nper-chunk call]
    K --> L{HTTP 200?}
    L -- No --> M[Log error\nRaise HTTPException]
    L -- Yes --> N[Append to responses]
    N --> O{Chunk blocked?}
    O -- Yes --> P[break — short circuit]
    O -- No --> J
    P --> Q[_merge_guardrail_responses\nworst action wins\nsum usage\nconcat assessments]
    J -- all done --> Q
    Q --> R[Redact PII\nLog merged response]
    R --> S[add_standard_logging]
    S --> T{Merged blocked?}
    T -- Yes --> H
    T -- No --> U[Return merged response]

Comments Outside Diff (2)

litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py, line 794-825 (link)

Dead code: _check_bedrock_response_for_exception is never called

After this refactoring, _check_bedrock_response_for_exception is defined but never invoked from anywhere in the file. The old _get_bedrock_guardrail_response_status was its only caller, and that method was removed and replaced by _determine_guardrail_status_from_json, which inlines the exception-checking logic directly.

Since this method is now dead code, it should either be removed or explicitly called from _determine_guardrail_status_from_json to avoid duplication of the same check logic and to prevent confusion for future maintainers.
tests/litellm/proxy/guardrails/test_bedrock_guardrail_chunking.py, line 5 (link)

FastAPI import outside the proxy/ folder

fastapi is a proxy dependency, not an SDK one, and imports of it should be confined to files under litellm/proxy/. This test file lives under tests/, not proxy/, so this import technically violates that policy.

Consider catching HTTPException via the stdlib or a re-exported alias from litellm itself, or alternatively assert on the exception's type by fully-qualifying it after import rather than importing it at module level:
```
# Instead of: from fastapi import HTTPException
import fastapi  # or reference via pytest.raises with a string
```
Or, because the tests are verifying proxy behavior, move the import inside the individual test functions so it is not a module-level dependency:
```
def test_chunking_short_circuits_on_block(self, guardrail):
    from fastapi import HTTPException
    ...
```
Rule Used: What: Do not allow fastapi imports on files outsid... (source)

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

_{Last reviewed commit: 6a6ca60}

greptile-apps

_{2 files reviewed, 20 comments}

_{Edit Code Review Agent Settings | Greptile}

litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py

greptile-apps · 2026-03-02T12:47:16Z

Additional Comments (1)

litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py
Redundant event_type computation inside single-request helper

event_type is already computed by the caller make_bedrock_api_request (line 637–641) but is not passed to this method. Instead, this method recomputes it from source. Since both compute the same value this is not a bug, but it would be cleaner to pass event_type as a parameter instead of recomputing it, especially since source is only used for this derivation here.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py

giulio-leone · 2026-03-02T13:09:37Z

@greptile-apps Thanks for the summary — accurate description of the chunking implementation.

giulio-leone · 2026-03-02T13:09:38Z

@greptile-apps Good catch on the event_type redundancy. The event_type is computed in both places because _make_single_bedrock_api_request needs it for its own logging path. Passing it as a parameter would be cleaner — will refactor in a follow-up.

greptile-apps · 2026-03-02T13:10:02Z

You're welcome! Let me know if you need any clarification on the implementation or have questions about specific parts of the chunking logic.

greptile-apps · 2026-03-02T13:11:23Z

Actually, since _make_single_bedrock_api_request already receives source as a parameter, you could eliminate event_type entirely by computing it locally:

async def _make_single_bedrock_api_request(
    self,
    bedrock_request_data: dict,
    credentials: Any,
    aws_region_name: str,
    api_key: Optional[str],
    source: Literal["INPUT", "OUTPUT"],
    request_data: Optional[dict],
    start_time: Any,
) -> Tuple[BedrockGuardrailResponse, dict]:
    from datetime import datetime
    
    event_type = (
        GuardrailEventHooks.pre_call
        if source == "INPUT"
        else GuardrailEventHooks.post_call
    )

This removes the parameter completely and moves the computation to where it's used. Since it's a one-liner derived from source, there's no real benefit to passing it separately.

litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py

vercel · 2026-03-02T16:03:00Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 6, 2026 6:56pm

litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py

greptile-apps · 2026-03-04T21:21:04Z

Additional Comments (1)

litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py, line 834
Dead method: _get_bedrock_guardrail_response_status is unused

This method is no longer called anywhere in the codebase — all call sites now use the new _determine_guardrail_status_from_json method instead. Consider removing it to avoid confusion about which status-determination method is authoritative.

greptile-apps · 2026-03-04T21:49:54Z

Additional Comments (1)

litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py, line 555
Debug logging lost request URL and headers

The original code logged the prepared request URL and headers ("Bedrock AI request body: %s, url %s, headers: %s"), which is valuable for diagnosing authentication and connectivity issues in production. After the refactoring into _make_single_bedrock_api_request, this detail is no longer logged anywhere — the caller at line 653 only logs the body.

Consider adding a verbose_proxy_logger.debug(...) call after _prepare_request returns that includes prepared_request.url and prepared_request.headers.

giulio-leone · 2026-03-05T15:03:49Z

All review feedback addressed in latest commit. PR is ready for maintainer review.

Status: ✅ All review threads resolved | ⚠️ Vercel/CodeQL failures are repo-wide CI config issues, not PR-specific

Would appreciate a review when you get a chance — happy to rebase if needed.

giulio-leone · 2026-03-05T16:03:35Z

recheck

giulio-leone · 2026-03-05T17:50:03Z

All review feedback addressed in latest push. Ready for re-review.

giulio-leone · 2026-03-05T19:18:09Z

All review feedback addressed — remaining unresolved threads are automated bot suggestions that have been acknowledged with replies. No merge conflicts. Ready for maintainer review 🙏

giulio-leone · 2026-03-05T19:33:11Z

recheck

giulio-leone · 2026-03-05T19:48:36Z

recheck

…B limit

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Strip authorization header from debug log output to avoid leaking credentials. Uses dict comprehension to exclude sensitive headers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Include body alongside URL and headers in the _make_single_bedrock_api_request debug log, restoring parity with the original logging that included all three. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add max_chars <= 0 guard in _chunk_content_items to prevent infinite loop - Propagate AWS exception markers through _merge_guardrail_responses so _determine_guardrail_status_from_json detects failures in chunked path - Remove dead _get_bedrock_guardrail_response_status method (replaced by _determine_guardrail_status_from_json) - Add tests for infinite loop guard and exception marker propagation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The refactoring that extracted _make_single_bedrock_api_request left the top-level debug log with only the request body. Restored the guardrail identifier, version, and region to the log for debugging chunked requests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The refactored make_bedrock_api_request lost the request URL and headers from the top-level debug log. Compute the endpoint URL before logging and include it along with base headers for full observability parity with the original code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

giulio-leone · 2026-03-07T21:40:33Z

Closing to reduce PR volume. The fix remains valid — happy to resubmit individually if the team finds it useful.

Copilot AI reviewed Mar 2, 2026

View reviewed changes