feat(guardrails): chunk large Bedrock guardrail requests to avoid 429#22561
feat(guardrails): chunk large Bedrock guardrail requests to avoid 429#22561giulio-leone wants to merge 9 commits intoBerriAI:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds automatic chunking for oversized AWS Bedrock ApplyGuardrail requests (25k char limit) and merges chunk-level responses to avoid 429 errors on long content.
Changes:
- Adds content chunking (
_chunk_content_items) and response merging (_merge_guardrail_responses) helpers. - Extracts single-call HTTP logic into
_make_single_bedrock_api_request()and updatesmake_bedrock_api_request()to orchestrate chunking/merging. - Adds a new unit test suite covering chunking, merging, and orchestration behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py | Implements chunking + multi-call orchestration and merges responses for large payloads. |
| tests/litellm/proxy/guardrails/test_bedrock_guardrail_chunking.py | Adds unit/integration-style tests validating chunking, merging, and short-circuiting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tests/litellm/proxy/guardrails/test_bedrock_guardrail_chunking.py
Outdated
Show resolved
Hide resolved
Greptile SummaryThis PR adds automatic chunking for AWS Bedrock
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py | Adds chunking support for large Bedrock guardrail requests. Logic is sound: total_chars is computed correctly, chunks are bounded, short-circuit on block works, and usage/assessments are properly merged. One issue: _check_bedrock_response_for_exception is now dead code — its only caller (_get_bedrock_guardrail_response_status) was removed and replaced by _determine_guardrail_status_from_json which inlines the same check. |
| tests/litellm/proxy/guardrails/test_bedrock_guardrail_chunking.py | Adds 16 unit + integration tests covering chunking, merging, and the full request flow. Tests are mock-only (no real network calls). One minor policy violation: top-level from fastapi import HTTPException import in a file outside the proxy/ folder. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[make_bedrock_api_request] --> B[Load credentials\nBuild bedrock_request_data]
B --> C{total_chars >\n25,000?}
C -- No --> D[_make_single_bedrock_api_request\nsingle call]
D --> E[Redact PII\nLog response]
E --> F{Blocked?}
F -- No --> G[add_standard_logging\nReturn response]
F -- Yes --> H[Raise HTTPException 400]
C -- Yes --> I[_chunk_content_items\nSplit into ≤25k chunks]
I --> J[for chunk in chunks]
J --> K[_make_single_bedrock_api_request\nper-chunk call]
K --> L{HTTP 200?}
L -- No --> M[Log error\nRaise HTTPException]
L -- Yes --> N[Append to responses]
N --> O{Chunk blocked?}
O -- Yes --> P[break — short circuit]
O -- No --> J
P --> Q[_merge_guardrail_responses\nworst action wins\nsum usage\nconcat assessments]
J -- all done --> Q
Q --> R[Redact PII\nLog merged response]
R --> S[add_standard_logging]
S --> T{Merged blocked?}
T -- Yes --> H
T -- No --> U[Return merged response]
Comments Outside Diff (2)
-
litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py, line 794-825 (link)Dead code:
_check_bedrock_response_for_exceptionis never calledAfter this refactoring,
_check_bedrock_response_for_exceptionis defined but never invoked from anywhere in the file. The old_get_bedrock_guardrail_response_statuswas its only caller, and that method was removed and replaced by_determine_guardrail_status_from_json, which inlines the exception-checking logic directly.Since this method is now dead code, it should either be removed or explicitly called from
_determine_guardrail_status_from_jsonto avoid duplication of the same check logic and to prevent confusion for future maintainers. -
tests/litellm/proxy/guardrails/test_bedrock_guardrail_chunking.py, line 5 (link)FastAPI import outside the
proxy/folderfastapiis a proxy dependency, not an SDK one, and imports of it should be confined to files underlitellm/proxy/. This test file lives undertests/, notproxy/, so this import technically violates that policy.Consider catching
HTTPExceptionvia the stdlib or a re-exported alias from litellm itself, or alternatively assert on the exception's type by fully-qualifying it after import rather than importing it at module level:# Instead of: from fastapi import HTTPException import fastapi # or reference via pytest.raises with a string
Or, because the tests are verifying proxy behavior, move the import inside the individual test functions so it is not a module-level dependency:
def test_chunking_short_circuits_on_block(self, guardrail): from fastapi import HTTPException ...
Rule Used: What: Do not allow fastapi imports on files outsid... (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Last reviewed commit: 6a6ca60
Additional Comments (1)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! |
|
@greptile-apps Thanks for the summary — accurate description of the chunking implementation. |
|
@greptile-apps Good catch on the event_type redundancy. The event_type is computed in both places because |
|
You're welcome! Let me know if you need any clarification on the implementation or have questions about specific parts of the chunking logic. |
|
Actually, since This removes the parameter completely and moves the computation to where it's used. Since it's a one-liner derived from |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Additional Comments (1)
This method is no longer called anywhere in the codebase — all call sites now use the new |
Additional Comments (1)
The original code logged the prepared request URL and headers ( Consider adding a |
|
All review feedback addressed in latest commit. PR is ready for maintainer review. Status: ✅ All review threads resolved | Would appreciate a review when you get a chance — happy to rebase if needed. |
|
recheck |
|
All review feedback addressed in latest push. Ready for re-review. |
|
All review feedback addressed — remaining unresolved threads are automated bot suggestions that have been acknowledged with replies. No merge conflicts. Ready for maintainer review 🙏 |
|
recheck |
1 similar comment
|
recheck |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Strip authorization header from debug log output to avoid leaking credentials. Uses dict comprehension to exclude sensitive headers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Include body alongside URL and headers in the _make_single_bedrock_api_request debug log, restoring parity with the original logging that included all three. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add max_chars <= 0 guard in _chunk_content_items to prevent infinite loop - Propagate AWS exception markers through _merge_guardrail_responses so _determine_guardrail_status_from_json detects failures in chunked path - Remove dead _get_bedrock_guardrail_response_status method (replaced by _determine_guardrail_status_from_json) - Add tests for infinite loop guard and exception marker propagation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The refactoring that extracted _make_single_bedrock_api_request left the top-level debug log with only the request body. Restored the guardrail identifier, version, and region to the log for debugging chunked requests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The refactored make_bedrock_api_request lost the request URL and headers from the top-level debug log. Compute the endpoint URL before logging and include it along with base headers for full observability parity with the original code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Closing to reduce PR volume. The fix remains valid — happy to resubmit individually if the team finds it useful. |
Problem
AWS Bedrock
ApplyGuardrailAPI has a 25,000 character limit per assessment call. When content exceeds this limit, AWS returns a429 TooManyRequestsException. This breaks guardrail checks for long conversations or large documents.Per AWS documentation, large payloads should be batched into smaller chunks.
Solution
Automatically splits content into chunks of ≤ 25,000 characters when the total text exceeds the limit, processes each chunk independently, and merges results.
Key design decisions:
GUARDRAIL_INTERVENED, the merged result reflects thattopicPolicyUnits,contentPolicyUnits, etc. are aggregated across chunksArchitecture
Extracted the HTTP call logic into
_make_single_bedrock_api_request()and added:_chunk_content_items(): SplitsBedrockContentItemlists into size-bounded chunks_merge_guardrail_responses(): Merges multipleBedrockGuardrailResponseobjectsmake_bedrock_api_request()now orchestrates chunking when content exceeds the limit.Changes
litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py: Chunking, merging, single-request extractiontests/litellm/proxy/guardrails/test_bedrock_guardrail_chunking.py: 16 unit testsTests
All 16 tests pass:
Fixes #19501