You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm experiencing an issue where AWS Bedrock Guardrails blocks all subsequent prompts in a conversation after blocking the first one, even though guardrail_redact_input should prevent this.
Setup:
Using RepositorySessionManager with Redis for session persistence (a custom implementation of the SessionRepository) guardrail_redact_input defaults to True (not explicitly set)
What happens:
Turn 1: User prompt triggers Guardrails (PROMPT_ATTACK, low confidence)
Turn 2: New, unrelated, valid prompt is sent
Guardrails blocks again, this time with "competing-technologies" topic violation (regardless of the prompt contents - could be anything, including prompts that have been tested successfully numerous times in the past).
Logs show the redacted prompt is still in conversation history:
"messages": [
{"content": [{"text": "[User input redacted.]"}], "role": "user"},
{"content": [{"text": "I'm sorry, I couldn't answer..."}], "role": "assistant"},
{"content": [{"text": "how many APIs do I have?"}], "role": "user"}
]
Questions:
Should guardrail_redact_input=True completely remove blocked messages from history, or just replace the content with [User input redacted.]?
Does AWS Guardrails re-evaluate the entire conversation history including redacted placeholders?
Is the expected behavior to start a new conversation after a Guardrails block, or should subsequent prompts work in the same session?
The documentation states: "When a guardrail is triggered, the Strands Agents SDK automatically overwrites the user's input in the conversation history to prevent follow-up questions from being blocked" - but this doesn't seem to be happening in practice.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm experiencing an issue where AWS Bedrock Guardrails blocks all subsequent prompts in a conversation after blocking the first one, even though guardrail_redact_input should prevent this.
Setup:
Using RepositorySessionManager with Redis for session persistence (a custom implementation of the
SessionRepository)guardrail_redact_inputdefaults to True (not explicitly set)What happens:
Logs show the redacted prompt is still in conversation history:
Questions:
The documentation states: "When a guardrail is triggered, the Strands Agents SDK automatically overwrites the user's input in the conversation history to prevent follow-up questions from being blocked" - but this doesn't seem to be happening in practice.
Versions:
strands-agents: 1.12.0
Python: 13.3
Any guidance would be appreciated!
Beta Was this translation helpful? Give feedback.
All reactions