[Bugfix] Fix Responses API instructions leaking through previous_response_id#37727
[Bugfix] Fix Responses API instructions leaking through previous_response_id#37727he-yufeng wants to merge 1 commit intovllm-project:mainfrom
Conversation
…onse_id Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a bug where instructions from a previous response would leak into a new response when using previous_response_id. The change in vllm/entrypoints/openai/responses/utils.py correctly filters out system messages from the previous message history, aligning with the specification that instructions should not be carried over. New unit tests have been added in tests/entrypoints/openai/responses/test_responses_utils.py to validate the fix across various scenarios. The changes appear correct and are appropriately tested.
| # Add the previous messages. | ||
| messages.extend(prev_msg) | ||
| # Filter out system messages from previous conversation -- per the | ||
| # OpenAI spec, instructions should NOT carry over across responses. |
There was a problem hiding this comment.
Looks good. Is there any related OpenAI spec documentation for this? Could you share the link?
There was a problem hiding this comment.
Sure! From the OpenAI API Reference — Create Response, the instructions parameter description states:
When used along with
previous_response_id, the instructions from a previous response will not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses.
The Text Generation guide also reinforces this — the instructions parameter only applies to the current response, and instructions from previous turns will not be present in the context when using previous_response_id.
Fixes #37697
What's the problem
When using
/v1/responseswithprevious_response_id, theinstructionsfrom the prior response carry over into the new response. Per the OpenAI spec, instructions should NOT carry over:Root cause
construct_input_messages()inresponses/utils.pyprependsrequest_instructionsas a system message, then the full messages list (including that system message) gets stored inmsg_store. When the next request referencesprevious_response_id, those stored messages — old system message included — are retrieved and extended into the new conversation. The new request also adds its own instructions, so you end up with both old and new system messages.Fix
Filter out system messages when pulling
prev_msgfrom the store inconstruct_input_messages(). One-line change:messages.extend(prev_msg)becomesmessages.extend(m for m in prev_msg if m.get("role") != "system").This ensures each request only uses its own
instructions, regardless of what the previous response had. Works correctly for all cases: new instructions provided, no instructions provided, or no previous response at all.Test plan
tests/entrypoints/openai/responses/test_responses_utils.pycovering: