[feat]Add prompt management support for responses api by Sameerlite · Pull Request #23999 · BerriAI/litellm

Sameerlite · 2026-03-18T11:41:29Z

Relevant issues

Fixes LIT-2135

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🐛 Bug Fix

Changes

- Fix async path: call async_get_chat_completion_prompt in aresponses() before executor dispatch, mirroring acompletion() in main.py. Discard merged_optional_params in async path (sync responses() handles them via local_vars), avoiding TypeError from duplicate kwargs in partial(). - Fix provider re-resolution: replace "/" in model heuristic with model != original_model comparison so bare model names are handled. - Add 3 async tests covering hook invocation, optional param propagation, and non-message item filtering in aresponses(). Made-with: Cursor

…l params - aresponses() now pops prompt_id from kwargs after the async hook runs and passes merged_optional_params via _async_prompt_merged_params. responses() checks for this internal kwarg first and skips the sync hook entirely when present — eliminating double-merge of template messages. - merged_optional_params from async_get_chat_completion_prompt is no longer discarded (_); it flows through to local_vars in responses(). - Async tests now assert get_chat_completion_prompt.assert_not_called() to directly detect any double-execution regression. Made-with: Cursor

vercel · 2026-03-18T11:41:34Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 18, 2026 11:41am

codspeed-hq · 2026-03-18T11:42:02Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing Sameerlite:litellm_feat_prompt_responses (021540b) with main (cec3e9e)}

greptile-apps · 2026-03-18T11:48:41Z

Greptile Summary

This PR adds prompt management support (prompt_id / prompt_variables) to the Responses API (/v1/responses), mirroring the existing chat-completions flow. It introduces a synchronized async/sync execution strategy: aresponses() runs the async hook, pops prompt_id from kwargs, and forwards merged results via an internal _async_prompt_merged_params sentinel kwarg so that the sync responses() path can skip its own hook — preventing the double-merge regression identified in the prior review.

Key changes:

litellm/responses/main.py: New ASYNC PROMPT MANAGEMENT block in aresponses() runs async_get_chat_completion_prompt, pops prompt_id, and passes merged_optional_params downstream via _async_prompt_merged_params. New PROMPT MANAGEMENT block in responses() checks for this sentinel and either applies async-merged optional params directly (skipping the sync hook) or runs get_chat_completion_prompt itself for the pure-sync path.
tests/…/test_responses_prompt_management.py: 9 mock-only unit tests, including the new get_chat_completion_prompt.assert_not_called() assertions in async tests that catch any double-merge regression.
Documentation: new prompt_management.md page and updates to the proxy docs.

Issues from the prior review — status:

✅ Double prompt management execution: fixed via kwargs.pop("prompt_id") + _async_prompt_merged_params sentinel.
✅ merged_optional_params discarded in async hook: fixed; now forwarded via _async_prompt_merged_params.
✅ Tests hiding double-merge: fixed; assert_not_called() assertions added to all async tests.

Remaining concerns:

prompt_variables is not popped from kwargs alongside prompt_id after the async hook fires, creating an asymmetry that could confuse future maintainers (see inline comment).
local_vars["model"] and local_vars["input"] are not explicitly refreshed in the _async_merged fast-path of responses(), unlike the sync path — benign now but inconsistent (see inline comment).
The _async_prompt_merged_params sentinel lives in the public **kwargs namespace; an accidental external use would silently suppress the sync prompt-management hook (see inline comment).
No async test for model-override + provider re-resolution (async counterpart to sync test [G]).

Confidence Score: 4/5

The three P1 issues from the prior review are correctly fixed; the PR is safe to merge with minor code-quality improvements recommended.
The double-merge regression is eliminated (prompt_id popped + _async_prompt_merged_params sentinel prevents the sync hook from firing after the async hook), and merged_optional_params are no longer discarded. The sync path is well-tested and the async tests now include assert_not_called() guards against double-merge. Score is 4 rather than 5 because: (1) prompt_variables is not popped from kwargs after the async hook, creating an asymmetry; (2) local_vars["model"] is not explicitly refreshed in the _async_merged fast-path unlike the sync path; (3) the _async_prompt_merged_params sentinel is in the public **kwargs namespace and could be exploited accidentally; (4) there is no async test for model-override + provider re-resolution.
litellm/responses/main.py (async prompt management block lines 467–512 and _async_merged fast-path lines 680–683)

Important Files Changed

Filename	Overview
litellm/responses/main.py	Adds prompt management blocks to both responses() and aresponses(); the double-merge and optional-param discard bugs from the prior review are resolved via the _async_prompt_merged_params sentinel kwarg; one minor inconsistency remains where local_vars["model"] is not explicitly refreshed in the _async_merged fast path (unlike the sync path), though it is benign in practice.
tests/test_litellm/responses/test_responses_prompt_management.py	9 new mock-only unit tests; async tests now correctly assert get_chat_completion_prompt.assert_not_called() to catch double-merge regressions; test_optional_params_from_template_applied checks temperature propagation but relies on the internal ResponsesAPIOptionalRequestParams TypedDict being passable to .get() which works fine.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant aresponses
    participant LiteLLMLoggingObj
    participant responses
    participant Handler

    Caller->>aresponses: aresponses(input, model, prompt_id, **kwargs)
    aresponses->>aresponses: get_llm_provider(model)
    aresponses->>LiteLLMLoggingObj: should_run_prompt_management_hooks(prompt_id)
    alt prompt management needed
        LiteLLMLoggingObj-->>aresponses: True
        aresponses->>LiteLLMLoggingObj: async_get_chat_completion_prompt(model, messages, prompt_id)
        LiteLLMLoggingObj-->>aresponses: (merged_model, merged_input, merged_optional_params)
        aresponses->>aresponses: kwargs.pop("prompt_id")
        aresponses->>aresponses: kwargs["_async_prompt_merged_params"] = merged_optional_params
        aresponses->>aresponses: input = merged_input, model = merged_model
    else no prompt management
        LiteLLMLoggingObj-->>aresponses: False
    end
    aresponses->>responses: partial(responses, input=input, model=model, **kwargs)
    responses->>responses: local_vars = locals()
    responses->>responses: get_llm_provider(model, custom_llm_provider)
    responses->>responses: _async_merged = kwargs.pop("_async_prompt_merged_params", None)
    alt _async_merged is not None
        responses->>responses: apply _async_merged to local_vars (skip sync hook)
    else no async merged params
        responses->>LiteLLMLoggingObj: should_run_prompt_management_hooks(prompt_id)
        alt prompt management needed
            LiteLLMLoggingObj-->>responses: True
            responses->>LiteLLMLoggingObj: get_chat_completion_prompt(model, messages, prompt_id)
            LiteLLMLoggingObj-->>responses: (merged_model, merged_input, merged_optional_params)
            responses->>responses: update local_vars with merged results
        end
    end
    responses->>responses: get_requested_response_api_optional_param(local_vars)
    responses->>Handler: response_api_handler(model, input, responses_api_request)
    Handler-->>responses: ResponsesAPIResponse
    responses-->>aresponses: ResponsesAPIResponse
    aresponses-->>Caller: ResponsesAPIResponse

_{Last reviewed commit: "fix: prevent double ..."}

greptile-apps · 2026-03-18T11:48:45Z

litellm/responses/main.py

+        _async_merged = kwargs.pop("_async_prompt_merged_params", None)
+        if _async_merged is not None:
+            for k, v in _async_merged.items():
+                local_vars[k] = v


local_vars["model"] not refreshed in async fast-path (inconsistency with sync path)

In the sync prompt-management block (lines 718–719) the code explicitly writes:

local_vars["model"] = model local_vars["input"] = input

In this _async_merged fast-path only the optional params from the template are applied to local_vars; model and input are not re-written. This is not a functional bug today — model is passed in as a function argument (captured correctly by locals()) and local_vars["input"] is overwritten at line 746 — but the inconsistency makes the code harder to reason about and could silently break if anything upstream reads local_vars["model"] before line 804.

Suggested change

_async_merged = kwargs.pop("_async_prompt_merged_params", None)

if _async_merged is not None:

for k, v in _async_merged.items():

local_vars[k] = v

_async_merged = kwargs.pop("_async_prompt_merged_params", None)

if _async_merged is not None:

# Keep model / input in sync with local_vars just as the sync path does

local_vars["model"] = model

local_vars["input"] = input

for k, v in _async_merged.items():

local_vars[k] = v

litellm/responses/main.py

tests/test_litellm/responses/test_responses_prompt_management.py

Sameerlite added 5 commits March 18, 2026 15:48

Add support for prompt management for responses

0d70864

Fix greptile review

35b3ed5

Fix greptile comments

b32f5ea

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes

Sameerlite changed the title ~~Litellm feat prompt responses~~ [feat]Add prompt management support for responses api Mar 18, 2026

Sameerlite changed the base branch from main to litellm_dev_sameer_16_march_week March 20, 2026 10:56

Sameerlite merged commit aafe9da into BerriAI:litellm_dev_sameer_16_march_week Mar 20, 2026
28 of 72 checks passed

joereyna mentioned this pull request Mar 23, 2026

docs(release-notes): add v1.82.6.rc.1 release notes #24452

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feat]Add prompt management support for responses api#23999

[feat]Add prompt management support for responses api#23999
Sameerlite merged 5 commits intoBerriAI:litellm_dev_sameer_16_march_weekfrom
Sameerlite:litellm_feat_prompt_responses

Sameerlite commented Mar 18, 2026 •

edited

Loading

Uh oh!

vercel bot commented Mar 18, 2026

Uh oh!

codspeed-hq bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 18, 2026

Important Files Changed

Uh oh!

greptile-apps bot Mar 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Sameerlite commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Type

Changes

Uh oh!

vercel bot commented Mar 18, 2026

Uh oh!

codspeed-hq bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

greptile-apps bot commented Mar 18, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Sameerlite commented Mar 18, 2026 •

edited

Loading

codspeed-hq bot commented Mar 18, 2026 •

edited

Loading