Skip to content

[feat]Add prompt management support for responses api#23999

Merged
Sameerlite merged 5 commits intoBerriAI:litellm_dev_sameer_16_march_weekfrom
Sameerlite:litellm_feat_prompt_responses
Mar 20, 2026
Merged

[feat]Add prompt management support for responses api#23999
Sameerlite merged 5 commits intoBerriAI:litellm_dev_sameer_16_march_weekfrom
Sameerlite:litellm_feat_prompt_responses

Conversation

@Sameerlite
Copy link
Copy Markdown
Contributor

@Sameerlite Sameerlite commented Mar 18, 2026

Relevant issues

Fixes LIT-2135

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🐛 Bug Fix

Changes

- Fix async path: call async_get_chat_completion_prompt in aresponses()
  before executor dispatch, mirroring acompletion() in main.py. Discard
  merged_optional_params in async path (sync responses() handles them
  via local_vars), avoiding TypeError from duplicate kwargs in partial().
- Fix provider re-resolution: replace "/" in model heuristic with
  model != original_model comparison so bare model names are handled.
- Add 3 async tests covering hook invocation, optional param
  propagation, and non-message item filtering in aresponses().

Made-with: Cursor
…l params

- aresponses() now pops prompt_id from kwargs after the async hook runs
  and passes merged_optional_params via _async_prompt_merged_params.
  responses() checks for this internal kwarg first and skips the sync
  hook entirely when present — eliminating double-merge of template
  messages.
- merged_optional_params from async_get_chat_completion_prompt is no
  longer discarded (_); it flows through to local_vars in responses().
- Async tests now assert get_chat_completion_prompt.assert_not_called()
  to directly detect any double-execution regression.

Made-with: Cursor
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 18, 2026 11:41am

Request Review

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 18, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing Sameerlite:litellm_feat_prompt_responses (021540b) with main (cec3e9e)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 18, 2026

Greptile Summary

This PR adds prompt management support (prompt_id / prompt_variables) to the Responses API (/v1/responses), mirroring the existing chat-completions flow. It introduces a synchronized async/sync execution strategy: aresponses() runs the async hook, pops prompt_id from kwargs, and forwards merged results via an internal _async_prompt_merged_params sentinel kwarg so that the sync responses() path can skip its own hook — preventing the double-merge regression identified in the prior review.

Key changes:

  • litellm/responses/main.py: New ASYNC PROMPT MANAGEMENT block in aresponses() runs async_get_chat_completion_prompt, pops prompt_id, and passes merged_optional_params downstream via _async_prompt_merged_params. New PROMPT MANAGEMENT block in responses() checks for this sentinel and either applies async-merged optional params directly (skipping the sync hook) or runs get_chat_completion_prompt itself for the pure-sync path.
  • tests/…/test_responses_prompt_management.py: 9 mock-only unit tests, including the new get_chat_completion_prompt.assert_not_called() assertions in async tests that catch any double-merge regression.
  • Documentation: new prompt_management.md page and updates to the proxy docs.

Issues from the prior review — status:

  • ✅ Double prompt management execution: fixed via kwargs.pop("prompt_id") + _async_prompt_merged_params sentinel.
  • merged_optional_params discarded in async hook: fixed; now forwarded via _async_prompt_merged_params.
  • ✅ Tests hiding double-merge: fixed; assert_not_called() assertions added to all async tests.

Remaining concerns:

  • prompt_variables is not popped from kwargs alongside prompt_id after the async hook fires, creating an asymmetry that could confuse future maintainers (see inline comment).
  • local_vars["model"] and local_vars["input"] are not explicitly refreshed in the _async_merged fast-path of responses(), unlike the sync path — benign now but inconsistent (see inline comment).
  • The _async_prompt_merged_params sentinel lives in the public **kwargs namespace; an accidental external use would silently suppress the sync prompt-management hook (see inline comment).
  • No async test for model-override + provider re-resolution (async counterpart to sync test [G]).

Confidence Score: 4/5

  • The three P1 issues from the prior review are correctly fixed; the PR is safe to merge with minor code-quality improvements recommended.
  • The double-merge regression is eliminated (prompt_id popped + _async_prompt_merged_params sentinel prevents the sync hook from firing after the async hook), and merged_optional_params are no longer discarded. The sync path is well-tested and the async tests now include assert_not_called() guards against double-merge. Score is 4 rather than 5 because: (1) prompt_variables is not popped from kwargs after the async hook, creating an asymmetry; (2) local_vars["model"] is not explicitly refreshed in the _async_merged fast-path unlike the sync path; (3) the _async_prompt_merged_params sentinel is in the public **kwargs namespace and could be exploited accidentally; (4) there is no async test for model-override + provider re-resolution.
  • litellm/responses/main.py (async prompt management block lines 467–512 and _async_merged fast-path lines 680–683)

Important Files Changed

Filename Overview
litellm/responses/main.py Adds prompt management blocks to both responses() and aresponses(); the double-merge and optional-param discard bugs from the prior review are resolved via the _async_prompt_merged_params sentinel kwarg; one minor inconsistency remains where local_vars["model"] is not explicitly refreshed in the _async_merged fast path (unlike the sync path), though it is benign in practice.
tests/test_litellm/responses/test_responses_prompt_management.py 9 new mock-only unit tests; async tests now correctly assert get_chat_completion_prompt.assert_not_called() to catch double-merge regressions; test_optional_params_from_template_applied checks temperature propagation but relies on the internal ResponsesAPIOptionalRequestParams TypedDict being passable to .get() which works fine.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant aresponses
    participant LiteLLMLoggingObj
    participant responses
    participant Handler

    Caller->>aresponses: aresponses(input, model, prompt_id, **kwargs)
    aresponses->>aresponses: get_llm_provider(model)
    aresponses->>LiteLLMLoggingObj: should_run_prompt_management_hooks(prompt_id)
    alt prompt management needed
        LiteLLMLoggingObj-->>aresponses: True
        aresponses->>LiteLLMLoggingObj: async_get_chat_completion_prompt(model, messages, prompt_id)
        LiteLLMLoggingObj-->>aresponses: (merged_model, merged_input, merged_optional_params)
        aresponses->>aresponses: kwargs.pop("prompt_id")
        aresponses->>aresponses: kwargs["_async_prompt_merged_params"] = merged_optional_params
        aresponses->>aresponses: input = merged_input, model = merged_model
    else no prompt management
        LiteLLMLoggingObj-->>aresponses: False
    end
    aresponses->>responses: partial(responses, input=input, model=model, **kwargs)
    responses->>responses: local_vars = locals()
    responses->>responses: get_llm_provider(model, custom_llm_provider)
    responses->>responses: _async_merged = kwargs.pop("_async_prompt_merged_params", None)
    alt _async_merged is not None
        responses->>responses: apply _async_merged to local_vars (skip sync hook)
    else no async merged params
        responses->>LiteLLMLoggingObj: should_run_prompt_management_hooks(prompt_id)
        alt prompt management needed
            LiteLLMLoggingObj-->>responses: True
            responses->>LiteLLMLoggingObj: get_chat_completion_prompt(model, messages, prompt_id)
            LiteLLMLoggingObj-->>responses: (merged_model, merged_input, merged_optional_params)
            responses->>responses: update local_vars with merged results
        end
    end
    responses->>responses: get_requested_response_api_optional_param(local_vars)
    responses->>Handler: response_api_handler(model, input, responses_api_request)
    Handler-->>responses: ResponsesAPIResponse
    responses-->>aresponses: ResponsesAPIResponse
    aresponses-->>Caller: ResponsesAPIResponse
Loading

Last reviewed commit: "fix: prevent double ..."

Comment on lines +680 to +683
_async_merged = kwargs.pop("_async_prompt_merged_params", None)
if _async_merged is not None:
for k, v in _async_merged.items():
local_vars[k] = v
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 local_vars["model"] not refreshed in async fast-path (inconsistency with sync path)

In the sync prompt-management block (lines 718–719) the code explicitly writes:

local_vars["model"] = model
local_vars["input"] = input

In this _async_merged fast-path only the optional params from the template are applied to local_vars; model and input are not re-written. This is not a functional bug today — model is passed in as a function argument (captured correctly by locals()) and local_vars["input"] is overwritten at line 746 — but the inconsistency makes the code harder to reason about and could silently break if anything upstream reads local_vars["model"] before line 804.

Suggested change
_async_merged = kwargs.pop("_async_prompt_merged_params", None)
if _async_merged is not None:
for k, v in _async_merged.items():
local_vars[k] = v
_async_merged = kwargs.pop("_async_prompt_merged_params", None)
if _async_merged is not None:
# Keep model / input in sync with local_vars just as the sync path does
local_vars["model"] = model
local_vars["input"] = input
for k, v in _async_merged.items():
local_vars[k] = v

@Sameerlite Sameerlite changed the title Litellm feat prompt responses [feat]Add prompt management support for responses api Mar 18, 2026
@Sameerlite Sameerlite changed the base branch from main to litellm_dev_sameer_16_march_week March 20, 2026 10:56
@Sameerlite Sameerlite merged commit aafe9da into BerriAI:litellm_dev_sameer_16_march_week Mar 20, 2026
28 of 72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant