feat(api)!: BREAKING CHANGE: support passing `extra_body` through to providers by ehhuang · Pull Request #3777 · llamastack/llama-stack

ehhuang · 2025-10-10T20:47:52Z

What does this PR do?

Allows passing through extra_body parameters to inference providers.

With this, we removed the 2 vllm-specific parameters from completions API into extra_body.
Before/After

Test Plan

CI and added new test

❯ uv run pytest -s -v tests/integration/ --stack-config=server:starter --inference-mode=record -k 'not( builtin_tool or safety_with_image or code_interpreter or test_rag ) and test_openai_completion_guided_choice' --setup=vllm --suite=base --color=yes
Uninstalled 3 packages in 125ms
Installed 3 packages in 19ms
INFO     2025-10-10 14:29:54,317 tests.integration.conftest:118 tests: Applying setup 'vllm' for suite base
INFO     2025-10-10 14:29:54,331 tests.integration.conftest:47 tests: Test stack config type: server
         (stack_config=server:starter)
============================================================================================================== test session starts ==============================================================================================================
platform darwin -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /Users/erichuang/projects/llama-stack-1/.venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.11', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}}
rootdir: /Users/erichuang/projects/llama-stack-1
configfile: pyproject.toml
plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 285 items / 284 deselected / 1 selected

tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
instantiating llama_stack_client
Starting llama stack server with config 'starter' on port 8321...
Waiting for server at http://localhost:8321... (0.0s elapsed)
Waiting for server at http://localhost:8321... (0.5s elapsed)
Waiting for server at http://localhost:8321... (5.1s elapsed)
Waiting for server at http://localhost:8321... (5.6s elapsed)
Waiting for server at http://localhost:8321... (10.1s elapsed)
Waiting for server at http://localhost:8321... (10.6s elapsed)
Server is ready at http://localhost:8321
llama_stack_client instantiated in 11.773s
PASSEDTerminating llama stack server process...
Terminating process 98444 and its group...
Server process and children terminated gracefully


============================================================================================================= slowest 10 durations ==============================================================================================================
11.88s setup    tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
3.02s call     tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
0.01s teardown tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
================================================================================================ 1 passed, 284 deselected, 3 warnings in 16.21s =================================================================================================

ashwinb

yesss

…ct (#3761) # What does this PR do? Converts openai(_chat)_completions params to pydantic BaseModel to reduce code duplication across all providers. ## Test Plan CI --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3761). * #3777 * __->__ #3761

# What does this PR do? Allows passing through extra_body parameters to inference providers. closes #2720 ## Test Plan CI and added new test

…ct (llamastack#3761) # What does this PR do? Converts openai(_chat)_completions params to pydantic BaseModel to reduce code duplication across all providers. ## Test Plan CI --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3761). * llamastack#3777 * __->__ llamastack#3761

…providers (llamastack#3777) # What does this PR do? Allows passing through extra_body parameters to inference providers. With this, we removed the 2 vllm-specific parameters from completions API into `extra_body`. Before/After <img width="1883" height="324" alt="image" src="https://github.com/user-attachments/assets/acb27c08-c748-46c9-b1da-0de64e9908a1" /> closes llamastack#2720 ## Test Plan CI and added new test ``` ❯ uv run pytest -s -v tests/integration/ --stack-config=server:starter --inference-mode=record -k 'not( builtin_tool or safety_with_image or code_interpreter or test_rag ) and test_openai_completion_guided_choice' --setup=vllm --suite=base --color=yes Uninstalled 3 packages in 125ms Installed 3 packages in 19ms INFO 2025-10-10 14:29:54,317 tests.integration.conftest:118 tests: Applying setup 'vllm' for suite base INFO 2025-10-10 14:29:54,331 tests.integration.conftest:47 tests: Test stack config type: server (stack_config=server:starter) ============================================================================================================== test session starts ============================================================================================================== platform darwin -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /Users/erichuang/projects/llama-stack-1/.venv/bin/python cachedir: .pytest_cache metadata: {'Python': '3.12.11', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}} rootdir: /Users/erichuang/projects/llama-stack-1 configfile: pyproject.toml plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0 asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 285 items / 284 deselected / 1 selected tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] instantiating llama_stack_client Starting llama stack server with config 'starter' on port 8321... Waiting for server at http://localhost:8321... (0.0s elapsed) Waiting for server at http://localhost:8321... (0.5s elapsed) Waiting for server at http://localhost:8321... (5.1s elapsed) Waiting for server at http://localhost:8321... (5.6s elapsed) Waiting for server at http://localhost:8321... (10.1s elapsed) Waiting for server at http://localhost:8321... (10.6s elapsed) Server is ready at http://localhost:8321 llama_stack_client instantiated in 11.773s PASSEDTerminating llama stack server process... Terminating process 98444 and its group... Server process and children terminated gracefully ============================================================================================================= slowest 10 durations ============================================================================================================== 11.88s setup tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] 3.02s call tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] 0.01s teardown tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] ================================================================================================ 1 passed, 284 deselected, 3 warnings in 16.21s ================================================================================================= ```

…s APIs Applies the same pattern from #3777 to embeddings and vector_stores.create() endpoints. Breaking change: Method signatures now accept a single params object with Pydantic extra="allow" instead of individual parameters. Provider-specific params can be passed via extra_body and accessed through params.model_extra. Updated APIs: openai_embeddings(), openai_create_vector_store(), openai_create_vector_store_file_batch()

…3794) Applies the same pattern from #3777 to embeddings and vector_stores.create() endpoints. This should _not_ be a breaking change since (a) our tests were already using the `extra_body` parameter when passing in to the backend (b) but the backend probably wasn't extracting the parameters correctly. This PR will fix that. Updated APIs: `openai_embeddings(), openai_create_vector_store(), openai_create_vector_store_file_batch()`

…ct (llamastack#3761) # What does this PR do? Converts openai(_chat)_completions params to pydantic BaseModel to reduce code duplication across all providers. ## Test Plan CI --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3761). * llamastack#3777 * __->__ llamastack#3761

…providers (llamastack#3777) # What does this PR do? Allows passing through extra_body parameters to inference providers. With this, we removed the 2 vllm-specific parameters from completions API into `extra_body`. Before/After <img width="1883" height="324" alt="image" src="https://github.com/user-attachments/assets/acb27c08-c748-46c9-b1da-0de64e9908a1" /> closes llamastack#2720 ## Test Plan CI and added new test ``` ❯ uv run pytest -s -v tests/integration/ --stack-config=server:starter --inference-mode=record -k 'not( builtin_tool or safety_with_image or code_interpreter or test_rag ) and test_openai_completion_guided_choice' --setup=vllm --suite=base --color=yes Uninstalled 3 packages in 125ms Installed 3 packages in 19ms INFO 2025-10-10 14:29:54,317 tests.integration.conftest:118 tests: Applying setup 'vllm' for suite base INFO 2025-10-10 14:29:54,331 tests.integration.conftest:47 tests: Test stack config type: server (stack_config=server:starter) ============================================================================================================== test session starts ============================================================================================================== platform darwin -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /Users/erichuang/projects/llama-stack-1/.venv/bin/python cachedir: .pytest_cache metadata: {'Python': '3.12.11', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}} rootdir: /Users/erichuang/projects/llama-stack-1 configfile: pyproject.toml plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0 asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 285 items / 284 deselected / 1 selected tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] instantiating llama_stack_client Starting llama stack server with config 'starter' on port 8321... Waiting for server at http://localhost:8321... (0.0s elapsed) Waiting for server at http://localhost:8321... (0.5s elapsed) Waiting for server at http://localhost:8321... (5.1s elapsed) Waiting for server at http://localhost:8321... (5.6s elapsed) Waiting for server at http://localhost:8321... (10.1s elapsed) Waiting for server at http://localhost:8321... (10.6s elapsed) Server is ready at http://localhost:8321 llama_stack_client instantiated in 11.773s PASSEDTerminating llama stack server process... Terminating process 98444 and its group... Server process and children terminated gracefully ============================================================================================================= slowest 10 durations ============================================================================================================== 11.88s setup tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] 3.02s call tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] 0.01s teardown tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] ================================================================================================ 1 passed, 284 deselected, 3 warnings in 16.21s ================================================================================================= ```

…lamastack#3794) Applies the same pattern from llamastack#3777 to embeddings and vector_stores.create() endpoints. This should _not_ be a breaking change since (a) our tests were already using the `extra_body` parameter when passing in to the backend (b) but the backend probably wasn't extracting the parameters correctly. This PR will fix that. Updated APIs: `openai_embeddings(), openai_create_vector_store(), openai_create_vector_store_file_batch()`

Like the equivalent support for chat completions (llamastack#3777), allow provider-specific parameters to pass through the responses API. Without this, requests with extra fields fail with "Extra inputs are not permitted". Update guided_choice tests to use the structured_outputs API, which replaced guided_choice in vllm-project/vllm#22772. Signed-off-by: Adrian Cole <adrian@tetrate.io>

# What does this PR do? Like the equivalent support for chat completions (#3777), allow provider-specific parameters to pass through the responses API. Without this, requests with extra fields fail with `Extra inputs are not permitted`. Changes: - `CreateResponseRequest`: `extra="forbid"` → `extra="allow"` - Thread `extra_body` (from `request.model_extra`) through `agents.py` → `openai_responses.py` → `streaming.py` → `OpenAIChatCompletionRequestWithExtraBody` - `LiteLLMOpenAIMixin`: forward `model_extra` as `extra_body` in both `atext_completion` and `acompletion` (matches `OpenAIMixin`) - Update `guided_choice` tests → `structured_outputs` to match vllm-project/vllm#22772 ## Test Plan Unit tests (136 passed): ``` uv run pytest tests/unit/providers/agents/meta_reference/ -v ``` Integration tests recorded against vLLM 0.15.1 (`Qwen/Qwen3-0.6B`): ``` uv run pytest -s -v \ tests/integration/responses/test_basic_responses.py::test_response_extra_body_guided_choice \ tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice \ --stack-config=server:starter --setup=vllm --inference-mode=record \ --embedding-model="" --color=yes ``` Output: ``` tests/integration/responses/test_basic_responses.py::test_response_extra_body_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] PASSED tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] PASSED ============================== 2 passed in 3.85s ============================== ``` Pre-commit (all passed): ``` PATH="/opt/homebrew/bin:$PATH" uv run pre-commit run --all-files ``` Signed-off-by: Adrian Cole <adrian@tetrate.io>

ehhuang changed the base branch from main to pr3761 October 10, 2025 20:47

ehhuang mentioned this pull request Oct 10, 2025

chore: refactor (chat)completions endpoints to use shared params struct #3761

Merged

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 10, 2025

ehhuang changed the title ~~featu: support passing "extra body" throught to providers~~ feat: support passing "extra body" throught to providers Oct 10, 2025

ehhuang changed the title ~~feat: support passing "extra body" throught to providers~~ feat: support passing "extra body" through to providers Oct 10, 2025

ehhuang force-pushed the pr3761 branch from 361fcaf to cb7fb07 Compare October 10, 2025 20:53

ehhuang force-pushed the pr3777 branch 3 times, most recently from d7b57a8 to aa34b11 Compare October 10, 2025 21:36

ehhuang changed the title ~~feat: support passing "extra body" through to providers~~ feat(api)!: support passing "extra body" through to providers BREAKING_CHANGE Oct 10, 2025

ehhuang force-pushed the pr3777 branch 5 times, most recently from 9f50338 to 0ed5949 Compare October 10, 2025 22:42

ashwinb marked this pull request as ready for review October 10, 2025 22:43

ashwinb requested review from ashwinb, bbrowning, franciscojavierarceo, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, slekkala1, terrytangyuan and yanxi0830 as code owners October 10, 2025 22:43

ashwinb approved these changes Oct 10, 2025

View reviewed changes

Base automatically changed from pr3761 to main October 10, 2025 22:46

ehhuang force-pushed the pr3777 branch from 0ed5949 to 10c7e67 Compare October 10, 2025 22:47

ehhuang changed the title ~~feat(api)!: support passing "extra body" through to providers BREAKING_CHANGE~~ feat(api)!: support passing "extra body" through to providers BREAKING CHANGE Oct 10, 2025

ehhuang force-pushed the pr3777 branch 2 times, most recently from 28dff0b to c4dbaa9 Compare October 10, 2025 23:00

ehhuang changed the title ~~feat(api)!: support passing "extra body" through to providers BREAKING CHANGE~~ feat(api)!: BREAKING CHANGE: support passing "extra body" through to providers Oct 10, 2025

ehhuang changed the title ~~feat(api)!: BREAKING CHANGE: support passing "extra body" through to providers~~ feat(api)!: BREAKING CHANGE: support passing extra_body through to providers Oct 10, 2025

ehhuang changed the title ~~feat(api)!: BREAKING CHANGE: support passing extra_body through to providers~~ feat(api)!: BREAKING CHANGE: support passing extra_body through to providers Oct 10, 2025

featu: support passing "extra body" throught to providers

579aa96

# What does this PR do? Allows passing through extra_body parameters to inference providers. closes #2720 ## Test Plan CI and added new test

ehhuang force-pushed the pr3777 branch from c4dbaa9 to 579aa96 Compare October 10, 2025 23:10

ehhuang changed the title ~~feat(api)!: BREAKING CHANGE: support passing extra_body through to providers~~ feat(api)!: BREAKING CHANGE: support passing "extra_body" through to providers Oct 10, 2025

ehhuang changed the title ~~feat(api)!: BREAKING CHANGE: support passing "extra_body" through to providers~~ feat(api)!: BREAKING CHANGE: support passing extra_body through to providers Oct 10, 2025

ehhuang merged commit 06e4cd8 into main Oct 10, 2025
44 of 47 checks passed

ehhuang deleted the pr3777 branch October 10, 2025 23:21

ashwinb mentioned this pull request Oct 12, 2025

feat(api)!: support extra_body to embeddings and vector_stores APIs #3794

Merged

codefromthecrypt mentioned this pull request Feb 12, 2026

feat(api): support extra_body pass-through in responses API #4893

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api)!: BREAKING CHANGE: support passing `extra_body` through to providers #3777

feat(api)!: BREAKING CHANGE: support passing `extra_body` through to providers #3777
ehhuang merged 1 commit intomainfrom
pr3777

ehhuang commented Oct 10, 2025 •

edited

Loading

Uh oh!

ashwinb left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ehhuang commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Uh oh!

ashwinb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ehhuang commented Oct 10, 2025 •

edited

Loading