Skip to content

feat(api)!: BREAKING CHANGE: support passing extra_body through to providers #3777

Merged
ehhuang merged 1 commit intomainfrom
pr3777
Oct 10, 2025
Merged

feat(api)!: BREAKING CHANGE: support passing extra_body through to providers #3777
ehhuang merged 1 commit intomainfrom
pr3777

Conversation

@ehhuang
Copy link
Contributor

@ehhuang ehhuang commented Oct 10, 2025

What does this PR do?

Allows passing through extra_body parameters to inference providers.

With this, we removed the 2 vllm-specific parameters from completions API into extra_body.
Before/After
image

closes #2720

Test Plan

CI and added new test

❯ uv run pytest -s -v tests/integration/ --stack-config=server:starter --inference-mode=record -k 'not( builtin_tool or safety_with_image or code_interpreter or test_rag ) and test_openai_completion_guided_choice' --setup=vllm --suite=base --color=yes
Uninstalled 3 packages in 125ms
Installed 3 packages in 19ms
INFO     2025-10-10 14:29:54,317 tests.integration.conftest:118 tests: Applying setup 'vllm' for suite base
INFO     2025-10-10 14:29:54,331 tests.integration.conftest:47 tests: Test stack config type: server
         (stack_config=server:starter)
============================================================================================================== test session starts ==============================================================================================================
platform darwin -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /Users/erichuang/projects/llama-stack-1/.venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.11', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}}
rootdir: /Users/erichuang/projects/llama-stack-1
configfile: pyproject.toml
plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 285 items / 284 deselected / 1 selected

tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
instantiating llama_stack_client
Starting llama stack server with config 'starter' on port 8321...
Waiting for server at http://localhost:8321... (0.0s elapsed)
Waiting for server at http://localhost:8321... (0.5s elapsed)
Waiting for server at http://localhost:8321... (5.1s elapsed)
Waiting for server at http://localhost:8321... (5.6s elapsed)
Waiting for server at http://localhost:8321... (10.1s elapsed)
Waiting for server at http://localhost:8321... (10.6s elapsed)
Server is ready at http://localhost:8321
llama_stack_client instantiated in 11.773s
PASSEDTerminating llama stack server process...
Terminating process 98444 and its group...
Server process and children terminated gracefully


============================================================================================================= slowest 10 durations ==============================================================================================================
11.88s setup    tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
3.02s call     tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
0.01s teardown tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
================================================================================================ 1 passed, 284 deselected, 3 warnings in 16.21s =================================================================================================

@ehhuang ehhuang changed the base branch from main to pr3761 October 10, 2025 20:47
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 10, 2025
@ehhuang ehhuang changed the title featu: support passing "extra body" throught to providers feat: support passing "extra body" throught to providers Oct 10, 2025
@ehhuang ehhuang changed the title feat: support passing "extra body" throught to providers feat: support passing "extra body" through to providers Oct 10, 2025
@ehhuang ehhuang force-pushed the pr3777 branch 3 times, most recently from d7b57a8 to aa34b11 Compare October 10, 2025 21:36
@ehhuang ehhuang changed the title feat: support passing "extra body" through to providers feat(api)!: support passing "extra body" through to providers BREAKING_CHANGE Oct 10, 2025
@ehhuang ehhuang force-pushed the pr3777 branch 5 times, most recently from 9f50338 to 0ed5949 Compare October 10, 2025 22:42
@ashwinb ashwinb marked this pull request as ready for review October 10, 2025 22:43
Copy link
Contributor

@ashwinb ashwinb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yesss

ehhuang added a commit that referenced this pull request Oct 10, 2025
…ct (#3761)

# What does this PR do?

Converts openai(_chat)_completions params to pydantic BaseModel to
reduce code duplication across all providers.

## Test Plan
CI









---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3761).
* #3777
* __->__ #3761
Base automatically changed from pr3761 to main October 10, 2025 22:46
@ehhuang ehhuang changed the title feat(api)!: support passing "extra body" through to providers BREAKING_CHANGE feat(api)!: support passing "extra body" through to providers BREAKING CHANGE Oct 10, 2025
@ehhuang ehhuang force-pushed the pr3777 branch 2 times, most recently from 28dff0b to c4dbaa9 Compare October 10, 2025 23:00
@ehhuang ehhuang changed the title feat(api)!: support passing "extra body" through to providers BREAKING CHANGE feat(api)!: BREAKING CHANGE: support passing "extra body" through to providers Oct 10, 2025
@ehhuang ehhuang changed the title feat(api)!: BREAKING CHANGE: support passing "extra body" through to providers feat(api)!: BREAKING CHANGE: support passing extra_body through to providers Oct 10, 2025
@ehhuang ehhuang changed the title feat(api)!: BREAKING CHANGE: support passing extra_body through to providers feat(api)!: BREAKING CHANGE: support passing extra_body through to providers Oct 10, 2025
# What does this PR do?
Allows passing through extra_body parameters to inference providers.


closes #2720

## Test Plan
CI and added new test
@ehhuang ehhuang changed the title feat(api)!: BREAKING CHANGE: support passing extra_body through to providers feat(api)!: BREAKING CHANGE: support passing "extra_body" through to providers Oct 10, 2025
@ehhuang ehhuang changed the title feat(api)!: BREAKING CHANGE: support passing "extra_body" through to providers feat(api)!: BREAKING CHANGE: support passing extra_body through to providers Oct 10, 2025
@ehhuang ehhuang merged commit 06e4cd8 into main Oct 10, 2025
44 of 47 checks passed
@ehhuang ehhuang deleted the pr3777 branch October 10, 2025 23:21
franciscojavierarceo pushed a commit to franciscojavierarceo/llama-stack that referenced this pull request Oct 11, 2025
…ct (llamastack#3761)

# What does this PR do?

Converts openai(_chat)_completions params to pydantic BaseModel to
reduce code duplication across all providers.

## Test Plan
CI









---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3761).
* llamastack#3777
* __->__ llamastack#3761
franciscojavierarceo pushed a commit to franciscojavierarceo/llama-stack that referenced this pull request Oct 11, 2025
…providers (llamastack#3777)

# What does this PR do?
Allows passing through extra_body parameters to inference providers.

With this, we removed the 2 vllm-specific parameters from completions
API into `extra_body`.
Before/After
<img width="1883" height="324" alt="image"
src="https://github.com/user-attachments/assets/acb27c08-c748-46c9-b1da-0de64e9908a1"
/>



closes llamastack#2720

## Test Plan
CI and added new test
```
❯ uv run pytest -s -v tests/integration/ --stack-config=server:starter --inference-mode=record -k 'not( builtin_tool or safety_with_image or code_interpreter or test_rag ) and test_openai_completion_guided_choice' --setup=vllm --suite=base --color=yes
Uninstalled 3 packages in 125ms
Installed 3 packages in 19ms
INFO     2025-10-10 14:29:54,317 tests.integration.conftest:118 tests: Applying setup 'vllm' for suite base
INFO     2025-10-10 14:29:54,331 tests.integration.conftest:47 tests: Test stack config type: server
         (stack_config=server:starter)
============================================================================================================== test session starts ==============================================================================================================
platform darwin -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /Users/erichuang/projects/llama-stack-1/.venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.11', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}}
rootdir: /Users/erichuang/projects/llama-stack-1
configfile: pyproject.toml
plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 285 items / 284 deselected / 1 selected

tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
instantiating llama_stack_client
Starting llama stack server with config 'starter' on port 8321...
Waiting for server at http://localhost:8321... (0.0s elapsed)
Waiting for server at http://localhost:8321... (0.5s elapsed)
Waiting for server at http://localhost:8321... (5.1s elapsed)
Waiting for server at http://localhost:8321... (5.6s elapsed)
Waiting for server at http://localhost:8321... (10.1s elapsed)
Waiting for server at http://localhost:8321... (10.6s elapsed)
Server is ready at http://localhost:8321
llama_stack_client instantiated in 11.773s
PASSEDTerminating llama stack server process...
Terminating process 98444 and its group...
Server process and children terminated gracefully


============================================================================================================= slowest 10 durations ==============================================================================================================
11.88s setup    tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
3.02s call     tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
0.01s teardown tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
================================================================================================ 1 passed, 284 deselected, 3 warnings in 16.21s =================================================================================================
```
ashwinb added a commit that referenced this pull request Oct 11, 2025
…s APIs

Applies the same pattern from #3777 to embeddings and vector_stores.create() endpoints.

Breaking change: Method signatures now accept a single params object with Pydantic extra="allow" instead of individual parameters. Provider-specific params can be passed via extra_body and accessed through params.model_extra.

Updated APIs: openai_embeddings(), openai_create_vector_store(), openai_create_vector_store_file_batch()
ashwinb added a commit that referenced this pull request Oct 13, 2025
…3794)

Applies the same pattern from
#3777 to embeddings and
vector_stores.create() endpoints.

This should _not_ be a breaking change since (a) our tests were already
using the `extra_body` parameter when passing in to the backend (b) but
the backend probably wasn't extracting the parameters correctly. This PR
will fix that.

Updated APIs: `openai_embeddings(), openai_create_vector_store(),
openai_create_vector_store_file_batch()`
jwm4 pushed a commit to jwm4/llama-stack that referenced this pull request Oct 13, 2025
…ct (llamastack#3761)

# What does this PR do?

Converts openai(_chat)_completions params to pydantic BaseModel to
reduce code duplication across all providers.

## Test Plan
CI









---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3761).
* llamastack#3777
* __->__ llamastack#3761
jwm4 pushed a commit to jwm4/llama-stack that referenced this pull request Oct 13, 2025
…providers (llamastack#3777)

# What does this PR do?
Allows passing through extra_body parameters to inference providers.

With this, we removed the 2 vllm-specific parameters from completions
API into `extra_body`.
Before/After
<img width="1883" height="324" alt="image"
src="https://github.com/user-attachments/assets/acb27c08-c748-46c9-b1da-0de64e9908a1"
/>



closes llamastack#2720

## Test Plan
CI and added new test
```
❯ uv run pytest -s -v tests/integration/ --stack-config=server:starter --inference-mode=record -k 'not( builtin_tool or safety_with_image or code_interpreter or test_rag ) and test_openai_completion_guided_choice' --setup=vllm --suite=base --color=yes
Uninstalled 3 packages in 125ms
Installed 3 packages in 19ms
INFO     2025-10-10 14:29:54,317 tests.integration.conftest:118 tests: Applying setup 'vllm' for suite base
INFO     2025-10-10 14:29:54,331 tests.integration.conftest:47 tests: Test stack config type: server
         (stack_config=server:starter)
============================================================================================================== test session starts ==============================================================================================================
platform darwin -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /Users/erichuang/projects/llama-stack-1/.venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.11', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}}
rootdir: /Users/erichuang/projects/llama-stack-1
configfile: pyproject.toml
plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 285 items / 284 deselected / 1 selected

tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
instantiating llama_stack_client
Starting llama stack server with config 'starter' on port 8321...
Waiting for server at http://localhost:8321... (0.0s elapsed)
Waiting for server at http://localhost:8321... (0.5s elapsed)
Waiting for server at http://localhost:8321... (5.1s elapsed)
Waiting for server at http://localhost:8321... (5.6s elapsed)
Waiting for server at http://localhost:8321... (10.1s elapsed)
Waiting for server at http://localhost:8321... (10.6s elapsed)
Server is ready at http://localhost:8321
llama_stack_client instantiated in 11.773s
PASSEDTerminating llama stack server process...
Terminating process 98444 and its group...
Server process and children terminated gracefully


============================================================================================================= slowest 10 durations ==============================================================================================================
11.88s setup    tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
3.02s call     tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
0.01s teardown tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B]
================================================================================================ 1 passed, 284 deselected, 3 warnings in 16.21s =================================================================================================
```
jwm4 pushed a commit to jwm4/llama-stack that referenced this pull request Oct 13, 2025
…lamastack#3794)

Applies the same pattern from
llamastack#3777 to embeddings and
vector_stores.create() endpoints.

This should _not_ be a breaking change since (a) our tests were already
using the `extra_body` parameter when passing in to the backend (b) but
the backend probably wasn't extracting the parameters correctly. This PR
will fix that.

Updated APIs: `openai_embeddings(), openai_create_vector_store(),
openai_create_vector_store_file_batch()`
codefromthecrypt added a commit to codefromthecrypt/llama-stack that referenced this pull request Feb 14, 2026
Like the equivalent support for chat completions (llamastack#3777), allow
provider-specific parameters to pass through the responses API.
Without this, requests with extra fields fail with
"Extra inputs are not permitted".

Update guided_choice tests to use the structured_outputs API, which
replaced guided_choice in vllm-project/vllm#22772.

Signed-off-by: Adrian Cole <adrian@tetrate.io>
codefromthecrypt added a commit to codefromthecrypt/llama-stack that referenced this pull request Mar 2, 2026
Like the equivalent support for chat completions (llamastack#3777), allow
provider-specific parameters to pass through the responses API.
Without this, requests with extra fields fail with
"Extra inputs are not permitted".

Update guided_choice tests to use the structured_outputs API, which
replaced guided_choice in vllm-project/vllm#22772.

Signed-off-by: Adrian Cole <adrian@tetrate.io>
github-merge-queue bot pushed a commit that referenced this pull request Mar 4, 2026
# What does this PR do?

Like the equivalent support for chat completions (#3777), allow
provider-specific parameters to pass through the responses API. Without
this, requests with extra fields fail with `Extra inputs are not
permitted`.

Changes:
- `CreateResponseRequest`: `extra="forbid"` → `extra="allow"`
- Thread `extra_body` (from `request.model_extra`) through `agents.py` →
`openai_responses.py` → `streaming.py` →
`OpenAIChatCompletionRequestWithExtraBody`
- `LiteLLMOpenAIMixin`: forward `model_extra` as `extra_body` in both
`atext_completion` and `acompletion` (matches `OpenAIMixin`)
- Update `guided_choice` tests → `structured_outputs` to match
vllm-project/vllm#22772

## Test Plan

Unit tests (136 passed):
```
uv run pytest tests/unit/providers/agents/meta_reference/ -v
```

Integration tests recorded against vLLM 0.15.1 (`Qwen/Qwen3-0.6B`):
```
uv run pytest -s -v \
  tests/integration/responses/test_basic_responses.py::test_response_extra_body_guided_choice \
  tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice \
  --stack-config=server:starter --setup=vllm --inference-mode=record \
  --embedding-model="" --color=yes
```

Output:
```
tests/integration/responses/test_basic_responses.py::test_response_extra_body_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] PASSED
tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] PASSED
============================== 2 passed in 3.85s ==============================
```

Pre-commit (all passed):
```
PATH="/opt/homebrew/bin:$PATH" uv run pre-commit run --all-files
```

Signed-off-by: Adrian Cole <adrian@tetrate.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

client.chat.completions.create() API is ignoring extra_body parameter

2 participants