feat(vllm): periodically refresh models by ashwinb · Pull Request #2823 · llamastack/llama-stack

ashwinb · 2025-07-18T22:35:18Z

Just like #2805 but for vLLM.

We also make VLLM_URL env variable optional (not required) -- if not specified, the provider silently sits idle and yells eventually if someone tries to call a completion on it. This is done so as to allow this provider to be present in the starter distribution.

Test Plan

Set up vLLM, copy the starter template and set { refresh_models: true, refresh_models_interval: 10 } for the vllm provider and then run:

ENABLE_VLLM=vllm VLLM_URL=http://localhost:8000/v1 \
  uv run llama stack run --image-type venv /tmp/starter.yaml

Verify that llama-stack-client models list brings up the model correctly from vLLM.

ehhuang · 2025-07-18T22:42:51Z

llama_stack/providers/remote/inference/vllm/vllm.py


+        if not self.config.url:
+            raise ValueError(
+                "You must provide a vLLM URL in the run.yaml file (or set the VLLM_URL environment variable)"


or set the VLLM_URL environment variable

is this correct in general? (non-starter templates)?

@ehhuang yes I think it is correct, because any template will have the same config struct derived from sample_run_config()

Just like llamastack#2805 but for vLLM. We also make VLLM_URL env variable optional (not required) -- if not specified, the provider silently sits idle and yells eventually if someone tries to call a completion on it. This is done so as to allow this provider to be present in the `starter` distribution. ## Test Plan Set up vLLM, copy the starter template and set `{ refresh_models: true, refresh_models_interval: 10 }` for the vllm provider and then run: ``` ENABLE_VLLM=vllm VLLM_URL=http://localhost:8000/v1 \ uv run llama stack run --image-type venv /tmp/starter.yaml ``` Verify that `llama-stack-client models list` brings up the model correctly from vLLM.

This flips #2823 and #2805 by making the Stack periodically query the providers for models rather than the providers going behind the back and calling "register" on to the registry themselves. This also adds support for model listing for all other providers via `ModelRegistryHelper`. Once this is done, we do not need to manually list or register models via `run.yaml` and it will remove both noise and annoyance (setting `INFERENCE_MODEL` environment variables, for example) from the new user experience. In addition, it adds a configuration variable `allowed_models` which can be used to optionally restrict the set of models exposed from a provider.

…mastack#2862) This flips llamastack#2823 and llamastack#2805 by making the Stack periodically query the providers for models rather than the providers going behind the back and calling "register" on to the registry themselves. This also adds support for model listing for all other providers via `ModelRegistryHelper`. Once this is done, we do not need to manually list or register models via `run.yaml` and it will remove both noise and annoyance (setting `INFERENCE_MODEL` environment variables, for example) from the new user experience. In addition, it adds a configuration variable `allowed_models` which can be used to optionally restrict the set of models exposed from a provider.

feat(vllm): periodically refresh models

1bf710b

ashwinb requested review from bbrowning, ehhuang, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, terrytangyuan and yanxi0830 as code owners July 18, 2025 22:35

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 18, 2025

ashwinb added 2 commits July 18, 2025 15:37

pre-commit

ad34a48

comment

b58d0da

ehhuang approved these changes Jul 18, 2025

View reviewed changes

ashwinb merged commit 199f859 into llamastack:main Jul 18, 2025
94 of 96 checks passed

ashwinb deleted the envsimplify_1 branch July 18, 2025 22:53

ashwinb mentioned this pull request Jul 22, 2025

feat(registry): make the Stack query providers for model listing #2862

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vllm): periodically refresh models#2823

feat(vllm): periodically refresh models#2823
ashwinb merged 3 commits intollamastack:mainfrom
ashwinb:envsimplify_1

ashwinb commented Jul 18, 2025 •

edited

Loading

Uh oh!

ehhuang Jul 18, 2025

Uh oh!

ashwinb Jul 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ashwinb commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Plan

Uh oh!

ehhuang Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

ashwinb Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ashwinb commented Jul 18, 2025 •

edited

Loading