Skip to content

feat(vllm): periodically refresh models#2823

Merged
ashwinb merged 3 commits intollamastack:mainfrom
ashwinb:envsimplify_1
Jul 18, 2025
Merged

feat(vllm): periodically refresh models#2823
ashwinb merged 3 commits intollamastack:mainfrom
ashwinb:envsimplify_1

Conversation

@ashwinb
Copy link
Contributor

@ashwinb ashwinb commented Jul 18, 2025

Just like #2805 but for vLLM.

We also make VLLM_URL env variable optional (not required) -- if not specified, the provider silently sits idle and yells eventually if someone tries to call a completion on it. This is done so as to allow this provider to be present in the starter distribution.

Test Plan

Set up vLLM, copy the starter template and set { refresh_models: true, refresh_models_interval: 10 } for the vllm provider and then run:

ENABLE_VLLM=vllm VLLM_URL=http://localhost:8000/v1 \
  uv run llama stack run --image-type venv /tmp/starter.yaml

Verify that llama-stack-client models list brings up the model correctly from vLLM.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 18, 2025

if not self.config.url:
raise ValueError(
"You must provide a vLLM URL in the run.yaml file (or set the VLLM_URL environment variable)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or set the VLLM_URL environment variable

is this correct in general? (non-starter templates)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehhuang yes I think it is correct, because any template will have the same config struct derived from sample_run_config()

@ashwinb ashwinb merged commit 199f859 into llamastack:main Jul 18, 2025
94 of 96 checks passed
@ashwinb ashwinb deleted the envsimplify_1 branch July 18, 2025 22:53
Nehanth pushed a commit to Nehanth/llama-stack that referenced this pull request Jul 23, 2025
Just like llamastack#2805 but for vLLM.

We also make VLLM_URL env variable optional (not required) -- if not
specified, the provider silently sits idle and yells eventually if
someone tries to call a completion on it. This is done so as to allow
this provider to be present in the `starter` distribution.

## Test Plan

Set up vLLM, copy the starter template and set `{ refresh_models: true,
refresh_models_interval: 10 }` for the vllm provider and then run:

```
ENABLE_VLLM=vllm VLLM_URL=http://localhost:8000/v1 \
  uv run llama stack run --image-type venv /tmp/starter.yaml
```

Verify that `llama-stack-client models list` brings up the model
correctly from vLLM.
ashwinb added a commit that referenced this pull request Jul 24, 2025
This flips #2823 and #2805 by making the Stack periodically query the
providers for models rather than the providers going behind the back and
calling "register" on to the registry themselves. This also adds support
for model listing for all other providers via `ModelRegistryHelper`.
Once this is done, we do not need to manually list or register models
via `run.yaml` and it will remove both noise and annoyance (setting
`INFERENCE_MODEL` environment variables, for example) from the new user
experience.

In addition, it adds a configuration variable `allowed_models` which can
be used to optionally restrict the set of models exposed from a
provider.
ChristianZaccaria pushed a commit to ChristianZaccaria/llama-stack that referenced this pull request Jul 28, 2025
…mastack#2862)

This flips llamastack#2823 and llamastack#2805 by making the Stack periodically query the
providers for models rather than the providers going behind the back and
calling "register" on to the registry themselves. This also adds support
for model listing for all other providers via `ModelRegistryHelper`.
Once this is done, we do not need to manually list or register models
via `run.yaml` and it will remove both noise and annoyance (setting
`INFERENCE_MODEL` environment variables, for example) from the new user
experience.

In addition, it adds a configuration variable `allowed_models` which can
be used to optionally restrict the set of models exposed from a
provider.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants