Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listing the current list of models and the loaded model. #61

Closed
FGDumitru opened this issue Mar 7, 2025 · 3 comments
Closed

Listing the current list of models and the loaded model. #61

FGDumitru opened this issue Mar 7, 2025 · 3 comments

Comments

@FGDumitru
Copy link
Contributor

Hi again,

I'm working on a library that leverages llama-swap at remote API level.

In this context:

  1. Is there a way for llama-swap to list the currently configured models in a JSON format?

  2. I could not find a reliable way of detecting the currently loaded model also in JSON format (for now I'm scrubbing the /logs for that which, as you may already know, is not ideally.

I think these two improvements would increasingly add value to llama-swap all around.

Thank you.

@mostlygeek
Copy link
Owner

Hi,

There is /v1/models which is openAI compatible, but it won’t list any model that is configured with unlisted:true.

There’s currently no API that lists the loaded models. Maybe adding a llama-swap propriety API to allow programmatic control would be useful.

Take a look at proxy/proxymanager.go. You may be able to add the http handlers you need quickly.

@FGDumitru
Copy link
Contributor Author

FGDumitru commented Mar 10, 2025

I've added a PR that adds an additional /running endpoint:

The endpoint returns either an empty JSON object if no model has been loaded so far, or the last model loaded (model key) and it's current state (state key). Possible state values are: stopped, starting, ready and stopping.

Example output if the endpoint is called right after llama-swap has been started:
{}

Example output if the endpoint is called and a model is still being loaded:
{"model":"DeepSeek-V3-Q4_K_M","state":"starting"}

Example output if the endpoint is called and a model has been loaded:
{"model":"DeepSeek-V3-Q4_K_M","state":"ready"}

Example output if the endpoint is called and a model is being unloaded due to TTL:
{"model":"DeepSeek-V3-Q4_K_M","state":"stopping"}

Example output if the endpoint is called and a model has been unloaded due to TTL:
{"model":"DeepSeek-V3-Q4_K_M","state":"stopped"}

Note: It returns an empty JSON object if the model is marked as Unlisted.

mostlygeek pushed a commit that referenced this issue Mar 13, 2025
* Adds an endpoint '/running' that returns either an empty JSON object if no model has been loaded so far, or the last model loaded (model key) and it's current state (state key). Possible state values are: stopped, starting, ready and stopping.

* Improves the `/running` endpoint by allowing multiple entries under the `running` key within the JSON response.
Refactors the `/running` method name (listRunningProcessesHandler).
Removes the unlisted filter implementation.

* Adds tests for:
- no model loaded
- one model loaded
- multiple models loaded

* Adds simple comments.

* Simplified code structure as per 250313 comments on PR #65.

---------

Co-authored-by: FGDumitru|B <[email protected]>
@mostlygeek
Copy link
Owner

fixed in #65

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants