Listing the current list of models and the loaded model. #61

FGDumitru · 2025-03-07T06:31:21Z

Hi again,

I'm working on a library that leverages llama-swap at remote API level.

In this context:

Is there a way for llama-swap to list the currently configured models in a JSON format?
I could not find a reliable way of detecting the currently loaded model also in JSON format (for now I'm scrubbing the /logs for that which, as you may already know, is not ideally.

I think these two improvements would increasingly add value to llama-swap all around.

Thank you.

mostlygeek · 2025-03-07T15:17:20Z

Hi,

There is /v1/models which is openAI compatible, but it won’t list any model that is configured with unlisted:true.

There’s currently no API that lists the loaded models. Maybe adding a llama-swap propriety API to allow programmatic control would be useful.

Take a look at proxy/proxymanager.go. You may be able to add the http handlers you need quickly.

FGDumitru · 2025-03-10T17:42:39Z

I've added a PR that adds an additional /running endpoint:

The endpoint returns either an empty JSON object if no model has been loaded so far, or the last model loaded (model key) and it's current state (state key). Possible state values are: stopped, starting, ready and stopping.

Example output if the endpoint is called right after llama-swap has been started:
{}

Example output if the endpoint is called and a model is still being loaded:
{"model":"DeepSeek-V3-Q4_K_M","state":"starting"}

Example output if the endpoint is called and a model has been loaded:
{"model":"DeepSeek-V3-Q4_K_M","state":"ready"}

Example output if the endpoint is called and a model is being unloaded due to TTL:
{"model":"DeepSeek-V3-Q4_K_M","state":"stopping"}

Example output if the endpoint is called and a model has been unloaded due to TTL:
{"model":"DeepSeek-V3-Q4_K_M","state":"stopped"}

Note: It returns an empty JSON object if the model is marked as Unlisted.

* Adds an endpoint '/running' that returns either an empty JSON object if no model has been loaded so far, or the last model loaded (model key) and it's current state (state key). Possible state values are: stopped, starting, ready and stopping. * Improves the `/running` endpoint by allowing multiple entries under the `running` key within the JSON response. Refactors the `/running` method name (listRunningProcessesHandler). Removes the unlisted filter implementation. * Adds tests for: - no model loaded - one model loaded - multiple models loaded * Adds simple comments. * Simplified code structure as per 250313 comments on PR #65. --------- Co-authored-by: FGDumitru|B <[email protected]>

mostlygeek · 2025-03-13T21:01:52Z

fixed in #65

mostlygeek closed this as completed Mar 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Listing the current list of models and the loaded model. #61

Listing the current list of models and the loaded model. #61

FGDumitru commented Mar 7, 2025

mostlygeek commented Mar 7, 2025

FGDumitru commented Mar 10, 2025 •

edited

Loading

mostlygeek commented Mar 13, 2025

Listing the current list of models and the loaded model. #61

Listing the current list of models and the loaded model. #61

Comments

FGDumitru commented Mar 7, 2025

mostlygeek commented Mar 7, 2025

FGDumitru commented Mar 10, 2025 • edited Loading

mostlygeek commented Mar 13, 2025

FGDumitru commented Mar 10, 2025 •

edited

Loading