feat: add status and ttl to /v1/models endpoint#569
feat: add status and ttl to /v1/models endpoint#569ghostdevv wants to merge 7 commits intomostlygeek:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a new GET /api/models endpoint (returns models with name, id, state, ttl, ttlRemaining), moves unload paths under /api/models, implements Process.GetTTL and ProxyManager helpers to surface TTLs, and adds unit tests covering listing and TTL behavior. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
proxy/proxymanager_test.go (1)
1317-1365: Consider expanding coverage for contract edge cases.Great baseline test. Please add sub-cases for at least one loaded model (
ready) and peer-listed models so/api/modelscontract stays stable forname/stateacross all returned entries.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@proxy/proxymanager_test.go` around lines 1317 - 1365, Update TestProxyManager_ApiGetModels to include sub-cases that exercise a loaded model and peer-listed models: keep the existing setup using New and StopProcesses but add a subtest where you transition one model to the "ready" state (e.g., start the model via the proxy's start/process method or set its state in the proxy's model registry) before calling proxy.ServeHTTP, and another subtest where you inject peer metadata into the proxy (e.g., add entries to the proxy's peer list/peer manager) so /api/models returns entries from both local and peer sources; assert that for all returned entries the "id", "name" and "state" fields are present and that the loaded model shows "state" == "ready" while others remain "stopped". Ensure you reuse CreateTestResponseRecorder, Route through ServeHTTP, and clean up with StopProcesses in each subtest.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@proxy/proxymanager_api.go`:
- Around line 52-63: Normalize the output in ProxyManager.apiGetModels: treat
empty model names by falling back to the model Id (use m.Id when m.Name == ""
for the apiModel.Name), and restrict m.State to the documented contract by
mapping only "starting" and "ready" through; coerce any other state (including
"stopping", "shutdown", "unknown" or empty) to "stopped" when setting
apiModel.State. Update the loop that builds result (refer to apiGetModels,
apiModel and getModelStatus and fields m.Name/m.State/m.Id) to apply these
mappings before calling c.JSON.
---
Nitpick comments:
In `@proxy/proxymanager_test.go`:
- Around line 1317-1365: Update TestProxyManager_ApiGetModels to include
sub-cases that exercise a loaded model and peer-listed models: keep the existing
setup using New and StopProcesses but add a subtest where you transition one
model to the "ready" state (e.g., start the model via the proxy's start/process
method or set its state in the proxy's model registry) before calling
proxy.ServeHTTP, and another subtest where you inject peer metadata into the
proxy (e.g., add entries to the proxy's peer list/peer manager) so /api/models
returns entries from both local and peer sources; assert that for all returned
entries the "id", "name" and "state" fields are present and that the loaded
model shows "state" == "ready" while others remain "stopped". Ensure you reuse
CreateTestResponseRecorder, Route through ServeHTTP, and clean up with
StopProcesses in each subtest.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: da787c5b-1563-4438-8cb2-29dcd02f803f
📒 Files selected for processing (3)
README.mdproxy/proxymanager_api.goproxy/proxymanager_test.go
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@README.md`:
- Line 40: Update the `/api/models` entry in README.md to explicitly document
the full pass-through `state` enum returned by that endpoint: list the values
`ready`, `starting`, `stopping`, `stopped`, `shutdown`, and `unknown`, and state
that the API intentionally passes through these states exactly (no
normalization) to avoid lossy mappings and preserve forward compatibility.
Ensure the description for `/api/models` mentions that clients should handle all
listed states.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ac760b86-05db-4f67-bb28-4719a0f9e7ed
📒 Files selected for processing (3)
README.mdproxy/proxymanager_api.goproxy/proxymanager_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- proxy/proxymanager_api.go
- proxy/proxymanager_test.go
|
I just pushed the ability for this This is ready for review whenever you have time @mostlygeek 🙏 |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
proxy/proxymanager_api.go (1)
150-158:⚠️ Potential issue | 🟡 MinorPeer models may have empty
stateexposed through the new/api/modelsendpoint.While this code segment is unchanged, the new
/api/modelsendpoint now exposes peer models withstate: ""(empty string), which is not in the documented set of valid states (ready,starting,stopping,stopped,shutdown,unknown). This could cause issues for clients like the Svelte UI that expectsModelStatusto be one of the valid string literals.Consider initializing peer models with
state: "unknown"to maintain consistency:models = append(models, Model{ Id: modelID, PeerID: peerID, + State: "unknown", })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@proxy/proxymanager_api.go` around lines 150 - 158, When building the models slice from peer proxies (pm.peerProxy and its ListPeers()), ensure each created Model includes a valid state instead of an empty string; update the construction of Model within the loop that iterates peer.Models to set State: "unknown" (or map empty/zero values to "unknown") so peer-exposed models conform to the documented ModelStatus literals for the /api/models endpoint.
🧹 Nitpick comments (1)
proxy/proxymanager_test.go (1)
1376-1424: TTL test coverage looks comprehensive.The test properly validates:
- TTL value is correctly set to the configured
UnloadAftervaluettlRemainingis within expected bounds after a request- Models without
UnloadAfterreturn null for both TTL fieldsMinor robustness suggestion: The type assertion on line 1419 could panic if the test setup fails unexpectedly. Consider using a comma-ok idiom or
requirefor preceding assertions:Optional: Safer type assertion
- assert.Equal(t, float64(60), modelMap["model1"]["ttl"]) - ttlRemaining1 := modelMap["model1"]["ttlRemaining"].(float64) + require.Equal(t, float64(60), modelMap["model1"]["ttl"]) + ttlRemaining1, ok := modelMap["model1"]["ttlRemaining"].(float64) + require.True(t, ok, "ttlRemaining should be a float64")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@proxy/proxymanager_test.go` around lines 1376 - 1424, The direct type assertion ttlRemaining1 := modelMap["model1"]["ttlRemaining"].(float64) in TestProxyManager_ApiGetModels_WithTTL can panic if the field is missing or of the wrong type; change it to a safe check using the comma-ok idiom (v, ok := modelMap["model1"]["ttlRemaining"].(float64)) and assert/require that ok is true before using v, or use require.IsType/require.NotNil from testify to fail the test cleanly if the value isn't a float64 — update the assertion around ttlRemaining1 accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@proxy/proxymanager_api.go`:
- Around line 150-158: When building the models slice from peer proxies
(pm.peerProxy and its ListPeers()), ensure each created Model includes a valid
state instead of an empty string; update the construction of Model within the
loop that iterates peer.Models to set State: "unknown" (or map empty/zero values
to "unknown") so peer-exposed models conform to the documented ModelStatus
literals for the /api/models endpoint.
---
Nitpick comments:
In `@proxy/proxymanager_test.go`:
- Around line 1376-1424: The direct type assertion ttlRemaining1 :=
modelMap["model1"]["ttlRemaining"].(float64) in
TestProxyManager_ApiGetModels_WithTTL can panic if the field is missing or of
the wrong type; change it to a safe check using the comma-ok idiom (v, ok :=
modelMap["model1"]["ttlRemaining"].(float64)) and assert/require that ok is true
before using v, or use require.IsType/require.NotNil from testify to fail the
test cleanly if the value isn't a float64 — update the assertion around
ttlRemaining1 accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f0427bbf-ba50-4c37-bc06-9c432000bd6c
📒 Files selected for processing (2)
proxy/proxymanager_api.goproxy/proxymanager_test.go
|
Hi, There is the v1/models endpoint that is supported by most clients. Any reason not to use that one? The handler has some metadata code already and extending that would eliminate having duplicated or similar endpoints. For retrieving the TTL and remaining time I suggest adding a small function to proxy.Process. For example, Process.getTTL() (configured int, remaining int). A return of 0,0 would mean no configured TTL. A return of 300,3 would mean 300s configured and 3s until shutdown. |
We could, it's just it's llama-swap specific and so I opted to create another endpoint. The openai openapi spec doesn't give way for extra metadata, but I guess we're technically breaking it anyway since
moved!
The way I have it now with the |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
proxy/proxymanager_api.go (1)
63-87: Avoid repeated group scans per model when building response.
apiGetModelscallsgetModelTTLfor each model, andgetModelTTLdoes anotherfindGroupByModelNamelookup. This duplicates traversal already done in status collection and can become expensive as model count grows.Consider producing state + TTL in one pass (or caching a modelID→process/group map for the request) to avoid repeated lookups.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@proxy/proxymanager_api.go` around lines 63 - 87, The current apiGetModels loop calls getModelTTL which calls findGroupByModelName repeatedly causing duplicate traversals; modify apiGetModels (or the loop that builds result) to compute state and TTL in a single pass by either 1) while iterating groups/processes collect a temporary map[string]*Process (or map[string]ttl info) keyed by modelID and then use that map when constructing apiModel entries, or 2) change getModelTTL to accept a precomputed model→process/group map so findGroupByModelName is not invoked per model; update references to getModelTTL and ProxyManager.findGroupByModelName/process.GetTTL accordingly so each model lookup is O(1) per request instead of re-scanning groups.proxy/process.go (1)
149-149: Use camelCase for the new method name.Line 149 introduces
GetTTL(PascalCase). Please rename to camelCase (for examplegetTTL) and update the caller accordingly.Proposed rename
- func (p *Process) GetTTL() (ttl *int, ttlRemaining *int) { + func (p *Process) getTTL() (ttl *int, ttlRemaining *int) {- return process.GetTTL() + return process.getTTL()As per coding guidelines:
**/*.go: Use camelCase for function/method naming in Go.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@proxy/process.go` at line 149, The method on type Process is named GetTTL (PascalCase) but Go style expects camelCase; rename the method from GetTTL to getTTL and update every caller to the new name. Locate the method declaration "func (p *Process) GetTTL()" and change it to "func (p *Process) getTTL()", then search for usages of "GetTTL(" and replace them with "getTTL(" to ensure compilation and consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@proxy/process.go`:
- Around line 155-157: The current branch that checks p.CurrentState() !=
StateReady returns new(int) which produces a zero-valued pointer and leads to
ttlRemaining: 0; change that to return nil for the ttlRemaining pointer so the
API emits null when the model isn't ready (keep returning &ttlSeconds as the
first return value). Update the return expression in the p.CurrentState() !=
StateReady branch to return &ttlSeconds, nil, ensuring the second return value
is a nil *int to satisfy the function signature (symbols: p.CurrentState(),
StateReady, ttlSeconds, ttlRemaining).
---
Nitpick comments:
In `@proxy/process.go`:
- Line 149: The method on type Process is named GetTTL (PascalCase) but Go style
expects camelCase; rename the method from GetTTL to getTTL and update every
caller to the new name. Locate the method declaration "func (p *Process)
GetTTL()" and change it to "func (p *Process) getTTL()", then search for usages
of "GetTTL(" and replace them with "getTTL(" to ensure compilation and
consistency.
In `@proxy/proxymanager_api.go`:
- Around line 63-87: The current apiGetModels loop calls getModelTTL which calls
findGroupByModelName repeatedly causing duplicate traversals; modify
apiGetModels (or the loop that builds result) to compute state and TTL in a
single pass by either 1) while iterating groups/processes collect a temporary
map[string]*Process (or map[string]ttl info) keyed by modelID and then use that
map when constructing apiModel entries, or 2) change getModelTTL to accept a
precomputed model→process/group map so findGroupByModelName is not invoked per
model; update references to getModelTTL and
ProxyManager.findGroupByModelName/process.GetTTL accordingly so each model
lookup is O(1) per request instead of re-scanning groups.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9147de9a-a8aa-423d-81f9-6db728e8e773
📒 Files selected for processing (2)
proxy/process.goproxy/proxymanager_api.go
There was a problem hiding this comment.
♻️ Duplicate comments (1)
proxy/proxymanager_api.go (1)
65-69:⚠️ Potential issue | 🟠 MajorPeer models can still emit an invalid empty
statevalue.On Line 68,
State: m.Stateforwards""for peer entries that lack state ingetModelStatus, which breaks the documented state contract (ready|starting|stopping|stopped|shutdown|unknown).Proposed fix (boundary-safe default in API mapping)
func (pm *ProxyManager) apiGetModels(c *gin.Context) { models := pm.getModelStatus() result := make([]apiModel, len(models)) for i, m := range models { name := m.Name if name == "" { name = m.Id } + state := m.State + if state == "" { + state = "unknown" + } ttl, ttlRemaining := pm.getModelTTL(m.Id) result[i] = apiModel{ Name: name, Id: m.Id, - State: m.State, + State: state, TTL: ttl, TTLRemaining: ttlRemaining, } } c.JSON(http.StatusOK, result) }Based on learnings: In mostlygeek/llama-swap,
/api/modelsintentionally passes through all model states (stopped,starting,ready,stopping,shutdown,unknown) rather than coercing them.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@proxy/proxymanager_api.go` around lines 65 - 69, The API mapping currently forwards an empty peer model state (result[i] = apiModel{..., State: m.State, ...}) which can violate the documented contract; update the mapping in the code that builds apiModel (the result[i] assignment that uses m.State) to normalize blank or invalid values by substituting a safe default (e.g., "unknown") — ideally check m.State and if it is "" or not one of the allowed tokens (ready|starting|stopping|stopped|shutdown|unknown) set State to "unknown" before assigning to apiModel so all API responses conform to the documented state set; reference the apiModel construction and the getModelStatus/m.State source to make the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@proxy/proxymanager_api.go`:
- Around line 65-69: The API mapping currently forwards an empty peer model
state (result[i] = apiModel{..., State: m.State, ...}) which can violate the
documented contract; update the mapping in the code that builds apiModel (the
result[i] assignment that uses m.State) to normalize blank or invalid values by
substituting a safe default (e.g., "unknown") — ideally check m.State and if it
is "" or not one of the allowed tokens
(ready|starting|stopping|stopped|shutdown|unknown) set State to "unknown" before
assigning to apiModel so all API responses conform to the documented state set;
reference the apiModel construction and the getModelStatus/m.State source to
make the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 943e5c59-430f-4e08-9cf3-23e8d80cf95d
📒 Files selected for processing (2)
proxy/process.goproxy/proxymanager_api.go
🚧 Files skipped from review as they are similar to previous changes (1)
- proxy/process.go
Nest it under
It makes sense. I prefer to use an int rather than pointers in the code. I think it's OK to check for ttl=0 to determine if it is enabled. If it is greater than 0, then a ttl_remaining of 0 (or less) would mean the process is going to be shut down or is in the process of shutting down. |
I'm not a fan either but unfortunately I don't know of another way to express
Yea but you also need to check if the state is ready because only then does a |
|
@mostlygeek updated 🙏 |
Closes #541
Closes #564
Update the
/v1/modelsto return the modelstate,ttl, andttlRemainingunder the meta.llamaswap field. For example:{ "data": [ { "created": 1775097337, "id": "llama-3.1-8b-instruct", "meta": { "llamaswap": { "state": "stopped", "ttl": 30, "ttlRemaining": null } }, "name": "Llama 3.1 8b Instruct", "object": "model", "owned_by": "llama-swap" } ], "object": "list" }The following states are used (from proxymanager_api):
readystartingstoppingstoppedshutdownunknownI also updated the README for
/api/models/unloadas it was stale