Update Serverless docs for deprecations and new SDK #33

LucasArmandVast · 2025-10-31T17:56:18Z

Removes old deprecated worker group parameters
Adds request_idx to route payload/response
Includes instructions for using the vastai-sdk

Colter-Downing · 2025-11-04T20:13:27Z

documentation/serverless/create-endpoints-and-workergroups.mdx

- `target_util` (float): A ratio that determines how much spare capacity (headroom) the serverless engine maintains. Default value is 0.9.
- `cold_mult`(float): A multiplier applied to your target capacity for longer-term planning (1+ hours). This parameter controls how much extra capacity the serverless engine will plan for in the future compared to immediate needs. Default value is 3.0.
- `test_workers` (integer): The number of different physical machines that a Workergroup should test during its initial "exploration" phase to gather performance data before transitioning to normal demand-based scaling. Default value is 3.
 - `gpu_ram` (integer): The amount of GPU memory (VRAM) in gigabytes that your model or workload requires to run. This parameter tells the serverless engine how much GPU memory your model needs. Default value is 24.


We should just remove this one too

Including the lines below too

gpu_ram actually is used on the per-workergroup level, no need to remove it. Unless I'm misreading your comment

Correct, but it looks weird to include that in the cli commands. Its almost always in the template. I was just pointing it out to simplify

Check me if I'm wrong, but either we get the gpu_ram field for autogroups from the CLI here, or it defaults to 8. Check create_autojobs(request) in client.py and create__workergroup(args) in vast.py. I can't find any example of sourcing this parameter from the template.

https://cloud.vast.ai?ref_id=226661&template_id=187ca5982dba926b0d401f8e735b0786

https://cloud.vast.ai?ref_id=62897&template_id=48b2e3620b5d1603d0d67927f84974b5

Colter-Downing · 2025-11-04T20:16:33Z

documentation/serverless/serverless-parameters.mdx

 ## gpu\_ram

-The amount of GPU memory (VRAM) in gigabytes that your model or workload requires to run. This parameter tells the serverless engine how much GPU memory your model needs.
+The amount of GPU memory (VRAM) in gigabytes that your model or workload requires to run. This parameter tells the serverless engine how much GPU memory your model needs, and is primarily used to detect unusually long model load times.


Sure, we can remove this

Colter-Downing

Looks good, just a few small changes

LucasArmandVast added 7 commits October 30, 2025 10:54

first attempt

a51e664

removed old fields from workergroups

4ae9d11

remove test_workers

b29560b

Removed unneccesary getting started steps

bb44bef

Add CLI for logs

77f2d74

Updated route to include request_idx

5dd52d6

updated workergroup params

c5711c0

mintlify bot deployed to staging October 31, 2025 17:57 View deployment

Added SDK examples for vLLM, TGI, streaming, and updating some params

15d43ec

mintlify bot deployed to staging November 4, 2025 18:26 View deployment

merge main

41e16c2

mintlify bot deployed to staging November 4, 2025 18:31 View deployment

Fix

031a6a1

mintlify bot deployed to staging November 4, 2025 18:55 View deployment

removed bad example

358b435

mintlify bot deployed to staging November 4, 2025 18:57 View deployment

Colter-Downing reviewed Nov 4, 2025

View reviewed changes

Colter-Downing requested changes Nov 4, 2025

View reviewed changes

Fix

0e7501e

mintlify bot deployed to staging November 4, 2025 21:43 View deployment

Add endpoint-id to workergroup params

598e96a

mintlify bot deployed to staging November 5, 2025 01:25 View deployment

Colter-Downing approved these changes Nov 5, 2025

View reviewed changes

LucasArmandVast merged commit 694ea34 into main Nov 5, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update Serverless docs for deprecations and new SDK #33

Update Serverless docs for deprecations and new SDK #33

Uh oh!

LucasArmandVast commented Oct 31, 2025 •

edited

Loading

Uh oh!

Colter-Downing Nov 4, 2025

Uh oh!

Colter-Downing Nov 4, 2025

Uh oh!

LucasArmandVast Nov 4, 2025

Uh oh!

Colter-Downing Nov 4, 2025

Uh oh!

LucasArmandVast Nov 4, 2025

Uh oh!

Colter-Downing Nov 5, 2025

Uh oh!

Colter-Downing Nov 5, 2025

Uh oh!

Colter-Downing Nov 4, 2025

Uh oh!

LucasArmandVast Nov 4, 2025

Uh oh!

Colter-Downing left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update Serverless docs for deprecations and new SDK #33

Update Serverless docs for deprecations and new SDK #33

Uh oh!

Conversation

LucasArmandVast commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Colter-Downing left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LucasArmandVast commented Oct 31, 2025 •

edited

Loading