Skip to content

Conversation

@LucasArmandVast
Copy link
Contributor

@LucasArmandVast LucasArmandVast commented Oct 31, 2025

Removes old deprecated worker group parameters
Adds request_idx to route payload/response
Includes instructions for using the vastai-sdk

- `target_util` (float): A ratio that determines how much spare capacity (headroom) the serverless engine maintains. Default value is 0.9.
- `cold_mult`(float): A multiplier applied to your target capacity for longer-term planning (1+ hours). This parameter controls how much extra capacity the serverless engine will plan for in the future compared to immediate needs. Default value is 3.0.
- `test_workers` (integer): The number of different physical machines that a Workergroup should test during its initial "exploration" phase to gather performance data before transitioning to normal demand-based scaling. Default value is 3.
- `gpu_ram` (integer): The amount of GPU memory (VRAM) in gigabytes that your model or workload requires to run. This parameter tells the serverless engine how much GPU memory your model needs. Default value is 24.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just remove this one too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including the lines below too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpu_ram actually is used on the per-workergroup level, no need to remove it. Unless I'm misreading your comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, but it looks weird to include that in the cli commands. Its almost always in the template. I was just pointing it out to simplify

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check me if I'm wrong, but either we get the gpu_ram field for autogroups from the CLI here, or it defaults to 8. Check create_autojobs(request) in client.py and create__workergroup(args) in vast.py. I can't find any example of sourcing this parameter from the template.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

## gpu\_ram

The amount of GPU memory (VRAM) in gigabytes that your model or workload requires to run. This parameter tells the serverless engine how much GPU memory your model needs.
The amount of GPU memory (VRAM) in gigabytes that your model or workload requires to run. This parameter tells the serverless engine how much GPU memory your model needs, and is primarily used to detect unusually long model load times.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can remove this

Copy link
Contributor

@Colter-Downing Colter-Downing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few small changes

@LucasArmandVast LucasArmandVast merged commit 694ea34 into main Nov 5, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants