Skip to content

Add large RAM GPU runner for vllm#1596

Merged
h-vetinari merged 2 commits intoconda-forge:mainfrom
shermansiu:feat/add_runner_for_vllm
Jul 29, 2025
Merged

Add large RAM GPU runner for vllm#1596
h-vetinari merged 2 commits intoconda-forge:mainfrom
shermansiu:feat/add_runner_for_vllm

Conversation

@shermansiu
Copy link
Copy Markdown
Contributor

@shermansiu shermansiu commented Jul 29, 2025

vllm is a package that uses caching to enable high-throughput LLM inference. Building the CUDA wheel takes a lot of RAM which causes regular hosts to crash, which is why I'd like to request a bigger one for building the wheel.

Checklist:

  • I want to request (or revoke) access to an opt-in CI resource:
    • Pinged the relevant feedstock team(s)
    • Added a small description explaining why access is needed

@conda-forge/vllm (I am one of the maintainers)

This can be merged after:

  1. The maintainers (me and @maresb) get added as allowed open-gpu-server users:
  2. The vllm feedstock gets created: Add vllm staged-recipes#28931

@shermansiu shermansiu requested a review from a team as a code owner July 29, 2025 05:42
@h-vetinari h-vetinari merged commit d96220e into conda-forge:main Jul 29, 2025
1 check passed
Comment thread requests/vllm.yml
feedstocks:
- vllm
resources:
- cirun-openstack-gpu-2xlarge
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine to start off like this, but if vllm can be compiled with CPU-agents (compiling CUDA doesn't need a GPU), then we should do that.

There'll be one more PR to https://github.com/conda-forge/.cirun, which is also where we can then add further resource policies in the future

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating the cirun PR! Do we need any additional PRs to use the larger CPU runners like ci_2xlarge?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[...] https://github.com/conda-forge/.cirun, which is also where we can then add further resource policies in the future

☝️

Check out the pull request I linked. It creates a policy for vllm and adds it to a list of policies that are enabled for cirun-openstack-gpu-2xlarge runners. To be able to use other (e.g. smaller and/or CPU-only) runners, you need to add the vllm policy to the respective list.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so as I understand it, subsequent PRs will be made directly to https://github.com/conda-forge/.cirun and not https://github.com/conda-forge/admin-requests? Because now, the CI policy has been created?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Please don't make me repeat myself N times

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants