Add large RAM GPU runner for vllm#1596
Conversation
| feedstocks: | ||
| - vllm | ||
| resources: | ||
| - cirun-openstack-gpu-2xlarge |
There was a problem hiding this comment.
I'm fine to start off like this, but if vllm can be compiled with CPU-agents (compiling CUDA doesn't need a GPU), then we should do that.
There'll be one more PR to https://github.com/conda-forge/.cirun, which is also where we can then add further resource policies in the future
There was a problem hiding this comment.
Thanks for creating the cirun PR! Do we need any additional PRs to use the larger CPU runners like ci_2xlarge?
There was a problem hiding this comment.
[...] https://github.com/conda-forge/.cirun, which is also where we can then add further resource policies in the future
☝️
Check out the pull request I linked. It creates a policy for vllm and adds it to a list of policies that are enabled for cirun-openstack-gpu-2xlarge runners. To be able to use other (e.g. smaller and/or CPU-only) runners, you need to add the vllm policy to the respective list.
There was a problem hiding this comment.
Ah, so as I understand it, subsequent PRs will be made directly to https://github.com/conda-forge/.cirun and not https://github.com/conda-forge/admin-requests? Because now, the CI policy has been created?
There was a problem hiding this comment.
Yes. Please don't make me repeat myself N times
vllmis a package that uses caching to enable high-throughput LLM inference. Building the CUDA wheel takes a lot of RAM which causes regular hosts to crash, which is why I'd like to request a bigger one for building the wheel.Checklist:
@conda-forge/vllm (I am one of the maintainers)
This can be merged after:
open-gpu-serverusers:shermansiu: Addshermansiuto the list of allowed users Quansight/open-gpu-server#63maresb: Add @maresb to the list of allowed users Quansight/open-gpu-server#64vllmfeedstock gets created: Add vllm staged-recipes#28931