Add vllm#28931
Conversation
|
Hi! This is the staged-recipes linter and your PR looks excellent! 🚀 |
|
Hi! This is the friendly automated conda-forge-linting service. I wanted to let you know that I linted all conda-recipes in your PR ( Here's what I've got... For recipes/vllm/recipe.yaml:
For recipes/vllm/recipe.yaml:
This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12962027449. Examine the logs at this URL for more detail. |
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
|
Interesting, we're getting different results between CUDA 11.8 and 12.0. Both fail in the following command: ['cmake', '$SRC_DIR', '-G', 'Ninja', '-DCMAKE_BUILD_TYPE=RelWithDebInfo',
'-DVLLM_TARGET_DEVICE=cuda', '-DVLLM_PYTHON_EXECUTABLE=$PREFIX/bin/python',
'-DVLLM_PYTHON_PATH=$PREFIX/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process:$PREFIX/lib/python39.zip:$PREFIX/lib/python3.9:$PREFIX/lib/python3.9/lib-dynload:$PREFIX/lib/python3.9/site-packages:$PREFIX/lib/python3.9/site-packages/setuptools/_vendor',
'-DFETCHCONTENT_BASE_DIR=$SRC_DIR/.deps', '-DNVCC_THREADS=1',
'-DCMAKE_JOB_POOL_COMPILE:STRING=compile', '-DCMAKE_JOB_POOLS:STRING=compile=2']12.0 fails earlier at CUDA detection: 11.8 gets further: |
|
Hi! This is the friendly automated conda-forge-linting service. I wanted to let you know that I linted all conda-recipes in your PR ( Here's what I've got... For recipes/vllm/recipe.yaml:
This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12967644561. Examine the logs at this URL for more detail. |
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
|
Thanks @maresb! I had forgotten that there's already #24710, perhaps @mediocretech would be interested in collaborating? W.r.t. CUDA, we need to move on from 12.0 here, which isn't used anywhere else in conda-forge anymore - it's just that staged-recipes seems to have been forgotten in the context of conda-forge/conda-forge-pinning-feedstock#6630. |
|
Oh, I didn't notice that effort, thanks @h-vetinari! Although that's old it looks like @rongou is eager to help! 🚀 Do you think that CUDA 12.0 is actually causing a problem here? I was thinking (i.e. wildly guessing) that we need to patch |
|
Mainly I want to avoid redundant work. As soon as #28938 is in and we have merged main here, I'll be happy to take a look what's going on. |
|
In any case, you'll have to address |
|
Woah, after adding |
|
Hi! This is the staged-recipes linter and I found some lint. It looks like some changes were made outside the If these changes are intentional (and you aren't submitting a recipe), please add a File-specific lints and/or hints:
|
|
Ah, hmm, I just added swap to conda-forge.yml. Not sure how that's supposed to work here on staged-recipes. 🤔 EDIT: Oh good, the linter is complaining, so that will help us to remember to revert it before merging. EDIT2: Hmm, it seems that the swap setting works on |
|
Hi @h-vetinari!
As a brief summary of the above, I merged On CUDA 12.x I'm hitting the error: On 11.8, after adding I'd be grateful for any advice you could provide. Thanks! |
We're (now) aware of the CUDA-angle of conda-forge/pytorch-cpu-feedstock#333
|
|
I would have hoped to get more out of setting
Here's the corresponding Python code to go from the envvar to get the flag: |
|
Hmm, it still appears broken after conda-forge/pytorch-cpu-feedstock#339. Is |
|
The CUDA 11.8 build probably fails because it's out of disk space and/or RAM, but that's just speculation:
|
|
Hey @shermansiu, great to have you around!!! I'm a bit lost since I'm not very familiar with CUDA. I was just now having some trouble getting the CI to rerun the CUDA builds, but rebasing seems to have fixed it. Also, post-rebase things seem to be proceeding slightly further for 12.6: I'm not too sure what this means or how to fix it. I'd be very grateful for any suggestions. |
|
Hmm, I'd like to build the recipe locally to diagnose this further, but at a glance, the following line looks a bit concerning: |
h-vetinari
left a comment
There was a problem hiding this comment.
Well, you need more than just {{ compiler("cuda") }} to get what all the CUDA components you need.
Look like you need at minimum
- cuda-version =={{ cuda_compiler_version }}
- cuda-cudart-dev
- cuda-nvrtc-dev
- libcublas-devin the host environment. Also note that we're still figuring out an issue with nvtx, see conda-forge/pytorch-cpu-feedstock#357
| - cmake | ||
| - git | ||
| - ${{ stdlib('c') }} | ||
| - ${{ compiler('c') }} | ||
| - ${{ compiler('cxx') }} | ||
| - ${{ compiler('cuda') }} |
There was a problem hiding this comment.
All this (+ninja) should move to the build environment.
shermansiu
left a comment
There was a problem hiding this comment.
This seems to resolve the nvtx issue, but then it complains about not being able to find kineto.
Using USE_KINETO=0 doesn't seem to work because the existing PyTorch .cmake files in the environment already have kineto enabled.
lib/python3.9/site-packages/torch/share/cmake/Torch/TorchConfig.cmake
if(ON)
append_torchlib_if_found(kineto)
endif()
See:
|
Thanks so much, h-vetenari, for your incredibly detailed review! |
h-vetinari
left a comment
There was a problem hiding this comment.
This PR basically LGTM now! Thanks for all the hard work! It's still marked as a draft, are you missing anything else? Some tasks related to the gpu server are still ahead, and some suggestions for further improvements below.
|
Thanks for looking at this, @h-vetinari! I should be able to get to the rest of the things later in the week, hopefully! |
|
If there are no other requested changes, I'm good to have this merged! The pull request is no longer just a draft, but I don't have the permissions to change it. |
|
Anyways, the CUDA 12.6 build works locally and the tests pass: │ Installing test environment
│ ✔ Successfully updated the test environment
│ Testing commands:
│ ============================= test session starts ==============================
│ platform linux -- Python 3.10.18, pytest-8.4.1, pluggy-1.6.0
│ rootdir: $PREFIX/etc/conda/test-files/vllm/2
│ plugins: anyio-4.9.0
│ collected 22 items
│ vllm/tests/core/test_scheduler.py ...................... [100%]
│ ============================== 22 passed in 6.97s ==============================
│
╰─────────────────── (took 72 seconds)
✔ all tests passed! |
h-vetinari
left a comment
There was a problem hiding this comment.
This is still marked as a draft (intentional?), and you'll have to do the procedures to get the rights to the opengpu server, but the PR itself LGTM!
|
"Only those with write access to this repository can mark a draft pull request as ready for review." - I'm unable to change this! |
Thanks @h-vetinari for adding that in! 😄 |

Very rough draft. I will almost certainly require help.Opened on the advice of @h-vetinari in conda-forge/xformers-feedstock#42
Direct and transitive dependencies:
Closes #24710
Fixes #29105
Checklist
url) rather than a repo (e.g.git_url) is used in your recipe (see here for more details).