-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[CI/Build] Avoid CUDA IMA on Ada (SM 8.9) by fallback to apply_repetition_penalties_torch #28180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI/Build] Avoid CUDA IMA on Ada (SM 8.9) by fallback to apply_repetition_penalties_torch #28180
Conversation
|
interesting. this means these distributed tests need to happen on h100/b200 be useful. this is causing other entry point test fialures. |
|
for Entrypoints Integration Test (API Server) failure, i don't think it is due to this PR - from https://app.hex.tech/533fe68e-dcd8-4a52-a101-aefba762f581/app/vLLM-CI-030kdEgDv6lSlh1UPYOkWP/latest , the test failed randomly |
|
cc: @houseroad @simon-mo to also take a look. we may need a few force merges to get all these fixes in place |
|
|
mgoin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the current_platform interface to avoid initializing torch.cuda directly. You should be able to do current_platform.is_device_capability(89) as the check
Signed-off-by: Huamin Li <[email protected]>
91f7fe8 to
eee43cc
Compare
|
Thanks @mgoin for reviewing! I switched to use current_platform.is_device_capability(89) per suggestion. Please take another look! |
|
Seems that this is just a luck that calling Fail also isn't stable: appear 1 from [5,10,20] times |
|
Set |
|
Additional finding #28220 (comment) In short: seems the problem in Marlin kernel. |
|
Close this PR as we merged #28324 |
Purpose
Recently
evals/gsm8k/test_gsm8k_correctness.py::test_gsm8k_correctness_param[Qwen1.5-MoE-W4A16-CT-tp1]fromLM Eval Small Modelsare failing in nightly:https://buildkite.com/vllm/ci/builds/37631/steps/canvas?sid=019a5264-3636-4617-87f8-9867066b7a78
https://buildkite.com/vllm/ci/builds/37251/steps/canvas?sid=019a42b9-e1bc-42a6-8605-7900f4330ffd
https://buildkite.com/vllm/ci/builds/37196/steps/canvas?sid=019a3d93-774a-470b-a899-0f34ed601d55
https://buildkite.com/vllm/ci/builds/37041/steps/canvas?sid=019a386d-1b21-41bf-bb23-9d1a53bb4455
https://buildkite.com/vllm/ci/builds/36869/steps/canvas?sid=019a3346-ceb2-4e6b-ac05-46162dea7b7e
Error Msg
While we are only able to repro the CUDA IMA on L4 but not H100/MI300. On L4, the IMA issue is not deterministic, it could pass/crash on the same commit.
This PR adds a small helper function
_should_use_cuda_repetition_penalties- if on Ada (SM 8.9), then fallback to apply_repetition_penalties_torch to avoid CUDA IMAOne caveat to callout - this PR maybe introduce performance regression on Ada since we switch from apply_repetition_penalties_cuda to apply_repetition_penalties_torch
Test Plan
CI
Test Result
https://buildkite.com/vllm/ci/builds/37789/steps/canvas?sid=019a5740-b01c-4ecf-9809-226c492b8aa4
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.