[Revert] Remove CUDA torch fallbacks for fp8_mqa_logits/fp8_paged_mqa_logits_torch function#37968
Conversation
…orch function Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request reverts the addition of PyTorch fallbacks for fp8_mqa_logits and fp8_paged_mqa_logits, making deep_gemm a hard requirement for this functionality. The changes correctly remove the fallback functions and conditional logic. However, the checks for deep_gemm availability have been changed from is_deep_gemm_supported to has_deep_gemm, which only verifies package installation and not hardware compatibility. To better align with the goal of failing clearly on unsupported hardware, I've suggested restoring the use of is_deep_gemm_supported and making the check a hard failure instead of a warning.
…orch function Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
Test |
|
I believe in the opposite... We're supposed to implement One backwards compatibility implementation example was the Marlin FP8 E4M3 fallback for sm80, which allows FP8 models to run in Ampere (#17579, #19990). This is not supported in SGLang (sgl-project/sglang#12887, sgl-project/sglang#9754), where all Ampere users of FP8 W8A8 MoE are redirected to only vLLM. Thus, |
…_logits_torch function (vllm-project#37968) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…_logits_torch function (vllm-project#37968) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…_logits_torch function (vllm-project#37968) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Michel Belleau <michel.belleau@malaiwah.com>
…_logits_torch function (vllm-project#37968) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
…_logits_torch function (vllm-project#37968) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>
…_logits_torch function (vllm-project#37968) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Purpose
Revet #35271
The original PR #35271 was intended to allow dsv3.2 to run even when deep_gemm is not installed or on lower-end GPUs such as A800.
@youkaichao believes that if the model vendor itself does not support the hardware, we should clearly state that it is not supported. Simply making it run doesn’t really add value.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)