[LoRA]Disable linear LoRA kernel PDL#31777
Conversation
|
Documentation preview: https://vllm--31777.org.readthedocs.build/en/31777/ |
There was a problem hiding this comment.
Code Review
This pull request disables Programmatic Dependent Launch (PDL) for the linear LoRA kernels (lora_expand and lora_shrink) by hardcoding use_gdc to False. This is a reasonable change as the comment explains that these kernels are not launched back-to-back, which is a prerequisite for PDL to be effective and can otherwise harm performance. The documentation is also updated to point to a relevant GitHub issue.
The changes are correct and well-justified. I have one suggestion regarding a related potential issue in the MoE LoRA implementation where PDL might also be used incorrectly under certain conditions (fully_sharded=True). Addressing this would improve the robustness of the LoRA implementation.
eafde62 to
8791e80
Compare
| ## LoRA Support for Tower and Connector of Multi-Modal Model | ||
|
|
||
| Currently, vLLM experimentally supports LoRA for the Tower and Connector components of multi-modal models. To enable this feature, you need to implement the corresponding token helper functions for the tower and connector. For more details on the rationale behind this approach, please refer to [PR 26674](https://github.com/vllm-project/vllm/pull/26674). We welcome contributions to extend LoRA support to additional models' tower and connector. | ||
| Currently, vLLM experimentally supports LoRA for the Tower and Connector components of multi-modal models. To enable this feature, you need to implement the corresponding token helper functions for the tower and connector. For more details on the rationale behind this approach, please refer to [PR 26674](https://github.com/vllm-project/vllm/pull/26674). We welcome contributions to extend LoRA support to additional models' tower and connector. Please refer to [Issue 31479](https://github.com/vllm-project/vllm/issues/31479) to check the current model support status. |
There was a problem hiding this comment.
BTW, add MM LoRA support status here
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
@jeejeelee should PDL/GDC be disabled in function When I run an MoE model on Blackwell I get this error: The error is not raised when I set I can open an issue and/or PR for this problem if required. |
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Purpose
Currently, we found that there is a base model GEMM operator between the two LoRA operators in linear LoRA. This causes PDL to be ineffective and actually negatively impacts LoRA performance. Therefore, this PR temporarily disables PDL.


Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.