[Misc] Update TritonLanguagePlaceholder to have attributes that are used by Flash Linear Attention ops.#26853
Conversation
…Flash Linear Attention ops. Summary: Update TritonLanguagePlaceholder to have attributes that are used by Flash Linear Attention ops. Signed-off-by: Xudong Ma <mxd@meta.com> Test Plan: CI signals Differential Revision: D84470467
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Code Review
This pull request adds exp, log, and log2 attributes to TritonLanguagePlaceholder to avoid AttributeError when Triton is not installed. However, initializing these attributes to None is problematic. If any Triton kernel is accidentally called when Triton is not installed, it will result in a TypeError: 'NoneType' object is not callable', which is not informative. A better approach is to make these placeholder attributes dummy callables that raise an ImportError with a clear message. This makes the placeholder more robust and improves debuggability. This issue also applies to the existing tensor attribute. A full fix would involve refactoring the TritonLanguagePlaceholder class. Additionally, I noticed that other Triton-based kernels like triton_flash_attention.py use tl.math.* functions, which will still cause an AttributeError as the math attribute is not defined in the placeholder. This should also be addressed for a complete fix.
| self.exp = None | ||
| self.log = None | ||
| self.log2 = None |
There was a problem hiding this comment.
Initializing these placeholder attributes for Triton functions to None is problematic. If this code path is ever executed when Triton is not installed, it will raise a TypeError: 'NoneType' object is not callable', which is not very informative. It would be better to initialize them to a dummy function that raises a descriptive ImportError.
For example, the TritonLanguagePlaceholder class could be refactored to use a helper method to create dummy callables that provide a clear error message. This would also be a good opportunity to fix the same issue for the existing tensor attribute on line 100.
|
@madongfly could you paste the test you are fixing and its result? |
vllm/model_executor/layers/fla/ops/op.py has code pieces like: which triggers AttributeError: module 'triton.language' has no attribute 'exp' when triton is not installed. |
let's double check which unit tests could have captured this and update it? |
IIUC, this class is a dummy class to work around some CI environment that doesn't have triton, now the code piece I provided above shows vllm/model_executor/layers/fla/ops/op.py reference to it, then it is imported by vllm/model_executor/layers/fla/ops/chunk_delta_h.py through "from .op import exp". I think the logic of why this is needed is pretty clear, do we really want to artificially add unused import to trigger the import chain in a random unit test? For such a minor fix on a dummy class whose whole purpose is to work around other tests? |
yeqcharlotte
left a comment
There was a problem hiding this comment.
LGTM. sigh ... we should probably look into why we are importing triton when triton is not available.
…sed by Flash Linear Attention ops. (vllm-project#26853) Co-authored-by: Xudong Ma <mxd@meta.com> Signed-off-by: bbartels <benjamin@bartels.dev>
…sed by Flash Linear Attention ops. (vllm-project#26853) Co-authored-by: Xudong Ma <mxd@meta.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
…sed by Flash Linear Attention ops. (vllm-project#26853) Co-authored-by: Xudong Ma <mxd@meta.com>
…sed by Flash Linear Attention ops. (vllm-project#26853) Co-authored-by: Xudong Ma <mxd@meta.com>
…sed by Flash Linear Attention ops. (vllm-project#26853) Co-authored-by: Xudong Ma <mxd@meta.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…sed by Flash Linear Attention ops. (vllm-project#26853) Co-authored-by: Xudong Ma <mxd@meta.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…sed by Flash Linear Attention ops. (vllm-project#26853) Co-authored-by: Xudong Ma <mxd@meta.com>
…sed by Flash Linear Attention ops. (vllm-project#26853) Co-authored-by: Xudong Ma <mxd@meta.com>
Summary:
Update TritonLanguagePlaceholder to have attributes that are used by Flash Linear Attention ops.
Signed-off-by: Xudong Ma mxd@meta.com
Differential Revision: D84470467