Skip to content

[Misc] Update TritonLanguagePlaceholder to have attributes that are used by Flash Linear Attention ops.#26853

Merged
yeqcharlotte merged 1 commit intovllm-project:mainfrom
madongfly:export-D84470467
Oct 15, 2025
Merged

[Misc] Update TritonLanguagePlaceholder to have attributes that are used by Flash Linear Attention ops.#26853
yeqcharlotte merged 1 commit intovllm-project:mainfrom
madongfly:export-D84470467

Conversation

@madongfly
Copy link
Copy Markdown
Contributor

@madongfly madongfly commented Oct 14, 2025

Summary:
Update TritonLanguagePlaceholder to have attributes that are used by Flash Linear Attention ops.

Signed-off-by: Xudong Ma mxd@meta.com

Differential Revision: D84470467

…Flash Linear Attention ops.

Summary:
Update TritonLanguagePlaceholder to have attributes that are used by Flash Linear Attention ops.

Signed-off-by: Xudong Ma <mxd@meta.com>

Test Plan: CI signals

Differential Revision: D84470467
@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds exp, log, and log2 attributes to TritonLanguagePlaceholder to avoid AttributeError when Triton is not installed. However, initializing these attributes to None is problematic. If any Triton kernel is accidentally called when Triton is not installed, it will result in a TypeError: 'NoneType' object is not callable', which is not informative. A better approach is to make these placeholder attributes dummy callables that raise an ImportError with a clear message. This makes the placeholder more robust and improves debuggability. This issue also applies to the existing tensor attribute. A full fix would involve refactoring the TritonLanguagePlaceholder class. Additionally, I noticed that other Triton-based kernels like triton_flash_attention.py use tl.math.* functions, which will still cause an AttributeError as the math attribute is not defined in the placeholder. This should also be addressed for a complete fix.

Comment on lines +101 to +103
self.exp = None
self.log = None
self.log2 = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Initializing these placeholder attributes for Triton functions to None is problematic. If this code path is ever executed when Triton is not installed, it will raise a TypeError: 'NoneType' object is not callable', which is not very informative. It would be better to initialize them to a dummy function that raises a descriptive ImportError.

For example, the TritonLanguagePlaceholder class could be refactored to use a helper method to create dummy callables that provide a clear error message. This would also be a good opportunity to fix the same issue for the existing tensor attribute on line 100.

@yeqcharlotte
Copy link
Copy Markdown
Collaborator

@madongfly could you paste the test you are fixing and its result?

@madongfly
Copy link
Copy Markdown
Contributor Author

madongfly commented Oct 14, 2025

@madongfly could you paste the test you are fixing and its result?

vllm/model_executor/layers/fla/ops/op.py has code pieces like:

@triton.jit
        def div_normal(x, y):
            return x / y
        div = div_normal
        exp = tl.exp

which triggers AttributeError: module 'triton.language' has no attribute 'exp' when triton is not installed.

@yeqcharlotte
Copy link
Copy Markdown
Collaborator

@madongfly could you paste the test you are fixing and its result?

vllm/model_executor/layers/fla/ops/op.py has code pieces like:

@triton.jit
        def div_normal(x, y):
            return x / y
        div = div_normal
        exp = tl.exp

which triggers AttributeError: module 'triton.language' has no attribute 'exp' when triton is not installed.

let's double check which unit tests could have captured this and update it?

@madongfly
Copy link
Copy Markdown
Contributor Author

@madongfly could you paste the test you are fixing and its result?

vllm/model_executor/layers/fla/ops/op.py has code pieces like:

@triton.jit
        def div_normal(x, y):
            return x / y
        div = div_normal
        exp = tl.exp

which triggers AttributeError: module 'triton.language' has no attribute 'exp' when triton is not installed.

let's double check which unit tests could have captured this and update it?

IIUC, this class is a dummy class to work around some CI environment that doesn't have triton, now the code piece I provided above shows vllm/model_executor/layers/fla/ops/op.py reference to it, then it is imported by vllm/model_executor/layers/fla/ops/chunk_delta_h.py through "from .op import exp".

I think the logic of why this is needed is pretty clear, do we really want to artificially add unused import to trigger the import chain in a random unit test? For such a minor fix on a dummy class whose whole purpose is to work around other tests?

@yeqcharlotte yeqcharlotte changed the title Update TritonLanguagePlaceholder to have attributes that are used by Flash Linear Attention ops. [Misc] Update TritonLanguagePlaceholder to have attributes that are used by Flash Linear Attention ops. Oct 15, 2025
Copy link
Copy Markdown
Collaborator

@yeqcharlotte yeqcharlotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. sigh ... we should probably look into why we are importing triton when triton is not available.

@yeqcharlotte yeqcharlotte enabled auto-merge (squash) October 15, 2025 04:51
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 15, 2025
@yeqcharlotte yeqcharlotte merged commit 5210dc3 into vllm-project:main Oct 15, 2025
51 checks passed
bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025
…sed by Flash Linear Attention ops. (vllm-project#26853)

Co-authored-by: Xudong Ma <mxd@meta.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 16, 2025
…sed by Flash Linear Attention ops. (vllm-project#26853)

Co-authored-by: Xudong Ma <mxd@meta.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…sed by Flash Linear Attention ops. (vllm-project#26853)

Co-authored-by: Xudong Ma <mxd@meta.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
…sed by Flash Linear Attention ops. (vllm-project#26853)

Co-authored-by: Xudong Ma <mxd@meta.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…sed by Flash Linear Attention ops. (vllm-project#26853)

Co-authored-by: Xudong Ma <mxd@meta.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…sed by Flash Linear Attention ops. (vllm-project#26853)

Co-authored-by: Xudong Ma <mxd@meta.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
…sed by Flash Linear Attention ops. (vllm-project#26853)

Co-authored-by: Xudong Ma <mxd@meta.com>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…sed by Flash Linear Attention ops. (vllm-project#26853)

Co-authored-by: Xudong Ma <mxd@meta.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants