Skip to content

Conversation

@nvchenghaoz
Copy link

@coderabbitai summary

Several changes:

  1. Change the fake op to return datatype float32 to match the eager implementation.
  2. Patch the attention mask and mark it as None.
  3. Hack: Update the attention pattern matcher to handle the enable_gqa.

Test result -

tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_bamba.py [09/22/2025-14:18:28] [TRT-LLM AUTO-DEPLOY] [I] Pre-fetching checkpoint directory from HF repo.
/lustre/fs1/portfolios/coreai/users/chengzhang/cache/huggingface/hub/models--ibm-ai-platform--Bamba-9B-v2/snapshots/b42852dc9eb96c8ae3359dc8df0e4c3f5c37eb21
[09/22/2025-14:18:33] [TRT-LLM AUTO-DEPLOY] [I] Loading and initializing weights.
[09/22/2025-14:18:33] [TRT-LLM AUTO-DEPLOY] [I] Pre-fetching checkpoint directory from HF repo.
[09/22/2025-14:18:42] [TRT-LLM AUTO-DEPLOY] [I] Loading and initializing weights.
[09/22/2025-14:18:42] [TRT-LLM AUTO-DEPLOY] [I] Pre-fetching checkpoint directory from HF repo.
====== WITHOUT PATCH ======
msg='Mamba is a snake with the following properties:', num_tokens=11
msg='Tiger is a cat with the following properties:', num_tokens=11
Mamba is a snake with the following properties: it  pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot
Tiger is a cat with the following properties: -  pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot pilot
====== WITH PATCH ======
msg='Mamba is a snake with the following properties:', num_tokens=11
msg='Tiger is a cat with the following properties:', num_tokens=11
Mamba is a snake with the following properties: it is a reptile, it is venomous, and it is a predator. It is also a member of the family of snakes known as the Elapidae. The mamba is a large snake, and it can grow to be up to 10 feet long. It is also a very fast snake, and
Tiger is a cat with the following properties: - Tiger is a cat. - Tiger is a mammal. - Tiger is a carnivore. - Tiger is a predator. - Tiger is a big cat. - Tiger is a wild cat. - Tiger is a striped cat. - Tiger is a large cat. - Tiger is a fierce cat. - Tiger is
====== EXPORTING GRAPH MODULE ======
====== COMPARISON (patched) ======
Passed!
====== COMPARISON (gm) ======
Passed!

Signed-off-by: Chenghao Zhang <[email protected]>
@nvchenghaoz nvchenghaoz merged commit 2d601e6 into feat/ad_linear_attention Sep 23, 2025
2 of 3 checks passed
lucaslie pushed a commit that referenced this pull request Sep 29, 2025
Signed-off-by: Chenghao Zhang <[email protected]>
nvchenghaoz added a commit that referenced this pull request Oct 1, 2025
Signed-off-by: Chenghao Zhang <[email protected]>
nvchenghaoz added a commit that referenced this pull request Oct 3, 2025
Signed-off-by: Chenghao Zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants