Fix the sampler and update the triton/cuda kernels #146

nvchenghaoz · 2025-09-26T22:50:39Z

@coderabbitai summary

Signed-off-by: Chenghao Zhang <[email protected]>

Signed-off-by: Suyog Gupta <[email protected]>

Signed-off-by: Chenghao Zhang <[email protected]>

* Fix the bamba unit test Signed-off-by: Chenghao Zhang <[email protected]> * none: Add triton backend for ssm_transform and cuda backend for conv Signed-off-by: Chenghao Zhang <[email protected]> * Fully Use the TRT LLM kernels Signed-off-by: Chenghao Zhang <[email protected]> * Add fake version for ssm transform op Signed-off-by: Chenghao Zhang <[email protected]> * Fix the datatype error in fake op Signed-off-by: Chenghao Zhang <[email protected]> * Fix the conv test error Signed-off-by: Chenghao Zhang <[email protected]> * Fix the triton ssm error Signed-off-by: Chenghao Zhang <[email protected]> * Fix the DemoLLM sampler mismatch Signed-off-by: Chenghao Zhang <[email protected]> * Update the implementation for triton/cuda kernels Signed-off-by: Chenghao Zhang <[email protected]> * Fix the d2d memcpy for decode Signed-off-by: Chenghao Zhang <[email protected]> * Revert the generator and remove the redundant code Signed-off-by: Chenghao Zhang <[email protected]> --------- Signed-off-by: Chenghao Zhang <[email protected]> Signed-off-by: Suyog Gupta <[email protected]> Co-authored-by: Suyog Gupta <[email protected]>

nvchenghaoz and others added 12 commits September 22, 2025 14:16

Fix the bamba unit test

22ade41

Signed-off-by: Chenghao Zhang <[email protected]>

none: Add triton backend for ssm_transform and cuda backend for conv

2344404

Signed-off-by: Chenghao Zhang <[email protected]>

Fully Use the TRT LLM kernels

1bbcf19

Signed-off-by: Chenghao Zhang <[email protected]>

Add fake version for ssm transform op

65083c2

Signed-off-by: Chenghao Zhang <[email protected]>

Fix the datatype error in fake op

8cfb07b

Signed-off-by: Chenghao Zhang <[email protected]>

Fix the conv test error

f6c7aec

Signed-off-by: Chenghao Zhang <[email protected]>

Fix the triton ssm error

08aada6

Signed-off-by: Chenghao Zhang <[email protected]>

Fix the DemoLLM sampler mismatch

199cdcb

Signed-off-by: Chenghao Zhang <[email protected]>

Update the implementation for triton/cuda kernels

a4307d3

Signed-off-by: Chenghao Zhang <[email protected]>

ensure cudagraph compatibility

2d02923

Signed-off-by: Suyog Gupta <[email protected]>

Fix the d2d memcpy for decode

33b6206

Signed-off-by: Chenghao Zhang <[email protected]>

Revert the generator and remove the redundant code

920fa1e

Signed-off-by: Chenghao Zhang <[email protected]>

lucaslie approved these changes Sep 26, 2025

View reviewed changes

nvchenghaoz merged commit 4b50b3e into feat/ad_linear_attention Sep 26, 2025
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix the sampler and update the triton/cuda kernels #146

Fix the sampler and update the triton/cuda kernels #146

Uh oh!

nvchenghaoz commented Sep 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix the sampler and update the triton/cuda kernels #146

Fix the sampler and update the triton/cuda kernels #146

Uh oh!

Conversation

nvchenghaoz commented Sep 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants