Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
1ab66bd
Add GDN decode kernel and tests
HongliMi Jan 18, 2026
48690c4
Fix the issues raised by the robot
HongliMi Jan 19, 2026
9c73933
Fix compilation and other minor issues
HongliMi Jan 19, 2026
828b35e
mypy
yzh119 Jan 19, 2026
c71f5cf
Fix bw test and details
HongliMi Jan 20, 2026
2674c0d
fix error
HongliMi Jan 20, 2026
30e0ac0
Added support for correctness verification under the Blockwell archit…
Jan 21, 2026
24f04cb
mypy lint
yzh119 Jan 21, 2026
29595fb
correct skip conditions
yzh119 Jan 22, 2026
5cfbad0
Merge remote-tracking branch 'origin/main' into GDN_decode_kernel
yzh119 Jan 22, 2026
4c39083
Optimize GDN decode: remove redundant state.copy_ and cache aux tensors
yzh119 Jan 22, 2026
5d11d96
Merge origin/main into improve-gdn-decode
yzh119 Jan 22, 2026
02b9f38
merge benchmark
yzh119 Jan 22, 2026
7df911a
upd baseline
yzh119 Jan 23, 2026
667245b
upd
yzh119 Jan 27, 2026
a1855b7
upd
yzh119 Jan 27, 2026
3259a12
slight optimizations
yzh119 Jan 27, 2026
1bdff9d
upd
yzh119 Jan 27, 2026
6aa598a
upd
yzh119 Jan 27, 2026
c8d37ce
upd
yzh119 Jan 27, 2026
a481527
GDN MTP kernel: use contiguous access with autovec_copy
yzh119 Jan 28, 2026
038de94
GDN MTP kernel: simplify local_tile to direct multi-dimensional form
yzh119 Jan 28, 2026
e183286
GDN MTP kernel: use autovec_copy for q, k loading
yzh119 Jan 28, 2026
b365433
GDN pretranspose kernels: use contiguous access with autovec_copy
yzh119 Jan 28, 2026
ffa7b60
GDN MTP kernel: use full warp shuffle and tune tile_v
yzh119 Jan 28, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Loading