Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
323392a
Added moe gluon gfx1250 implementation and fixed bench script
farlukas Mar 27, 2026
c39f8f5
Added kernel metadata for benchmarks to work
farlukas Mar 27, 2026
559dd61
Applied black formatter
farlukas Mar 27, 2026
0f62a75
Changed scale swizzling condition in test for gfx1250
farlukas Mar 27, 2026
aa19fbd
Added config parameter for easier testing
farlukas Mar 30, 2026
9a0c2c7
Removed unneeded quant scales
farlukas Apr 6, 2026
152a60e
Removed old xcd swizzling to be replaced with gfx1250
farlukas Apr 6, 2026
f152d99
Disabled split k functionality for now
farlukas Apr 6, 2026
d07a0ee
x is always microscaled
farlukas Apr 6, 2026
36e01ed
Changed global load for gather indices to use buffer load
farlukas Apr 6, 2026
7c8cb08
Use unified moe activations
farlukas Apr 6, 2026
74e0322
Use unified grouped reduce
farlukas Apr 6, 2026
33d7a86
Removed backend argument from grouped reduce
farlukas Apr 6, 2026
745d139
Added residual parameters
farlukas Apr 6, 2026
1636c0c
Added residual argument to swiglu call
farlukas Apr 6, 2026
d7a74e7
Buffer load x scales instead of using TDM
farlukas Apr 6, 2026
057a448
Fixed x scales buffer load
farlukas Apr 7, 2026
78c970e
Modified w scales layout when swizzled
farlukas Apr 7, 2026
cf1a569
enabled gather across multiple waves
farlukas Apr 10, 2026
ac75b39
removed old idx layout
farlukas Apr 10, 2026
755adf0
bypass lds for w scales
farlukas Apr 15, 2026
f0afbd6
Fixed buffer loading w scales
farlukas Apr 15, 2026
816a2b6
Updated w scales blocked layout
farlukas Apr 16, 2026
6274219
changed w scales layout
farlukas Apr 16, 2026
af6250f
moved unshuffled logic to offset calculation to eliminate convert layout
farlukas Apr 27, 2026
4569c15
refactored kernel and applied LDS pipeline
farlukas Apr 27, 2026
2ab9759
Added upcast indices logic
farlukas Apr 28, 2026
8c09bfb
Removed unshuffle scale function as unshuffling is now done in the of…
farlukas Apr 28, 2026
7d5e7b1
Removed reduce grouped function
farlukas Apr 28, 2026
ddace95
Added comments to explain constants
farlukas Apr 28, 2026
fca10be
x scales will never be None
farlukas Apr 28, 2026
55b002d
Increment scales pointer instead of using index
farlukas Apr 28, 2026
b34a192
Removed unused variable
farlukas Apr 28, 2026
38d70ab
Fixed pytest skip condition
farlukas Apr 28, 2026
494001d
Merge branch 'main' into farlukas/moe-a4w4-gfx1250
farlukas Apr 28, 2026
52c7929
Removed unneeded imports
farlukas Apr 29, 2026
28338e9
Formatted with black
farlukas Apr 29, 2026
d80c849
Added fix for swiglu when preshuffling scales
farlukas Apr 29, 2026
6c34ac3
removed quant_static_scale as a4w4 do not use it
farlukas Apr 29, 2026
b113179
Skipped gluon tests if not supported
farlukas Apr 29, 2026
462c272
Merge branch 'main' into farlukas/moe-a4w4-gfx1250
farlukas Apr 29, 2026
646e66e
Merge branch 'main' into farlukas/moe-a4w4-gfx1250
farlukas Apr 30, 2026
cb6a88e
Changed scales loading to TDM
farlukas May 8, 2026
af3537e
Added gfx1250 block configurations in routing function
farlukas May 8, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Loading