Skip to content
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
323392a
Added moe gluon gfx1250 implementation and fixed bench script
farlukas Mar 27, 2026
c39f8f5
Added kernel metadata for benchmarks to work
farlukas Mar 27, 2026
559dd61
Applied black formatter
farlukas Mar 27, 2026
0f62a75
Changed scale swizzling condition in test for gfx1250
farlukas Mar 27, 2026
aa19fbd
Added config parameter for easier testing
farlukas Mar 30, 2026
9a0c2c7
Removed unneeded quant scales
farlukas Apr 6, 2026
152a60e
Removed old xcd swizzling to be replaced with gfx1250
farlukas Apr 6, 2026
f152d99
Disabled split k functionality for now
farlukas Apr 6, 2026
d07a0ee
x is always microscaled
farlukas Apr 6, 2026
36e01ed
Changed global load for gather indices to use buffer load
farlukas Apr 6, 2026
7c8cb08
Use unified moe activations
farlukas Apr 6, 2026
74e0322
Use unified grouped reduce
farlukas Apr 6, 2026
33d7a86
Removed backend argument from grouped reduce
farlukas Apr 6, 2026
745d139
Added residual parameters
farlukas Apr 6, 2026
1636c0c
Added residual argument to swiglu call
farlukas Apr 6, 2026
d7a74e7
Buffer load x scales instead of using TDM
farlukas Apr 6, 2026
057a448
Fixed x scales buffer load
farlukas Apr 7, 2026
78c970e
Modified w scales layout when swizzled
farlukas Apr 7, 2026
cf1a569
enabled gather across multiple waves
farlukas Apr 10, 2026
ac75b39
removed old idx layout
farlukas Apr 10, 2026
755adf0
bypass lds for w scales
farlukas Apr 15, 2026
f0afbd6
Fixed buffer loading w scales
farlukas Apr 15, 2026
816a2b6
Updated w scales blocked layout
farlukas Apr 16, 2026
6274219
changed w scales layout
farlukas Apr 16, 2026
af6250f
moved unshuffled logic to offset calculation to eliminate convert layout
farlukas Apr 27, 2026
4569c15
refactored kernel and applied LDS pipeline
farlukas Apr 27, 2026
2ab9759
Added upcast indices logic
farlukas Apr 28, 2026
8c09bfb
Removed unshuffle scale function as unshuffling is now done in the of…
farlukas Apr 28, 2026
7d5e7b1
Removed reduce grouped function
farlukas Apr 28, 2026
ddace95
Added comments to explain constants
farlukas Apr 28, 2026
fca10be
x scales will never be None
farlukas Apr 28, 2026
55b002d
Increment scales pointer instead of using index
farlukas Apr 28, 2026
b34a192
Removed unused variable
farlukas Apr 28, 2026
38d70ab
Fixed pytest skip condition
farlukas Apr 28, 2026
494001d
Merge branch 'main' into farlukas/moe-a4w4-gfx1250
farlukas Apr 28, 2026
52c7929
Removed unneeded imports
farlukas Apr 29, 2026
28338e9
Formatted with black
farlukas Apr 29, 2026
d80c849
Added fix for swiglu when preshuffling scales
farlukas Apr 29, 2026
6c34ac3
removed quant_static_scale as a4w4 do not use it
farlukas Apr 29, 2026
b113179
Skipped gluon tests if not supported
farlukas Apr 29, 2026
462c272
Merge branch 'main' into farlukas/moe-a4w4-gfx1250
farlukas Apr 29, 2026
646e66e
Merge branch 'main' into farlukas/moe-a4w4-gfx1250
farlukas Apr 30, 2026
cb6a88e
Changed scales loading to TDM
farlukas May 8, 2026
af3537e
Added gfx1250 block configurations in routing function
farlukas May 8, 2026
5614cdd
Implemented prefetching
farlukas May 13, 2026
69cdbd8
Changed preshuffled scales to do b128 loads
farlukas May 15, 2026
a176023
Implemented weight preshuffling
farlukas May 20, 2026
4305d0b
Arranged the tdm instructions in such a way to enable tdm fusion
farlukas May 21, 2026
e6b5e70
Fixed unutilized buffers
farlukas May 27, 2026
0af551e
Changed x scales to use async copy
farlukas May 28, 2026
372e968
Removed loop carried load percent flag
farlukas May 28, 2026
bd496a2
Refactored directory structure
farlukas May 28, 2026
8a8e3a0
Merge branch 'main' into farlukas/moe-a4w4-gfx1250
farlukas May 28, 2026
e8771d2
Fixed segmentation fault
farlukas May 29, 2026
9008d5e
Fixed scale preshuffling
farlukas Jun 1, 2026
6297b1e
Moved WMMA layouts inside
farlukas Jun 1, 2026
31843b4
Implemented TDM store
farlukas Jun 4, 2026
4bc5b23
optimized wmma layouts
farlukas Jun 5, 2026
79ae362
Formatted files
farlukas Jun 5, 2026
94249bd
Removed unsused imports
farlukas Jun 5, 2026
b1bd4b4
Implemented multicast
farlukas Jun 15, 2026
bad75ec
Removed masking for x scales m dimension
farlukas Jun 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Loading