Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
de09e4e
check cu count for gfx942
micmelesse Sep 30, 2025
94df7bd
create get_cu_count
micmelesse Sep 30, 2025
0149646
update repo root
micmelesse Oct 1, 2025
6e6b4a5
update forward tune
micmelesse Oct 1, 2025
e967c14
clean up load
micmelesse Oct 1, 2025
371bec5
use float8_e4m3fnuz
micmelesse Oct 1, 2025
1f2aaa0
save
micmelesse Oct 2, 2025
56eba61
show bwd mode
micmelesse Oct 2, 2025
0218cd2
recommend fp8
micmelesse Oct 4, 2025
baa6330
use torch.float32 for fp8 kernel
micmelesse Oct 4, 2025
f3ed846
add both best fp16 and fp8 config
micmelesse Oct 4, 2025
efa901b
tune fp8 backward
micmelesse Oct 7, 2025
0038a5c
descale factors should be b, hk
micmelesse Oct 7, 2025
449471f
fp8 bwd working on all primus configs
micmelesse Oct 8, 2025
88b37e9
tune bwd configs
micmelesse Oct 8, 2025
b7e3e48
fa v3 tests passing
micmelesse Oct 9, 2025
1f9510d
better warning
micmelesse Oct 9, 2025
d6dcef4
clean up bwd launcher
micmelesse Oct 9, 2025
948df01
v3 passing
micmelesse Oct 10, 2025
cc4cbf9
tune more
micmelesse Oct 10, 2025
f5f67c9
improve perf
micmelesse Oct 10, 2025
91abd99
clean up
micmelesse Oct 10, 2025
f4224dd
lint
micmelesse Oct 10, 2025
c478b8f
clean
micmelesse Oct 10, 2025
f25ff54
start tuning gfx950
micmelesse Oct 11, 2025
d03c5a0
tune non causal path
micmelesse Oct 11, 2025
73755aa
fix bug
micmelesse Oct 11, 2025
6e36a82
save
micmelesse Oct 11, 2025
3b14461
Skip configs where BLOCK_M2 % BLOCK_N2 != 0
micmelesse Oct 11, 2025
75aa291
skip more
micmelesse Oct 11, 2025
fb53a00
stop tuning
micmelesse Oct 12, 2025
0c17f2f
fix varlen bug
micmelesse Oct 12, 2025
1d67065
fix dropout & causal/swa segfault
micmelesse Oct 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading