Skip to content

Fix precision issue for 32x256 kernel, attempt #3#2587

Merged
JohnNikolay84 merged 2 commits into
mainfrom
precision_issue
Apr 9, 2026
Merged

Fix precision issue for 32x256 kernel, attempt #3#2587
JohnNikolay84 merged 2 commits into
mainfrom
precision_issue

Conversation

@JohnNikolay84
Copy link
Copy Markdown
Contributor

@JohnNikolay84 JohnNikolay84 commented Apr 2, 2026

Motivation

Fix precision issue for asm fmoe kernels.

Technical Details

cl_gemm0_withup wait counters are not being used correctly.

Test Plan

DeepSeek model accuracy is 95%
local 2 stage fmoe unit test show ~50% matching values and max diff should be 40% less than before.

Test Result

DS:

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.9591 ± 0.0055
strict-match 5 exact_match 0.9568 ± 0.0056

Submission Checklist

@JohnNikolay84 JohnNikolay84 requested a review from a team April 2, 2026 00:53
@JohnNikolay84 JohnNikolay84 self-assigned this Apr 2, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-355 Run Triton tests on MI355 in addition to MI325
ci:sglang SGLang integration tests
ci:atom ATOM benchmark (DeepSeek-R1 + GPT-OSS)
ci:vllm vLLM benchmark
ci:all All of the above

Add labels via the sidebar or gh pr edit 2587 --add-label <label>

@JohnNikolay84 JohnNikolay84 requested a review from valarLip April 2, 2026 09:13
nholmber added a commit to nholmber/aiter that referenced this pull request Apr 5, 2026
Adds tuned_fmoe.csv (1857 lines, all TP1/TP2/TP4 shapes) and
untuned_fmoe.csv from reference image. Updated 76 kernel symbol
entries to match renamed symbols from PR ROCm#2587 precision fix.
nholmber added a commit to nholmber/aiter that referenced this pull request Apr 5, 2026
PR ROCm#2587 renamed kernel symbols in the silu blockscale CSV:
- fmoe_bf16_blockscaleFp8_g1u1_silu_64x256 -> vs_ps_silu_64x256
- fmoe_bf16_blockscaleFp8_g1u1_silu_64x128 -> vs_ps_silu_64x128
Updated 38 references in tuned_fmoe.csv to match.
@JohnNikolay84
Copy link
Copy Markdown
Contributor Author

@valarLip do you have any objections about this commit ? Should I run more tests ?

@JohnNikolay84 JohnNikolay84 merged commit 63c224d into main Apr 9, 2026
24 checks passed
@JohnNikolay84 JohnNikolay84 deleted the precision_issue branch April 9, 2026 23:41
sunway513 pushed a commit that referenced this pull request Apr 21, 2026
* Fix precision issue for 32x256, 64x128, 64x256 kernels silu and gelu variants
---------

Co-authored-by: Sergey Solo <ssolovye@amd.com>
ClementLinCF pushed a commit that referenced this pull request Apr 25, 2026
* Fix precision issue for 32x256, 64x128, 64x256 kernels silu and gelu variants
---------

Co-authored-by: Sergey Solo <ssolovye@amd.com>
Liang-jianhao97 pushed a commit that referenced this pull request Apr 30, 2026
* Fix precision issue for 32x256, 64x128, 64x256 kernels silu and gelu variants
---------

Co-authored-by: Sergey Solo <ssolovye@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants