[Bugfix] [Config] Retune ptpc fmoe deepseek-r1 for MI308 by tjtanaavllm · Pull Request #1418 · ROCm/aiter

tjtanaavllm · 2025-11-17T08:12:03Z

Motivation

This also addressed issue #1417
Since the commit a7f63e3 is only about adding preshuffle for mxfp4,
so for gfx942 that does not support fp4, the bpreshuffle argument has been set to False.

The fused moe has been retuned as the original configuration has accuracy issue. We are getting lm_eval score of 0.
Please let me know if the kernel usage or tuning procedure is not correct as the generated tuning file only have 1 kernel entry.

Technical Details

Retuning procedure that we have executed:

Clean the untuned_fmoe.csv and the tuned_fmoe.csv
Add the following entries into untuned_fmoe.csv.

token,model_dim,inter_dim,expert,topk,act_type,dtype,q_dtype_a,q_dtype_w,q_type,use_g1u1,doweight_stage1
1024,7168,256,256,8,ActivationType.Silu,torch.bfloat16,torch.float8_e4m3fnuz,torch.float8_e4m3fnuz,QuantType.per_Token,1,0
512,7168,256,256,8,ActivationType.Silu,torch.bfloat16,torch.float8_e4m3fnuz,torch.float8_e4m3fnuz,QuantType.per_Token,1,0
256,7168,256,256,8,ActivationType.Silu,torch.bfloat16,torch.float8_e4m3fnuz,torch.float8_e4m3fnuz,QuantType.per_Token,1,0
128,7168,256,256,8,ActivationType.Silu,torch.bfloat16,torch.float8_e4m3fnuz,torch.float8_e4m3fnuz,QuantType.per_Token,1,0
64,7168,256,256,8,ActivationType.Silu,torch.bfloat16,torch.float8_e4m3fnuz,torch.float8_e4m3fnuz,QuantType.per_Token,1,0
32,7168,256,256,8,ActivationType.Silu,torch.bfloat16,torch.float8_e4m3fnuz,torch.float8_e4m3fnuz,QuantType.per_Token,1,0
16,7168,256,256,8,ActivationType.Silu,torch.bfloat16,torch.float8_e4m3fnuz,torch.float8_e4m3fnuz,QuantType.per_Token,1,0

Run AITER_REBUILD=1 python3 hsa/gfx942/fmoe_2stages/tune.py at root of the repository.

Test Plan

Run E2E lmeval test for ptpc deepseek-r1

Test Result

local-completions (model=EmbeddedLLM/deepseek-r1-FP8-Dynamic,base_url=http://127.0.0.1:8000/v1/completions), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 100
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9447|±  |0.0063|
|     |       |strict-match    |     5|exact_match|↑  |0.9447|±  |0.0063|

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

valarLip

LGTM

yzhou103 · 2025-11-18T01:59:50Z

I fixed a problem of tuning with the CK solutions of fmoe in #1405. And you can retune the shapes with this procedure: 1. Clean the untuned_fmoe.csv and add the shapes you want to tune 2. Run AITER_REBUILD=1 python3 hsa/gfx942/fmoe_2stages/tune.py --all at root of the repository. It will update the shapes tuned in tuned_fmoe.csv

tjtanaavllm · 2025-11-18T02:00:55Z

I fixed a problem of tuning with the CK solutions of fmoe in #1405. And you can retune the shapes with this procedure:

Ok. Let me try and get back to you.

…_config

tjtanaavllm · 2025-11-18T02:30:55Z

@yzhou103
Do you know why there is this issue with the ck fmoe kernels?


Error in process:431120 info:((80, 1024, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', '_ZN5aiter44fmoe_stage1_bf16_pertokenFp8_g1u1_32x512_pf2E', 32): 'NoneType' object is not subscriptable
Error in process:431120 info:((80, 16, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', 'moe_ck2stages_gemm1_256x64x128x128_1x4_MulABScale_v3_Nswizzle0_Quant2_MulRoutedWeight0_silu_F8_F8_B16', 64): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431120 info:((80, 16, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage2', 'moe_ck2stages_gemm2_256x128x128x128_1x4_MulABScaleExpertWeight_v3_Nswizzle0_Quant2_MulRoutedWeight1_F8_F8_B16', 128): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 512, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', '_ZN5aiter48fmoe_stage1_bf16_pertokenFp8_g1u1_16x512_2tg_pf2E', 16): 'NoneType' object is not subscriptable
Error in process:431118 info:((80, 16, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', 'moe_ck2stages_gemm1_256x32x64x128_1x4_MulABScale_v1_Nswizzle0_Quant2_MulRoutedWeight0_silu_F8_F8_B16', 32): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 16, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage2', 'moe_ck2stages_gemm2_256x64x128x256_1x4_MulABScaleExpertWeight_v3_Nswizzle0_Quant2_MulRoutedWeight1_F8_F8_B16', 64): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 32, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', 'moe_ck2stages_gemm1_256x64x64x128_1x4_MulABScale_v1_Nswizzle0_Quant2_MulRoutedWeight0_silu_F8_F8_B16', 64): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 32, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage2', 'moe_ck2stages_gemm2_256x64x128x128_1x4_MulABScaleExpertWeight_v3_Nswizzle0_Quant2_MulRoutedWeight1_F8_F8_B16', 64): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 64, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', 'moe_ck2stages_gemm1_256x64x64x128_1x4_MulABScale_v1_Nswizzle0_Quant2_MulRoutedWeight0_silu_F8_F8_B16', 64): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 64, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage2', 'moe_ck2stages_gemm2_256x64x128x256_1x4_MulABScaleExpertWeight_v3_Nswizzle0_Quant2_MulRoutedWeight1_F8_F8_B16', 64): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 128, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', 'moe_ck2stages_gemm1_256x32x64x128_1x4_MulABScale_v1_Nswizzle0_Quant2_MulRoutedWeight0_silu_F8_F8_B16', 32): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 128, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', 'moe_ck2stages_gemm1_256x64x128x256_1x4_MulABScale_v3_Nswizzle0_Quant2_MulRoutedWeight0_silu_F8_F8_B16', 64): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 128, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', 'moe_ck2stages_gemm1_256x128x128x256_1x4_MulABScale_v3_Nswizzle0_Quant2_MulRoutedWeight0_silu_F8_F8_B16', 128): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 256, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage2', 'moe_ck2stages_gemm2_256x32x64x256_1x4_MulABScaleExpertWeight_v1_Nswizzle0_Quant2_MulRoutedWeight1_F8_F8_B16', 32): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 256, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', 'moe_ck2stages_gemm1_256x128x64x128_1x4_MulABScale_v1_Nswizzle0_Quant2_MulRoutedWeight0_silu_F8_F8_B16', 128): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 512, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', 'moe_ck2stages_gemm1_256x32x64x256_1x4_MulABScale_v1_Nswizzle0_Quant2_MulRoutedWeight0_silu_F8_F8_B16', 32): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_

tjtanaavllm · 2025-11-20T07:33:43Z

@yzhou103 Do you know why there is this issue with the ck fmoe kernels?


Error in process:431120 info:((80, 1024, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', '_ZN5aiter44fmoe_stage1_bf16_pertokenFp8_g1u1_32x512_pf2E', 32): 'NoneType' object is not subscriptable
Error in process:431120 info:((80, 16, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', 'moe_ck2stages_gemm1_256x64x128x128_1x4_MulABScale_v3_Nswizzle0_Quant2_MulRoutedWeight0_silu_F8_F8_B16', 64): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431120 info:((80, 16, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage2', 'moe_ck2stages_gemm2_256x128x128x128_1x4_MulABScaleExpertWeight_v3_Nswizzle0_Quant2_MulRoutedWeight1_F8_F8_B16', 128): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 512, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', '_ZN5aiter48fmoe_stage1_bf16_pertokenFp8_g1u1_16x512_2tg_pf2E', 16): 'NoneType' object is not subscriptable
Error in process:431118 info:((80, 16, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', 'moe_ck2stages_gemm1_256x32x64x128_1x4_MulABScale_v1_Nswizzle0_Quant2_MulRoutedWeight0_silu_F8_F8_B16', 32): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 16, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage2', 'moe_ck2stages_gemm2_256x64x128x256_1x4_MulABScaleExpertWeight_v3_Nswizzle0_Quant2_MulRoutedWeight1_F8_F8_B16', 64): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 32, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', 'moe_ck2stages_gemm1_256x64x64x128_1x4_MulABScale_v1_Nswizzle0_Quant2_MulRoutedWeight0_silu_F8_F8_B16', 64): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 32, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage2', 'moe_ck2stages_gemm2_256x64x128x128_1x4_MulABScaleExpertWeight_v3_Nswizzle0_Quant2_MulRoutedWeight1_F8_F8_B16', 64): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 64, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', 'moe_ck2stages_gemm1_256x64x64x128_1x4_MulABScale_v1_Nswizzle0_Quant2_MulRoutedWeight0_silu_F8_F8_B16', 64): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 64, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage2', 'moe_ck2stages_gemm2_256x64x128x256_1x4_MulABScaleExpertWeight_v3_Nswizzle0_Quant2_MulRoutedWeight1_F8_F8_B16', 64): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 128, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', 'moe_ck2stages_gemm1_256x32x64x128_1x4_MulABScale_v1_Nswizzle0_Quant2_MulRoutedWeight0_silu_F8_F8_B16', 32): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 128, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', 'moe_ck2stages_gemm1_256x64x128x256_1x4_MulABScale_v3_Nswizzle0_Quant2_MulRoutedWeight0_silu_F8_F8_B16', 64): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 128, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', 'moe_ck2stages_gemm1_256x128x128x256_1x4_MulABScale_v3_Nswizzle0_Quant2_MulRoutedWeight0_silu_F8_F8_B16', 128): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 256, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage2', 'moe_ck2stages_gemm2_256x32x64x256_1x4_MulABScaleExpertWeight_v1_Nswizzle0_Quant2_MulRoutedWeight1_F8_F8_B16', 32): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 256, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', 'moe_ck2stages_gemm1_256x128x64x128_1x4_MulABScale_v1_Nswizzle0_Quant2_MulRoutedWeight0_silu_F8_F8_B16', 128): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_
Error in process:431118 info:((80, 512, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0), 'stage1', 'moe_ck2stages_gemm1_256x32x64x256_1x4_MulABScale_v1_Nswizzle0_Quant2_MulRoutedWeight0_silu_F8_F8_B16', 32): /app/debugds/update_fmoe_dsv3_ptpc_config/aiter/jit/module_moe_ck2stages_f8_f8_preshuffle_on_b16_silu_per_token_mulWeightStage2.so: undefined symbol: _Z18ck_moe_stage1_gemmIN2ck9f4x2_pk_tES1_ft10MulABScaleLNS0_24BlockGemmPipelineVersionE2ELi256ELi64ELi64ELi128ELi2ELi2ELb0ELb0ELb0ELi1EEvRKP12ihipStream_tiiiiiRPvS9_S9_S9_S9_S9_S9_S9_St8optionalIS8_ESB_

@yzhou103 I built from a clean build and I still encounter this problem. I am using rocm/vllm-dev:nightly . I reinstall latest aiter and tuning in it.

And did you encounter this issue #1417?
Yes, and I fixed it with the same modification in your pr. This is my tuning result. Tuning result for (80, 1024, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0) is (np.int64(32), '_ZN5aiter45fmoe_bf16_pertokenFp8_g1u1_vs_silu_1tg_32x256E', None, '0.0%', '0.0%', np.int64(1)) 724.5808 us, 124.48 TFLOPS, 1975.36 GB/s
Tuning result for (80, 512, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0) is (np.int64(32), '_ZN5aiter45fmoe_bf16_pertokenFp8_g1u1_vs_silu_1tg_32x256E', None, '0.0%', '0.0%', np.int64(1)) 560.4502 us, 80.47 TFLOPS, 2534.21 GB/s
Tuning result for (80, 256, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0) is (np.int64(32), '_ZN5aiter45fmoe_bf16_pertokenFp8_g1u1_vs_silu_1tg_32x256E', None, '0.0%', '0.0%', np.int64(1)) 525.2929 us, 42.93 TFLOPS, 2693.34 GB/s
Tuning result for (80, 128, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0) is (np.int64(32), '_ZN5aiter45fmoe_bf16_pertokenFp8_g1u1_vs_silu_1tg_32x256E', None, '0.0%', '0.0%', np.int64(1)) 521.0929 us, 21.64 TFLOPS, 2709.76 GB/s
Tuning result for (80, 64, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0) is (np.int64(32), '_ZN5aiter45fmoe_bf16_pertokenFp8_g1u1_vs_silu_1tg_32x256E', None, '0.0%', '0.0%', np.int64(1)) 420.1145 us, 13.42 TFLOPS, 3357.8 GB/s
Tuning result for (80, 32, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0) is (np.int64(32), '_ZN5aiter45fmoe_bf16_pertokenFp8_g1u1_vs_silu_1tg_32x256E', None, '0.0%', '0.0%', np.int64(1)) 363.2105 us, 7.76 TFLOPS, 3881.98 GB/s
Tuning result for (80, 16, 7168, 256, 256, 8, <ActivationType.Silu: 0>, torch.bfloat16, torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, <QuantType.per_Token: 2>, 1, 0) is (np.int64(32), '_ZN5aiter45fmoe_bf16_pertokenFp8_g1u1_vs_silu_1tg_32x256E', None, '0.0%', '0.0%', np.int64(1)) 239.9458 us, 5.87 TFLOPS, 5874.79 GB/s
[aiter] processed 1 batches of 1, Processing Status ====> 100.0% tuned in fmoeTuner

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

tjtanaavllm · 2025-11-21T08:36:00Z

@yzhou103 Thank you for sharing the tuning results that you have obtained. They are the same as my tuning results.

So could we proceed with the review and get this merge to fix the accuracy issue? Thank you.

tjtanaavllm · 2025-11-24T09:53:39Z

@yzhou103 the CI still fails, do you know why?

CC @valarLip

tjtanaavllm · 2025-11-25T06:13:43Z

@yzhou103 do I need to do anything to unblock the merge? Thank you.

yzhou103 · 2025-11-25T07:57:14Z

@yzhou103 do I need to do anything to unblock the merge? Thank you.

Nothing, it seems ci is not stable recently

tjtanaavllm · 2025-12-03T14:00:15Z

@yzhou103 all the tests passed now.

tjtanaavllm · 2025-12-04T04:23:44Z

@valarLip @yzhou103 All the tests passed and is ready for merging. Thank you

yzhou103

looks good to me

* retune ptpc fmoe deepseek-r1 Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> * fix formatting Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> * update tuning config Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> --------- Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> Co-authored-by: yzhou103 <Ying.Zhou2@amd.com>

tjtanaavllm added 2 commits November 17, 2025 08:09

retune ptpc fmoe deepseek-r1

b5c07c0

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

fix formatting

df1353a

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

tjtanaavllm marked this pull request as draft November 17, 2025 10:56

tjtanaavllm marked this pull request as ready for review November 17, 2025 11:01

valarLip requested a review from yzhou103 November 17, 2025 11:51

valarLip reviewed Nov 17, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into update_fmoe_dsv3_ptpc…

938289b

…_config

update tuning config

b4593c6

Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>

Merge branch 'main' into update_fmoe_dsv3_ptpc_config

31407f5

Merge branch 'main' into update_fmoe_dsv3_ptpc_config

a4a1923

yzhou103 and others added 4 commits November 25, 2025 15:57

Merge branch 'main' into update_fmoe_dsv3_ptpc_config

ac3c8d6

Merge branch 'main' into update_fmoe_dsv3_ptpc_config

a0524ca

Merge branch 'main' into update_fmoe_dsv3_ptpc_config

a3694c8

Merge branch 'main' into update_fmoe_dsv3_ptpc_config

831ed3c

yzhou103 reviewed Dec 4, 2025

View reviewed changes

valarLip approved these changes Dec 4, 2025

View reviewed changes

valarLip merged commit 17c858e into main Dec 4, 2025
22 checks passed

valarLip deleted the update_fmoe_dsv3_ptpc_config branch December 4, 2025 15:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] [Config] Retune ptpc fmoe deepseek-r1 for MI308#1418

[Bugfix] [Config] Retune ptpc fmoe deepseek-r1 for MI308#1418
valarLip merged 10 commits intomainfrom
update_fmoe_dsv3_ptpc_config

tjtanaavllm commented Nov 17, 2025 •

edited

Loading

Uh oh!

valarLip left a comment

Uh oh!

yzhou103 commented Nov 18, 2025 •

edited

Loading

Uh oh!

tjtanaavllm commented Nov 18, 2025

Uh oh!

tjtanaavllm commented Nov 18, 2025 •

edited

Loading

Uh oh!

tjtanaavllm commented Nov 20, 2025 •

edited by yzhou103

Loading

Uh oh!

tjtanaavllm commented Nov 21, 2025

Uh oh!

tjtanaavllm commented Nov 24, 2025

Uh oh!

tjtanaavllm commented Nov 25, 2025

Uh oh!

yzhou103 commented Nov 25, 2025

Uh oh!

tjtanaavllm commented Dec 3, 2025

Uh oh!

tjtanaavllm commented Dec 4, 2025

Uh oh!

yzhou103 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tjtanaavllm commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

valarLip left a comment

Choose a reason for hiding this comment

Uh oh!

yzhou103 commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tjtanaavllm commented Nov 18, 2025

Uh oh!

tjtanaavllm commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tjtanaavllm commented Nov 20, 2025 • edited by yzhou103 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tjtanaavllm commented Nov 21, 2025

Uh oh!

tjtanaavllm commented Nov 24, 2025

Uh oh!

tjtanaavllm commented Nov 25, 2025

Uh oh!

yzhou103 commented Nov 25, 2025

Uh oh!

tjtanaavllm commented Dec 3, 2025

Uh oh!

tjtanaavllm commented Dec 4, 2025

Uh oh!

yzhou103 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tjtanaavllm commented Nov 17, 2025 •

edited

Loading

yzhou103 commented Nov 18, 2025 •

edited

Loading

tjtanaavllm commented Nov 18, 2025 •

edited

Loading

tjtanaavllm commented Nov 20, 2025 •

edited by yzhou103

Loading