Skip to content

Optimize SM120 NVFP4 GEMM kernel with small-M tile config#4

Draft
Copilot wants to merge 2 commits intomainfrom
copilot/optimize-vllm-sm120-nvfp4-gemm-kernel
Draft

Optimize SM120 NVFP4 GEMM kernel with small-M tile config#4
Copilot wants to merge 2 commits intomainfrom
copilot/optimize-vllm-sm120-nvfp4-gemm-kernel

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 15, 2026

  • Add swap_ab_ template parameter to Fp4GemmSm120 with transposed layout types
  • Conditionally swap A/B in CollectiveMainloop and C/D layouts in CollectiveEpilogue
  • Add sm120_fp4_config_swapab tile configuration (128×128×256)
  • Update args_from_options to handle swapAB (swap problem shape, strides, data/SF pointers)
  • Update runGemm to use GemmConfig template parameter
  • Update dispatch functions: M ≤ 64 → swapAB, M ≤ 256 → M256, M > 256 → default
  • Apply clang-format
  • Run validation (passed)

Reference: sglang PR vllm-project#21314
- New tile config sm120_fp4_config_small_m with MmaTileShape 128x128x256
  for small M values (M ≤ 32), doubling K tile for better throughput
- Updated dispatch: M≤32 → small_m, M≤256 → M256, M>256 → default
- ~20% speedup for decode-phase small-batch GEMM operations

Co-authored-by: GitHub Copilot

Agent-Logs-Url: https://github.com/Nekofish-L/vllm/sessions/66285e45-f69c-404b-975a-4afc5d3edb4e

Co-authored-by: Nekofish-L <29830327+Nekofish-L@users.noreply.github.com>
When M is small (≤64), swap A/B operands so the small M dimension
becomes the N dimension in the CUTLASS GEMM. This improves GPU
utilization during decode by providing better CTA scheduling and
memory access patterns. Follows the same pattern used in FP8 SM90,
SM100, and SM120 blockwise kernels.

Co-authored-by: GitHub Copilot

Agent-Logs-Url: https://github.com/Nekofish-L/vllm/sessions/86332631-5db7-485e-8d7f-3f51fce66977

Co-authored-by: Nekofish-L <29830327+Nekofish-L@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants