Use maximum number of batched tokens to autotune MoE by nvjullin · Pull Request #28106 · vllm-project/vllm

nvjullin · 2025-11-05T06:48:53Z

Purpose

Follow up on #27904.
CC @varun-sundar-rabindranath.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request updates the Mixture-of-Experts (MoE) autotuning logic to use the maximum number of batched tokens from the scheduler configuration, which is a more appropriate parameter for this purpose than the CUDA graph capture size. The changes are logical, but the refactoring is incomplete, leading to a critical issue where a removed attribute is still being accessed in the code.

vllm/model_executor/layers/quantization/mxfp4.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/model_executor/layers/quantization/mxfp4.py

Signed-off-by: Julien Lin <jullin@nvidia.com>

nvpohanh · 2025-12-16T08:17:35Z

@nvjullin could you rebase so that we can keep driving this PR? thanks

nvjullin requested review from mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners November 5, 2025 06:48

gemini-code-assist bot reviewed Nov 5, 2025

View reviewed changes

vllm/model_executor/layers/quantization/mxfp4.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Nov 5, 2025

View reviewed changes

vllm/model_executor/layers/quantization/mxfp4.py Show resolved Hide resolved

nvjullin force-pushed the fix-moe-tune-tokens branch from 2773c72 to ce4f6a9 Compare November 5, 2025 06:55

mgoin added the nvidia label Nov 5, 2025

use maximum number of batched tokens to autotune

9c5783d

Signed-off-by: Julien Lin <jullin@nvidia.com>

nvjullin force-pushed the fix-moe-tune-tokens branch from ce4f6a9 to fd72784 Compare November 19, 2025 06:45

mergify bot added the v1 label Nov 19, 2025

github-project-automation bot added this to NVIDIA Nov 19, 2025

nvjullin force-pushed the fix-moe-tune-tokens branch from fd72784 to 9c5783d Compare November 19, 2025 07:12

Merge branch 'main' into fix-moe-tune-tokens

0bdb4c5

nvjullin force-pushed the fix-moe-tune-tokens branch from 5e1d4ea to 0bdb4c5 Compare December 16, 2025 10:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use maximum number of batched tokens to autotune MoE#28106

Use maximum number of batched tokens to autotune MoE#28106
nvjullin wants to merge 2 commits intovllm-project:mainfrom
nvjullin:fix-moe-tune-tokens

nvjullin commented Nov 5, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

nvpohanh commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

nvjullin commented Nov 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

nvpohanh commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nvjullin commented Nov 5, 2025 •

edited by github-actions bot

Loading