Oracle improvements by robertgshaw2-redhat · Pull Request #32122 · vllm-project/vllm

robertgshaw2-redhat · 2026-01-11T14:33:31Z

Purpose

This PR does two inter-related things:

Oracle

updates oracle to query the mk experts for support for various features in standard way, enabling auto-selection
updates the mk experts to express which features are supported
creates standard interface for creating experts, simplifying the factory function

Unifies Dp/Ep and Tp cases

add support for running naive and ag/rs dispatch/combine via modular kernels
allow the QuantMethod to "own" the kernel, no more FusedMoEModularMethod

Follow Ups

apply to the other quant methods (this only does fp8 and nvfp4)
remove do_naive_dispatch_combine from fused moe (^once everything above is migrated)

TODOs

Figure out flashinfer all2all
Make sure models with shared expert are working properly (including SBO)
Routing Tables
Kernels --- Batched Triton
Kernels --- vLLM CUTLASS FP4
Kernels --- Batched vLLM CUTLASS
Split into 2 PRs?

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Robert Shaw <robshaw@redhat.com>

Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

Signed-off-by: Robert Shaw <robshaw@redhat.com>

… cutlass Signed-off-by: Robert Shaw <robshaw@redhat.com>

Signed-off-by: Robert Shaw <robshaw@redhat.com>

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

mergify · 2026-01-11T14:34:13Z

Documentation preview: https://vllm--32122.org.readthedocs.build/en/32122/

gemini-code-assist

Code Review

This pull request introduces a valuable refactoring by adding a set of supports_* methods to the MoE expert kernels. This acts as an 'oracle' for kernel capabilities, which will allow for a more robust and modular kernel selection mechanism. The changes are well-structured and implement the new interface across several kernel files. I've found one critical issue in the implementation that needs to be addressed.

gemini-code-assist · 2026-01-11T14:34:55Z

vllm/model_executor/layers/fused_moe/flashinfer_cutlass_moe.py

+            elif quant_config.is_per_act_token:
+                return False
+            elif quant_config.is_block_quantized:
+                if (current_platform.is_cuda and current_platform.is_device_capability(9,0) and quant_config.block_shape[0] == 128 and quant_config.block_shape[1] == 128):


current_platform.is_cuda is a method and must be called as current_platform.is_cuda(). Without the call, it will always evaluate to True as a function object is truthy, which could lead to incorrect behavior.

Additionally, this check is redundant because supports_current_device is expected to be called before this method, and it already performs the is_cuda() check. Removing the redundant check simplifies the code.

Suggested change

if (current_platform.is_cuda and current_platform.is_device_capability(9,0) and quant_config.block_shape[0] == 128 and quant_config.block_shape[1] == 128):

if current_platform.is_device_capability(9, 0) and quant_config.block_shape[0] == 128 and quant_config.block_shape[1] == 128:

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

Signed-off-by: Robert Shaw <robshaw@redhat.com>

mergify · 2026-01-15T05:17:34Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @robertgshaw2-redhat.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

vllm/model_executor/layers/fused_moe/flashinfer_cutlass_moe.py

robertgshaw2-redhat · 2026-01-19T03:28:51Z

replaced by:

Robert Shaw and others added 17 commits January 7, 2026 19:01

stash

f8851e0

Signed-off-by: Robert Shaw <robshaw@redhat.com>

stash

085adf7

Signed-off-by: Robert Shaw <robshaw@redhat.com>

update interface

a6b039d

Signed-off-by: Robert Shaw <robshaw@redhat.com>

stash

f8052ce

Signed-off-by: Robert Shaw <robshaw@redhat.com>

stash

13b619f

Signed-off-by: Robert Shaw <robshaw@redhat.com>

first correctness!

04bb010

Signed-off-by: Robert Shaw <robshaw@redhat.com>

updated

b1320de

Signed-off-by: Robert Shaw <robshaw@redhat.com>

comments

4d47206

Signed-off-by: Robert Shaw <robshaw@redhat.com>

updated

f86fad8

Signed-off-by: Robert Shaw <robshaw@redhat.com>

Merge branch 'main' into naive-dispatch-combine

5601b95

Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

updateds

8c1a530

Signed-off-by: Robert Shaw <robshaw@redhat.com>

nit changes

7d7d5a6

Signed-off-by: Robert Shaw <robshaw@redhat.com>

support apply router weight on input

63357f7

Signed-off-by: Robert Shaw <robshaw@redhat.com>

attempt to get everything working for llama scout modelopt flashinfer…

3886cfb

… cutlass Signed-off-by: Robert Shaw <robshaw@redhat.com>

updated

2284b59

Signed-off-by: Robert Shaw <robshaw@redhat.com>

apply to batched deep gemm

e131054

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

updated

77c7b05

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

mergify bot added documentation Improvements or additions to documentation nvidia labels Jan 11, 2026

github-project-automation bot added this to NVIDIA Jan 11, 2026

gemini-code-assist bot reviewed Jan 11, 2026

View reviewed changes

Robert Shaw added 5 commits January 11, 2026 09:36

stash

477d699

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

remove NaiveBatchedExperts

9f2e10b

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

stash

ef5e664

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

stash

f6e85bc

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

added back moe torch iterative

0db0b11

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

mergify bot added the gpt-oss Related to GPT-OSS models label Jan 11, 2026

github-project-automation bot added this to gpt-oss Issues & Enhancements Jan 11, 2026

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Jan 11, 2026

Robert Shaw added 6 commits January 14, 2026 17:27

reject usage for things that have not migrated over yet

8cacf5a

Signed-off-by: Robert Shaw <robshaw@redhat.com>

reject usage for things that have not migrated over yet

9518c97

Signed-off-by: Robert Shaw <robshaw@redhat.com>

reject usage for things that have not migrated over yet

ab090c1

Signed-off-by: Robert Shaw <robshaw@redhat.com>

reject usage for things that have not migrated over yet

1db50dc

Signed-off-by: Robert Shaw <robshaw@redhat.com>

differentiate static vs dynamic quantization

6090a06

Signed-off-by: Robert Shaw <robshaw@redhat.com>

remove newline

6a3a75b

Signed-off-by: Robert Shaw <robshaw@redhat.com>

mergify bot added the v1 label Jan 14, 2026

Robert Shaw added 14 commits January 14, 2026 17:56

fix static vs dynamic

89911ea

Signed-off-by: Robert Shaw <robshaw@redhat.com>

get things working again

ad8fe2e

Signed-off-by: Robert Shaw <robshaw@redhat.com>

updated

7bc3674

Signed-off-by: Robert Shaw <robshaw@redhat.com>

updatred

f8d3af7

Signed-off-by: Robert Shaw <robshaw@redhat.com>

attempt to get sp working

10b957c

Signed-off-by: Robert Shaw <robshaw@redhat.com>

nit

e8ab545

Signed-off-by: Robert Shaw <robshaw@redhat.com>

appears to be working properly

50a8c97

Signed-off-by: Robert Shaw <robshaw@redhat.com>

fix pre commit

ddc2eb1

Signed-off-by: Robert Shaw <robshaw@redhat.com>

updated

5f913ea

Signed-off-by: Robert Shaw <robshaw@redhat.com>

remove flashinfer constructors

e58e783

Signed-off-by: Robert Shaw <robshaw@redhat.com>

nits

a8de4da

Signed-off-by: Robert Shaw <robshaw@redhat.com>

remove do naive dispach combine comment

2c9e9e6

Signed-off-by: Robert Shaw <robshaw@redhat.com>

update backend names

9a907e0

Signed-off-by: Robert Shaw <robshaw@redhat.com>

update fallback experts

0670758

Signed-off-by: Robert Shaw <robshaw@redhat.com>

robertgshaw2-redhat added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 15, 2026

mergify bot added the needs-rebase label Jan 15, 2026

danisereb reviewed Jan 15, 2026

View reviewed changes

vllm/model_executor/layers/fused_moe/flashinfer_cutlass_moe.py Show resolved Hide resolved

robertgshaw2-redhat closed this Jan 19, 2026

github-project-automation bot moved this to Done in NVIDIA Jan 19, 2026

github-project-automation bot moved this from To Triage to Done in gpt-oss Issues & Enhancements Jan 19, 2026

robertgshaw2-redhat deleted the oracle-improvements branch January 19, 2026 03:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Oracle improvements#32122

Oracle improvements#32122
robertgshaw2-redhat wants to merge 118 commits intovllm-project:mainfrom
robertgshaw2-redhat:oracle-improvements

robertgshaw2-redhat commented Jan 11, 2026 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Jan 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 11, 2026

Uh oh!

mergify bot commented Jan 15, 2026

Uh oh!

Uh oh!

robertgshaw2-redhat commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	if (current_platform.is_cuda and current_platform.is_device_capability(9,0) and quant_config.block_shape[0] == 128 and quant_config.block_shape[1] == 128):
	if current_platform.is_device_capability(9, 0) and quant_config.block_shape[0] == 128 and quant_config.block_shape[1] == 128:

Uh oh!

Conversation

robertgshaw2-redhat commented Jan 11, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Oracle

Unifies Dp/Ep and Tp cases

Follow Ups

TODOs

Test Plan

Test Result

Uh oh!

mergify bot commented Jan 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Jan 15, 2026

Uh oh!

Uh oh!

robertgshaw2-redhat commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

robertgshaw2-redhat commented Jan 11, 2026 •

edited by github-actions bot

Loading