Skip to content

Oracle improvements#32122

Closed
robertgshaw2-redhat wants to merge 118 commits intovllm-project:mainfrom
robertgshaw2-redhat:oracle-improvements
Closed

Oracle improvements#32122
robertgshaw2-redhat wants to merge 118 commits intovllm-project:mainfrom
robertgshaw2-redhat:oracle-improvements

Conversation

@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat commented Jan 11, 2026

Purpose

This PR does two inter-related things:

Oracle

  • updates oracle to query the mk experts for support for various features in standard way, enabling auto-selection
  • updates the mk experts to express which features are supported
  • creates standard interface for creating experts, simplifying the factory function

Unifies Dp/Ep and Tp cases

  • add support for running naive and ag/rs dispatch/combine via modular kernels
  • allow the QuantMethod to "own" the kernel, no more FusedMoEModularMethod

Follow Ups

  • apply to the other quant methods (this only does fp8 and nvfp4)
  • remove do_naive_dispatch_combine from fused moe (^once everything above is migrated)

TODOs

  • Figure out flashinfer all2all
  • Make sure models with shared expert are working properly (including SBO)
  • Routing Tables
  • Kernels --- Batched Triton
  • Kernels --- vLLM CUTLASS FP4
  • Kernels --- Batched vLLM CUTLASS
  • Split into 2 PRs?

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Robert Shaw and others added 17 commits January 7, 2026 19:01
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
… cutlass

Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 11, 2026

Documentation preview: https://vllm--32122.org.readthedocs.build/en/32122/

@mergify mergify bot added documentation Improvements or additions to documentation nvidia labels Jan 11, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable refactoring by adding a set of supports_* methods to the MoE expert kernels. This acts as an 'oracle' for kernel capabilities, which will allow for a more robust and modular kernel selection mechanism. The changes are well-structured and implement the new interface across several kernel files. I've found one critical issue in the implementation that needs to be addressed.

elif quant_config.is_per_act_token:
return False
elif quant_config.is_block_quantized:
if (current_platform.is_cuda and current_platform.is_device_capability(9,0) and quant_config.block_shape[0] == 128 and quant_config.block_shape[1] == 128):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

current_platform.is_cuda is a method and must be called as current_platform.is_cuda(). Without the call, it will always evaluate to True as a function object is truthy, which could lead to incorrect behavior.

Additionally, this check is redundant because supports_current_device is expected to be called before this method, and it already performs the is_cuda() check. Removing the redundant check simplifies the code.

Suggested change
if (current_platform.is_cuda and current_platform.is_device_capability(9,0) and quant_config.block_shape[0] == 128 and quant_config.block_shape[1] == 128):
if current_platform.is_device_capability(9, 0) and quant_config.block_shape[0] == 128 and quant_config.block_shape[1] == 128:

Robert Shaw added 5 commits January 11, 2026 09:36
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Robert Shaw added 6 commits January 14, 2026 17:27
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
@mergify mergify bot added the v1 label Jan 14, 2026
Robert Shaw added 14 commits January 14, 2026 17:56
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
nit
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
@robertgshaw2-redhat robertgshaw2-redhat added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 15, 2026
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 15, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @robertgshaw2-redhat.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 15, 2026
@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator Author

@github-project-automation github-project-automation bot moved this to Done in NVIDIA Jan 19, 2026
@github-project-automation github-project-automation bot moved this from To Triage to Done in gpt-oss Issues & Enhancements Jan 19, 2026
@robertgshaw2-redhat robertgshaw2-redhat deleted the oracle-improvements branch January 19, 2026 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation gpt-oss Related to GPT-OSS models llama Related to Llama models needs-rebase nvidia ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm v1

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants