[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority by robertgshaw2-redhat · Pull Request #32414 · vllm-project/vllm

robertgshaw2-redhat · 2026-01-15T14:00:42Z

Purpose

update oracles to have notion of priority of kernels
update kernels to have standard interface for construction
update kernels to have registration of supported features

This allows us to:

autoselect kernels across hardware and model architecture (enabling us to opt-into kernels like TRTLLM)
verify that kernels support the deployment at init time rather than runtime (enabling clearer error messages)
improve code reuse

Test Plan

refactor tests in the CI/CD

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Robert Shaw <robshaw@redhat.com>

Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

Signed-off-by: Robert Shaw <robshaw@redhat.com>

… cutlass Signed-off-by: Robert Shaw <robshaw@redhat.com>

Signed-off-by: Robert Shaw <robshaw@redhat.com>

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

fadara01 · 2026-01-22T13:05:19Z

This PR causes: #32840

I fixed the issue in: #32855

Could you please take a look?

vanbasten23 · 2026-01-23T04:09:12Z

This PR broke the TPU fp8 weight loading: https://gist.github.com/vanbasten23/03a9580d499fa3205ee30762e527b528. cc: @QiliangCui @kyuyeunk @robertgshaw2-redhat

vanbasten23 · 2026-01-23T04:36:27Z

Created a fix: #32908. @robertgshaw2-redhat could you help take a look?

…ect#32414) Signed-off-by: mohammad najafi <mohammad.najafi@amd.com>

…ect#32414) Signed-off-by: 陈建华 <1647430658@qq.com>

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27df97c3eb79f891802fc0e858f8f7ac6a0) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16da1e423ede2c2f52a9850cbfbb39cefe96) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117ea2e689cd43df4be6892671a17cdae5833) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a7c1b61350c5c40ca1115d3bf8cf2b8cc9) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271bb6d1e7e9b1a55be73d755ef1a57dbbe5) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cceb877dfd13f98c538c4c96158047d98bd) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com>

…ect#32414)

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27df97c3eb79f891802fc0e858f8f7ac6a0) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16da1e423ede2c2f52a9850cbfbb39cefe96) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117ea2e689cd43df4be6892671a17cdae5833) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a7c1b61350c5c40ca1115d3bf8cf2b8cc9) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271bb6d1e7e9b1a55be73d755ef1a57dbbe5) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cceb877dfd13f98c538c4c96158047d98bd) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com>

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27df97c3eb79f891802fc0e858f8f7ac6a0) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16da1e423ede2c2f52a9850cbfbb39cefe96) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117ea2e689cd43df4be6892671a17cdae5833) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a7c1b61350c5c40ca1115d3bf8cf2b8cc9) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271bb6d1e7e9b1a55be73d755ef1a57dbbe5) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cceb877dfd13f98c538c4c96158047d98bd) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: momochenchuw <chenchuw@huawei.com>

…ect#32414)

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27df97c3eb79f891802fc0e858f8f7ac6a0) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16da1e423ede2c2f52a9850cbfbb39cefe96) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117ea2e689cd43df4be6892671a17cdae5833) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a7c1b61350c5c40ca1115d3bf8cf2b8cc9) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271bb6d1e7e9b1a55be73d755ef1a57dbbe5) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cceb877dfd13f98c538c4c96158047d98bd) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27df97c3eb79f891802fc0e858f8f7ac6a0) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16da1e423ede2c2f52a9850cbfbb39cefe96) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117ea2e689cd43df4be6892671a17cdae5833) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a7c1b61350c5c40ca1115d3bf8cf2b8cc9) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271bb6d1e7e9b1a55be73d755ef1a57dbbe5) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cceb877dfd13f98c538c4c96158047d98bd) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com>

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27df97c3eb79f891802fc0e858f8f7ac6a0) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16da1e423ede2c2f52a9850cbfbb39cefe96) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117ea2e689cd43df4be6892671a17cdae5833) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a7c1b61350c5c40ca1115d3bf8cf2b8cc9) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271bb6d1e7e9b1a55be73d755ef1a57dbbe5) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cceb877dfd13f98c538c4c96158047d98bd) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27df97c3eb79f891802fc0e858f8f7ac6a0) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16da1e423ede2c2f52a9850cbfbb39cefe96) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117ea2e689cd43df4be6892671a17cdae5833) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a7c1b61350c5c40ca1115d3bf8cf2b8cc9) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271bb6d1e7e9b1a55be73d755ef1a57dbbe5) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cceb877dfd13f98c538c4c96158047d98bd) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com>

Robert Shaw and others added 30 commits January 7, 2026 20:20

stash

13b619f

Signed-off-by: Robert Shaw <robshaw@redhat.com>

first correctness!

04bb010

Signed-off-by: Robert Shaw <robshaw@redhat.com>

updated

b1320de

Signed-off-by: Robert Shaw <robshaw@redhat.com>

comments

4d47206

Signed-off-by: Robert Shaw <robshaw@redhat.com>

updated

f86fad8

Signed-off-by: Robert Shaw <robshaw@redhat.com>

Merge branch 'main' into naive-dispatch-combine

5601b95

Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

updateds

8c1a530

Signed-off-by: Robert Shaw <robshaw@redhat.com>

nit changes

7d7d5a6

Signed-off-by: Robert Shaw <robshaw@redhat.com>

support apply router weight on input

63357f7

Signed-off-by: Robert Shaw <robshaw@redhat.com>

attempt to get everything working for llama scout modelopt flashinfer…

3886cfb

… cutlass Signed-off-by: Robert Shaw <robshaw@redhat.com>

updated

2284b59

Signed-off-by: Robert Shaw <robshaw@redhat.com>

apply to batched deep gemm

e131054

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

updated

77c7b05

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

stash

477d699

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

remove NaiveBatchedExperts

9f2e10b

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

stash

ef5e664

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

stash

f6e85bc

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

added back moe torch iterative

0db0b11

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

revert changes

09dc4f5

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

re add

3311e9e

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

add back iterative

755a3a2

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

add methodology to kernels

a8bb9d0

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

restructure kernel selection logic

eb571a2

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

remove is_cuda

8f0a969

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

added renamed file

e461b6f

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

improve quant scheme

93bd28b

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

improve validation

2b24d70

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

improve platform selection logic

312b767

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

nit newline

0f37e95

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

nit newline

c187335

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

mgoin deleted the oracle-part-a branch January 21, 2026 13:22

github-project-automation bot moved this from Ready to Done in NVIDIA Jan 21, 2026

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Jan 21, 2026

kyuyeunk mentioned this pull request Jan 22, 2026

[Bugfix] Fix fused moe vllm-project/tpu-inference#1512

Merged

fadara01 mentioned this pull request Jan 22, 2026

[Bug]: [CPU Backend] Engine crashed due to error on flashinfer op registration #32840

Closed

1 task

vanbasten23 mentioned this pull request Jan 23, 2026

[Bugfix][TPU] Return a Default fp8 MoE Backend #32908

Merged

5 tasks

monajafi-amd pushed a commit to monajafi-amd/vllm that referenced this pull request Jan 23, 2026

[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (vllm-proj…

b31b208

…ect#32414) Signed-off-by: mohammad najafi <mohammad.najafi@amd.com>

cwazai pushed a commit to cwazai/vllm that referenced this pull request Jan 25, 2026

[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (vllm-proj…

7899196

…ect#32414) Signed-off-by: 陈建华 <1647430658@qq.com>

Meihan-chen mentioned this pull request Jan 26, 2026

[Main2Main] Upgrade vllm commit to 0123 vllm-project/vllm-ascend#6169

Merged

yewentao256 mentioned this pull request Jan 26, 2026

[Feature] Enable flashinfer moe fp4 by default #32622

Closed

lapy pushed a commit to lapy/vllm that referenced this pull request Jan 27, 2026

[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (vllm-proj…

21fe6fe

…ect#32414)

renehonig mentioned this pull request Jan 30, 2026

fix: Add SM120 (RTX Blackwell) support for FlashInfer CUTLASS NVFP4 MoE kernels #33417

Merged

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority (vllm-proj…

057334c

…ect#32414)

Duyi-Wang mentioned this pull request Mar 18, 2026

[Bugfix][ROCm] Fix MoRI + AITER FP8 dispatch compatibility for defer_input_quant #37418

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority#32414

[MoE Refactor] Oracle Select FP8+NVFP4 Kernels In Priority#32414
mgoin merged 254 commits intomainfrom
oracle-part-a

robertgshaw2-redhat commented Jan 15, 2026 •

edited by github-actions bot

Loading

Uh oh!

fadara01 commented Jan 22, 2026

Uh oh!

vanbasten23 commented Jan 23, 2026 •

edited

Loading

Uh oh!

vanbasten23 commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Uh oh!

Conversation

robertgshaw2-redhat commented Jan 15, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Uh oh!

fadara01 commented Jan 22, 2026

Uh oh!

vanbasten23 commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vanbasten23 commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

robertgshaw2-redhat commented Jan 15, 2026 •

edited by github-actions bot

Loading

vanbasten23 commented Jan 23, 2026 •

edited

Loading