Added qwen3 vision language moe support for speculative decoding by shanjiaz · Pull Request #32048 · vllm-project/vllm

shanjiaz · 2026-01-09T20:41:03Z

Purpose

To support qwen3 vision language moe speculator models.

Test Plan

Test command:

python examples/offline_inference/spec_decode.py   --method "eagle3"   --model-dir "Qwen/Qwen3-VL-30B-A3B-Instruct"   --eagle-dir "shanjiaz/qwen3-vl-5k-epoch9"   --dataset_name "hf" --dataset_path "philschmid/mt-bench" --num-spec-tokens 3

Test Result

Model is producing reasonable result. (It's a test model only trained on 5k examples so it's not very good.)

total_num_output_tokens: 213421
num_drafts: 154015
num_draft_tokens: 462045
num_accepted_tokens: 57143
mean acceptance length: 1.37
--------------------------------------------------
acceptance at token 0: 0.29
acceptance at token 1: 0.07
acceptance at token 2: 0.01

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

gemini-code-assist

Code Review

This pull request adds support for Qwen3 Vision-Language MoE models in speculative decoding. The changes correctly handle M-RoPE position adjustments for text-only draft models and add the necessary model-specific configurations. However, I found a critical bug in qwen3_vl_moe.py that would cause a TypeError at runtime due to an unhandled None value for the residual connection in the first layer. I've provided a code suggestion to fix this issue.

vllm/model_executor/models/qwen3_vl_moe.py

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

dsikka · 2026-01-12T22:03:22Z

vllm/v1/spec_decode/eagle.py

-        self.uses_mrope = self.vllm_config.model_config.uses_mrope
+        # Use draft model's M-RoPE setting, not target model's
+        # Draft models may be text-only even if target is multimodal
+        self.uses_mrope = self.draft_model_config.uses_mrope


This should be fine to use as should support both multi-modal and text only draft models, correct?

dsikka · 2026-01-12T22:04:36Z

vllm/v1/spec_decode/eagle.py

+        # Convert M-RoPE positions to 1D if draft model is text-only
+        if not self.uses_mrope and target_positions.dim() == 2:
+            # For text inputs, all M-RoPE dimensions are identical
+            target_positions = target_positions[0]


Shouldn't the first check be sufficient? Wont it always be 1D if no mrope? We can also just assert > 1 if the first condition is true?

I think we might need the second condition, since we only want to convert the 2D target_positions to 1D when draft does not use mrope but target does? If we only check self.uses_mrope, for non vl models that have 1D target_positions anyways this might error out? I can change this to something more specific like if not self.uses_mrope and self.vllm_config.model_config.uses_mrope?

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

benchislett

few quick questions

benchislett · 2026-01-19T15:13:28Z

tests/v1/spec_decode/test_speculators_eagle3.py

            "nm-testing/Speculator-Qwen3-8B-Eagle3-converted-071-quantized",
            id="qwen3-eagle3-speculator",
        ),
+        pytest.param(


Is this test run on nightly and/or part of the per-PR CI suite? I wonder if there might be issues adding more and more models to this test, or if it's not run frequently enough to drain resources.

Seems like this test is run as part of the per-PR CI suite. I can revert this change for now and open up a follow-up PR to move these models to an optional test suite similar to the setup of large lm-eval tests. Let me know if that's more appropriate!

Will ask for opinions among committers and report back

30B is too large of a model size unfortunately. Should be good to merge as-is

I can use a smaller model for testing

That would work

vllm/v1/spec_decode/eagle.py

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

yewentao256

[2026-01-20T20:20:47Z] #58 651.6 error: Request failed after 3 retries

Unrelated CI failure, try merge from main

…m-project#32048) Signed-off-by: shanjiaz <zsjwpianpian@gmail.com> Signed-off-by: shanjiaz <43143795+shanjiaz@users.noreply.github.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…m-project#32048) Signed-off-by: shanjiaz <zsjwpianpian@gmail.com> Signed-off-by: shanjiaz <43143795+shanjiaz@users.noreply.github.com> Signed-off-by: mohammad najafi <mohammad.najafi@amd.com>

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27df97c3eb79f891802fc0e858f8f7ac6a0) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16da1e423ede2c2f52a9850cbfbb39cefe96) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117ea2e689cd43df4be6892671a17cdae5833) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a7c1b61350c5c40ca1115d3bf8cf2b8cc9) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271bb6d1e7e9b1a55be73d755ef1a57dbbe5) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cceb877dfd13f98c538c4c96158047d98bd) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com>

…m-project#32048) Signed-off-by: shanjiaz <zsjwpianpian@gmail.com> Signed-off-by: shanjiaz <43143795+shanjiaz@users.noreply.github.com>

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27df97c3eb79f891802fc0e858f8f7ac6a0) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16da1e423ede2c2f52a9850cbfbb39cefe96) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117ea2e689cd43df4be6892671a17cdae5833) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a7c1b61350c5c40ca1115d3bf8cf2b8cc9) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271bb6d1e7e9b1a55be73d755ef1a57dbbe5) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cceb877dfd13f98c538c4c96158047d98bd) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com>

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27df97c3eb79f891802fc0e858f8f7ac6a0) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16da1e423ede2c2f52a9850cbfbb39cefe96) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117ea2e689cd43df4be6892671a17cdae5833) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a7c1b61350c5c40ca1115d3bf8cf2b8cc9) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271bb6d1e7e9b1a55be73d755ef1a57dbbe5) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cceb877dfd13f98c538c4c96158047d98bd) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: momochenchuw <chenchuw@huawei.com>

…m-project#32048) Signed-off-by: shanjiaz <zsjwpianpian@gmail.com> Signed-off-by: shanjiaz <43143795+shanjiaz@users.noreply.github.com>

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27df97c3eb79f891802fc0e858f8f7ac6a0) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16da1e423ede2c2f52a9850cbfbb39cefe96) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117ea2e689cd43df4be6892671a17cdae5833) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a7c1b61350c5c40ca1115d3bf8cf2b8cc9) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271bb6d1e7e9b1a55be73d755ef1a57dbbe5) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cceb877dfd13f98c538c4c96158047d98bd) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27df97c3eb79f891802fc0e858f8f7ac6a0) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16da1e423ede2c2f52a9850cbfbb39cefe96) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117ea2e689cd43df4be6892671a17cdae5833) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a7c1b61350c5c40ca1115d3bf8cf2b8cc9) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271bb6d1e7e9b1a55be73d755ef1a57dbbe5) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cceb877dfd13f98c538c4c96158047d98bd) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com>

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27df97c3eb79f891802fc0e858f8f7ac6a0) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16da1e423ede2c2f52a9850cbfbb39cefe96) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117ea2e689cd43df4be6892671a17cdae5833) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a7c1b61350c5c40ca1115d3bf8cf2b8cc9) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271bb6d1e7e9b1a55be73d755ef1a57dbbe5) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cceb877dfd13f98c538c4c96158047d98bd) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27df97c3eb79f891802fc0e858f8f7ac6a0) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16da1e423ede2c2f52a9850cbfbb39cefe96) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117ea2e689cd43df4be6892671a17cdae5833) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a7c1b61350c5c40ca1115d3bf8cf2b8cc9) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271bb6d1e7e9b1a55be73d755ef1a57dbbe5) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cceb877dfd13f98c538c4c96158047d98bd) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com>

shanjiaz and others added 2 commits January 9, 2026 20:37

Added qwen3 vision language moe support for speculative decoding

44f7715

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

Merge branch 'main' into qwen3-vl-moe-spec-update

9612a1a

mergify bot added qwen Related to Qwen models speculative-decoding v1 labels Jan 9, 2026

gemini-code-assist bot reviewed Jan 9, 2026

View reviewed changes

vllm/model_executor/models/qwen3_vl_moe.py Show resolved Hide resolved

shanjiaz and others added 6 commits January 9, 2026 21:32

min diff

bbef7e7

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

min diff

86e804f

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

white space

35a1024

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

Merge branch 'main' into qwen3-vl-moe-spec-update

4ab9986

Merge branch 'main' into qwen3-vl-moe-spec-update

4f8160c

Merge branch 'main' into qwen3-vl-moe-spec-update

5ee93e0

dsikka reviewed Jan 12, 2026

View reviewed changes

shanjiaz and others added 5 commits January 13, 2026 10:41

Merge branch 'main' into qwen3-vl-moe-spec-update

0ba1e92

Merge branch 'main' into qwen3-vl-moe-spec-update

a65da8e

Added test and refined conditions.

4bef2f9

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

Merge branch 'main' into qwen3-vl-moe-spec-update

3b035ba

Merge branch 'main' into qwen3-vl-moe-spec-update

3a71574

shanjiaz marked this pull request as ready for review January 14, 2026 17:36

shanjiaz requested review from benchislett, luccafong and sighingnow as code owners January 14, 2026 17:36

shanjiaz added 2 commits January 14, 2026 12:36

Merge branch 'main' into qwen3-vl-moe-spec-update

de8b289

Merge branch 'main' into qwen3-vl-moe-spec-update

5256ed9

dsikka mentioned this pull request Jan 15, 2026

Feature: eagle3 support for qwen2 and qwen2-vl vllm-project/speculators#86

Closed

shanjiaz added 2 commits January 15, 2026 10:26

Merge branch 'main' into qwen3-vl-moe-spec-update

75bd33c

Merge branch 'main' into qwen3-vl-moe-spec-update

b63cadc

benchislett reviewed Jan 19, 2026

View reviewed changes

shanjiaz and others added 2 commits January 19, 2026 11:50

Merge branch 'main' into qwen3-vl-moe-spec-update

3fb773f

move logic to set_positions

b27e6c4

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

shanjiaz added 4 commits January 20, 2026 13:27

Merge branch 'main' into qwen3-vl-moe-spec-update

e121abe

Merge branch 'main' into qwen3-vl-moe-spec-update

ea62713

Merge branch 'main' into qwen3-vl-moe-spec-update

ab352e4

Merge branch 'main' into qwen3-vl-moe-spec-update

8f2b1e1

yewentao256 reviewed Jan 20, 2026

View reviewed changes

shanjiaz added 3 commits January 20, 2026 15:39

Merge branch 'main' into qwen3-vl-moe-spec-update

0a59c88

Merge branch 'main' into qwen3-vl-moe-spec-update

46521b5

Merge branch 'main' into qwen3-vl-moe-spec-update

aacde22

dsikka mentioned this pull request Jan 20, 2026

Q1 2026 Roadmap vllm-project/speculators#251

Open

16 tasks

shanjiaz added 2 commits January 20, 2026 18:51

Merge branch 'main' into qwen3-vl-moe-spec-update

42db6eb

Merge branch 'main' into qwen3-vl-moe-spec-update

44e3fbe

benchislett merged commit 7ab80a8 into vllm-project:main Jan 21, 2026
53 checks passed

Meihan-chen mentioned this pull request Jan 26, 2026

[Main2Main] Upgrade vllm commit to 0123 vllm-project/vllm-ascend#6169

Merged

rahul-tuli mentioned this pull request Jan 28, 2026

RuntimeError when running Qwen3-VL Eagle3 speculator with vLLM vllm-project/speculators#266

Closed

Uh oh!

Conversation

shanjiaz commented Jan 9, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dsikka Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shanjiaz Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benchislett left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shanjiaz commented Jan 9, 2026 •

edited by github-actions bot

Loading

dsikka Jan 12, 2026 •

edited

Loading

shanjiaz Jan 13, 2026 •

edited

Loading