[0.13.0][Bugfix] Fixed an problem related to embeddings sharing by drslark · Pull Request #5972 · vllm-project/vllm-ascend

drslark · 2026-01-17T12:50:56Z

What this PR does / why we need it?

Cancel the embeddings sharing when the embeddings of main model and the embeddings of eagle model are different.

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

Cause i don't have Meta-Llama-3.1-8B-Instruct locally, i commented it and run:

pytest -s tests/e2e/singlecard/spec_decode/test_v1_spec_decode.py::test_llama_qwen_eagle_acceptance

The output is fine:

.

======================================================================================================================== warnings summary =========================================================================================================================
<frozen importlib._bootstrap>:241
  <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:241
  <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================================================================== 3 passed, 1 skipped, 2 warnings in 196.19s (0:03:16) =======================================================================================================

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@2f4e654

gemini-code-assist

Code Review

This pull request correctly addresses a bug in embedding sharing by introducing a check to ensure embedding weights are identical before sharing them between the main and eagle models. The implementation in vllm_ascend/spec_decode/eagle_proposer.py is sound. However, the associated unit test only covers the case where embeddings are shared. To prevent regressions, it's crucial to also test the scenario where embeddings differ and should not be shared. I have added a high-severity comment with a suggestion for this missing test case.

gemini-code-assist · 2026-01-17T12:52:24Z

+        weight = torch.zeros(0)
+
        mock_model = MagicMock()
-        mock_model.model.embed_tokens = MagicMock()
        mock_model.lm_head = MagicMock()
        mock_model.multimodal_cpu_fields = None
        mock_model.merge_by_field_config = None
-        mock_get_model.return_value = MagicMock()
+        mock_model.model.embed_tokens = MagicMock()
+        mock_model.model.embed_tokens.weight = weight
+
        self.proposer.name = SpecDcodeType.EAGLE
+        mock_get_model.return_value = MagicMock()
+        mock_get_model.return_value.model.embed_tokens.weight = weight

        self.proposer.load_model(mock_model)
        mock_get_model.assert_called_once()


The updated test test_load_model_pp1 correctly verifies that embeddings are shared when their weights are identical. However, to ensure the bugfix is robust and prevent regressions, it's important to also test the negative case where the embedding weights are different and should not be shared.

Please consider adding a new test case to cover this scenario. Here's a possible implementation:

@patch("vllm_ascend.spec_decode.eagle_proposer.get_layers_from_vllm_config") @patch("vllm_ascend.spec_decode.eagle_proposer.get_model") @patch("vllm_ascend.spec_decode.eagle_proposer.get_pp_group") def test_load_model_pp1_no_share_embeddings(self, mock_pp_group, mock_get_model, mock_get_layers): mock_pp_group.return_value.world_size = 1 # Mock layers as in the other test mock_get_layers.side_effect = [ {"layer1": MagicMock()}, {}, {}, {"layer3": MagicMock()}, ] mock_model = MagicMock() mock_model.lm_head = MagicMock() mock_model.multimodal_cpu_fields = None mock_model.merge_by_field_config = None mock_model.model.embed_tokens = MagicMock() mock_model.model.embed_tokens.weight = torch.randn(1) # Target model weight self.proposer.name = SpecDcodeType.EAGLE mock_get_model.return_value = MagicMock() # Draft model has a different embedding weight mock_get_model.return_value.model.embed_tokens.weight = torch.randn(1) self.proposer.load_model(mock_model) mock_get_model.assert_called_once() # Verify that the embeddings are NOT shared self.assertIsNot(self.proposer.model.model.embed_tokens, mock_model.model.embed_tokens)

Signed-off-by: drslark <slarksblood@qq.com>

…lm-ascend into FIA_v0.13.0 * 'releases/v0.13.0' of https://github.com/vllm-project/vllm-ascend: [0.13.0][Bugfix] Fix setting of `speculative_config.enforce_eager` for dsv32 (vllm-project#5958) [v0.13.0][Bugfix] Fix XliteModelRunner init failed when aclgraph is enabled (vllm-project#5887) [0.13.0][Bugfix] Fixed an problem related to embeddings sharing (vllm-project#5972) [Bugfix]Fixed precision issues caused by pooled request pooling (vllm-project#6057) [0.13.0][Bugfix] fix pcp aclgraph qwen FIA bug (vllm-project#6038) [0.13.0][cherry-pick][bugfix] fix bug of triton mrope (vllm-project#6009) 【0.13.0】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. (vllm-project#6056)

…-project#5972)  ### What this PR does / why we need it?  Cancel the embeddings sharing when the embeddings of main model and the embeddings of eagle model are different. ### Does this PR introduce _any_ user-facing change?  N/A ### How was this patch tested?  Cause i don't have `Meta-Llama-3.1-8B-Instruct` locally, i commented it and run: ```shell pytest -s tests/e2e/singlecard/spec_decode/test_v1_spec_decode.py::test_llama_qwen_eagle_acceptance ``` The output is fine: ```text . ======================================================================================================================== warnings summary ========================================================================================================================= <frozen importlib._bootstrap>:241 <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute <frozen importlib._bootstrap>:241 <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ====================================================================================================== 3 passed, 1 skipped, 2 warnings in 196.19s (0:03:16) ======================================================================================================= ``` * vLLM version: v0.13.0 * vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: drslark <slarksblood@qq.com>

gemini-code-assist bot reviewed Jan 17, 2026

View reviewed changes

drslark force-pushed the v0.13.0 branch 4 times, most recently from 8c65ab8 to f46949d Compare January 19, 2026 09:53

Yikun added ready read for review ready-for-test start test by label for PR labels Jan 19, 2026

[0.13.0][Bugfix] Fixed an problem related to embeddings sharing

4319dbb

Signed-off-by: drslark <slarksblood@qq.com>

drslark force-pushed the v0.13.0 branch from f46949d to 4319dbb Compare January 20, 2026 11:19

zzzzwwjj merged commit c008426 into vllm-project:releases/v0.13.0 Jan 20, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.13.0][Bugfix] Fixed an problem related to embeddings sharing#5972

[0.13.0][Bugfix] Fixed an problem related to embeddings sharing#5972
zzzzwwjj merged 1 commit intovllm-project:releases/v0.13.0from
drslark:v0.13.0

drslark commented Jan 17, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

drslark commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

drslark commented Jan 17, 2026 •

edited

Loading