Skip to content

[0.13.0][Bugfix] Fixed an problem related to embeddings sharing#5972

Merged
zzzzwwjj merged 1 commit intovllm-project:releases/v0.13.0from
drslark:v0.13.0
Jan 20, 2026
Merged

[0.13.0][Bugfix] Fixed an problem related to embeddings sharing#5972
zzzzwwjj merged 1 commit intovllm-project:releases/v0.13.0from
drslark:v0.13.0

Conversation

@drslark
Copy link
Copy Markdown
Contributor

@drslark drslark commented Jan 17, 2026

What this PR does / why we need it?

Cancel the embeddings sharing when the embeddings of main model and the embeddings of eagle model are different.

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

Cause i don't have Meta-Llama-3.1-8B-Instruct locally, i commented it and run:

pytest -s tests/e2e/singlecard/spec_decode/test_v1_spec_decode.py::test_llama_qwen_eagle_acceptance

The output is fine:

.

======================================================================================================================== warnings summary =========================================================================================================================
<frozen importlib._bootstrap>:241
  <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:241
  <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================================================================== 3 passed, 1 skipped, 2 warnings in 196.19s (0:03:16) =======================================================================================================

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a bug in embedding sharing by introducing a check to ensure embedding weights are identical before sharing them between the main and eagle models. The implementation in vllm_ascend/spec_decode/eagle_proposer.py is sound. However, the associated unit test only covers the case where embeddings are shared. To prevent regressions, it's crucial to also test the scenario where embeddings differ and should not be shared. I have added a high-severity comment with a suggestion for this missing test case.

Comment on lines +158 to 172
weight = torch.zeros(0)

mock_model = MagicMock()
mock_model.model.embed_tokens = MagicMock()
mock_model.lm_head = MagicMock()
mock_model.multimodal_cpu_fields = None
mock_model.merge_by_field_config = None
mock_get_model.return_value = MagicMock()
mock_model.model.embed_tokens = MagicMock()
mock_model.model.embed_tokens.weight = weight

self.proposer.name = SpecDcodeType.EAGLE
mock_get_model.return_value = MagicMock()
mock_get_model.return_value.model.embed_tokens.weight = weight

self.proposer.load_model(mock_model)
mock_get_model.assert_called_once()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The updated test test_load_model_pp1 correctly verifies that embeddings are shared when their weights are identical. However, to ensure the bugfix is robust and prevent regressions, it's important to also test the negative case where the embedding weights are different and should not be shared.

Please consider adding a new test case to cover this scenario. Here's a possible implementation:

    @patch("vllm_ascend.spec_decode.eagle_proposer.get_layers_from_vllm_config")
    @patch("vllm_ascend.spec_decode.eagle_proposer.get_model")
    @patch("vllm_ascend.spec_decode.eagle_proposer.get_pp_group")
    def test_load_model_pp1_no_share_embeddings(self, mock_pp_group, mock_get_model,
                                                mock_get_layers):
        mock_pp_group.return_value.world_size = 1
        # Mock layers as in the other test
        mock_get_layers.side_effect = [
            {"layer1": MagicMock()},
            {},
            {},
            {"layer3": MagicMock()},
        ]

        mock_model = MagicMock()
        mock_model.lm_head = MagicMock()
        mock_model.multimodal_cpu_fields = None
        mock_model.merge_by_field_config = None
        mock_model.model.embed_tokens = MagicMock()
        mock_model.model.embed_tokens.weight = torch.randn(1)  # Target model weight

        self.proposer.name = SpecDcodeType.EAGLE
        mock_get_model.return_value = MagicMock()
        # Draft model has a different embedding weight
        mock_get_model.return_value.model.embed_tokens.weight = torch.randn(1)

        self.proposer.load_model(mock_model)

        mock_get_model.assert_called_once()
        # Verify that the embeddings are NOT shared
        self.assertIsNot(self.proposer.model.model.embed_tokens,
                         mock_model.model.embed_tokens)

@drslark drslark force-pushed the v0.13.0 branch 4 times, most recently from 8c65ab8 to f46949d Compare January 19, 2026 09:53
@Yikun Yikun added ready read for review ready-for-test start test by label for PR labels Jan 19, 2026
Signed-off-by: drslark <slarksblood@qq.com>
@zzzzwwjj zzzzwwjj merged commit c008426 into vllm-project:releases/v0.13.0 Jan 20, 2026
12 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Jan 21, 2026
…lm-ascend into FIA_v0.13.0

* 'releases/v0.13.0' of https://github.com/vllm-project/vllm-ascend:
  [0.13.0][Bugfix] Fix setting of `speculative_config.enforce_eager` for dsv32 (vllm-project#5958)
  [v0.13.0][Bugfix] Fix XliteModelRunner init failed when aclgraph is enabled (vllm-project#5887)
  [0.13.0][Bugfix] Fixed an problem related to embeddings sharing (vllm-project#5972)
  [Bugfix]Fixed precision issues caused by pooled request pooling (vllm-project#6057)
  [0.13.0][Bugfix] fix pcp aclgraph qwen FIA bug (vllm-project#6038)
  [0.13.0][cherry-pick][bugfix] fix bug of triton mrope (vllm-project#6009)
  【0.13.0】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. (vllm-project#6056)
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…-project#5972)

<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->

Cancel the embeddings sharing when the embeddings of main model and the
embeddings of eagle model are different.

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->

N/A

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Cause i don't have `Meta-Llama-3.1-8B-Instruct` locally, i commented it
and run:

```shell
pytest -s tests/e2e/singlecard/spec_decode/test_v1_spec_decode.py::test_llama_qwen_eagle_acceptance
```

The output is fine:

```text
.

======================================================================================================================== warnings summary =========================================================================================================================
<frozen importlib._bootstrap>:241
  <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:241
  <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================================================================== 3 passed, 1 skipped, 2 warnings in 196.19s (0:03:16) =======================================================================================================

```

* vLLM version: v0.13.0
* vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: drslark <slarksblood@qq.com>
tangtiangu pushed a commit to tangtiangu/jiusi-vllm-ascend that referenced this pull request Feb 24, 2026
…-project#5972)

<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->

Cancel the embeddings sharing when the embeddings of main model and the
embeddings of eagle model are different.

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->

N/A

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Cause i don't have `Meta-Llama-3.1-8B-Instruct` locally, i commented it
and run:

```shell
pytest -s tests/e2e/singlecard/spec_decode/test_v1_spec_decode.py::test_llama_qwen_eagle_acceptance
```

The output is fine:

```text
.

======================================================================================================================== warnings summary =========================================================================================================================
<frozen importlib._bootstrap>:241
  <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:241
  <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================================================================== 3 passed, 1 skipped, 2 warnings in 196.19s (0:03:16) =======================================================================================================

```

* vLLM version: v0.13.0
* vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: drslark <slarksblood@qq.com>
tangtiangu pushed a commit to tangtiangu/jiusi-vllm-ascend that referenced this pull request Feb 24, 2026
…-project#5972)

<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->

Cancel the embeddings sharing when the embeddings of main model and the
embeddings of eagle model are different.

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->

N/A

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Cause i don't have `Meta-Llama-3.1-8B-Instruct` locally, i commented it
and run:

```shell
pytest -s tests/e2e/singlecard/spec_decode/test_v1_spec_decode.py::test_llama_qwen_eagle_acceptance
```

The output is fine:

```text
.

======================================================================================================================== warnings summary =========================================================================================================================
<frozen importlib._bootstrap>:241
  <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:241
  <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================================================================== 3 passed, 1 skipped, 2 warnings in 196.19s (0:03:16) =======================================================================================================

```

* vLLM version: v0.13.0
* vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: drslark <slarksblood@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants