[0.13.0][Bugfix] Fixed an problem related to embeddings sharing#5972
[0.13.0][Bugfix] Fixed an problem related to embeddings sharing#5972zzzzwwjj merged 1 commit intovllm-project:releases/v0.13.0from
Conversation
There was a problem hiding this comment.
Code Review
This pull request correctly addresses a bug in embedding sharing by introducing a check to ensure embedding weights are identical before sharing them between the main and eagle models. The implementation in vllm_ascend/spec_decode/eagle_proposer.py is sound. However, the associated unit test only covers the case where embeddings are shared. To prevent regressions, it's crucial to also test the scenario where embeddings differ and should not be shared. I have added a high-severity comment with a suggestion for this missing test case.
| weight = torch.zeros(0) | ||
|
|
||
| mock_model = MagicMock() | ||
| mock_model.model.embed_tokens = MagicMock() | ||
| mock_model.lm_head = MagicMock() | ||
| mock_model.multimodal_cpu_fields = None | ||
| mock_model.merge_by_field_config = None | ||
| mock_get_model.return_value = MagicMock() | ||
| mock_model.model.embed_tokens = MagicMock() | ||
| mock_model.model.embed_tokens.weight = weight | ||
|
|
||
| self.proposer.name = SpecDcodeType.EAGLE | ||
| mock_get_model.return_value = MagicMock() | ||
| mock_get_model.return_value.model.embed_tokens.weight = weight | ||
|
|
||
| self.proposer.load_model(mock_model) | ||
| mock_get_model.assert_called_once() |
There was a problem hiding this comment.
The updated test test_load_model_pp1 correctly verifies that embeddings are shared when their weights are identical. However, to ensure the bugfix is robust and prevent regressions, it's important to also test the negative case where the embedding weights are different and should not be shared.
Please consider adding a new test case to cover this scenario. Here's a possible implementation:
@patch("vllm_ascend.spec_decode.eagle_proposer.get_layers_from_vllm_config")
@patch("vllm_ascend.spec_decode.eagle_proposer.get_model")
@patch("vllm_ascend.spec_decode.eagle_proposer.get_pp_group")
def test_load_model_pp1_no_share_embeddings(self, mock_pp_group, mock_get_model,
mock_get_layers):
mock_pp_group.return_value.world_size = 1
# Mock layers as in the other test
mock_get_layers.side_effect = [
{"layer1": MagicMock()},
{},
{},
{"layer3": MagicMock()},
]
mock_model = MagicMock()
mock_model.lm_head = MagicMock()
mock_model.multimodal_cpu_fields = None
mock_model.merge_by_field_config = None
mock_model.model.embed_tokens = MagicMock()
mock_model.model.embed_tokens.weight = torch.randn(1) # Target model weight
self.proposer.name = SpecDcodeType.EAGLE
mock_get_model.return_value = MagicMock()
# Draft model has a different embedding weight
mock_get_model.return_value.model.embed_tokens.weight = torch.randn(1)
self.proposer.load_model(mock_model)
mock_get_model.assert_called_once()
# Verify that the embeddings are NOT shared
self.assertIsNot(self.proposer.model.model.embed_tokens,
mock_model.model.embed_tokens)8c65ab8 to
f46949d
Compare
Signed-off-by: drslark <slarksblood@qq.com>
…lm-ascend into FIA_v0.13.0 * 'releases/v0.13.0' of https://github.com/vllm-project/vllm-ascend: [0.13.0][Bugfix] Fix setting of `speculative_config.enforce_eager` for dsv32 (vllm-project#5958) [v0.13.0][Bugfix] Fix XliteModelRunner init failed when aclgraph is enabled (vllm-project#5887) [0.13.0][Bugfix] Fixed an problem related to embeddings sharing (vllm-project#5972) [Bugfix]Fixed precision issues caused by pooled request pooling (vllm-project#6057) [0.13.0][Bugfix] fix pcp aclgraph qwen FIA bug (vllm-project#6038) [0.13.0][cherry-pick][bugfix] fix bug of triton mrope (vllm-project#6009) 【0.13.0】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. (vllm-project#6056)
…-project#5972) <!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> Cancel the embeddings sharing when the embeddings of main model and the embeddings of eagle model are different. ### Does this PR introduce _any_ user-facing change? <!-- Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> N/A ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Cause i don't have `Meta-Llama-3.1-8B-Instruct` locally, i commented it and run: ```shell pytest -s tests/e2e/singlecard/spec_decode/test_v1_spec_decode.py::test_llama_qwen_eagle_acceptance ``` The output is fine: ```text . ======================================================================================================================== warnings summary ========================================================================================================================= <frozen importlib._bootstrap>:241 <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute <frozen importlib._bootstrap>:241 <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ====================================================================================================== 3 passed, 1 skipped, 2 warnings in 196.19s (0:03:16) ======================================================================================================= ``` * vLLM version: v0.13.0 * vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: drslark <slarksblood@qq.com>
…-project#5972) <!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> Cancel the embeddings sharing when the embeddings of main model and the embeddings of eagle model are different. ### Does this PR introduce _any_ user-facing change? <!-- Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> N/A ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Cause i don't have `Meta-Llama-3.1-8B-Instruct` locally, i commented it and run: ```shell pytest -s tests/e2e/singlecard/spec_decode/test_v1_spec_decode.py::test_llama_qwen_eagle_acceptance ``` The output is fine: ```text . ======================================================================================================================== warnings summary ========================================================================================================================= <frozen importlib._bootstrap>:241 <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute <frozen importlib._bootstrap>:241 <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ====================================================================================================== 3 passed, 1 skipped, 2 warnings in 196.19s (0:03:16) ======================================================================================================= ``` * vLLM version: v0.13.0 * vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: drslark <slarksblood@qq.com>
…-project#5972) <!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> Cancel the embeddings sharing when the embeddings of main model and the embeddings of eagle model are different. ### Does this PR introduce _any_ user-facing change? <!-- Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> N/A ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Cause i don't have `Meta-Llama-3.1-8B-Instruct` locally, i commented it and run: ```shell pytest -s tests/e2e/singlecard/spec_decode/test_v1_spec_decode.py::test_llama_qwen_eagle_acceptance ``` The output is fine: ```text . ======================================================================================================================== warnings summary ========================================================================================================================= <frozen importlib._bootstrap>:241 <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute <frozen importlib._bootstrap>:241 <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ====================================================================================================== 3 passed, 1 skipped, 2 warnings in 196.19s (0:03:16) ======================================================================================================= ``` * vLLM version: v0.13.0 * vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: drslark <slarksblood@qq.com>
What this PR does / why we need it?
Cancel the embeddings sharing when the embeddings of main model and the embeddings of eagle model are different.
Does this PR introduce any user-facing change?
N/A
How was this patch tested?
Cause i don't have
Meta-Llama-3.1-8B-Instructlocally, i commented it and run:The output is fine: