MLA Based Eagle3#30574
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Code Review
This pull request adds support for MLA-based Eagle3 speculative decoding for DeepseekV2/V3 models. The changes include adding a new deepseek_eagle3.py model implementation, updating configurations, and modifying the DeepseekV2 model to support auxiliary hidden state extraction for Eagle3.
I've found a critical issue in vllm/model_executor/models/deepseek_v2.py which includes a syntax error in an enumerate call and a likely logic error where an argument to a layer call was omitted. I've provided a code suggestion to fix this. The rest of the changes look good and correctly integrate the new model.
| for idx, layer in enumerate( | ||
| islice(self.layers, self.start_layer, self.end_layer) | ||
| start=self.start_layer, | ||
| ): | ||
| if idx in self.aux_hidden_state_layers: | ||
| aux_hidden_states.append(hidden_states + residual) | ||
| hidden_states, residual = layer(positions, hidden_states, residual) |
There was a problem hiding this comment.
There are a couple of issues in this loop:
- There's a syntax error in the
enumeratecall. Thestartargument is misplaced and there's a missing comma. It should beenumerate(islice(...), start=self.start_layer). - The
llama_4_scalingargument is no longer passed to thelayercall, but the logic to compute it is still present. This seems like an accidental omission and could lead to incorrect behavior.
I've provided a suggestion to fix both issues.
| for idx, layer in enumerate( | |
| islice(self.layers, self.start_layer, self.end_layer) | |
| start=self.start_layer, | |
| ): | |
| if idx in self.aux_hidden_state_layers: | |
| aux_hidden_states.append(hidden_states + residual) | |
| hidden_states, residual = layer(positions, hidden_states, residual) | |
| for idx, layer in enumerate( | |
| islice(self.layers, self.start_layer, self.end_layer), | |
| start=self.start_layer, | |
| ): | |
| if idx in self.aux_hidden_state_layers: | |
| aux_hidden_states.append(hidden_states + residual) | |
| hidden_states, residual = layer(positions, hidden_states, residual, | |
| llama_4_scaling) |
|
Hi @IzzyPutterman, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
1c61f04 to
f39721e
Compare
|
Hi @IzzyPutterman do you have a model or usecase where this is useful? It would be helpful for review if you filled out the description and testing you did |
This allows for Eagles that share MLA instead of GQA for attention, so one can train Eagle3s for Kimi and Deepseek and use them across TRTLLM, SGL, and vLLM. We have a Kimi K2 thinking Eagle3 internally, which might get released (not sure here). |
f39721e to
b704ffb
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
b704ffb to
0c838b5
Compare
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.