Prevent overwriting drafters lm-head and embed_tokens#27737
Prevent overwriting drafters lm-head and embed_tokens#27737eldarkurtic wants to merge 6 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a more robust mechanism for handling the lm_head and embed_tokens layers in EAGLE3 drafter models. By replacing the fragile shape-matching heuristic with explicit flags set during weight loading, the change correctly prevents overwriting these layers when the drafter model provides its own. This is a significant improvement for correctness and maintainability. My review includes a suggestion to further strengthen the flag-setting logic to prevent potential edge cases.
|
moved all attributes into SupportsEagle3 interface. Could you please re-review? @NickLucche @rahul-tuli @dsikka |
NickLucche
left a comment
There was a problem hiding this comment.
Thanks for the fix @eldarkurtic ! Let's get CI green real quick before merging this one
| # To prevent overriding with target model's layers | ||
| if "lm_head" in name: | ||
| self.has_own_lm_head = True | ||
| if "embed_tokens" in name: |
There was a problem hiding this comment.
Is this a change from the default? I thought that EAGLE3 heads usually share the embedding with the base model?
There was a problem hiding this comment.
Eagle3 by default doesn’t train these layers. But there is no reason not to train them. This doesn’t affect the standard eagle3 flow, just extends it to support this new use case
|
There are other EAGLE1-based models which do not have the EAGLE3 mixin, causing the CI failures. Please update the logic to cover EAGLE1 as well |
vllm/v1/spec_decode/eagle.py
Outdated
| logger.info( | ||
| "Assuming the EAGLE head shares the same vocab embedding" | ||
| " with the target model." | ||
| "Draft model embed_tokens are uninitialized. " |
There was a problem hiding this comment.
This was originally done as a memory optimization since the original EAGLE3 models are released with an embedding layer included, but having the same weights as the base model.
Ideally we would have another check here to delete them if they are present but identical to those of the base model, but that's tricky to implement cleanly. I'm fine with doing it this way for now, but it should be noted that the behaviour is changing
|
Do they use some other mixin similar to SupportsEagle3? |
| @@ -922,6 +922,16 @@ class SupportsEagle3(Protocol): | |||
| MRO of your model class. | |||
| """ | |||
|
|
|||
| has_own_lm_head: ClassVar[bool] = False | |||
There was a problem hiding this comment.
These are not class variables, they are instance variables that are set per-model based on the weight loading.
|
The proper fix here is to add a base SupportsEagle mixin and have SupportsEagle3 inherit from that. Then the other EAGLE classes can inherit from the new base mixin that will give them a reasonable default. |
| @@ -328,6 +328,12 @@ def load_weights(self, weights: Iterable[tuple[str, torch.Tensor]]): | |||
| includes_embed_tokens = True | |||
| model_weights[name] = loaded_weight | |||
|
|
|||
| # To prevent overriding with target model's layers | |||
| if "lm_head" in name: | |||
There was a problem hiding this comment.
This should be added to all the EAGLE classes, not just llama_eagle3.
You might want to refactor this into the mixin to reuse the code between them.
|
Hi @eldarkurtic @robertgshaw2-redhat , just wondering what is the status of this PR? Are we close to getting it merged? Thanks! |
Head branch was pushed to by a user without write access
|
This pull request has merge conflicts that must be resolved before it can be |
4495667 to
e69c88f
Compare
|
Documentation preview: https://vllm--27737.org.readthedocs.build/en/27737/ |
…ialized Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
…ration (vllm-project#27670) Signed-off-by: KevinCheung2259 <2651309292@qq.com> Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
e69c88f to
f766651
Compare
|
Closed in favor of a slightly cleaner approach in #28549 |
Some EAGLE3 drafters might have their own lm_head and/or embed_tokens layers. Existing codebase ignores this and always overrides them with target model's layers.
Test-case 1: for a model which needs copying of lm_head and embed_tokens from verifier model, the behavior should not change
Eval command to verify:
Before this PR:
After this PR:
Test-case 2: for a model with has its own lm_head and embed_tokens, and therefore does not require copying from target's layers, acceptance rates look significantly better
Before this PR:
After this PR:
Note: idea for lm_head check inspired by #27688