[EAGLE] [Quantization] turn quantization off for draft model initialization#26411
[EAGLE] [Quantization] turn quantization off for draft model initialization#26411jmkuebler wants to merge 2 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
|
Will this break support for draft models which are quantized? I don't know of any that are used in practice but I can't be sure that a quantized eagle head isn't being used |
Pretty sure the answer is no. Given the way it currently is, the draft model is initialized with the target model's quantization config. So any quantization config in the draft model checkpoint would not take any effect as of now. |
|
@benchislett notice that #25883 introduced a quant_config overwrite specifically in case of EAGLE3. But that approach still works with my changes. |
|
@rahul-tuli can you take a look at this? |
|
I think the approach I take should propagate to all EAGLE variants (and we could thus undo the previous fix). For the other approach we have to fix it in each. Probably would also require a fix for Llama-4 Eagle if we follow the per-eagle patching |
You are correct, in the sense that the previous fix would have to be applied to each |
|
@rahul-tuli but I don't think the drafters quant config is ever used to overwrite the target models quant_config -- because that would then actually. change the targets models config as well. So my change will never overwrite a draft models quant config. We propagate the draft model config separately to the model, any application of the draft models quant config, would presumably happen within the specific drafter model class. And then my change is not a problem either. If you prefer, I can also patch |
|
to be a bit more crisp: |
|
@rahul-tuli wdyt how to proceed? |
|
I looked for a better way to do this but so far it seems like approach from #25883 seems like the most reasonable, we will have some duplicate code though which is not that great, will put up a PR for that |
I don't think this is correct. We can have a separate drafters config with @rahul-tuli 's approach from #25883 where we override (not overwrite) the quant config. To avoid the duplication, the better way is to define a Also, we should have the flexibility to have a different quant config for drafter and verifier. There are use cases where we need it. Also, the current implementation in |
I made a fix PR here: #26590 Could you review, happy to make any changes |
|
Closing this PR in favor of #26590 |
Purpose
Fixes #26402
As an alternative, I have also considered creating a deepcopy of the
VllmConfigto the drafter, but then this results in errors, as the drafter makes some modifications that would not be flushed back in this case.Test Plan
I ran the same commands as detailed in #26402 and verified that now the drafting works correctly also if the target model is quantized.
Test Result
Fixes the issue.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.