[0.13.0][cherry-pick][BugFix] Support setting tp=1 for the Eagle draft model to take effect#5804
Conversation
Signed-off-by: zhaomingyu <zhaomingyu13@h-partners.com> Co-authored-by: drslark <slarksblood@qq.com>
There was a problem hiding this comment.
Code Review
This pull request aims to fix a bug where setting tensor_parallel_size=1 for the Eagle draft model was not taking effect. The approach of creating a temporary tensor parallel group and patching the global TP group during the draft model loading is sound. However, I've identified a critical issue in the implementation that prevents it from working as intended. The new tensor parallel group is being created with group_name="tp", which is the same name as the main tensor parallel group. This causes init_model_parallel_group to return the existing main TP group instead of creating a new one, rendering the patch ineffective. I have provided a suggestion to resolve this. The accompanying test additions and an unrelated but correct fix for UniformTypeKVCacheSpecs are well-implemented.
What this PR does / why we need it?
According to the official documentation, the parameter "draft_tensor_parallel_size": 1 is supposed to be applied to the Eagle3 model. However, based on actual debugging, it was found that the number of tensor parallelisms (tp) of the Eagle model is consistent with that of the target model. The setting of tp for the draft model did not take effect as expected.
Note: This feature has not been superimposed and tested with
spanddp. It will be adapted laterDoes this PR introduce any user-facing change?
No
How was this patch tested?
Fixes vllm-project/vllm#31345