[Core]Consolidate RoPE-related parsing into ModelArchitectureConfig#32989
[Core]Consolidate RoPE-related parsing into ModelArchitectureConfig#32989charlotte12l wants to merge 8 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request is a well-executed refactoring that consolidates RoPE-related configuration parsing into the ModelArchitectureConfig. By moving this logic from vllm/config/model.py to vllm/transformers_utils/model_arch_config_convertor.py, it centralizes all RoPE-related decisions, improving maintainability and reducing direct access to HuggingFace configs in model.py. A key improvement is the correction of the application order for RoPE scaling and context length caps, which resolves a potential bug where hard architectural limits like sliding_window were incorrectly scaled. The changes are clean, logical, and enhance the robustness of the configuration handling. The introduction of the MaxModelLenInfo named tuple provides a clear and well-documented interface for maximum length information. Overall, this is a high-quality contribution.
|
This pull request has merge conflicts that must be resolved before it can be |
Purpose
We are introducing model_arch_config #28454, which defines explicitly what kind of information vLLM engine need from hugggingface config/ user-defined config, so we could avoid hf_config/getattr(hf_config, xxx) got passing around in engine.
Before this PR,
_get_and_verify_max_lenin model.py directly accessed multiple HuggingFace config fields:By moving this logic into the convertor, we:
Behavior Differences/ Potential Bug Fix
Order of RoPE Scaling vs Caps
Before: RoPE scaling was applied AFTER sliding_window and tokenizer caps
After: RoPE scaling is applied BEFORE caps
Rationale: The new order is more correct. Sliding_window is a hard architectural limit that shouldn't be scaled up. The old behavior could result in incorrect values like sliding_window(2048) * factor(8) = 16384.
Test Plan
Select several models, run and get
max_model_lengroundtruth before this PR, compare if the values after this PR is the same.Test Result
passed
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.