Skip to content

[Core]Consolidate RoPE-related parsing into ModelArchitectureConfig#32989

Open
charlotte12l wants to merge 8 commits intovllm-project:mainfrom
charlotte12l:rope_refactor
Open

[Core]Consolidate RoPE-related parsing into ModelArchitectureConfig#32989
charlotte12l wants to merge 8 commits intovllm-project:mainfrom
charlotte12l:rope_refactor

Conversation

@charlotte12l
Copy link
Contributor

@charlotte12l charlotte12l commented Jan 24, 2026

Purpose

We are introducing model_arch_config #28454, which defines explicitly what kind of information vLLM engine need from hugggingface config/ user-defined config, so we could avoid hf_config/getattr(hf_config, xxx) got passing around in engine.

Before this PR, _get_and_verify_max_len in model.py directly accessed multiple HuggingFace config fields:

  • rope_parameters
  • original_max_position_embeddings
  • model_max_length

By moving this logic into the convertor, we:

  • Centralize all RoPE-related decisions in one place
  • Reduce hf_config access in model.py

Behavior Differences/ Potential Bug Fix

Order of RoPE Scaling vs Caps
Before: RoPE scaling was applied AFTER sliding_window and tokenizer caps

  1. Get base max_len from config
  2. Apply sliding_window cap
  3. Apply tokenizer cap
  4. Apply RoPE scaling (yarn override + scaling_factor)

After: RoPE scaling is applied BEFORE caps

  1. Get base max_len from config
  2. Apply RoPE scaling (yarn override + scaling_factor) <- Moved here
  3. Apply sliding_window cap
  4. Apply tokenizer cap

Rationale: The new order is more correct. Sliding_window is a hard architectural limit that shouldn't be scaled up. The old behavior could result in incorrect values like sliding_window(2048) * factor(8) = 16384.

Test Plan

Select several models, run and get max_model_len groundtruth before this PR, compare if the values after this PR is the same.

pytest  tests/config/test_model_arch_config.py

Test Result

passed


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a well-executed refactoring that consolidates RoPE-related configuration parsing into the ModelArchitectureConfig. By moving this logic from vllm/config/model.py to vllm/transformers_utils/model_arch_config_convertor.py, it centralizes all RoPE-related decisions, improving maintainability and reducing direct access to HuggingFace configs in model.py. A key improvement is the correction of the application order for RoPE scaling and context length caps, which resolves a potential bug where hard architectural limits like sliding_window were incorrectly scaled. The changes are clean, logical, and enhance the robustness of the configuration handling. The introduction of the MaxModelLenInfo named tuple provides a clear and well-documented interface for maximum length information. Overall, this is a high-quality contribution.

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
@charlotte12l charlotte12l changed the title [Misc]Consolidate RoPE-related parsing into ModelArchitectureConfig [Core]Consolidate RoPE-related parsing into ModelArchitectureConfig Jan 25, 2026
@charlotte12l charlotte12l marked this pull request as ready for review January 25, 2026 08:10
@mergify
Copy link

mergify bot commented Jan 31, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @charlotte12l.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant