[Bugfix] Fix: add patch_rope_scaling after hf override #20857

Wangmerlyn · 2025-07-12T09:21:38Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

This PR addresses a compatibility issue when using legacy rope_scaling parameters via --hf-overrides.

When specifying rope_scaling directly in the model's config.json, legacy parameters like:

"rope_scaling": {
  "type": "yarn",
  "factor": 4.0,
  "original_max_position_embeddings": 35268
}

are handled correctly by the patch_rope_scaling function, which maps "type" to "rope_type".

However, when passing the same configuration via --hf-overrides, a KeyError: 'rope_type' is raised:

rope_type = rope_scaling["rope_type"]
KeyError: 'rope_type'

This PR ensures that patch_rope_scaling is invoked after applying hf-overrides, so that legacy parameters are properly converted even when passed via command-line overrides.

This improves backward compatibility and prevents runtime errors when using the --hf-overrides mechanism with legacy rope_scaling configs.

Test Plan

Use the following command to serve the Qwen2.5 model with yarn rope_scaling.

# File: test_legacy_yarn_override.sh
vllm serve Qwen/Qwen2.5-32B-Instruct \
    --tensor-parallel-size 8 \
    --max-model-len 141072 \
    --gpu-memory-utilization 0.5 \
    --hf-overrides '{"rope_scaling": {"factor": 4.0, "type":"yarn", "original_max_position_embeddings":35268}}'

Test Result

Before fix:

$ bash test_legacy_yarn_override.sh
DEBUG 07-12 08:53:25 [__init__.py:31] No plugins for group vllm.platform_plugins found.
DEBUG 07-12 08:53:25 [__init__.py:35] Checking if TPU platform is available.
DEBUG 07-12 08:53:25 [__init__.py:45] TPU platform is not available because: No module named 'libtpu'
DEBUG 07-12 08:53:25 [__init__.py:52] Checking if CUDA platform is available.
DEBUG 07-12 08:53:25 [__init__.py:72] Confirmed CUDA platform is available.
DEBUG 07-12 08:53:25 [__init__.py:100] Checking if ROCm platform is available.
DEBUG 07-12 08:53:25 [__init__.py:114] ROCm platform is not available because: No module named 'amdsmi'
DEBUG 07-12 08:53:25 [__init__.py:121] Checking if HPU platform is available.
DEBUG 07-12 08:53:25 [__init__.py:128] HPU platform is not available because habana_frameworks is not found.
DEBUG 07-12 08:53:25 [__init__.py:138] Checking if XPU platform is available.
DEBUG 07-12 08:53:25 [__init__.py:157] XPU platform is not available because: No module named 'intel_extension_for_pytorch'
DEBUG 07-12 08:53:25 [__init__.py:164] Checking if CPU platform is available.
DEBUG 07-12 08:53:25 [__init__.py:186] Checking if Neuron platform is available.
DEBUG 07-12 08:53:25 [__init__.py:52] Checking if CUDA platform is available.
DEBUG 07-12 08:53:25 [__init__.py:72] Confirmed CUDA platform is available.
INFO 07-12 08:53:25 [__init__.py:253] Automatically detected platform cuda.
DEBUG 07-12 08:53:29 [utils.py:162] Setting VLLM_WORKER_MULTIPROC_METHOD to 'spawn'
DEBUG 07-12 08:53:29 [__init__.py:39] Available plugins for group vllm.general_plugins:
DEBUG 07-12 08:53:29 [__init__.py:41] - lora_filesystem_resolver -> vllm.plugins.lora_resolvers.filesystem_resolver:register_filesystem_resolver
DEBUG 07-12 08:53:29 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 07-12 08:53:29 [api_server.py:1639] vLLM API server version 0.1.dev7658+gb639327
INFO 07-12 08:53:29 [cli_args.py:325] non-default args: {'model': '/mnt/longcontext/models/siyuan/llama3/Qwen2.5-32B-Instruct', 'max_model_len': 141072, 'hf_overrides': {'rope_scaling': {'factor': 4.0, 'type': 'yarn', 'original_max_position_embeddings': 35268}}, 'tensor_parallel_size': 8, 'gpu_memory_utilization': 0.5}
WARNING 07-12 08:53:29 [__init__.py:2703] Found ulimit of 4096 and failed to automatically increase with error current limit exceeds maximum limit. This can cause fd limit errors like `OSError: [Errno 24] Too many open files`. Consider increasing with ulimit -n
DEBUG 07-12 08:53:30 [config.py:541] Overriding HF config with {'rope_scaling': {'factor': 4.0, 'type': 'yarn', 'original_max_position_embeddings': 35268}}
INFO 07-12 08:53:37 [config.py:852] This model supports multiple tasks: {'classify', 'reward', 'embed', 'generate'}. Defaulting to 'generate'.
Traceback (most recent call last):
  File "/home/aiscuser/.conda/envs/vllm_test/bin/vllm", line 8, in <module>
    sys.exit(main())
  File "/home/aiscuser/vllm/vllm/entrypoints/cli/main.py", line 65, in main
    args.dispatch_function(args)
  File "/home/aiscuser/vllm/vllm/entrypoints/cli/serve.py", line 57, in cmd
    uvloop.run(run_server(args))
  File "/home/aiscuser/.conda/envs/vllm_test/lib/python3.10/site-packages/uvloop/__init__.py", line 82, in run
    return loop.run_until_complete(wrapper())
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/home/aiscuser/.conda/envs/vllm_test/lib/python3.10/site-packages/uvloop/__init__.py", line 61, in wrapper
    return await main
  File "/home/aiscuser/vllm/vllm/entrypoints/openai/api_server.py", line 1675, in run_server
    await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
  File "/home/aiscuser/vllm/vllm/entrypoints/openai/api_server.py", line 1695, in run_server_worker
    async with build_async_engine_client(args, client_config) as engine_client:
  File "/home/aiscuser/.conda/envs/vllm_test/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/home/aiscuser/vllm/vllm/entrypoints/openai/api_server.py", line 158, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
  File "/home/aiscuser/.conda/envs/vllm_test/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/home/aiscuser/vllm/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client_from_engine_args
    vllm_config = engine_args.create_engine_config(usage_context=usage_context)
  File "/home/aiscuser/vllm/vllm/engine/arg_utils.py", line 1104, in create_engine_config
    model_config = self.create_model_config()
  File "/home/aiscuser/vllm/vllm/engine/arg_utils.py", line 976, in create_model_config
    return ModelConfig(
  File "/home/aiscuser/.conda/envs/vllm_test/lib/python3.10/site-packages/pydantic/_internal/_dataclasses.py", line 123, in __init__
    s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
  File "/home/aiscuser/vllm/vllm/config.py", line 617, in __post_init__
    self.max_model_len = self.get_and_verify_max_len(self.max_model_len)
  File "/home/aiscuser/vllm/vllm/config.py", line 1492, in get_and_verify_max_len
    max_model_len = _get_and_verify_max_len(
  File "/home/aiscuser/vllm/vllm/config.py", line 3512, in _get_and_verify_max_len
    rope_type = rope_scaling["rope_type"]
KeyError: 'rope_type'

After the fix:

$ bash test_legacy_yarn_override.sh
DEBUG 07-12 08:55:26 [__init__.py:31] No plugins for group vllm.platform_plugins found.
DEBUG 07-12 08:55:26 [__init__.py:35] Checking if TPU platform is available.
DEBUG 07-12 08:55:26 [__init__.py:45] TPU platform is not available because: No module named 'libtpu'
DEBUG 07-12 08:55:26 [__init__.py:52] Checking if CUDA platform is available.
DEBUG 07-12 08:55:26 [__init__.py:72] Confirmed CUDA platform is available.
DEBUG 07-12 08:55:26 [__init__.py:100] Checking if ROCm platform is available.
DEBUG 07-12 08:55:26 [__init__.py:114] ROCm platform is not available because: No module named 'amdsmi'
DEBUG 07-12 08:55:26 [__init__.py:121] Checking if HPU platform is available.
DEBUG 07-12 08:55:26 [__init__.py:128] HPU platform is not available because habana_frameworks is not found.
DEBUG 07-12 08:55:26 [__init__.py:138] Checking if XPU platform is available.
DEBUG 07-12 08:55:26 [__init__.py:157] XPU platform is not available because: No module named 'intel_extension_for_pytorch'
DEBUG 07-12 08:55:26 [__init__.py:164] Checking if CPU platform is available.
DEBUG 07-12 08:55:26 [__init__.py:186] Checking if Neuron platform is available.
DEBUG 07-12 08:55:26 [__init__.py:52] Checking if CUDA platform is available.
DEBUG 07-12 08:55:26 [__init__.py:72] Confirmed CUDA platform is available.
INFO 07-12 08:55:26 [__init__.py:253] Automatically detected platform cuda.
DEBUG 07-12 08:55:29 [utils.py:162] Setting VLLM_WORKER_MULTIPROC_METHOD to 'spawn'
DEBUG 07-12 08:55:29 [__init__.py:39] Available plugins for group vllm.general_plugins:
DEBUG 07-12 08:55:29 [__init__.py:41] - lora_filesystem_resolver -> vllm.plugins.lora_resolvers.filesystem_resolver:register_filesystem_resolver
DEBUG 07-12 08:55:29 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 07-12 08:55:30 [api_server.py:1639] vLLM API server version 0.1.dev7658+gb639327
INFO 07-12 08:55:30 [cli_args.py:325] non-default args: {'model': '/mnt/longcontext/models/siyuan/llama3/Qwen2.5-32B-Instruct', 'max_model_len': 141072, 'hf_overrides': {'rope_scaling': {'factor': 4.0, 'type': 'yarn', 'original_max_position_embeddings': 35268}}, 'tensor_parallel_size': 8, 'gpu_memory_utilization': 0.5}
WARNING 07-12 08:55:30 [__init__.py:2703] Found ulimit of 4096 and failed to automatically increase with error current limit exceeds maximum limit. This can cause fd limit errors like `OSError: [Errno 24] Too many open files`. Consider increasing with ulimit -n
DEBUG 07-12 08:55:30 [config.py:541] Overriding HF config with {'rope_scaling': {'factor': 4.0, 'type': 'yarn', 'original_max_position_embeddings': 35268}}
INFO 07-12 08:55:30 [config.py:241] Replacing legacy 'type' key with 'rope_type'

(Optional) Documentation Update

Signed-off-by: Wang Siyuan <[email protected]>

github-actions · 2025-07-12T09:21:46Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @Wangmerlyn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical compatibility issue related to how rope_scaling parameters are handled when provided through command-line overrides. By adjusting the order of configuration processing, it ensures that legacy rope_scaling formats are correctly interpreted, thereby improving backward compatibility and preventing runtime errors during model initialization.

Highlights

Bugfix for rope_scaling with --hf-overrides: This PR resolves a KeyError: 'rope_type' that occurred when legacy rope_scaling parameters (e.g., using 'type' instead of 'rope_type') were passed via the --hf-overrides command-line argument, preventing the model from loading correctly.
Improved Configuration Processing Order: The patch_rope_scaling function, which converts legacy rope_scaling formats, is now explicitly invoked after any Hugging Face configuration overrides (--hf-overrides) have been applied. This ensures that the configuration is correctly normalized regardless of its source.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request correctly fixes a KeyError that occurred when using legacy rope_scaling parameters with --hf-overrides. The change to invoke patch_rope_scaling after applying the Hugging Face overrides is logical and directly addresses the issue, as demonstrated by the provided test case.

I've added one comment regarding a pre-existing circular dependency that this change touches upon. Addressing it would improve the long-term maintainability of the configuration logic.

Overall, this is a good fix. Thank you for contributing!

Signed-off-by: Wang Siyuan <[email protected]>

DarkLight1337 · 2025-07-12T13:06:54Z

I think it would be cleaner to move the application of hf_overrides into the get_config function so we can handle this in a unified manner.

Wangmerlyn · 2025-07-12T16:05:21Z

I think it would be cleaner to move the application of hf_overrides into the get_config function so we can handle this in a unified manner.

Thank you for the kind suggestion. I'll try moving the hf_overrides to get_config asap!

Signed-off-by: Wang Siyuan <[email protected]>

DarkLight1337

LGTM, thanks!

…20857) Signed-off-by: Wang Siyuan <[email protected]> Signed-off-by: Wang Siyuan <[email protected]> Signed-off-by: x22x22 <[email protected]>

…20857) Signed-off-by: Wang Siyuan <[email protected]> Signed-off-by: Wang Siyuan <[email protected]>

…20857) Signed-off-by: Wang Siyuan <[email protected]> Signed-off-by: Wang Siyuan <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>

…20857) Signed-off-by: Wang Siyuan <[email protected]> Signed-off-by: Wang Siyuan <[email protected]> Signed-off-by: Paul Pak <[email protected]>

…20857) Signed-off-by: Wang Siyuan <[email protected]> Signed-off-by: Wang Siyuan <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

…20857) Signed-off-by: Wang Siyuan <[email protected]> Signed-off-by: Wang Siyuan <[email protected]>

add patch_rope_scaling after hf override

a7eb120

Signed-off-by: Wang Siyuan <[email protected]>

Wangmerlyn requested review from WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth and youkaichao as code owners July 12, 2025 09:21

gemini-code-assist bot reviewed Jul 12, 2025

View reviewed changes

fix import sort

b32e1ad

Signed-off-by: Wang Siyuan <[email protected]>

refactor hf override to get_config

118ba0b

Signed-off-by: Wang Siyuan <[email protected]>

DarkLight1337 approved these changes Jul 13, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) July 13, 2025 02:36

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 13, 2025

vllm-bot merged commit 247102f into vllm-project:main Jul 13, 2025
71 of 73 checks passed

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025

[Bugfix] Fix: add patch_rope_scaling after hf override (vllm-project#…

84eefce

…20857) Signed-off-by: Wang Siyuan <[email protected]> Signed-off-by: Wang Siyuan <[email protected]>

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

[Bugfix] Fix: add patch_rope_scaling after hf override (vllm-project#…

caa2c29

…20857) Signed-off-by: Wang Siyuan <[email protected]> Signed-off-by: Wang Siyuan <[email protected]>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 27, 2025

[Bugfix] Fix: add patch_rope_scaling after hf override (vllm-project#…

6987088

…20857) Signed-off-by: Wang Siyuan <[email protected]> Signed-off-by: Wang Siyuan <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix: add patch_rope_scaling after hf override #20857

[Bugfix] Fix: add patch_rope_scaling after hf override #20857

Uh oh!

Wangmerlyn commented Jul 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jul 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

DarkLight1337 commented Jul 12, 2025

Uh oh!

Wangmerlyn commented Jul 12, 2025

Uh oh!

DarkLight1337 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Bugfix] Fix: add patch_rope_scaling after hf override #20857

[Bugfix] Fix: add patch_rope_scaling after hf override #20857

Uh oh!

Conversation

Wangmerlyn commented Jul 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

DarkLight1337 commented Jul 12, 2025

Uh oh!

Wangmerlyn commented Jul 12, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Wangmerlyn commented Jul 12, 2025 •

edited by github-actions bot

Loading