Skip to content

Conversation

@Wangmerlyn
Copy link
Contributor

@Wangmerlyn Wangmerlyn commented Jul 12, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

This PR addresses a compatibility issue when using legacy rope_scaling parameters via --hf-overrides.

When specifying rope_scaling directly in the model's config.json, legacy parameters like:

"rope_scaling": {
  "type": "yarn",
  "factor": 4.0,
  "original_max_position_embeddings": 35268
}

are handled correctly by the patch_rope_scaling function, which maps "type" to "rope_type".

However, when passing the same configuration via --hf-overrides, a KeyError: 'rope_type' is raised:

rope_type = rope_scaling["rope_type"]
KeyError: 'rope_type'

This PR ensures that patch_rope_scaling is invoked after applying hf-overrides, so that legacy parameters are properly converted even when passed via command-line overrides.

This improves backward compatibility and prevents runtime errors when using the --hf-overrides mechanism with legacy rope_scaling configs.

Test Plan

Use the following command to serve the Qwen2.5 model with yarn rope_scaling.

# File: test_legacy_yarn_override.sh
vllm serve Qwen/Qwen2.5-32B-Instruct \
    --tensor-parallel-size 8 \
    --max-model-len 141072 \
    --gpu-memory-utilization 0.5 \
    --hf-overrides '{"rope_scaling": {"factor": 4.0, "type":"yarn", "original_max_position_embeddings":35268}}'

Test Result

Before fix:

$ bash test_legacy_yarn_override.sh
DEBUG 07-12 08:53:25 [__init__.py:31] No plugins for group vllm.platform_plugins found.
DEBUG 07-12 08:53:25 [__init__.py:35] Checking if TPU platform is available.
DEBUG 07-12 08:53:25 [__init__.py:45] TPU platform is not available because: No module named 'libtpu'
DEBUG 07-12 08:53:25 [__init__.py:52] Checking if CUDA platform is available.
DEBUG 07-12 08:53:25 [__init__.py:72] Confirmed CUDA platform is available.
DEBUG 07-12 08:53:25 [__init__.py:100] Checking if ROCm platform is available.
DEBUG 07-12 08:53:25 [__init__.py:114] ROCm platform is not available because: No module named 'amdsmi'
DEBUG 07-12 08:53:25 [__init__.py:121] Checking if HPU platform is available.
DEBUG 07-12 08:53:25 [__init__.py:128] HPU platform is not available because habana_frameworks is not found.
DEBUG 07-12 08:53:25 [__init__.py:138] Checking if XPU platform is available.
DEBUG 07-12 08:53:25 [__init__.py:157] XPU platform is not available because: No module named 'intel_extension_for_pytorch'
DEBUG 07-12 08:53:25 [__init__.py:164] Checking if CPU platform is available.
DEBUG 07-12 08:53:25 [__init__.py:186] Checking if Neuron platform is available.
DEBUG 07-12 08:53:25 [__init__.py:52] Checking if CUDA platform is available.
DEBUG 07-12 08:53:25 [__init__.py:72] Confirmed CUDA platform is available.
INFO 07-12 08:53:25 [__init__.py:253] Automatically detected platform cuda.
DEBUG 07-12 08:53:29 [utils.py:162] Setting VLLM_WORKER_MULTIPROC_METHOD to 'spawn'
DEBUG 07-12 08:53:29 [__init__.py:39] Available plugins for group vllm.general_plugins:
DEBUG 07-12 08:53:29 [__init__.py:41] - lora_filesystem_resolver -> vllm.plugins.lora_resolvers.filesystem_resolver:register_filesystem_resolver
DEBUG 07-12 08:53:29 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 07-12 08:53:29 [api_server.py:1639] vLLM API server version 0.1.dev7658+gb639327
INFO 07-12 08:53:29 [cli_args.py:325] non-default args: {'model': '/mnt/longcontext/models/siyuan/llama3/Qwen2.5-32B-Instruct', 'max_model_len': 141072, 'hf_overrides': {'rope_scaling': {'factor': 4.0, 'type': 'yarn', 'original_max_position_embeddings': 35268}}, 'tensor_parallel_size': 8, 'gpu_memory_utilization': 0.5}
WARNING 07-12 08:53:29 [__init__.py:2703] Found ulimit of 4096 and failed to automatically increase with error current limit exceeds maximum limit. This can cause fd limit errors like `OSError: [Errno 24] Too many open files`. Consider increasing with ulimit -n
DEBUG 07-12 08:53:30 [config.py:541] Overriding HF config with {'rope_scaling': {'factor': 4.0, 'type': 'yarn', 'original_max_position_embeddings': 35268}}
INFO 07-12 08:53:37 [config.py:852] This model supports multiple tasks: {'classify', 'reward', 'embed', 'generate'}. Defaulting to 'generate'.
Traceback (most recent call last):
  File "/home/aiscuser/.conda/envs/vllm_test/bin/vllm", line 8, in <module>
    sys.exit(main())
  File "/home/aiscuser/vllm/vllm/entrypoints/cli/main.py", line 65, in main
    args.dispatch_function(args)
  File "/home/aiscuser/vllm/vllm/entrypoints/cli/serve.py", line 57, in cmd
    uvloop.run(run_server(args))
  File "/home/aiscuser/.conda/envs/vllm_test/lib/python3.10/site-packages/uvloop/__init__.py", line 82, in run
    return loop.run_until_complete(wrapper())
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/home/aiscuser/.conda/envs/vllm_test/lib/python3.10/site-packages/uvloop/__init__.py", line 61, in wrapper
    return await main
  File "/home/aiscuser/vllm/vllm/entrypoints/openai/api_server.py", line 1675, in run_server
    await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
  File "/home/aiscuser/vllm/vllm/entrypoints/openai/api_server.py", line 1695, in run_server_worker
    async with build_async_engine_client(args, client_config) as engine_client:
  File "/home/aiscuser/.conda/envs/vllm_test/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/home/aiscuser/vllm/vllm/entrypoints/openai/api_server.py", line 158, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
  File "/home/aiscuser/.conda/envs/vllm_test/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/home/aiscuser/vllm/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client_from_engine_args
    vllm_config = engine_args.create_engine_config(usage_context=usage_context)
  File "/home/aiscuser/vllm/vllm/engine/arg_utils.py", line 1104, in create_engine_config
    model_config = self.create_model_config()
  File "/home/aiscuser/vllm/vllm/engine/arg_utils.py", line 976, in create_model_config
    return ModelConfig(
  File "/home/aiscuser/.conda/envs/vllm_test/lib/python3.10/site-packages/pydantic/_internal/_dataclasses.py", line 123, in __init__
    s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
  File "/home/aiscuser/vllm/vllm/config.py", line 617, in __post_init__
    self.max_model_len = self.get_and_verify_max_len(self.max_model_len)
  File "/home/aiscuser/vllm/vllm/config.py", line 1492, in get_and_verify_max_len
    max_model_len = _get_and_verify_max_len(
  File "/home/aiscuser/vllm/vllm/config.py", line 3512, in _get_and_verify_max_len
    rope_type = rope_scaling["rope_type"]
KeyError: 'rope_type'

After the fix:

$ bash test_legacy_yarn_override.sh
DEBUG 07-12 08:55:26 [__init__.py:31] No plugins for group vllm.platform_plugins found.
DEBUG 07-12 08:55:26 [__init__.py:35] Checking if TPU platform is available.
DEBUG 07-12 08:55:26 [__init__.py:45] TPU platform is not available because: No module named 'libtpu'
DEBUG 07-12 08:55:26 [__init__.py:52] Checking if CUDA platform is available.
DEBUG 07-12 08:55:26 [__init__.py:72] Confirmed CUDA platform is available.
DEBUG 07-12 08:55:26 [__init__.py:100] Checking if ROCm platform is available.
DEBUG 07-12 08:55:26 [__init__.py:114] ROCm platform is not available because: No module named 'amdsmi'
DEBUG 07-12 08:55:26 [__init__.py:121] Checking if HPU platform is available.
DEBUG 07-12 08:55:26 [__init__.py:128] HPU platform is not available because habana_frameworks is not found.
DEBUG 07-12 08:55:26 [__init__.py:138] Checking if XPU platform is available.
DEBUG 07-12 08:55:26 [__init__.py:157] XPU platform is not available because: No module named 'intel_extension_for_pytorch'
DEBUG 07-12 08:55:26 [__init__.py:164] Checking if CPU platform is available.
DEBUG 07-12 08:55:26 [__init__.py:186] Checking if Neuron platform is available.
DEBUG 07-12 08:55:26 [__init__.py:52] Checking if CUDA platform is available.
DEBUG 07-12 08:55:26 [__init__.py:72] Confirmed CUDA platform is available.
INFO 07-12 08:55:26 [__init__.py:253] Automatically detected platform cuda.
DEBUG 07-12 08:55:29 [utils.py:162] Setting VLLM_WORKER_MULTIPROC_METHOD to 'spawn'
DEBUG 07-12 08:55:29 [__init__.py:39] Available plugins for group vllm.general_plugins:
DEBUG 07-12 08:55:29 [__init__.py:41] - lora_filesystem_resolver -> vllm.plugins.lora_resolvers.filesystem_resolver:register_filesystem_resolver
DEBUG 07-12 08:55:29 [__init__.py:44] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 07-12 08:55:30 [api_server.py:1639] vLLM API server version 0.1.dev7658+gb639327
INFO 07-12 08:55:30 [cli_args.py:325] non-default args: {'model': '/mnt/longcontext/models/siyuan/llama3/Qwen2.5-32B-Instruct', 'max_model_len': 141072, 'hf_overrides': {'rope_scaling': {'factor': 4.0, 'type': 'yarn', 'original_max_position_embeddings': 35268}}, 'tensor_parallel_size': 8, 'gpu_memory_utilization': 0.5}
WARNING 07-12 08:55:30 [__init__.py:2703] Found ulimit of 4096 and failed to automatically increase with error current limit exceeds maximum limit. This can cause fd limit errors like `OSError: [Errno 24] Too many open files`. Consider increasing with ulimit -n
DEBUG 07-12 08:55:30 [config.py:541] Overriding HF config with {'rope_scaling': {'factor': 4.0, 'type': 'yarn', 'original_max_position_embeddings': 35268}}
INFO 07-12 08:55:30 [config.py:241] Replacing legacy 'type' key with 'rope_type'

(Optional) Documentation Update

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @Wangmerlyn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical compatibility issue related to how rope_scaling parameters are handled when provided through command-line overrides. By adjusting the order of configuration processing, it ensures that legacy rope_scaling formats are correctly interpreted, thereby improving backward compatibility and preventing runtime errors during model initialization.

Highlights

  • Bugfix for rope_scaling with --hf-overrides: This PR resolves a KeyError: 'rope_type' that occurred when legacy rope_scaling parameters (e.g., using 'type' instead of 'rope_type') were passed via the --hf-overrides command-line argument, preventing the model from loading correctly.
  • Improved Configuration Processing Order: The patch_rope_scaling function, which converts legacy rope_scaling formats, is now explicitly invoked after any Hugging Face configuration overrides (--hf-overrides) have been applied. This ensures that the configuration is correctly normalized regardless of its source.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a KeyError that occurred when using legacy rope_scaling parameters with --hf-overrides. The change to invoke patch_rope_scaling after applying the Hugging Face overrides is logical and directly addresses the issue, as demonstrated by the provided test case.

I've added one comment regarding a pre-existing circular dependency that this change touches upon. Addressing it would improve the long-term maintainability of the configuration logic.

Overall, this is a good fix. Thank you for contributing!

Signed-off-by: Wang Siyuan <[email protected]>
@DarkLight1337
Copy link
Member

I think it would be cleaner to move the application of hf_overrides into the get_config function so we can handle this in a unified manner.

@Wangmerlyn
Copy link
Contributor Author

I think it would be cleaner to move the application of hf_overrides into the get_config function so we can handle this in a unified manner.

Thank you for the kind suggestion. I'll try moving the hf_overrides to get_config asap!

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) July 13, 2025 02:36
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 13, 2025
@vllm-bot vllm-bot merged commit 247102f into vllm-project:main Jul 13, 2025
71 of 73 checks passed
x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025
Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025
npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
…20857)

Signed-off-by: Wang Siyuan <[email protected]>
Signed-off-by: Wang Siyuan <[email protected]>
Signed-off-by: Jinzhen Lin <[email protected]>
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
…20857)

Signed-off-by: Wang Siyuan <[email protected]>
Signed-off-by: Wang Siyuan <[email protected]>
Signed-off-by: Diego-Castan <[email protected]>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants