[Frontend] Dynamic RoPE scaling by sasha0552 · Pull Request #4638 · vllm-project/vllm

sasha0552 · 2024-05-07T01:43:26Z

In #555, @WoosukKwon removed dynamic specifying of RoPE scaling with the comment:

As we discussed offline, I removed rope_scaling from ModelConfig and EngineArgs. Now rope_scaling is always read from the model's config.json.

I don't understand why this feature was removed, so this PR brings it back. Specifying the RoPE scaling on the command line is very useful, because otherwise we have to manually modify the config.json that can be managed by huggingface - so each model has to be forked to properly set a different RoPE scaling.

FIX #4334

RoPE scaling allows to use higher context without further fine-tuning

Summarization using meta-llama/Meta-Llama-3-8B-Instruct (has a native context of 8192 tokens) with type = linear and factor = 2.0:

ChatCompletion(id='cmpl-11ec2bc71a734a1eba13eaca5e0c54d9', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="The article provides information about CUDA, a parallel computing platform and programming model created by NVIDIA. It allows software developers to use graphics processing units (GPUs) for general-purpose computing, rather than just graphics processing graphics. CUDA is a proprietary, but has been adopted by many developers. The article covers CUDA's history, features, technical specifications, capabilities, and usage.\n\nCUDA's features and capabilities include:\n\n* Parallel programming model: CUDA allows for parallel processing of tasks on multiple threads and blocks\n* Memory management: CUDA's memory hierarchy includes registers, shared memory, and global memory\n* Execution: CUDA code can be executed on multiple GPUs and CPUs\n* Interoperability: CUDA supports various programming languages, including C, C++, Fortran, and Python\n* APIs: CUDA has a low-level API (CUDA Driver) and high-level (CUDA) API, with libraries and runtime\n\nCUDA has several advantages, including:\n\n* Higher performance, especially for computationally intensive tasks\n* Larger memory bandwidth and storage\n* Better power efficiency\n\nCUDA's compute capabilities include:\n\n* Wide range of compute capabilities (1.0 to 11.1)\n* Different memory types (shared, global, texture, and constant)\n* Instructions (e.g., ALU, INT, FP, and FP16)\n* Number of AL lanes, texture mapping units, and scheduling\n* Warp size and block sizes\n\nCUDA is used for various applications, including:\n\n* Accelerated rendering, video, encryption, and decryption\nBioinformatics, medical simulations, machine learning\nNeural network, proteins, cryptography, and more\n\nCUDA competes with other GPU computing stacks, such as Intel's OneAPI and AMD's ROCm.", role='assistant', function_call=None, tool_calls=None), stop_reason=128009)], created=1715044970, model='meta-llama/Meta-Llama-3-8B-Instruct', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=344, prompt_tokens=12648, total_tokens=12992))

tom-doerr · 2024-05-21T14:07:04Z

Could this get merged? Especially with Llama 3 being very tolerant of RoPE scaling this is very useful

mgoin · 2024-05-21T14:51:08Z

Sure this would be great to get in. Currently we don't have a test for it though and it might be too intensive with a real model, so I would like to see at least a unit test implemented. @sasha0552 could you add a test to tests/test_config.py?

sasha0552 · 2024-05-21T19:27:22Z

@mgoin test added. Can you review? The failures are not related to this PR.

mgoin

Thank you @sasha0552! This is great. We just merged a fix for the failing tests, so please rebase and the tests should pass #4944

sasha0552 · 2024-05-21T22:39:43Z

@mgoin can you merge? All tests passed.

sasha0552 changed the title ~~Dynamic RoPE scaling~~ [Frontend] Dynamic RoPE scaling May 7, 2024

sasha0552 force-pushed the dynamic-rope branch from 7afc83d to 7e09d6b Compare May 7, 2024 04:59

mgoin reviewed May 14, 2024

View reviewed changes

Comment thread vllm/engine/arg_utils.py Outdated

Comment thread vllm/transformers_utils/config.py Outdated

sasha0552 force-pushed the dynamic-rope branch from 7e09d6b to 68f60b7 Compare May 14, 2024 20:29

sasha0552 requested a review from mgoin May 14, 2024 20:31

sasha0552 mentioned this pull request May 18, 2024

v0.4.3 Release Tracker #4895

Closed

6 tasks

mgoin approved these changes May 21, 2024

View reviewed changes

mgoin enabled auto-merge (squash) May 21, 2024 19:39

sasha0552 added 4 commits May 21, 2024 19:44

Dynamic RoPE scaling

8bae38f

Add rope scaling test

1635321

Fix test

7a43a77

Fix test

bb81a2e

auto-merge was automatically disabled May 21, 2024 19:45
Head branch was pushed to by a user without write access

sasha0552 force-pushed the dynamic-rope branch from f1b3bf3 to bb81a2e Compare May 21, 2024 19:45

Minor fix

22c76cf

mgoin merged commit 9b9a10d into vllm-project:main May 22, 2024

sasha0552 deleted the dynamic-rope branch May 22, 2024 07:07

sparsh35 mentioned this pull request May 24, 2024

Add RoPE scaling arguments to engine aphrodite-engine/aphrodite-engine#220

Closed

dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request May 31, 2024

[Frontend] Dynamic RoPE scaling (vllm-project#4638)

075223e

robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jun 8, 2024

[Frontend] Dynamic RoPE scaling (vllm-project#4638)

c1672a9

joerunde pushed a commit to joerunde/vllm that referenced this pull request Jun 17, 2024

[Frontend] Dynamic RoPE scaling (vllm-project#4638)

d002313

jeffreymeetkai mentioned this pull request Jun 28, 2024

Feature Request: Support for Additional vLLM Configuration Settings MeetKai/functionary#213

Closed

robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jul 14, 2024

[Frontend] Dynamic RoPE scaling (vllm-project#4638)

2989ade

CUHKSZzxy mentioned this pull request Jul 9, 2025

Override HF config.json via CLI InternLM/lmdeploy#3722

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Frontend] Dynamic RoPE scaling#4638

[Frontend] Dynamic RoPE scaling#4638
mgoin merged 5 commits intovllm-project:mainfrom
sasha0552:dynamic-rope

sasha0552 commented May 7, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

tom-doerr commented May 21, 2024

Uh oh!

mgoin commented May 21, 2024

Uh oh!

sasha0552 commented May 21, 2024

Uh oh!

mgoin left a comment

Uh oh!

sasha0552 commented May 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

sasha0552 commented May 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tom-doerr commented May 21, 2024

Uh oh!

mgoin commented May 21, 2024

Uh oh!

sasha0552 commented May 21, 2024

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

sasha0552 commented May 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sasha0552 commented May 7, 2024 •

edited

Loading