[Frontend] Dynamic RoPE scaling#4638
Merged
mgoin merged 5 commits intovllm-project:mainfrom May 22, 2024
Merged
Conversation
mgoin
reviewed
May 14, 2024
|
Could this get merged? Especially with Llama 3 being very tolerant of RoPE scaling this is very useful |
Member
|
Sure this would be great to get in. Currently we don't have a test for it though and it might be too intensive with a real model, so I would like to see at least a unit test implemented. @sasha0552 could you add a test to |
Contributor
Author
|
@mgoin test added. Can you review? The failures are not related to this PR. |
mgoin
approved these changes
May 21, 2024
Member
mgoin
left a comment
There was a problem hiding this comment.
Thank you @sasha0552! This is great. We just merged a fix for the failing tests, so please rebase and the tests should pass #4944
auto-merge was automatically disabled
May 21, 2024 19:45
Head branch was pushed to by a user without write access
Contributor
Author
|
@mgoin can you merge? All tests passed. |
dtrifiro
pushed a commit
to opendatahub-io/vllm
that referenced
this pull request
May 31, 2024
robertgshaw2-redhat
pushed a commit
to neuralmagic/nm-vllm
that referenced
this pull request
Jun 8, 2024
joerunde
pushed a commit
to joerunde/vllm
that referenced
this pull request
Jun 17, 2024
robertgshaw2-redhat
pushed a commit
to neuralmagic/nm-vllm
that referenced
this pull request
Jul 14, 2024
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In #555, @WoosukKwon removed dynamic specifying of RoPE scaling with the comment:
I don't understand why this feature was removed, so this PR brings it back. Specifying the RoPE scaling on the command line is very useful, because otherwise we have to manually modify the
config.jsonthat can be managed by huggingface - so each model has to be forked to properly set a different RoPE scaling.FIX #4334
RoPE scaling allows to use higher context without further fine-tuning
Summarization using
meta-llama/Meta-Llama-3-8B-Instruct(has a native context of 8192 tokens) withtype=linearandfactor=2.0: