-
Notifications
You must be signed in to change notification settings - Fork 534
Adds vLLM as Option for Local App #693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 4 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
7dbd31d
Update local-apps.ts
EliMCosta 32560d8
Update packages/tasks/src/local-apps.ts
EliMCosta f97747a
Update packages/tasks/src/local-apps.ts
EliMCosta 82233de
Update local-apps.ts
EliMCosta 2123430
Validation for `config.quantization_config.quant_method`
julien-c 43de0e9
Update local-apps.ts
EliMCosta 8557110
Update packages/tasks/src/local-apps.ts
EliMCosta 2c3c6c2
Merge branch 'main' into patch-4
krampstudio 17ad182
fix: udpate snippets
krampstudio 2bb1bc1
Update packages/tasks/src/local-apps.ts
krampstudio d63b7cb
fix: rely only on gguf tag
krampstudio 6fd56ef
Merge branch 'main' into patch-4
krampstudio 0308303
Merge branch 'main' into patch-4
pcuenca File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would you define those methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, the suggested vLLM method deploys the non-quantized version from the Hugginface repository. All examples of type "text-generation" in the code are GGUF. Any suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR! Concretely we support a set of
architectureswhich is readable from the model datahuggingface.js/packages/tasks/src/model-data.ts
Line 40 in 04a0eb4
https://github.com/vllm-project/vllm/blob/757b62c49560baa6f294310a53032348a0d95939/vllm/model_executor/models/__init__.py#L13-L63
And for quantization method we can read in
config.quantization_config.quant_methodwhich we support awq, gptq, aqlm, and marlinhttps://huggingface.co/TheBloke/zephyr-7B-alpha-AWQ/blob/main/config.json#L28
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome @simon-mo, super clear!
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've pushed 2123430 on this PR to type
config.quantization_config.quant_methodwhich we now parse & pass from the HubThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some changes, I need your help to review