Add default 'auto' MODEL_IMPL_TYPE that resolves based on architecture#1255
Add default 'auto' MODEL_IMPL_TYPE that resolves based on architecture#1255kyuyeunk merged 3 commits intovllm-project:mainfrom
Conversation
|
@kyuyeunk Please review. |
kyuyeunk
left a comment
There was a problem hiding this comment.
Wouldn't it be possible move 'auto' to 'match/case' as well?
|
It is possible to move it in to match-case, but in that case it will have duplicated codes, including: get_vllm_model, get_flax_model and the fall back check. I think resolve first then use the same code will be more clean. |
|
I don't see any update? Can you verify if the changes were pushed? |
kyuyeunk
left a comment
There was a problem hiding this comment.
LGTM. Thank you for working on it!
|
seems like CI is failing. Have you rebase the branch to HEAD? |
|
I believe so, let me look into it. |
|
Looks like tests/layers/vllm/test_awq.py is failing from the main branch as well. |
|
seems like it's due to an upstream change. let me create a quick fix for this. |
|
Please wait until this is merged: #1284 |
|
the PR has been merged. please update the branch and try again. |
- Add 'auto' as default value for MODEL_IMPL_TYPE env var - For GptOssForCausalLM, 'auto' resolves to 'vllm' for better performance - For all other architectures, 'auto' resolves to 'flax_nnx' - Add _VLLM_REQUIRED_ARCHITECTURES frozenset in model_loader.py - Use match/case pattern in get_model() for implementation selection - Add tests for 'auto' resolution behavior Signed-off-by: Xing Liu <xingliu14@gmail.com>
Please fix pre-commit failure. |
|
Thank you so much for making this feature! |
Description
autoas default value for MODEL_IMPL_TYPE env varautoresolves tovllmfor better performanceflax_nnxfor better performanceTests
pytest
Checklist
Before submitting this PR, please make sure: