-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MoE support for turbomind #2621
Conversation
|
@lzhangzz it seems the converzation cannot be stoped on 4bits models during oc evaluation. for example internlm2_5-7b-chat-4bits |
@lzhangzz convert error lmdeploy convert internlm /nvme/qa_test_models/internlm/internlm-chat-20b --dst-path /nvme/qa_test_models/autotest_model/workspace_internlm/internlm-chat-20b --tp 2
|
|
internvl models raise error on V100
|
lmdeploy chat /nvme/qa_test_models/openbmb/MiniCPM-V-2_6 --backend turbomind --session-len 4096 --tp 2
lmdeploy chat /nvme/qa_test_models/internlm/internlm2_5-7b-chat-inner-gptq --backend turbomind --session-len 4096 --tp 1 --model-format gptq
|
|
MoE都支持了,是否可以在turbomind支持下Multi-Lora呀,万分感谢🙏 |
* initial moe support * dynamic grouped gemm * benchmark * moe benchmark * moe sampling * split-k * refactor tuning * simplify * n-major weight * add `num` for `MatrixLayout` * packed rows * packed cols * dispatch for packed rows * w4a16 moe * refactor model loading * fix pytorch loader * refactor * dispatch w4a16 moe * fix loader * add comment * fix msvc build * fix msvc build * fix msvc build * fix ut * fix ut * fix p-lora * add all support arches * minor * fix lint * fix lint * fix lint * fix ut * bf16 support * minor * refactor * fix lint * fix ut * minor * minor * minor * fix inter_size config * load with non-standard filenames * fix loader * fix missing default param * defer the loading of misc weights for safetensors * fix conversion * fix deepseek-vl * verify model config * pad inter size by group size and tp * fix minicpm attn bias & ignore un-needed bias * set `attn_bias` based on minicpm version
No description provided.