Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请求增加多gpu推理 #23

Open
grizxlyzx opened this issue Jul 4, 2023 · 3 comments
Open

请求增加多gpu推理 #23

grizxlyzx opened this issue Jul 4, 2023 · 3 comments

Comments

@grizxlyzx
Copy link

非常感谢你们的工作,
当前版本似乎不支持单机多卡推理。在 8 x RTX 2080 TI 11G 的机子上加载未量化的chatglm2-6b-f16权重,ggml可以检测到8块显卡,但是所有权重都被扔到第一块卡上,导致cuda out of memory。同样的main运行q_4的权重没有任何问题。
系统:Ubuntu20.04
bash运行
$echo CUDA_VISIBLE_DEVICE
返回
0,1,2,3,4,5,6,7
期待修改以支持多卡推理,谢谢!

@grizxlyzx
Copy link
Author

不能多显卡,问题在于chatglm.cpp中,宏定义:
#ifdef GGML_USE_CUBLAS(出现两次,分别对应glm和glm2)中, tensor->backend = GGML_BACKEND_GPU;
这使ggml不会把模型split到多个设备上,将两处都改为
tensor->backend = GGML_BACKEND_GPU_SPLIT;
,则会使ggml根据设备显存和性能,将模型按比例split到多块设备上运行。

@3wweiweiwu
Copy link

最近版本更新了,现在似乎只有一处tensor->backend = GGML_BACKEND_GPU。我试着修改之后编译,之后运行一直报错。请问LZ有什么思路吗?谢谢啦

ggml_init_cublas: found 4 CUDA devices:
Device 0: Tesla M60, compute capability 5.2
Device 1: Tesla M60, compute capability 5.2
Device 2: Tesla M60, compute capability 5.2
Device 3: Tesla M60, compute capability 5.2
GGML_ASSERT: /home/xxs/chatglm.cpp/third_party/ggml/src/ggml-cuda.cu:6100: false
Aborted (core dumped)

@XiaoYangWu
Copy link

请问最后成功了吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants