You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
非常感谢你们的工作,
当前版本似乎不支持单机多卡推理。在 8 x RTX 2080 TI 11G 的机子上加载未量化的chatglm2-6b-f16权重,ggml可以检测到8块显卡,但是所有权重都被扔到第一块卡上,导致cuda out of memory。同样的main运行q_4的权重没有任何问题。
系统:Ubuntu20.04
bash运行 $echo CUDA_VISIBLE_DEVICE
返回 0,1,2,3,4,5,6,7
期待修改以支持多卡推理,谢谢!
The text was updated successfully, but these errors were encountered:
非常感谢你们的工作,
当前版本似乎不支持单机多卡推理。在 8 x RTX 2080 TI 11G 的机子上加载未量化的chatglm2-6b-f16权重,ggml可以检测到8块显卡,但是所有权重都被扔到第一块卡上,导致cuda out of memory。同样的main运行q_4的权重没有任何问题。
系统:Ubuntu20.04
bash运行
$echo CUDA_VISIBLE_DEVICE
返回
0,1,2,3,4,5,6,7
期待修改以支持多卡推理,谢谢!
The text was updated successfully, but these errors were encountered: