-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
使用多卡加载模型,推理时报错 #109
Comments
看起来像是socket 通信出了问题 |
我也在多卡推理的时候报错了,chatglm2报错,llama2好像没问题 /lightllm/lightllm/models/chatglm2/layer_infer/transformer_layer_infer.py:30: UserWarning: An output with one or more elements was resized since it had shape [6, 128], which does not match the required output shape [6, 256]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /opt/conda/conda-bld/pytorch_1678402412426/work/aten/src/ATen/native/Resize.cpp:26.) ========= Remote Traceback (1) ========= Traceback (most recent call last): ========= Remote Traceback (1) ========= /lightllm/lightllm/models/chatglm2/layer_infer/transformer_layer_infer.py:30: UserWarning: An output with one or more elements was resized since it had shape [6, 128], which does not match the required output shape [6, 256]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /opt/conda/conda-bld/pytorch_1678402412426/work/aten/src/ATen/native/Resize.cpp:26.) |
@UncleFB chatglm2 这个小模型的维度有点特殊,目前还没有支持多卡。而且这个规模的小模型,单卡的性能已经很不错了。 |
@llehtahw 这个错误你怎么看。 |
看起来rpyc的进程在prefill过程中崩掉了,可能是异常被吞了,也可能是个segfault |
llama2 70b在2个A800上推理,出现过一次这个问题。 |
@ChristineSeven 看起来是推理后端挂掉了,瞎猜可能是参数配置不合理,爆显存把推理进程干掉了。当然如果是共用的机器,更有可能是有人操作杀掉了你的推理进程。 |
--tp 2 --max_total_token_num 12000 --max_req_input_len 3000 --max_req_total_len 8192 --max_new_tokens 4096 --top_k 30 --top_p 0.85 --temperature 0.5 --do_sample True |
@ChristineSeven 你这个配置应该不存在显存上的问题,可能是其他原因导致的。是不是别人后台操作过啥,我个人经验是共享的机器都很容易被误伤。 |
我们有个系统,已分配的卡不会再分配给别人,大家都是在系统上提交的。除非少数人登录机器操作。理论上不会存在共享的问题。 |
@ChristineSeven 嗯,理解。可能会存在其他问题,不过需要找到复现条件才好定位。还有就是建议使用 triton 2.1.0 的版本做长期部署用,triton 2.0.0 存在内存泄露的bug,可能会导致崩溃。 |
@hiworldwzj 有可能是这个问题。我观察到使用过程中显存占用有增大的趋势。 |
docker启动的时候加上--shm-size 参数就ok了 |
这个 nccl 多卡通信,在容器启动得时候确实需要开一个比较大得shm-size, 多谢勘误,久了都忘记这个环境限制了。 |
@hiworldwzj @wx971025 看起来不像是shm-size 的问题。llama 70b启动时给了3T的shm-size ,还是会有这个问题。 10-12 22:27:39: Task exception was never retrieved |
@ChristineSeven 抱歉,我确实是因为默认的shm-size过小导致的,建议您可以free一下看看内存占用的情况 |
@wx971025 是的呢,同一个现象可能是不同的原因导致的。目前两个卡,一个卡上的进程挂掉了,另一个卡大概是这样。 shm-size大概使用了258g这样。 |
我看源码中似乎预料到这个地方会出问题,加了这个注释,# raise if exception |
@hiworldwzj 有什么修改建议不? |
这个问题有大佬解决了吗? |
@CXH19940504 chatglm2 因为结构有点特殊,暂时没有支持多卡,单卡应该是正常的。 |
@CXH19940504 我今天看了一下chatglm2的代码,感觉确实可能有点问题,稍等确认修复一下。 |
@hiworldwzj chatglm2多卡的问题修复了吗? 我在8卡3090上我用两卡是模型可以加载成功(推理的时候报错),4卡和8卡加载模型就报错了。 |
@chaizhongming chatglm2 目前只能单卡跑,最近修复了一个版本还没有合并,可以支持双卡跑,想支持更多卡跑,还需要更多的适配,请稍等更新,不过chatglm2这种规模的模型,单卡双卡应该是性价比比较高的了。 |
您好,
我使用单块A800进行部署推理时正常,但是使用多卡推理会报错:
`Task exception was never retrieved
future: <Task finished name='Task-6' coro=<RouterManager.loop_for_fwd() done, defined at /lightllm/lightllm/server/router/manager.py:88> exception=EOFError(ConnectionResetError(104, 'Connection reset by peer'))>
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/rpyc/core/stream.py", line 268, in read
buf = self.sock.recv(min(self.MAX_IO_CHUNK, count))
ConnectionResetError: [Errno 104] Connection reset by peer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/lightllm/lightllm/server/router/manager.py", line 91, in loop_for_fwd
await self._step()
File "/lightllm/lightllm/server/router/manager.py", line 112, in _step
await self._prefill_batch(self.running_batch)
File "/lightllm/lightllm/server/router/manager.py", line 149, in _prefill_batch
ans = await asyncio.gather(*rets)
File "/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 227, in prefill_batch
return await ans
File "/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 187, in func
await asyncio.to_thread(ans.wait)
File "/opt/conda/lib/python3.9/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
File "/opt/conda/lib/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/opt/conda/lib/python3.9/site-packages/rpyc/core/async.py", line 51, in wait
self._conn.serve(self._ttl)
File "/opt/conda/lib/python3.9/site-packages/rpyc/core/protocol.py", line 438, in serve
data = self._channel.poll(timeout) and self._channel.recv()
File "/opt/conda/lib/python3.9/site-packages/rpyc/core/channel.py", line 55, in recv
header = self.stream.read(self.FRAME_HEADER.size)
File "/opt/conda/lib/python3.9/site-packages/rpyc/core/stream.py", line 277, in read
raise EOFError(ex)
EOFError: [Errno 104] Connection reset by peer
`
这是什么原因呢
The text was updated successfully, but these errors were encountered: