Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

用qwen模型的报错,也不知道是不是和模型有关。docker部署源码部署都如此 #166

Closed
k2o333 opened this issue Aug 17, 2024 · 10 comments

Comments

@k2o333
Copy link

k2o333 commented Aug 17, 2024

==========
== CUDA ==

CUDA Version 12.4.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .

Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/root/mindsearch/app.py", line 11, in
from lagent.schema import AgentStatusCode
File "/opt/py3/lib/python3.10/site-packages/lagent/init.py", line 2, in
from .actions import * # noqa: F401, F403
File "/opt/py3/lib/python3.10/site-packages/lagent/actions/init.py", line 3, in
from .action_executor import ActionExecutor
File "/opt/py3/lib/python3.10/site-packages/lagent/actions/action_executor.py", line 4, in
from .base_action import BaseAction
File "/opt/py3/lib/python3.10/site-packages/lagent/actions/base_action.py", line 16, in
from griffe.enumerations import DocstringSectionKind
ModuleNotFoundError: No module named 'griffe.enumerations'

@findziliao
Copy link

griffe版本需要降级,在backend.dockerfile里加一句:RUN pip install --no-cache-dir -U griffe==0.48.0

@k2o333
Copy link
Author

k2o333 commented Aug 17, 2024

griffe版本需要降级,在backend.dockerfile里加一句:RUN pip install --no-cache-dir -U griffe==0.48.0

谢谢,griffe解决了,不过打开前端输入问题后,后端日志没变化

INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8002 (Press CTRL+C to quit)

@lcolok
Copy link
Contributor

lcolok commented Aug 19, 2024

griffe版本需要降级,在backend.dockerfile里加一句:RUN pip install --no-cache-dir -U griffe==0.48.0

谢谢,griffe解决了,不过打开前端输入问题后,后端日志没变化

INFO: Started server process [1] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8002 (Press CTRL+C to quit)

感谢反馈问题,请排查一下是不是跨域的问题:

https://github.com/InternLM/MindSearch/blob/main/docker/README_zh-CN.md#%E8%B7%A8%E5%9F%9F%E8%AE%BF%E9%97%AE%E6%B3%A8%E6%84%8F%E4%BA%8B%E9%A1%B9

方便的话, 请附上浏览器中的调控台的报错信息给我们排查。

@xs818818
Copy link

[TM][WARNING] [LlamaTritonModel] max_context_token_num = 32776.
2024-08-19 21:33:39,395 - lmdeploy - WARNING - get 227 model params
[WARNING] gemm_config.in is not found; using default GEMM algo
HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!!
HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!!
HINT: Please open http://0.0.0.0:23333 in a browser for detailed api usage!!!
INFO: Started server process [2992081]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:23333 (Press CTRL+C to quit)
INFO: 127.0.0.1:49862 - "GET /v1/models HTTP/1.1" 200 OK
Launched the api_server in process 2992081, user can kill the server by:
import os,signal
os.kill(2992081, signal.SIGKILL)
INFO: 127.0.0.1:49870 - "POST /v1/completions HTTP/1.1" 200 OK
terminate called after throwing an instance of 'std::runtime_error'
what(): [TM][ERROR] Assertion fail: /lmdeploy/src/turbomind/kernels/attention/attention.cu:35

/usr/lib/python3/dist-packages/apport/report.py:13: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import fnmatch, glob, traceback, errno, sys, atexit, imp, stat
Traceback (most recent call last):
File "/home/xs/.local/lib/python3.8/site-packages/requests/models.py", line 820, in generate
yield from self.raw.stream(chunk_size, decode_content=True)
File "/home/xs/.local/lib/python3.8/site-packages/urllib3/response.py", line 1057, in stream
yield from self.read_chunked(amt, decode_content=decode_content)
File "/home/xs/.local/lib/python3.8/site-packages/urllib3/response.py", line 1206, in read_chunked
self._update_chunk_length()
File "/home/xs/.local/lib/python3.8/site-packages/urllib3/response.py", line 1136, in _update_chunk_length
raise ProtocolError("Response ended prematurely") from None
urllib3.exceptions.ProtocolError: Response ended prematurely

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/xs/MindSearch/mindsearch/terminal.py", line 49, in
for agent_return in agent.stream_chat('上海今天适合穿什么衣服'):
File "/home/xs/MindSearch/mindsearch/agent/mindsearch_agent.py", line 214, in stream_chat
for model_state, response, _ in self.llm.stream_chat(
File "/home/xs/.local/lib/python3.8/site-packages/lagent/llms/lmdeploy_wrapper.py", line 411, in stream_chat
for text in self.client.completions_v1(
File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/serve/openai/api_client.py", line 299, in completions_v1
for chunk in response.iter_lines(chunk_size=8192,
File "/home/xs/.local/lib/python3.8/site-packages/requests/models.py", line 869, in iter_lines
for chunk in self.iter_content(
File "/home/xs/.local/lib/python3.8/site-packages/requests/models.py", line 822, in generate
raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: Response ended prematurely
问题我用的qwen接口

@xs818818
Copy link

python3 -m mindsearch.app --lang cn --model_format qwen --search_engine BingSearch时

@xs818818
Copy link

python3 -m mindsearch.terminal
[TM][WARNING] [LlamaTritonModel] max_context_token_num = 32776.
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/serve/openai/api_server.py", line 1285, in serve
VariableInterface.async_engine = pipeline_class(
File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/serve/async_engine.py", line 190, in init
self._build_turbomind(model_path=model_path,
File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/serve/async_engine.py", line 235, in _build_turbomind
self.engine = tm.TurboMind.from_pretrained(
File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 340, in from_pretrained
return cls(model_path=pretrained_model_name_or_path,
File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 144, in init
self.model_comm = self._from_hf(model_source=model_source,
File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 251, in _from_hf
self._create_weight(model_comm)
File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 170, in _create_weight
future.result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 163, in _create_weight_func
model_comm.create_shared_weights(device_id, rank)
RuntimeError: [TM][ERROR] CUDA runtime error: out of memory /lmdeploy/src/turbomind/utils/memory_utils.cu:32
我用的qwen接口,可是默认还是要加载模型

@k2o333
Copy link
Author

k2o333 commented Aug 19, 2024

python3 -m mindsearch.app --lang cn --model_format qwen --search_engine BingSearch时

python3 -m mindsearch.terminal [TM][WARNING] [LlamaTritonModel] max_context_token_num = 32776. Process Process-1: Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/serve/openai/api_server.py", line 1285, in serve VariableInterface.async_engine = pipeline_class( File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/serve/async_engine.py", line 190, in init self._build_turbomind(model_path=model_path, File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/serve/async_engine.py", line 235, in _build_turbomind self.engine = tm.TurboMind.from_pretrained( File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 340, in from_pretrained return cls(model_path=pretrained_model_name_or_path, File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 144, in init self.model_comm = self._from_hf(model_source=model_source, File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 251, in _from_hf self._create_weight(model_comm) File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 170, in _create_weight future.result() File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result return self.__get_result() File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result raise self._exception File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/home/xs/.local/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 163, in _create_weight_func model_comm.create_shared_weights(device_id, rank) RuntimeError: [TM][ERROR] CUDA runtime error: out of memory /lmdeploy/src/turbomind/utils/memory_utils.cu:32 我用的qwen接口,可是默认还是要加载模型

我是docker启动的,然后因为用外部模型,我把部署gpu的部分删除了,是不是因此就没有反应了呢?

@lcolok
Copy link
Contributor

lcolok commented Aug 19, 2024

@k2o333 最新的优化应该能解决您的问题:#170 ,等待仓库管理员测试并合并代码就可以了。

@lcolok
Copy link
Contributor

lcolok commented Aug 19, 2024

@xs818818 使用 Qwen 模型的方法我也没有跑通,应该是 mindsearch/agent 这模块下的逻辑问题,包括采用 SiliconFlow 的 API 的情况下,我也只能跑通使用 internlm/internlm2_5-7b-chat 这个模型的情况。

@mengrennwpu
Copy link

@xs818818 使用 Qwen 模型的方法我也没有跑通,应该是 mindsearch/agent 这模块下的逻辑问题,包括采用 SiliconFlow 的 API 的情况下,我也只能跑通使用 internlm/internlm2_5-7b-chat 这个模型的情况。

@lcolok 其他模型没有跑通很正常,因为当前internlm/internlm2_5-7b-chat 针对这个搜索RAG的场景是微调过的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants