Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] serve的时候event loop报错 #2101

Open
3 tasks done
cmpute opened this issue Jul 22, 2024 · 31 comments
Open
3 tasks done

[Bug] serve的时候event loop报错 #2101

cmpute opened this issue Jul 22, 2024 · 31 comments
Assignees

Comments

@cmpute
Copy link
Contributor

cmpute commented Jul 22, 2024

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

在gradio上使用VLM的时候,第一轮图文对话可以正常完成,但是第二轮(也是图文)就会报错,试了几次都是这样。貌似是多轮对话中输入多次图片会有问题。

Reproduction

lmdeploy 0.5.1从wheel安装,使用的模型是InternVL2-2B-AWQ

Environment

sys.platform: linux
Python: 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: Quadro RTX 5000
CUDA_HOME: None
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 2.2.2+cu118
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.2 (Git Hash 2dc95a2ad0841e29db8b22fbccaf3e5da7992b01)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.8
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.7
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.2.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

TorchVision: 0.17.2+cu118
LMDeploy: 0.5.1+
transformers: 4.42.4
gradio: 4.38.1
fastapi: 0.111.1
pydantic: 2.8.2
triton: 2.2.0
NVIDIA Topology: 
        GPU0    CPU Affinity    NUMA Affinity
GPU0     X      0-15            N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

Error traceback

Traceback (most recent call last):
  File "/home/jacobz/.conda/envs/lmdeploy/lib/python3.8/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/home/jacobz/.conda/envs/lmdeploy/lib/python3.8/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/jacobz/.conda/envs/lmdeploy/lib/python3.8/site-packages/gradio/blocks.py", line 1897, in process_api
    result = await self.call_function(
  File "/home/jacobz/.conda/envs/lmdeploy/lib/python3.8/site-packages/gradio/blocks.py", line 1495, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/home/jacobz/.conda/envs/lmdeploy/lib/python3.8/site-packages/gradio/utils.py", line 661, in async_iteration
    return await iterator.__anext__()
  File "/home/jacobz/.conda/envs/lmdeploy/lib/python3.8/site-packages/gradio/utils.py", line 654, in __anext__
    return await anyio.to_thread.run_sync(
  File "/home/jacobz/.conda/envs/lmdeploy/lib/python3.8/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/jacobz/.conda/envs/lmdeploy/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "/home/jacobz/.conda/envs/lmdeploy/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "/home/jacobz/.conda/envs/lmdeploy/lib/python3.8/site-packages/gradio/utils.py", line 637, in run_sync_iterator_async
    return next(iterator)
  File "/home/jacobz/.conda/envs/lmdeploy/lib/python3.8/site-packages/gradio/utils.py", line 799, in gen_wrapper
    response = next(iterator)
  File "/home/jacobz/.conda/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/serve/gradio/vl.py", line 119, in chat
    inputs = _run_until_complete(
  File "/home/jacobz/.conda/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/pytorch/engine/request.py", line 78, in _run_until_complete
    return event_loop.run_until_complete(future)
  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
  File "/home/jacobz/.conda/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/serve/vl_async_engine.py", line 66, in _get_prompt_input
    features = await self.vl_encoder.async_infer(images)
  File "/home/jacobz/.conda/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/vl/engine.py", line 171, in async_infer
    self.req_que.put_nowait(item)
  File "/home/jacobz/.conda/envs/lmdeploy/lib/python3.8/site-packages/lmdeploy/vl/engine.py", line 124, in req_que
    raise RuntimeError('Current event loop is different from'
RuntimeError: Current event loop is different from the one bound to loop task!
@iWasOmen
Copy link

补充下,我这里第一轮输入图+问,后续轮次使用纯文字输出正常,但是点击reset以后,重新上传图片提问,也会报同样的错误

@lvhan028 lvhan028 assigned AllentDan and unassigned irexyc Jul 22, 2024
@irexyc
Copy link
Collaborator

irexyc commented Jul 22, 2024

似乎gradio使用了不同的event loop,pytorch backend 应该也会有类似的问题。

@AllentDan
Copy link
Collaborator

AllentDan commented Jul 22, 2024

gradio 4.0 后引入了好几个问题了。#2103 reset 改成用新session就没问题了。

@iWasOmen 试试可以先用起来

@yaaisinile
Copy link

gradio 4.0 后引入了好几个问题了。#2103 reset 改成用新session就没问题了。

@iWasOmen 试试可以先用起来

我在internlm-xcomposer2d5-7b-4bit使用这个方法问题没有改善

@AllentDan
Copy link
Collaborator

我试了下是OK的,你是怎么操作的。 @yaaisinile

@yaaisinile
Copy link

我试了下是OK的,你是怎么操作的。 @yaaisinile

按你提交的修改内容修改了文件,然后执行命令python gradio_demo/gradio_demo_chat.py --code_path /home/ai/Documents/InternLM-XComposer/internlm-xcomposer2d5-7b-4bit/
打开demo 127.0.0.1:6006,第一轮上传图片后提问正常,再传一张图片后就报错误了

@AllentDan
Copy link
Collaborator

好像 pip uninstall uvloop,代码就都能跑了

@yaaisinile
Copy link

uvloop,代码就都能跑了

我把engine.py第 101 行 asyncio.set_event_loop(self._loop) 改为 asyncio.set_event_loop(asyncio.new_event_loop()) 后可以多轮对话不报错了

@yaaisinile
Copy link

好像 pip uninstall uvloop,代码就都能跑了

感谢回复,实测卸载uvloop也有效

@cmpute
Copy link
Contributor Author

cmpute commented Jul 23, 2024

gradio 4.0 后引入了好几个问题了。#2103 reset 改成用新session就没问题了。

@iWasOmen 试试可以先用起来

这个pr能修复多轮多图对话的问题吗?我卸载uvloop后多轮多图对话还是会报错

@yaaisinile
Copy link

gradio 4.0 后引入了好几个问题了。#2103 reset 改成用新session就没问题了。
@iWasOmen 试试可以先用起来

这个pr能修复多轮多图对话的问题吗?我卸载uvloop后多轮多图对话还是会报错

试下把
engine.py第 101 行 asyncio.set_event_loop(self._loop) 改为 asyncio.set_event_loop(asyncio.new_event_loop())

@cmpute
Copy link
Contributor Author

cmpute commented Jul 23, 2024

gradio 4.0 后引入了好几个问题了。#2103 reset 改成用新session就没问题了。
@iWasOmen 试试可以先用起来

这个pr能修复多轮多图对话的问题吗?我卸载uvloop后多轮多图对话还是会报错

试下把 engine.py第 101 行 asyncio.set_event_loop(self._loop) 改为 asyncio.set_event_loop(asyncio.new_event_loop())

这个也不行,还是报错

@AllentDan
Copy link
Collaborator

报错内容呢?

@cmpute
Copy link
Contributor Author

cmpute commented Jul 23, 2024

报错是一样的

@cmpute
Copy link
Contributor Author

cmpute commented Jul 23, 2024

会跟Python版本有关吗?我看asyncio.Queue的构造函数在3.10有变化

@yaaisinile
Copy link

gradio 4.0 后引入了好几个问题了。#2103 reset 改成用新session就没问题了。
@iWasOmen 试试可以先用起来

这个pr能修复多轮多图对话的问题吗?我卸载uvloop后多轮多图对话还是会报错

试下把 engine.py第 101 行 asyncio.set_event_loop(self._loop) 改为 asyncio.set_event_loop(asyncio.new_event_loop())

这个也不行,还是报错

我试完卸载uvloop后也不行了,再装uvloop都不行

@yaaisinile
Copy link

gradio 4.0 后引入了好几个问题了。#2103 reset 改成用新session就没问题了。
@iWasOmen 试试可以先用起来

这个pr能修复多轮多图对话的问题吗?我卸载uvloop后多轮多图对话还是会报错

试下把 engine.py第 101 行 asyncio.set_event_loop(self._loop) 改为 asyncio.set_event_loop(asyncio.new_event_loop())

这个也不行,还是报错

修改engine.py文件,把128-129行屏蔽掉加一行self._create_event_loop_task()可以暂时避免,不知道会影响多用户使用不

@77h2l
Copy link

77h2l commented Jul 29, 2024

同样的错误,把InternVL2-4B 部署成服务上线的时候,会报 Current event loop is different from the one bound to loop task! 错误

@AllentDan
Copy link
Collaborator

一劳永逸的方法是 #1930 修改的内容改回去,用低版本的 gradio。

@fabro66
Copy link

fabro66 commented Jul 30, 2024

一劳永逸的方法是 #1930 修改的内容改回去,用低版本的 gradio。

改回去了,用gradio 3.50.2,还是报“RuntimeError: Current event loop is different from the one bound to loop task!”

@AllentDan
Copy link
Collaborator

@irexyc 帮忙看下?

@77h2l
Copy link

77h2l commented Jul 30, 2024

没有用到gradio,利用lmdeploy去推InternVL2的模型,无论backend设置成torch或者turbomind,部署成服务,调用的时候都会遇到这个报错,请问是否是lmdeploy某个版本更新之后导致的错误?目前尝试了最新的0.5.2 0.5.2.post 都有这个问题

@irexyc
Copy link
Collaborator

irexyc commented Jul 31, 2024

@77h2l
如果说不是用的lmdeploy本身的服务功能,而是将pipeline接口封装为服务的话。

需要在创建pipeline的时候,增加参数 pipe = pipeline('...', vision_config=VisonConfig(thread_safe=True)), pytorch backend也会有类似的问题。

另外如果调用的是 call、stream_infer 接口的话,因为目前没有提供session_id的参数,多个请求可能并不会有迸发。

@77h2l
Copy link

77h2l commented Jul 31, 2024

@77h2l 如果说不是用的lmdeploy本身的服务功能,而是将pipeline接口封装为服务的话。

需要在创建pipeline的时候,增加参数 pipe = pipeline('...', vision_config=VisonConfig(thread_safe=True)), pytorch backend也会有类似的问题。

另外如果调用的是 call、stream_infer 接口的话,因为目前没有提供session_id的参数,多个请求可能并不会有迸发。

对的,目前的应用场景就是使用pipeline的接口,然后在外层通过其他专门的serving框架用来部署服务,尝试了几个版本都有这个问题,我试一下您说的这个参数,另外降低lmdeploy到更低的版本能解决这个问题吗?

@irexyc
Copy link
Collaborator

irexyc commented Jul 31, 2024

@77h2l
就针对event_loop 来讲,PytorchEngineConfig / VisionConfig 都需要设置这个参数,降版本没意义,因为这个参数就是之前的功能。

出现这个问题应该是你多线程使用了。如果你能用协程的话,可以直接用 pipeline.generate 这个入口。

@77h2l
Copy link

77h2l commented Jul 31, 2024

@irexyc 您好,self.pipe = pipeline(self.model, model_name=self.model_name, chat_template_config=self.chat_template_config,
backend_config=self.backend_config, vision_config=VisionConfig(thread_safe=True))
vision_config这个参数设置了以后,重新部署的服务接口,推理部分直接超时不返回结果了,请问在类似多线程的环境下,报错和超时这两个问题该如何避免呢?

@HelloWarcraft
Copy link

同样的问题,期待下一个版本能解决这个问题

@ltt-gddxz
Copy link

ltt-gddxz commented Nov 20, 2024

同遇到这个问题,就是本地模拟用python双线程去调用pipeline预测,就会出现这个问题同样的报错,试了上面的都不行,大佬们现在有解决办法吗?

@AllentDan
Copy link
Collaborator

@ltt-gddxz 可以给个最小复现脚本。

@ltt-gddxz
Copy link

ltt-gddxz commented Nov 21, 2024

@AllentDan
抱歉,我这不好提供完整代码,这里给出主程序部分,inference函数就是基于lmdeploy的pipeline进行预测(用的InternVL2-1B, 可参考官方推理代码),大佬这边应该可以复现。

  • demo.py
if __name__ == '__main__':
    thread_num = 2
    threads = []
    for i in range(thread_num):
        thread = threading.Thread(target=inference)
        threads.append(thread)
        thread.start()
    for thread in threads:
        thread.join()
    print("Main finished.")
  • main requirements
torch==2.3.1
transformers==4.46.3
lmdeploy==0.6.3
  • error
File "/usr/local/lib/python3.8/dist-packages/lmdeploy/vl/engine.py", line 129, in req_que
    raise RuntimeError('Current event loop is different from'
RuntimeError: Current event loop is different from the one bound to loop task!

@AllentDan
Copy link
Collaborator

VL 这里的代码不建议多线程,最好用协程。调用 generate 接口函数,效率也高

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants