Skip to content

Qwen3-Reranker-8B模型使用后显存不释放 #4045

@wangchengxiangwangchengxiang

Description

System Info / 系統信息

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.51.03 Driver Version: 575.51.03 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX A6000 Off | 00000000:01:00.0 Off | Off |
| 48% 69C P2 153W / 300W | 43193MiB / 49140MiB | 14% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 226747 C Model: Qwen3-Embedding-8B-0 14890MiB |
| 0 N/A N/A 468632 C ...party/bin/llama-box/llama-box 898MiB |
| 0 N/A N/A 469101 C /usr/bin/python3 1434MiB |
| 0 N/A N/A 2499695 C ...lama-box/llama-box-rpc-server 260MiB |
| 0 N/A N/A 2807504 C Model: Qwen3-Reranker-8B-0 19392MiB |
| 0 N/A N/A 3292392 C python3 1228MiB |
| 0 N/A N/A 3292394 C python3 2852MiB |
| 0 N/A N/A 3695965 C /usr/bin/python3 2190MiB |

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

Version info / 版本信息

xinference:v1.9.0

The command used to start Xinference / 用以启动 xinference 的命令

xinference launch --model-name Qwen3-Reranker-8B --model-type rerank --replica 1 --n-gpu auto --model-engine vllm --model-format pytorch --quantization none

Reproduction / 复现过程

使用docker部署xinference:v1.9.0 后,加载重排序模型Qwen3-Reranker-8B。经过一段时间的使用发现显存泄漏问题。

Image

Expected behavior / 期待表现

显存占用能稳定在一定区间内

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions