[Usage]: How to release GPU of vLLM model in python code

### Your current environment

```python
llm = LLM(model=model1_path, tensor_parallel_size=torch.cuda.device_count())
llm = LLM(model=model2_path, tensor_parallel_size=torch.cuda.device_count())
```

It will cause CUDA out of memory when execute the second line.


### How would you like to use vllm

I want to use two model in pipeline in one python code to infer. When finish inference on the first model, how to release this model and release GPU memory to load another one, since directly reloading may cause CUDA OUT OF MEMORY for it doesn't release the first one.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Usage]: How to release GPU of vLLM model in python code #6544

Your current environment

How would you like to use vllm

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Usage]: How to release GPU of vLLM model in python code #6544

Description

Your current environment

How would you like to use vllm

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions