Server cannot run with 3080 Ti #107

fanglioc · 2023-12-01T17:32:29Z

While trying to deploy the server with:
python -m ochat.serving.openai_api_server --model openchat/openchat_3.5,

I got the following error:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB. GPU 0 has a total capacty of 11.74 GiB of which 140.00 MiB is free. Including non-PyTorch memory, this process has 11.38 GiB memory in use. Of the allocated memory 10.94 GiB is allocated by PyTorch, and 9.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Do i need to somehow specify the quantization?

Thanks!

The text was updated successfully, but these errors were encountered:

mfortini · 2023-12-04T07:17:19Z

Same thing here with a 4060 and 8GB VRAM

sffranke · 2023-12-15T15:02:11Z

Same here with GTX 1060 6 GB / ubuntu 22.04

Altenrion · 2024-03-09T14:32:35Z

Same with Tesla T4 16 GB / ubuntu 22.04
Any updates from anyone?

Hi-Angel · 2024-04-26T16:24:17Z

@imoneoi given online demo has been broken for a while, any chance this problem could get some attention, so people could run the server locally instead?

Hi-Angel mentioned this issue Apr 26, 2024

CUDA out of memory on 8xA100 GPUs #34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server cannot run with 3080 Ti #107

Server cannot run with 3080 Ti #107

fanglioc commented Dec 1, 2023 •

edited

Loading

mfortini commented Dec 4, 2023

sffranke commented Dec 15, 2023

Altenrion commented Mar 9, 2024

Hi-Angel commented Apr 26, 2024

Server cannot run with 3080 Ti #107

Server cannot run with 3080 Ti #107

Comments

fanglioc commented Dec 1, 2023 • edited Loading

mfortini commented Dec 4, 2023

sffranke commented Dec 15, 2023

Altenrion commented Mar 9, 2024

Hi-Angel commented Apr 26, 2024

fanglioc commented Dec 1, 2023 •

edited

Loading