Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server cannot run with 3080 Ti #107

Open
fanglioc opened this issue Dec 1, 2023 · 4 comments
Open

Server cannot run with 3080 Ti #107

fanglioc opened this issue Dec 1, 2023 · 4 comments

Comments

@fanglioc
Copy link

fanglioc commented Dec 1, 2023

While trying to deploy the server with:
python -m ochat.serving.openai_api_server --model openchat/openchat_3.5,

I got the following error:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB. GPU 0 has a total capacty of 11.74 GiB of which 140.00 MiB is free. Including non-PyTorch memory, this process has 11.38 GiB memory in use. Of the allocated memory 10.94 GiB is allocated by PyTorch, and 9.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Do i need to somehow specify the quantization?

Thanks!

@mfortini
Copy link

mfortini commented Dec 4, 2023

Same thing here with a 4060 and 8GB VRAM

@sffranke
Copy link

Same here with GTX 1060 6 GB / ubuntu 22.04

@Altenrion
Copy link

Same with Tesla T4 16 GB / ubuntu 22.04
Any updates from anyone?

@Hi-Angel
Copy link

@imoneoi given online demo has been broken for a while, any chance this problem could get some attention, so people could run the server locally instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants