You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While trying to deploy the server with: python -m ochat.serving.openai_api_server --model openchat/openchat_3.5,
I got the following error:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB. GPU 0 has a total capacty of 11.74 GiB of which 140.00 MiB is free. Including non-PyTorch memory, this process has 11.38 GiB memory in use. Of the allocated memory 10.94 GiB is allocated by PyTorch, and 9.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Do i need to somehow specify the quantization?
Thanks!
The text was updated successfully, but these errors were encountered:
@imoneoi given online demo has been broken for a while, any chance this problem could get some attention, so people could run the server locally instead?
While trying to deploy the server with:
python -m ochat.serving.openai_api_server --model openchat/openchat_3.5
,I got the following error:
Do i need to somehow specify the quantization?
Thanks!
The text was updated successfully, but these errors were encountered: