-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA out of memory during training #69
Comments
UPDATE: I tried using an A100 single GPU with 40GB GPU memory and the same error happened. Seems there's a memory leak somewhere because the process just used up all the available memory. Here's the updated error message:
|
I'm renting my GPUs from brev.dev by the way. |
Is this issue resolved? I kind of encounter exactly same issue. |
@CodeWithOz @zazabap Was the GPU running only the training? Can you provide the command you used to run the training? Additionally, what libraries and versions are you using? |
Hello, Did you find a solution? I am on NVIDIA L4 with 24Go RAM |
for record, I could manage by setting seq_len: 4192 #32768 |
Hi! For me it worked export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True |
I keep getting "CUDA out of memory" during training when finetuning Mistral 7B. My hardware is an NVIDIA A10G single GPU with 24GB GPU memory. The error message looks like this:
The value of reserved/unallocated memory ranges between 82MB and 1.08GB, so the error appears to happen both when there's enough and not enough memory.
I've tried the following measures with no success:
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
as recommended by the error messagemax_steps
from the default 300 to 200PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
(from here)max_steps
down to first 20, then 16, and ultimately 1torch.cuda.empty_cache()
before triggering the pytorch run{"messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "What is the capital of Portugal?"},{"role": "assistant", "content": "Lisbon"}]}
.seq_len
to 5000 following this commentThe README says best results require an A100 or H100, but single GPU machines can work with Mistral 7B. Given that I've tried to minimize so many parameters, is it really the case that bigger hardware is the only way forward?
The text was updated successfully, but these errors were encountered: