CUDA out of memory during training #69

CodeWithOz · 2024-06-12T05:58:49Z

I keep getting "CUDA out of memory" during training when finetuning Mistral 7B. My hardware is an NVIDIA A10G single GPU with 24GB GPU memory. The error message looks like this:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 896.00 MiB. GPU 0 has a total capacity of 21.99 GiB of which 759.38 MiB is free. Including non-PyTorch memory, this process has 0 bytes memory in use. Of the allocated memory 20.52 GiB is allocated by PyTorch, and 102.08 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

The value of reserved/unallocated memory ranges between 82MB and 1.08GB, so the error appears to happen both when there's enough and not enough memory.

I've tried the following measures with no success:

setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True as recommended by the error message
reducing max_steps from the default 300 to 200
setting PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 (from here)
reducing max_steps down to first 20, then 16, and ultimately 1
entered the python shell and manually emptied the cuda cache using torch.cuda.empty_cache() before triggering the pytorch run
used a very basic and small training set (containing just 4 lines similar to this one: {"messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "What is the capital of Portugal?"},{"role": "assistant", "content": "Lisbon"}]}.
reducing seq_len to 5000 following this comment

The README says best results require an A100 or H100, but single GPU machines can work with Mistral 7B. Given that I've tried to minimize so many parameters, is it really the case that bigger hardware is the only way forward?

The text was updated successfully, but these errors were encountered:

CodeWithOz · 2024-06-17T06:08:16Z

UPDATE: I tried using an A100 single GPU with 40GB GPU memory and the same error happened. Seems there's a memory leak somewhere because the process just used up all the available memory. Here's the updated error message:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.00 GiB. GPU 0 has a total capacity of 39.39 GiB of which 1.33 GiB is free. Process 35938 has 38.05 GiB memory in use. Of the allocated memory 35.03 GiB is allocated by PyTorch, and 2.21 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

CodeWithOz · 2024-06-17T06:09:37Z

I'm renting my GPUs from brev.dev by the way.

zazabap · 2024-06-25T15:35:46Z

Is this issue resolved? I kind of encounter exactly same issue.

matheus-prandini · 2024-06-27T00:34:42Z

@CodeWithOz @zazabap Was the GPU running only the training? Can you provide the command you used to run the training? Additionally, what libraries and versions are you using?

C3po-D2rd2 · 2024-08-20T12:25:14Z

Hello,
It seems I have the same issue:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 896.00 MiB. GPU 0 has a total capacity of 21.95 GiB of which 806.12 MiB is free. Including non-PyTorch memory, this process has 21.16 GiB memory in use. Of the allocated memory 19.53 GiB is allocated by PyTorch, and 1.08 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Did you find a solution?

I am on NVIDIA L4 with 24Go RAM

C3po-D2rd2 · 2024-08-20T14:11:32Z

for record, I could manage by setting seq_len: 4192 #32768

danigarciaoca · 2024-11-14T17:24:38Z

Hi! For me it worked export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory during training #69

CUDA out of memory during training #69

CodeWithOz commented Jun 12, 2024

CodeWithOz commented Jun 17, 2024

CodeWithOz commented Jun 17, 2024

zazabap commented Jun 25, 2024

matheus-prandini commented Jun 27, 2024 •

edited

Loading

C3po-D2rd2 commented Aug 20, 2024 •

edited

Loading

C3po-D2rd2 commented Aug 20, 2024

danigarciaoca commented Nov 14, 2024

CUDA out of memory during training #69

CUDA out of memory during training #69

Comments

CodeWithOz commented Jun 12, 2024

CodeWithOz commented Jun 17, 2024

CodeWithOz commented Jun 17, 2024

zazabap commented Jun 25, 2024

matheus-prandini commented Jun 27, 2024 • edited Loading

C3po-D2rd2 commented Aug 20, 2024 • edited Loading

C3po-D2rd2 commented Aug 20, 2024

danigarciaoca commented Nov 14, 2024

matheus-prandini commented Jun 27, 2024 •

edited

Loading

C3po-D2rd2 commented Aug 20, 2024 •

edited

Loading