cuda: Out of memory issue on rtx3090 (24GB vram) #3

dikasterion · 2022-11-30T05:45:29Z

Hi,
I succesfully managed to run your repo based on SD1.5 model.
now I'm trying to run SD2.0 768 model but I have CUDA: out of memory error.

I have 23 train images (768*768) in 20_person folder under train_person folder.
I tried lowering batch size and disabling cache latents to 0
here's the setting that I put into the powershell with venv(virtual environment)

variable values

$pretrained_model_name_or_path = "D:\stable-diffusion-webui\models\Stable-diffusion\768-v-ema.ckpt"
$data_dir = "D:\kohya_ss\zwx_person_db\train_person"
$logging_dir = "D:\kohya_ss\log"
$output_dir = "D:\kohya_ss\output"
$resolution = "768,768"
$lr_scheduler="polynomial"
$cache_latents = 0 # 1 = true, 0 = false

$image_num = Get-ChildItem $data_dir -Recurse -File -Include *.png, *.jpg, *.webp | Measure-Object | %{$_.Count}

Write-Output "image_num: $image_num"

$dataset_repeats = 2000
$learning_rate = 1e-6
$train_batch_size = 1
$epoch = 1
$save_every_n_epochs=1
$mixed_precision="bf16"
$num_cpu_threads_per_process=6

You should not have to change values past this point

if ($cache_latents -eq 1) {
$cache_latents_value="--cache_latents"
}
else {
$cache_latents_value=""
}

$repeats = $image_num * $dataset_repeats
$mts = [Math]::Ceiling($repeats / $train_batch_size * $epoch)

Write-Output "Repeats: $repeats"

cd D:\kohya_ss
.\venv\Scripts\activate

accelerate launch --num_cpu_threads_per_process $num_cpu_threads_per_process train_db_fixed.py --v2
--v_parameterization --pretrained_model_name_or_path=$pretrained_model_name_or_path
--train_data_dir=$data_dir --output_dir=$output_dir
--resolution=$resolution --train_batch_size=$train_batch_size
--learning_rate=$learning_rate --max_train_steps=$mts
--use_8bit_adam --xformers
--mixed_precision=$mixed_precision $cache_latents_value
--save_every_n_epochs=$save_every_n_epochs --logging_dir=$logging_dir
--save_precision="fp16" --seed=494481440
--lr_scheduler=$lr_scheduler

Add the inference 768v yaml file along with the model for proper loading. Need to have the same name as model... Most likelly "last.yaml" in our case.

cp v2_inference\v2-inference-v.yaml $output_dir"\last.yaml"

and here's the error message I got

steps: 0%| | 0/46000 [00:00<?, ?it/s]epoch 1/100
Traceback (most recent call last):
File "D:\kohya_ss[train_db_fixed.py](http://train_db_fixed.py/)", line 2098, in
train(args)
File "D:\kohya_ss[train_db_fixed.py](http://train_db_fixed.py/)", line 1948, in train
optimizer.step()
File "D:\kohya_ss\venv\lib\site-packages\accelerate[optimizer.py](http://optimizer.py/)", line 134, in step
self.scaler.step(self.optimizer, closure)
File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\cuda\amp[grad_scaler.py](http://grad_scaler.py/)", line 338, in step
retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)
File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\cuda\amp[grad_scaler.py](http://grad_scaler.py/)", line 285, in _maybe_opt_step
retval = optimizer.step(*args, **kwargs)
File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\optim[lr_scheduler.py](http://lr_scheduler.py/)", line 65, in wrapper
return wrapped(*args, **kwargs)
File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\optim[optimizer.py](http://optimizer.py/)", line 113, in wrapper
return func(*args, **kwargs)
File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd[grad_mode.py](http://grad_mode.py/)", line 27, in decorate_context
return func(*args, **kwargs)
File "D:\kohya_ss\venv\lib\site-packages\bitsandbytes\optim[optimizer.py](http://optimizer.py/)", line 263, in step
self.init_state(group, p, gindex, pindex)
File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd[grad_mode.py](http://grad_mode.py/)", line 27, in decorate_context
return func(*args, **kwargs)
File "D:\kohya_ss\venv\lib\site-packages\bitsandbytes\optim[optimizer.py](http://optimizer.py/)", line 401, in init_state
state["state2"] = torch.zeros_like(
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 24.00 GiB total capacity; 10.13 GiB already allocated; 8.91 GiB free; 10.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
steps: 0%| | 0/46000 [00:30<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib[runpy.py](http://runpy.py/)", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib[runpy.py](http://runpy.py/)", line 86, in _run_code
exec(code, run_globals)
File "D:\kohya_ss\venv\Scripts\accelerate.exe[main.py](http://main.py/)", line 7, in
File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands[accelerate_cli.py](http://accelerate_cli.py/)", line 45, in main
args.func(args)
File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands[launch.py](http://launch.py/)", line 1069, in launch_command
simple_launcher(args)
File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands[launch.py](http://launch.py/)", line 551, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\kohya_ss\venv\Scripts\python.exe', 'train_db_fixed.py', '--v2', '--v_parameterization', '--pretrained_model_name_or_path=D:\stable-diffusion-webui\models\Stable-diffusion\768-v-ema.ckpt', '--train_data_dir=D:\kohya_ss\zwx_person_db\train_person', '--output_dir=D:\kohya_ss\output', '--resolution=768,768', '--train_batch_size=1', '--learning_rate=1E-06', '--max_train_steps=46000', '--use_8bit_adam', '--xformers', '--mixed_precision=bf16', '--save_every_n_epochs=1', '--logging_dir=D:\kohya_ss\log', '--save_precision=fp16', '--seed=494481440', '--lr_scheduler=polynomial']' returned non-zero exit status 1.

The text was updated successfully, but these errors were encountered:

dikasterion · 2022-11-30T06:43:17Z

oh I think it's running with global packages, even I've followed all the steps in the readme page...
does anyone have any idea why...?
I have activated ".\venv\Scripts\activate" and run the setting above with the virtual environment...

toyxyz · 2022-11-30T20:28:23Z

So how about using 8bit adam, xformers? You can reduce vram usage.

dikasterion · 2022-12-01T06:07:08Z

So how about using 8bit adam, xformers? You can reduce vram usage.

oh as the setting above, I enabled 8bit anam and xformers.

I finally managed to run v2 768 succesfully. I ran windows powershell with administrator then I can run it with 1 batch size.
It fails with 2 or above but I'm content with what I got.
thank you guys

Merge requirements

dikasterion closed this as completed Dec 1, 2022

Cauldrath pushed a commit to Cauldrath/kohya_ss that referenced this issue Apr 5, 2023

Merge pull request bmaltais#3 from bmaltais/merge-requirements

dfbecbc

Merge requirements

Aniket22156 mentioned this issue Jun 1, 2023

terminating due to uncaught exception of type c10::TypeError: Trying to convert BFloat16 to the MPS backend but it does not have support for that dtype. #882

Closed

AbstractEyes mentioned this issue Oct 14, 2024

[rank2]:[W1014 03:47:39.789025589 socket.cpp:428] [c10d] While waitForInput, poolFD failed with (errno: 0 - Success). #2899

Open

AbstractEyes mentioned this issue Oct 28, 2024

Technical problem apparently. #2933

Open

Red-Scarff mentioned this issue Feb 12, 2025

Training stuck at epoch 1 steps 0 #3081

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda: Out of memory issue on rtx3090 (24GB vram) #3

cuda: Out of memory issue on rtx3090 (24GB vram) #3

dikasterion commented Nov 30, 2022 •

edited

Loading

variable values

You should not have to change values past this point

Add the inference 768v yaml file along with the model for proper loading. Need to have the same name as model... Most likelly "last.yaml" in our case.

dikasterion commented Nov 30, 2022

toyxyz commented Nov 30, 2022

dikasterion commented Dec 1, 2022

cuda: Out of memory issue on rtx3090 (24GB vram) #3

cuda: Out of memory issue on rtx3090 (24GB vram) #3

Comments

dikasterion commented Nov 30, 2022 • edited Loading

variable values

You should not have to change values past this point

Add the inference 768v yaml file along with the model for proper loading. Need to have the same name as model... Most likelly "last.yaml" in our case.

dikasterion commented Nov 30, 2022

toyxyz commented Nov 30, 2022

dikasterion commented Dec 1, 2022

dikasterion commented Nov 30, 2022 •

edited

Loading