Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda: Out of memory issue on rtx3090 (24GB vram) #3

Closed
dikasterion opened this issue Nov 30, 2022 · 3 comments
Closed

cuda: Out of memory issue on rtx3090 (24GB vram) #3

dikasterion opened this issue Nov 30, 2022 · 3 comments

Comments

@dikasterion
Copy link

dikasterion commented Nov 30, 2022

Hi,
I succesfully managed to run your repo based on SD1.5 model.
now I'm trying to run SD2.0 768 model but I have CUDA: out of memory error.

I have 23 train images (768*768) in 20_person folder under train_person folder.
I tried lowering batch size and disabling cache latents to 0
here's the setting that I put into the powershell with venv(virtual environment)

variable values

$pretrained_model_name_or_path = "D:\stable-diffusion-webui\models\Stable-diffusion\768-v-ema.ckpt"
$data_dir = "D:\kohya_ss\zwx_person_db\train_person"
$logging_dir = "D:\kohya_ss\log"
$output_dir = "D:\kohya_ss\output"
$resolution = "768,768"
$lr_scheduler="polynomial"
$cache_latents = 0 # 1 = true, 0 = false

$image_num = Get-ChildItem $data_dir -Recurse -File -Include *.png, *.jpg, *.webp | Measure-Object | %{$_.Count}

Write-Output "image_num: $image_num"

$dataset_repeats = 2000
$learning_rate = 1e-6
$train_batch_size = 1
$epoch = 1
$save_every_n_epochs=1
$mixed_precision="bf16"
$num_cpu_threads_per_process=6

You should not have to change values past this point

if ($cache_latents -eq 1) {
$cache_latents_value="--cache_latents"
}
else {
$cache_latents_value=""
}

$repeats = $image_num * $dataset_repeats
$mts = [Math]::Ceiling($repeats / $train_batch_size * $epoch)

Write-Output "Repeats: $repeats"

cd D:\kohya_ss
.\venv\Scripts\activate

accelerate launch --num_cpu_threads_per_process $num_cpu_threads_per_process train_db_fixed.py --v2
--v_parameterization --pretrained_model_name_or_path=$pretrained_model_name_or_path
--train_data_dir=$data_dir --output_dir=$output_dir
--resolution=$resolution --train_batch_size=$train_batch_size
--learning_rate=$learning_rate --max_train_steps=$mts
--use_8bit_adam --xformers
--mixed_precision=$mixed_precision $cache_latents_value
--save_every_n_epochs=$save_every_n_epochs --logging_dir=$logging_dir
--save_precision="fp16" --seed=494481440
--lr_scheduler=$lr_scheduler

Add the inference 768v yaml file along with the model for proper loading. Need to have the same name as model... Most likelly "last.yaml" in our case.

cp v2_inference\v2-inference-v.yaml $output_dir"\last.yaml"

and here's the error message I got

steps: 0%| | 0/46000 [00:00<?, ?it/s]epoch 1/100
Traceback (most recent call last):
File "D:\kohya_ss[train_db_fixed.py](http://train_db_fixed.py/)", line 2098, in
train(args)
File "D:\kohya_ss[train_db_fixed.py](http://train_db_fixed.py/)", line 1948, in train
optimizer.step()
File "D:\kohya_ss\venv\lib\site-packages\accelerate[optimizer.py](http://optimizer.py/)", line 134, in step
self.scaler.step(self.optimizer, closure)
File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\cuda\amp[grad_scaler.py](http://grad_scaler.py/)", line 338, in step
retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)
File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\cuda\amp[grad_scaler.py](http://grad_scaler.py/)", line 285, in _maybe_opt_step
retval = optimizer.step(*args, **kwargs)
File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\optim[lr_scheduler.py](http://lr_scheduler.py/)", line 65, in wrapper
return wrapped(*args, **kwargs)
File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\optim[optimizer.py](http://optimizer.py/)", line 113, in wrapper
return func(*args, **kwargs)
File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd[grad_mode.py](http://grad_mode.py/)", line 27, in decorate_context
return func(*args, **kwargs)
File "D:\kohya_ss\venv\lib\site-packages\bitsandbytes\optim[optimizer.py](http://optimizer.py/)", line 263, in step
self.init_state(group, p, gindex, pindex)
File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd[grad_mode.py](http://grad_mode.py/)", line 27, in decorate_context
return func(*args, **kwargs)
File "D:\kohya_ss\venv\lib\site-packages\bitsandbytes\optim[optimizer.py](http://optimizer.py/)", line 401, in init_state
state["state2"] = torch.zeros_like(
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 24.00 GiB total capacity; 10.13 GiB already allocated; 8.91 GiB free; 10.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
steps: 0%| | 0/46000 [00:30<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib[runpy.py](http://runpy.py/)", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Donny\AppData\Local\Programs\Python\Python310\lib[runpy.py](http://runpy.py/)", line 86, in _run_code
exec(code, run_globals)
File "D:\kohya_ss\venv\Scripts\accelerate.exe[main.py](http://main.py/)", line 7, in
File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands[accelerate_cli.py](http://accelerate_cli.py/)", line 45, in main
args.func(args)
File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands[launch.py](http://launch.py/)", line 1069, in launch_command
simple_launcher(args)
File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands[launch.py](http://launch.py/)", line 551, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\kohya_ss\venv\Scripts\python.exe', 'train_db_fixed.py', '--v2', '--v_parameterization', '--pretrained_model_name_or_path=D:\stable-diffusion-webui\models\Stable-diffusion\768-v-ema.ckpt', '--train_data_dir=D:\kohya_ss\zwx_person_db\train_person', '--output_dir=D:\kohya_ss\output', '--resolution=768,768', '--train_batch_size=1', '--learning_rate=1E-06', '--max_train_steps=46000', '--use_8bit_adam', '--xformers', '--mixed_precision=bf16', '--save_every_n_epochs=1', '--logging_dir=D:\kohya_ss\log', '--save_precision=fp16', '--seed=494481440', '--lr_scheduler=polynomial']' returned non-zero exit status 1.

@dikasterion
Copy link
Author

oh I think it's running with global packages, even I've followed all the steps in the readme page...
does anyone have any idea why...?
I have activated ".\venv\Scripts\activate" and run the setting above with the virtual environment...

@toyxyz
Copy link

toyxyz commented Nov 30, 2022

So how about using 8bit adam, xformers? You can reduce vram usage.

@dikasterion
Copy link
Author

So how about using 8bit adam, xformers? You can reduce vram usage.

oh as the setting above, I enabled 8bit anam and xformers.

I finally managed to run v2 768 succesfully. I ran windows powershell with administrator then I can run it with 1 batch size.
It fails with 2 or above but I'm content with what I got.
thank you guys

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants