-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deep speed #1139
Deep speed #1139
Conversation
…ccelerate settings
support deepspeed
I tested new branch with some of settings. It seems like even if SD-variants(cascade, SD-3, etc.) come out later, they will work well with wrapping. |
Hey @BootsofLagrangian |
Can you attach your bash script or toml config file? |
@BootsofLagrangian compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
gradient_accumulation_steps: 1
offload_optimizer_device: none
offload_param_device: none
zero3_init_flag: false
zero_stage: 2
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false Here is how I run finetuning: accelerate launch --gpu_ids="0,1" --multi_gpu --num_processes=2 --num_cpu_threads_per_process=2 "./sdxl_train.py" \
--ddp_timeout='1000' \
--bucket_no_upscale \
--bucket_reso_steps=64 \
--cache_latents \
--cache_latents_to_disk \
--caption_extension=".txt" \
--dataset_repeats="20" \
--enable_bucket \
--min_bucket_reso=64 \
--max_bucket_reso=1024 \
--in_json="/home/storuky/ml/train/meta_cap.json" \
--gradient_checkpointing \
--learning_rate="1.2e-06" \
--learning_rate_te1="5e-07" \
--learning_rate_te2="5e-07" \
--logging_dir="/home/storuky/ml/train/log" \
--lr_scheduler="constant" \
--lr_scheduler_args \
--lr_scheduler_type "CosineAnnealingLR" \
--lr_scheduler_args "T_max=10" \
--max_data_loader_n_workers="0" \
--resolution="1024,1024" \
--max_timestep=900 \
--max_token_length=225 \
--max_train_epochs=10 \
--max_train_steps="979575" \
--min_snr_gamma=5 \
--min_timestep=100 \
--mixed_precision="bf16" \
--no_half_vae \
--noise_offset=0.0375 \
--adaptive_noise_scale=0.00375 \
--optimizer_args scale_parameter=False relative_step=False warmup_init=False weight_decay=0.01 \
--optimizer_type="Adafactor" \
--output_dir="/home/storuky/ml/out" \
--output_name="TrainingModel" \
--pretrained_model_name_or_path="/home/storuky/ml/sd_xl_base_1.0.safetensors" \
--save_every_n_epochs="1" \
--save_model_as=safetensors \
--save_precision="bf16" \
--save_state \
--seed="1234" \
--train_batch_size="1" \
--train_data_dir="/home/storuky/ml/train/dataset" \
--train_text_encoder \
--v_pred_like_loss="0.5" \
--xformers \
--deepspeed \
--zero_stage 2 \
--offload_optimizer_device cpu |
When you want to use cpu offloading with offload_optimizer_device=cpu, DeepSpeed will build and use CPUAdam. It is also kind of Adam. Can you change When I use adafactor, I got another error. No error with adamw. |
@BootsofLagrangian Yeah, I tried AdamW as well but no luck so far... Here is a full trace of issue with AdamW as optimizer (spoiler: it's happening with any kind of offload_optimizer_device... none, nvme, cpu – doesn't matter):
|
@BootsofLagrangian even if I copy your toml conf from here , change only paths and run as you described I still get this error. Tried to reconfigure accelerate and reinstall/install another versions on Deepspeed – no affect. |
@BootsofLagrangian Ah, I just switched to your version and it's working! The issue just with this branch. |
Thank for your report! |
- we have to prepare optimizer and ds_model at the same time. - pull/1139#issuecomment-1986790007 Signed-off-by: BootsofLagrangian <[email protected]>
Fix sdxl_train.py in deepspeed branch
Edit: see comment below for reason (missing
config.toml pretrained_model_name_or_path = "runwayml/stable-diffusion-v1-5"
dataset_config = "/home/ml/checkpoints/sd15/dataset.toml"
xformers = true
deepspeed = true
zero_stage = 2
mixed_precision = "bf16"
save_precision = "bf16"
full_bf16 = true
no_half_vae = true
train_batch_size = 24
max_data_loader_n_workers = 4
persistent_data_loader_workers = true
optimizer_type = "AdamW8bit"
optimizer_args = [ "weight_decay=1e-1", ]
lr_scheduler = "constant"
max_train_steps = 78452
gradient_checkpointing = true
gradient_accumulation_steps = 16
learning_rate = 4e-5
unet_lr = 4e-5
text_encoder_lr = 2e-5
max_grad_norm = 1.0
max_token_length = 225
network_alpha = 64
network_dim = 128
network_module = "networks.lora"
cache_latents = true
cache_latents_to_disk = true |
Original PR #1101
I think it is not necessary to set back
unet
ortext_encoder
with the result ofprepare_deepspeed_model
. Because the model is notlist
, so they are not changed in the function.