You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(env) root:~/tharun/Flux-HF# accelerate launch --config_file=accelerate.yaml \
train_dreambooth_lora_flux_miniature.py \
--pretrained_model_name_or_path="/root/tharun/black-forest-labs/FLUX.1-dev" \
--data_df_path="embeddings.parquet" \
--output_dir="yarn_art_lora_flux_nf4" \
--mixed_precision="fp16" \
--use_8bit_adam \
--weighting_scheme="none" \
--resolution=1024 \
--train_batch_size=1 \
--repeats=1 \
--learning_rate=1e-4 \
--guidance_scale=1 \
--report_to="wandb" \
--gradient_accumulation_steps=4 \
--gradient_checkpointing \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--cache_latents \
--rank=4 \
--max_train_steps=700 \
--seed="0"
10/29/2024 16:58:01 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: fp16
Merged sharded checkpoints as `hf_quantizer` is not None.
{'axes_dims_rope'} was not found in config. Values will be initialized to default values.
Caching latents: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [00:01<00:00, 10.35it/s]
wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
wandb: Currently logged in as: tharunsivamani (tharunsivamani-student). Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.18.5
wandb: Run data is saved locally in /root/tharun/Flux-HF/wandb/run-20241029_165827-66cke5nx
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run feasible-bird-3
wandb: ⭐️ View project at https://wandb.ai/tharunsivamani-student/dreambooth-flux-dev-lora-nf4
wandb: 🚀 View run at https://wandb.ai/tharunsivamani-student/dreambooth-flux-dev-lora-nf4/runs/66cke5nx
10/29/2024 16:58:28 - INFO - __main__ - ***** Running training *****
10/29/2024 16:58:28 - INFO - __main__ - Num examples = 18
10/29/2024 16:58:28 - INFO - __main__ - Num batches each epoch = 18
10/29/2024 16:58:28 - INFO - __main__ - Num Epochs = 140
10/29/2024 16:58:28 - INFO - __main__ - Instantaneous batch size per device = 1
10/29/2024 16:58:28 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 4
10/29/2024 16:58:28 - INFO - __main__ - Gradient Accumulation steps = 4
10/29/2024 16:58:28 - INFO - __main__ - Total optimization steps = 700
Steps: 0%|| 0/700 [00:00<?, ?it/s]Traceback (most recent call last):
File "/root/tharun/Flux-HF/train_dreambooth_lora_flux_miniature.py", line 1183, in<module>
main(args)
File "/root/tharun/Flux-HF/train_dreambooth_lora_flux_miniature.py", line 1072, in main
model_pred = transformer(
File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 820, in forward
return model_forward(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 808, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/root/tharun/env/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
return func(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/transformers/transformer_flux.py", line 490, in forward
encoder_hidden_states, hidden_states = torch.utils.checkpoint.checkpoint(
File "/root/tharun/env/lib/python3.10/site-packages/torch/_compile.py", line 32, in inner
return disable_fn(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
return fn(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 496, in checkpoint
ret = function(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/transformers/transformer_flux.py", line 485, in custom_forward
return module(*inputs)
File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/transformers/transformer_flux.py", line 175, in forward
attn_output, context_attn_output = self.attn(
File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 495, in forward
return self.processor(
File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 1872, in __call__
query = apply_rotary_emb(query, image_rotary_emb)
File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 770, in apply_rotary_emb
out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)
RuntimeError: The size of tensor a (4173) must match the size of tensor b (16461) at non-singleton dimension 2
wandb: 🚀 View run feasible-bird-3 at: https://wandb.ai/tharunsivamani-student/dreambooth-flux-dev-lora-nf4/runs/66cke5nx
wandb: Find logs at: wandb/run-20241029_165827-66cke5nx/logs
Traceback (most recent call last):
File "/root/tharun/env/bin/accelerate", line 8, in<module>sys.exit(main())
File "/root/tharun/env/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/root/tharun/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1168, in launch_command
simple_launcher(args)
File "/root/tharun/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 763, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/tharun/env/bin/python', 'train_dreambooth_lora_flux_miniature.py', '--pretrained_model_name_or_path=/root/tharun/black-forest-labs/FLUX.1-dev', '--data_df_path=embeddings.parquet', '--output_dir=yarn_art_lora_flux_nf4', '--mixed_precision=fp16', '--use_8bit_adam', '--weighting_scheme=none', '--resolution=1024', '--train_batch_size=1', '--repeats=1', '--learning_rate=1e-4', '--guidance_scale=1', '--report_to=wandb', '--gradient_accumulation_steps=4', '--gradient_checkpointing', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--cache_latents', '--rank=4', '--max_train_steps=700', '--seed=0']' returned non-zero exit status 1.
Describe the bug
I am trying to work on the flux lora quantization example as per the link
https://github.com/huggingface/diffusers/tree/main/examples/research_projects/flux_lora_quantization
but facing RuntimeError: The size of tensor a (4173) must match the size of tensor b (16461) at non-singleton dimension 2 - error
Reproduction
Steps to reproduce:
python compute_embeddings.py
accelerate launch --config_file=accelerate.yaml
train_dreambooth_lora_flux_miniature.py
--pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev" \ # used flux-dev locally downloaded model
--data_df_path="embeddings.parquet"
--output_dir="yarn_art_lora_flux_nf4"
--mixed_precision="fp16"
--use_8bit_adam
--weighting_scheme="none"
--resolution=1024
--train_batch_size=1
--repeats=1
--learning_rate=1e-4
--guidance_scale=1
--report_to="wandb"
--gradient_accumulation_steps=4
--gradient_checkpointing
--lr_scheduler="constant"
--lr_warmup_steps=0
--cache_latents
--rank=4
--max_train_steps=700
--seed="0"
Logs
System Info
NVIDIA L40S, 46068 MiB
Who can help?
No response
The text was updated successfully, but these errors were encountered: