RuntimeError: The size of tensor a (4173) must match the size of tensor b (16461) at non-singleton dimension 2 #9799

TharunSivamani · 2024-10-29T17:03:47Z

Describe the bug

I am trying to work on the flux lora quantization example as per the link
https://github.com/huggingface/diffusers/tree/main/examples/research_projects/flux_lora_quantization

but facing RuntimeError: The size of tensor a (4173) must match the size of tensor b (16461) at non-singleton dimension 2 - error

Reproduction

Steps to reproduce:

python compute_embeddings.py

accelerate launch --config_file=accelerate.yaml
train_dreambooth_lora_flux_miniature.py
--pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev" \ # used flux-dev locally downloaded model
--data_df_path="embeddings.parquet"
--output_dir="yarn_art_lora_flux_nf4"
--mixed_precision="fp16"
--use_8bit_adam
--weighting_scheme="none"
--resolution=1024
--train_batch_size=1
--repeats=1
--learning_rate=1e-4
--guidance_scale=1
--report_to="wandb"
--gradient_accumulation_steps=4
--gradient_checkpointing
--lr_scheduler="constant"
--lr_warmup_steps=0
--cache_latents
--rank=4
--max_train_steps=700
--seed="0"

Logs

(env) root:~/tharun/Flux-HF# accelerate launch --config_file=accelerate.yaml \
  train_dreambooth_lora_flux_miniature.py \
  --pretrained_model_name_or_path="/root/tharun/black-forest-labs/FLUX.1-dev" \
  --data_df_path="embeddings.parquet" \
  --output_dir="yarn_art_lora_flux_nf4" \
  --mixed_precision="fp16" \
  --use_8bit_adam \
  --weighting_scheme="none" \
  --resolution=1024 \
  --train_batch_size=1 \
  --repeats=1 \
  --learning_rate=1e-4 \
  --guidance_scale=1 \
  --report_to="wandb" \
  --gradient_accumulation_steps=4 \
  --gradient_checkpointing \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --cache_latents \
  --rank=4 \
  --max_train_steps=700 \
  --seed="0"
10/29/2024 16:58:01 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

Merged sharded checkpoints as `hf_quantizer` is not None.
{'axes_dims_rope'} was not found in config. Values will be initialized to default values.
Caching latents: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [00:01<00:00, 10.35it/s]
wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
wandb: Currently logged in as: tharunsivamani (tharunsivamani-student). Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.18.5
wandb: Run data is saved locally in /root/tharun/Flux-HF/wandb/run-20241029_165827-66cke5nx
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run feasible-bird-3
wandb: ⭐️ View project at https://wandb.ai/tharunsivamani-student/dreambooth-flux-dev-lora-nf4
wandb: 🚀 View run at https://wandb.ai/tharunsivamani-student/dreambooth-flux-dev-lora-nf4/runs/66cke5nx
10/29/2024 16:58:28 - INFO - __main__ - ***** Running training *****
10/29/2024 16:58:28 - INFO - __main__ -   Num examples = 18
10/29/2024 16:58:28 - INFO - __main__ -   Num batches each epoch = 18
10/29/2024 16:58:28 - INFO - __main__ -   Num Epochs = 140
10/29/2024 16:58:28 - INFO - __main__ -   Instantaneous batch size per device = 1
10/29/2024 16:58:28 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 4
10/29/2024 16:58:28 - INFO - __main__ -   Gradient Accumulation steps = 4
10/29/2024 16:58:28 - INFO - __main__ -   Total optimization steps = 700
Steps:   0%|                                                                                                                                                                                                  | 0/700 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/root/tharun/Flux-HF/train_dreambooth_lora_flux_miniature.py", line 1183, in <module>
    main(args)
  File "/root/tharun/Flux-HF/train_dreambooth_lora_flux_miniature.py", line 1072, in main
    model_pred = transformer(
  File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 820, in forward
    return model_forward(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 808, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/root/tharun/env/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/transformers/transformer_flux.py", line 490, in forward
    encoder_hidden_states, hidden_states = torch.utils.checkpoint.checkpoint(
  File "/root/tharun/env/lib/python3.10/site-packages/torch/_compile.py", line 32, in inner
    return disable_fn(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
    return fn(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 496, in checkpoint
    ret = function(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/transformers/transformer_flux.py", line 485, in custom_forward
    return module(*inputs)
  File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/transformers/transformer_flux.py", line 175, in forward
    attn_output, context_attn_output = self.attn(
  File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 495, in forward
    return self.processor(
  File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 1872, in __call__
    query = apply_rotary_emb(query, image_rotary_emb)
  File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 770, in apply_rotary_emb
    out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)
RuntimeError: The size of tensor a (4173) must match the size of tensor b (16461) at non-singleton dimension 2
wandb: 🚀 View run feasible-bird-3 at: https://wandb.ai/tharunsivamani-student/dreambooth-flux-dev-lora-nf4/runs/66cke5nx
wandb: Find logs at: wandb/run-20241029_165827-66cke5nx/logs
Traceback (most recent call last):
  File "/root/tharun/env/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/root/tharun/env/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/root/tharun/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1168, in launch_command
    simple_launcher(args)
  File "/root/tharun/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 763, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/tharun/env/bin/python', 'train_dreambooth_lora_flux_miniature.py', '--pretrained_model_name_or_path=/root/tharun/black-forest-labs/FLUX.1-dev', '--data_df_path=embeddings.parquet', '--output_dir=yarn_art_lora_flux_nf4', '--mixed_precision=fp16', '--use_8bit_adam', '--weighting_scheme=none', '--resolution=1024', '--train_batch_size=1', '--repeats=1', '--learning_rate=1e-4', '--guidance_scale=1', '--report_to=wandb', '--gradient_accumulation_steps=4', '--gradient_checkpointing', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--cache_latents', '--rank=4', '--max_train_steps=700', '--seed=0']' returned non-zero exit status 1.

System Info

🤗 Diffusers version: 0.32.0.dev0
Platform: Linux-5.15.0-119-generic-x86_64-with-glibc2.35
Running on Google Colab?: No
Python version: 3.10.12
PyTorch version (GPU?): 2.5.0+cu124 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Huggingface_hub version: 0.24.7
Transformers version: 4.46.1
Accelerate version: 1.0.1
PEFT version: 0.13.2
Bitsandbytes version: 0.44.1
Safetensors version: 0.4.5
xFormers version: not installed
Accelerator: NVIDIA L40S, 46068 MiB
NVIDIA L40S, 46068 MiB
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

No response

a-r-r-o-w · 2024-10-30T10:17:01Z

cc @sayakpaul

sayakpaul · 2024-10-30T11:01:34Z

#9806

sayakpaul · 2024-11-01T03:18:28Z

#9806 should have fixed it. If you still experience issues, please let us know.

TharunSivamani added the bug Something isn't working label Oct 29, 2024

sayakpaul closed this as completed Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: The size of tensor a (4173) must match the size of tensor b (16461) at non-singleton dimension 2 #9799

RuntimeError: The size of tensor a (4173) must match the size of tensor b (16461) at non-singleton dimension 2 #9799

TharunSivamani commented Oct 29, 2024

a-r-r-o-w commented Oct 30, 2024

sayakpaul commented Oct 30, 2024

sayakpaul commented Nov 1, 2024

RuntimeError: The size of tensor a (4173) must match the size of tensor b (16461) at non-singleton dimension 2 #9799

RuntimeError: The size of tensor a (4173) must match the size of tensor b (16461) at non-singleton dimension 2 #9799

Comments

TharunSivamani commented Oct 29, 2024

Describe the bug

Reproduction

Logs

System Info

Who can help?

a-r-r-o-w commented Oct 30, 2024

sayakpaul commented Oct 30, 2024

sayakpaul commented Nov 1, 2024