Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: The size of tensor a (4173) must match the size of tensor b (16461) at non-singleton dimension 2 #9799

Closed
TharunSivamani opened this issue Oct 29, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@TharunSivamani
Copy link

Describe the bug

I am trying to work on the flux lora quantization example as per the link
https://github.com/huggingface/diffusers/tree/main/examples/research_projects/flux_lora_quantization

but facing RuntimeError: The size of tensor a (4173) must match the size of tensor b (16461) at non-singleton dimension 2 - error

Reproduction

Steps to reproduce:

python compute_embeddings.py

accelerate launch --config_file=accelerate.yaml
train_dreambooth_lora_flux_miniature.py
--pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev" \ # used flux-dev locally downloaded model
--data_df_path="embeddings.parquet"
--output_dir="yarn_art_lora_flux_nf4"
--mixed_precision="fp16"
--use_8bit_adam
--weighting_scheme="none"
--resolution=1024
--train_batch_size=1
--repeats=1
--learning_rate=1e-4
--guidance_scale=1
--report_to="wandb"
--gradient_accumulation_steps=4
--gradient_checkpointing
--lr_scheduler="constant"
--lr_warmup_steps=0
--cache_latents
--rank=4
--max_train_steps=700
--seed="0"

Logs

(env) root:~/tharun/Flux-HF# accelerate launch --config_file=accelerate.yaml \
  train_dreambooth_lora_flux_miniature.py \
  --pretrained_model_name_or_path="/root/tharun/black-forest-labs/FLUX.1-dev" \
  --data_df_path="embeddings.parquet" \
  --output_dir="yarn_art_lora_flux_nf4" \
  --mixed_precision="fp16" \
  --use_8bit_adam \
  --weighting_scheme="none" \
  --resolution=1024 \
  --train_batch_size=1 \
  --repeats=1 \
  --learning_rate=1e-4 \
  --guidance_scale=1 \
  --report_to="wandb" \
  --gradient_accumulation_steps=4 \
  --gradient_checkpointing \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --cache_latents \
  --rank=4 \
  --max_train_steps=700 \
  --seed="0"
10/29/2024 16:58:01 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

Merged sharded checkpoints as `hf_quantizer` is not None.
{'axes_dims_rope'} was not found in config. Values will be initialized to default values.
Caching latents: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [00:01<00:00, 10.35it/s]
wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
wandb: Currently logged in as: tharunsivamani (tharunsivamani-student). Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.18.5
wandb: Run data is saved locally in /root/tharun/Flux-HF/wandb/run-20241029_165827-66cke5nx
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run feasible-bird-3
wandb: ⭐️ View project at https://wandb.ai/tharunsivamani-student/dreambooth-flux-dev-lora-nf4
wandb: 🚀 View run at https://wandb.ai/tharunsivamani-student/dreambooth-flux-dev-lora-nf4/runs/66cke5nx
10/29/2024 16:58:28 - INFO - __main__ - ***** Running training *****
10/29/2024 16:58:28 - INFO - __main__ -   Num examples = 18
10/29/2024 16:58:28 - INFO - __main__ -   Num batches each epoch = 18
10/29/2024 16:58:28 - INFO - __main__ -   Num Epochs = 140
10/29/2024 16:58:28 - INFO - __main__ -   Instantaneous batch size per device = 1
10/29/2024 16:58:28 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 4
10/29/2024 16:58:28 - INFO - __main__ -   Gradient Accumulation steps = 4
10/29/2024 16:58:28 - INFO - __main__ -   Total optimization steps = 700
Steps:   0%|                                                                                                                                                                                                  | 0/700 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/root/tharun/Flux-HF/train_dreambooth_lora_flux_miniature.py", line 1183, in <module>
    main(args)
  File "/root/tharun/Flux-HF/train_dreambooth_lora_flux_miniature.py", line 1072, in main
    model_pred = transformer(
  File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 820, in forward
    return model_forward(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 808, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/root/tharun/env/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/transformers/transformer_flux.py", line 490, in forward
    encoder_hidden_states, hidden_states = torch.utils.checkpoint.checkpoint(
  File "/root/tharun/env/lib/python3.10/site-packages/torch/_compile.py", line 32, in inner
    return disable_fn(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
    return fn(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 496, in checkpoint
    ret = function(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/transformers/transformer_flux.py", line 485, in custom_forward
    return module(*inputs)
  File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/transformers/transformer_flux.py", line 175, in forward
    attn_output, context_attn_output = self.attn(
  File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 495, in forward
    return self.processor(
  File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 1872, in __call__
    query = apply_rotary_emb(query, image_rotary_emb)
  File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 770, in apply_rotary_emb
    out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)
RuntimeError: The size of tensor a (4173) must match the size of tensor b (16461) at non-singleton dimension 2
wandb: 🚀 View run feasible-bird-3 at: https://wandb.ai/tharunsivamani-student/dreambooth-flux-dev-lora-nf4/runs/66cke5nx
wandb: Find logs at: wandb/run-20241029_165827-66cke5nx/logs
Traceback (most recent call last):
  File "/root/tharun/env/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/root/tharun/env/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/root/tharun/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1168, in launch_command
    simple_launcher(args)
  File "/root/tharun/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 763, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/tharun/env/bin/python', 'train_dreambooth_lora_flux_miniature.py', '--pretrained_model_name_or_path=/root/tharun/black-forest-labs/FLUX.1-dev', '--data_df_path=embeddings.parquet', '--output_dir=yarn_art_lora_flux_nf4', '--mixed_precision=fp16', '--use_8bit_adam', '--weighting_scheme=none', '--resolution=1024', '--train_batch_size=1', '--repeats=1', '--learning_rate=1e-4', '--guidance_scale=1', '--report_to=wandb', '--gradient_accumulation_steps=4', '--gradient_checkpointing', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--cache_latents', '--rank=4', '--max_train_steps=700', '--seed=0']' returned non-zero exit status 1.

System Info

  • 🤗 Diffusers version: 0.32.0.dev0
  • Platform: Linux-5.15.0-119-generic-x86_64-with-glibc2.35
  • Running on Google Colab?: No
  • Python version: 3.10.12
  • PyTorch version (GPU?): 2.5.0+cu124 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.24.7
  • Transformers version: 4.46.1
  • Accelerate version: 1.0.1
  • PEFT version: 0.13.2
  • Bitsandbytes version: 0.44.1
  • Safetensors version: 0.4.5
  • xFormers version: not installed
  • Accelerator: NVIDIA L40S, 46068 MiB
    NVIDIA L40S, 46068 MiB
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

No response

@TharunSivamani TharunSivamani added the bug Something isn't working label Oct 29, 2024
@a-r-r-o-w
Copy link
Member

cc @sayakpaul

@sayakpaul
Copy link
Member

#9806

@sayakpaul
Copy link
Member

#9806 should have fixed it. If you still experience issues, please let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants