Bug in Custom Diffusion training with concept list

### Describe the bug

**Context**: `train_custom_diffusion.py` script in Custom Diffusion using concept list
**Description**: This script fails if I use concept list and try to synthesize images. In this case it tries to generate synthetic images with wrong or missing prompt. Note, that the primary use of concept list is for multiconcept training, but this example uses single concept for simplicity.

Bug: https://github.com/huggingface/diffusers/blob/main/examples/custom_diffusion/train_custom_diffusion.py#L756
Fix: https://github.com/huggingface/diffusers/pull/6710/commits/1b8972db4a004d5cec0b4a8bfc72ee0e8ee25121

### Reproduction

```bash
# move to relevant dir
cd examples/custom_diffusion/
# download example data from README.md
wget https://www.cs.cmu.edu/~custom-diffusion/assets/data.zip
unzip data.zip
```

Make `concept_list.json` file:
```json
[
    {
        "instance_prompt":     "photo of a <new1> cat",
        "class_prompt":         "cat",
        "instance_data_dir":    "data/cat",
        "class_data_dir": "synth-dataset/cat"
    }
]
```

Run script:
```
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export OUTPUT_DIR="path-to-save-model"

accelerate launch train_custom_diffusion.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --output_dir=$OUTPUT_DIR \
  --concepts_list=./concept_list.json \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --resolution=512  \
  --train_batch_size=2  \
  --learning_rate=1e-5  \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --num_class_images=200 \
  --scale_lr --hflip  \
  --modifier_token "<new1>" 
```

### Logs

```shell
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: bf16

{'image_encoder', 'requires_safety_checker'} was not found in config. Values will be initialized to default values.
Loading pipeline components...:   0%|                                                                                                                                                                                                                        | 0/6 [00:00<?, ?it/s]{'resnet_out_scale_factor', 'encoder_hid_dim_type', 'class_embeddings_concat', 'addition_embed_type', 'mid_block_only_cross_attention', 'projection_class_embeddings_input_dim', 'time_embedding_type', 'addition_time_embed_dim', 'resnet_time_scale_shift', 'transformer_layers_per_block', 'time_embedding_dim', 'dual_cross_attention', 'timestep_post_act', 'resnet_skip_time_act', 'reverse_transformer_layers_per_block', 'time_cond_proj_dim', 'class_embed_type', 'upcast_attention', 'only_cross_attention', 'encoder_hid_dim', 'attention_type', 'conv_in_kernel', 'cross_attention_norm', 'num_attention_heads', 'use_linear_projection', 'mid_block_type', 'time_embedding_act_fn', 'num_class_embeds', 'conv_out_kernel', 'addition_embed_type_num_heads', 'dropout'} was not found in config. Values will be initialized to default values.
Loaded unet as UNet2DConditionModel from `unet` subfolder of CompVis/stable-diffusion-v1-4.
Loading pipeline components...:  17%|██████████████████████████████████▋                                                                                                                                                                             | 1/6 [00:01<00:07,  1.54s/it]{'prediction_type', 'timestep_spacing'} was not found in config. Values will be initialized to default values.
Loaded scheduler as PNDMScheduler from `scheduler` subfolder of CompVis/stable-diffusion-v1-4.
Loaded feature_extractor as CLIPImageProcessor from `feature_extractor` subfolder of CompVis/stable-diffusion-v1-4.
Loaded text_encoder as CLIPTextModel from `text_encoder` subfolder of CompVis/stable-diffusion-v1-4.
Loading pipeline components...:  67%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                                     | 4/6 [00:01<00:00,  2.53it/s]{'force_upcast', 'norm_num_groups'} was not found in config. Values will be initialized to default values.
Loaded vae as AutoencoderKL from `vae` subfolder of CompVis/stable-diffusion-v1-4.
Loading pipeline components...:  83%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                                  | 5/6 [00:02<00:00,  2.95it/s]Loaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of CompVis/stable-diffusion-v1-4.
Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:02<00:00,  2.71it/s]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes-0.38.1-py3.10.egg/bitsandbytes/libbitsandbytes_cuda117.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.7/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes-0.38.1-py3.10.egg/bitsandbytes/libbitsandbytes_cuda117.so...
01/25/2024 15:31:14 - INFO - __main__ - Number of class images to sample: 200.
Generating class images:   0%|                                                                                                                                                                                                                              | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/aishutin/.local/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 127, in collate
    return elem_type({key: collate([d[key] for d in batch], collate_fn_map=collate_fn_map) for key in elem})
  File "/home/aishutin/.local/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 127, in <dictcomp>
    return elem_type({key: collate([d[key] for d in batch], collate_fn_map=collate_fn_map) for key in elem})
  File "/home/aishutin/.local/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 150, in collate
    raise TypeError(default_collate_err_msg_format.format(elem_type))
TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <class 'NoneType'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/aishutin/contribute/diffusers/examples/custom_diffusion/train_custom_diffusion.py", line 1350, in <module>
    main(args)
  File "/home/aishutin/contribute/diffusers/examples/custom_diffusion/train_custom_diffusion.py", line 762, in main
    for example in tqdm(
  File "/home/aishutin/.local/lib/python3.10/site-packages/tqdm/std.py", line 1182, in __iter__
    for obj in iterable:
  File "/home/aishutin/.local/lib/python3.10/site-packages/accelerate/data_loader.py", line 451, in __iter__
    current_batch = next(dataloader_iter)
  File "/home/aishutin/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 634, in __next__
    data = self._next_data()
  File "/home/aishutin/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 678, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/aishutin/.local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
    return self.collate_fn(data)
  File "/home/aishutin/.local/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 264, in default_collate
    return collate(batch, collate_fn_map=default_collate_fn_map)
  File "/home/aishutin/.local/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 130, in collate
    return {key: collate([d[key] for d in batch], collate_fn_map=collate_fn_map) for key in elem}
  File "/home/aishutin/.local/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 130, in <dictcomp>
    return {key: collate([d[key] for d in batch], collate_fn_map=collate_fn_map) for key in elem}
  File "/home/aishutin/.local/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py", line 150, in collate
    raise TypeError(default_collate_err_msg_format.format(elem_type))
TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <class 'NoneType'>
Traceback (most recent call last):
  File "/home/aishutin/.local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/aishutin/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/home/aishutin/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
    simple_launcher(args)
  File "/home/aishutin/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_custom_diffusion.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--output_dir=path-to-save-model', '--concepts_list=./concept_list.json', '--with_prior_preservation', '--prior_loss_weight=1.0', '--resolution=512', '--train_batch_size=2', '--learning_rate=1e-5', '--lr_warmup_steps=0', '--max_train_steps=500', '--num_class_images=200', '--scale_lr', '--hflip', '--modifier_token', '<new1>']' returned non-zero exit status 1.
```


### System Info

- `diffusers` version: 0.26.0.dev0
- Platform: Linux-6.5.0-14-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- PyTorch version (GPU?): 2.0.0+cu117 (True)
- Huggingface_hub version: 0.20.3
- Transformers version: 4.37.0
- Accelerate version: 0.26.1
- xFormers version: not installed
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no


### Who can help?

@sayakpaul 
@nupurkmr9 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug in Custom Diffusion training with concept list #6709

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug in Custom Diffusion training with concept list #6709

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions