Skip to content

Exception: No images were discovered by the bucket manager in the dataset #1035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
a-l-e-x-d-s-9 opened this issue Oct 8, 2024 · 9 comments

Comments

@a-l-e-x-d-s-9
Copy link

My dataset is a single image:
dataset.zip
Settings:
s01_multidatabackend.json
s01_config_01.json
Log:

No dependencies to install or update
INFO:root:lm_eval is not installed, GPTQ may not be usable
/home/alexds9/Documents/stable_diffusion/SimpleTuner/.venv/lib/python3.11/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_fwd")
/home/alexds9/Documents/stable_diffusion/SimpleTuner/.venv/lib/python3.11/site-packages/xformers/ops/fmha/flash.py:344: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_bwd")
2024-10-08 04:09:41,385 [INFO] Using json configuration backend.
2024-10-08 04:09:41,385 [INFO] [CONFIG.JSON] Loaded configuration from config/config.json
2024-10-08 04:09:41,385 [WARNING] Skipping false argument: --push_to_hub
2024-10-08 04:09:41,385 [WARNING] Skipping false argument: --push_checkpoints_to_hub
2024-10-08 04:09:41,385 [WARNING] Skipping false argument: --validation_torch_compile
2024-10-08 04:09:41,385 [WARNING] Skipping false argument: --disable_benchmark
--model_type=lora
--lora_type=lycoris
--lycoris_config=/home/alexds9/Documents/stable_diffusion/Models_2024_10/simple_image_test/lycoris_config_03.json
--pretrained_model_name_or_path=black-forest-labs/FLUX.1-dev
--model_family=flux
--data_backend_config=/home/alexds9/Documents/stable_diffusion/Models_2024_10/simple_image_test/s01_multidatabackend.json
--output_dir=/home/alexds9/stable-diffusion-webui/models/Lora/My/Flux/Training/Models_2024_10/simple_image_test/tr_01/
--user_prompt_library=/home/alexds9/Documents/stable_diffusion/Models_2024_10/simple_image_test/s01_prompt_library.json
--hub_model_id=simpletuner-lora-01_simple_image_test_tr_01
--tracker_project_name=simpletuner-lora-01_simple_image_test_tr_01
--tracker_run_name=tr_01
--seed=5612103
--lora_rank=8
--lora_alpha=8
--mixed_precision=bf16
--optimizer=adamw_bf16
--learning_rate=7.5e-3
--train_batch_size=2
--gradient_accumulation_steps=2
--lr_scheduler=cosine
--lr_warmup_steps=20
--max_train_steps=1000
--num_train_epochs=0
--checkpointing_steps=100
--base_model_precision=int8-quanto
--base_model_default_dtype=bf16
--keep_vae_loaded
--flux_lora_target=all+ffs
--gradient_precision=fp32
--noise_offset=0.15
--noise_offset_probability=0.5
--checkpoints_total_limit=20
--aspect_bucket_rounding=2
--minimum_image_size=0
--resume_from_checkpoint=latest
--report_to=wandb
--metadata_update_interval=60
--gradient_checkpointing
--caption_dropout_probability=0.20
--resolution_type=pixel_area
--resolution=256
--validation_seed=10
--validation_steps=100
--validation_resolution=512x768
--validation_guidance=3.5
--validation_guidance_rescale=0.0
--validation_num_inference_steps=20
--validation_prompt=woman, brown hair, blue eyes, white shirt, upper body, indoors,
--num_validation_images=1
--snr_gamma=5
--inference_scheduler_timestep_spacing=trailing
--training_scheduler_timestep_spacing=trailing
--max_workers=32
--read_batch_size=25
--write_batch_size=64
--torch_num_threads=8
--image_processing_batch_size=32
--vae_batch_size=4
--compress_disk_cache
--max_grad_norm=0.02
--disable_bucket_pruning
--override_dataset_config
--quantize_via=cpu
2024-10-08 04:09:41,390 [WARNING] The VAE model madebyollin/sdxl-vae-fp16-fix is not compatible. Please use a compatible VAE to eliminate this warning. The baked-in VAE will be used, instead.
2024-10-08 04:09:41,391 [INFO] VAE Model: black-forest-labs/FLUX.1-dev
2024-10-08 04:09:41,391 [INFO] Default VAE Cache location: 
2024-10-08 04:09:41,391 [INFO] Text Cache location: cache
2024-10-08 04:09:41,391 [WARNING] Updating T5 XXL tokeniser max length to 512 for Flux.
2024-10-08 04:09:41,391 [WARNING] Gradient accumulation steps are enabled, but gradient precision is set to 'unmodified'. This may lead to numeric instability. Consider disabling gradient accumulation steps. Continuing in 10 seconds..
2024-10-08 04:09:51,391 [INFO] Enabled NVIDIA TF32 for faster training on Ampere GPUs. Use --disable_tf32 if this causes any problems.
2024-10-08 04:09:51,912 [INFO] Load VAE: black-forest-labs/FLUX.1-dev
2024-10-08 04:09:52,464 [INFO] Loading VAE onto accelerator, converting from torch.float32 to torch.bfloat16
2024-10-08 04:09:52,603 [INFO] Load tokenizers
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
2024-10-08 04:09:53,495 [INFO] Loading OpenAI CLIP-L text encoder from black-forest-labs/FLUX.1-dev/text_encoder..
2024-10-08 04:09:53,895 [INFO] Loading T5 XXL v1.1 text encoder from black-forest-labs/FLUX.1-dev/text_encoder_2..
Downloading shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 7876.63it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  7.17it/s]
2024-10-08 04:09:57,514 [INFO] Moving text encoder to GPU.
2024-10-08 04:09:57,707 [INFO] Moving text encoder 2 to GPU.
2024-10-08 04:10:06,404 [INFO] Loading data backend config from /home/alexds9/Documents/stable_diffusion/Models_2024_10/simple_image_test/s01_multidatabackend.json
2024-10-08 04:10:06,405 [INFO] Configuring text embed backend: alt-embed-cache
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 593.91it/s]
2024-10-08 04:10:06,757 [INFO] (Rank: 0) (id=alt-embed-cache) Listing all text embed cache entries
2024-10-08 04:10:06,758 [INFO] Pre-computing null embedding
2024-10-08 04:10:13,232 [INFO] Completed loading text embed services.                                                        
2024-10-08 04:10:13,232 [INFO] Configuring data backend: all_dataset_768
2024-10-08 04:10:13,232 [INFO] (id=all_dataset_768) Loading bucket manager.                                                  
2024-10-08 04:10:13,243 [WARNING] No cache file found, creating new one.
2024-10-08 04:10:13,243 [INFO] (id=all_dataset_768) Refreshing aspect buckets on main process.
2024-10-08 04:10:13,243 [INFO] Discovering new files...
2024-10-08 04:10:13,245 [INFO] Compressed 0 existing files from 0.
Generating aspect bucket cache:   0%|                                         | 0/1 [00:00<?, ?it/s]2024-10-08 04:10:13,267 [ERROR] Error processing image: Aspect buckets must be a list of floats or dictionaries.
2024-10-08 04:10:13,268 [ERROR] Error traceback: Traceback (most recent call last):
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/metadata/backends/discovery.py", line 237, in _process_for_bucket
    prepared_sample = training_sample.prepare()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/image_manipulation/training_sample.py", line 314, in prepare
    self.crop()
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/image_manipulation/training_sample.py", line 529, in crop
    self.calculate_target_size()
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/image_manipulation/training_sample.py", line 484, in calculate_target_size
    self.aspect_ratio = self._select_random_aspect()
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/image_manipulation/training_sample.py", line 280, in _select_random_aspect
    available_aspects = self._trim_aspect_bucket_list()
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/image_manipulation/training_sample.py", line 198, in _trim_aspect_bucket_list
    raise ValueError(
ValueError: Aspect buckets must be a list of floats or dictionaries.

2024-10-08 04:10:13,270 [INFO] Image processing statistics: {'total_processed': 0, 'skipped': {'already_exists': 0, 'metadata_missing': 0, 'not_found': 0, 'too_small': 0, 'other': 0}}
2024-10-08 04:10:13,270 [INFO] Enforcing minimum image size of 0.013225. This could take a while for very-large datasets.
2024-10-08 04:10:13,270 [INFO] Completed aspect bucket update.
2024-10-08 04:10:13,271 [INFO] Configured backend: {'id': 'all_dataset_768', 'config': {'vae_cache_clear_each_epoch': False, 'probability': 1.0, 'repeats': 5, 'crop': True, 'crop_aspect': 'random', 'crop_aspect_buckets': [0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0, 1.125, 1.25, 1.375, 1.5, 1.625, 1.75, 1.875, 2], 'crop_style': 'random', 'disable_validation': False, 'resolution': 0.589824, 'resolution_type': 'area', 'caption_strategy': 'textfile', 'instance_data_dir': '/home/alexds9/Documents/stable_diffusion/Models_2024_10/simple_image_test/dataset', 'maximum_image_size': 1.048576, 'target_downsample_size': 0.589824, 'config_version': 2}, 'dataset_type': 'image', 'data_backend': <helpers.data_backend.local.LocalDataBackend object at 0x732528accc90>, 'instance_data_dir': '/home/alexds9/Documents/stable_diffusion/Models_2024_10/simple_image_test/dataset', 'metadata_backend': <helpers.metadata.backends.discovery.DiscoveryMetadataBackend object at 0x732528a8a690>}
(Rank: 0)  | Bucket     | Image Count (per-GPU)
------------------------------
2024-10-08 04:10:13,272 [ERROR] No images were discovered by the bucket manager in the dataset: all_dataset_768., traceback: Traceback (most recent call last):
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/training/trainer.py", line 605, in init_data_backend
    configure_multi_databackend(
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/data_backend/factory.py", line 823, in configure_multi_databackend
    raise Exception(
Exception: No images were discovered by the bucket manager in the dataset: all_dataset_768.

No images were discovered by the bucket manager in the dataset: all_dataset_768.
Traceback (most recent call last):
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/train.py", line 30, in <module>
    trainer.init_data_backend()
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/training/trainer.py", line 631, in init_data_backend
    raise e
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/training/trainer.py", line 605, in init_data_backend
    configure_multi_databackend(
  File "/home/alexds9/Documents/stable_diffusion/SimpleTuner/helpers/data_backend/factory.py", line 823, in configure_multi_databackend
    raise Exception(
Exception: No images were discovered by the bucket manager in the dataset: all_dataset_768.

@a-l-e-x-d-s-9
Copy link
Author

The dataset include a single image, and it uses two settings for resolution: 512px and 768px. Fot 768px, it seems to remove the file and assumes there is nothing to train.

@a-l-e-x-d-s-9
Copy link
Author

I don't use "--delete_problematic_images" or "--delete_unwanted_images". I cleared cache and all json files that script generated from dataset and output folder, and tried again - and it crashed again.
When I used: "crop": false - the image was discovered - and training was working, so it seems that crop option is deleting the image and causing the issue.

@bghira
Copy link
Owner

bghira commented Oct 8, 2024

clearly the system is haunted by poltergeist

@a-l-e-x-d-s-9
Copy link
Author

The image size: 852 × 480
Crop settings that cased the image to be deleted:

        "crop": true,
        "crop_style": "random",
        "crop_aspect": "random",
        "crop_aspect_buckets": [0.125, 0.250, 0.375, 0.500, 0.625, 0.750, 0.875, 1.0, 1.125, 1.250, 1.375, 1.500, 1.625, 1.750, 1.875, 2],
        "resolution": 768,
        "resolution_type": "pixel_area",
        "minimum_image_size": 115,
        "maximum_image_size": 1024,
        "target_downsample_size": 768,

@a-l-e-x-d-s-9
Copy link
Author

I removed 768px resolution from dataset settings and tried only with 512px, it crashed with similar error for 512px:

[ERROR] No images were discovered by the bucket manager in the dataset: all_dataset_512., traceback: Traceback

So crop option removing the image even for smaller resolution.

@a-l-e-x-d-s-9
Copy link
Author

a-l-e-x-d-s-9 commented Oct 8, 2024

@bghira
The problem was caused by having an integer in crop_aspect_buckets without a decimal point.
For example, you can reproduce the problem with: "crop_aspect_buckets": [1],
But if you change it to 1.0 - it will work: "crop_aspect_buckets": [1.0],

@bghira
Copy link
Owner

bghira commented Oct 8, 2024

thanks for figuring that part out. i looked into the file deletions and really every call to data_backend.delete(...) is wrapped by a check for delete_problematic_images etc so those might be lurking somewhere?

@bghira bghira closed this as completed Oct 8, 2024
@a-l-e-x-d-s-9
Copy link
Author

Thank you. I meant to say that the image was deleted from the list of recognized/used images, not from the file system itself. So there is no problem in this regard.

@bghira
Copy link
Owner

bghira commented Oct 8, 2024

oh, that is a relief

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants