Strange issues using local model #793

bidilun · 2024-08-17T10:57:49Z

bidilun
Aug 17, 2024

I'm on a machine where i do not have often access to internet and there are strange behaviors when I try to run in local. Hardware is 3080

At some point I had this error:

2024-08-17 10:38:36,223 [INFO] (__main__) Load tokenizers
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'CLIPTokenizer'. 
The class this function is called from is 'T5Tokenizer'.
2024-08-17 10:38:36,535 [WARNING] (helpers.training.text_encoding) Could not load secondary tokenizer (T5 XXL). Cannot continue: not a string
not a string

But now it is not doing it anymore but I have not understood what I changed.

Now the Pre-computing null embedding is extremely slow but i pases it
more importantly Initialize text embed pre-computation > 1000s/it

so it would take more than 160 days to complete!!!!!!

Here is the log:


/home/bidilun/github/SimpleTuner/.venv/lib/python3.11/site-packages/nvidia/nvjitlink/lib
DEBUG_EXTRA_ARGS not set, defaulting to empty.
2024-08-17 11:13:16,164 [WARNING] (ArgsParser) The VAE model madebyollin/sdxl-vae-fp16-fix is not compatible. Please use a compatible VAE to eliminate this warning. The baked-in VAE will be used, instead.
2024-08-17 11:13:16,164 [INFO] (ArgsParser) VAE Model: models/FLUX.1-dev
2024-08-17 11:13:16,164 [INFO] (ArgsParser) Default VAE Cache location: 
2024-08-17 11:13:16,164 [INFO] (ArgsParser) Text Cache location: cache
2024-08-17 11:13:16,164 [WARNING] (ArgsParser) Updating T5 XXL tokeniser max length to 512 for Flux.
2024-08-17 11:13:16,164 [WARNING] (ArgsParser) Gradient accumulation steps are enabled, but gradient precision is set to 'unmodified'. This may lead to numeric instability. Consider disabling gradient accumulation steps. Continuing in 10 seconds..
2024-08-17 11:13:26,184 [ERROR] (__main__) Failed to log into Hugging Face Hub: Token is required (`token=True`), but no token found. You need to provide a token or be logged in to Hugging Face with `huggingface-cli login` or `huggingface_hub.login`. See https://huggingface.co/settings/tokens.
2024-08-17 11:13:26,185 [INFO] (__main__) Load tokenizers
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
2024-08-17 11:13:26,412 [INFO] (helpers.training.text_encoding) Loading OpenAI CLIP-L text encoder from models/FLUX.1-dev/text_encoder..
2024-08-17 11:13:26,434 [INFO] (helpers.training.text_encoding) Loading T5 XXL v1.1 text encoder from models/FLUX.1-dev/text_encoder_2..
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  9.08it/s]
2024-08-17 11:13:28,910 [INFO] (__main__) Load VAE: models/FLUX.1-dev
2024-08-17 11:13:28,976 [INFO] (__main__) Moving text encoder to GPU.
2024-08-17 11:13:28,977 [INFO] (__main__) Moving text encoder 2 to GPU.
2024-08-17 11:13:28,980 [INFO] (__main__) Loading VAE onto accelerator, converting from torch.float32 to torch.bfloat16
2024-08-17 11:13:29,002 [INFO] (DataBackendFactory) Loading data backend config from config/multidatabackend.json
2024-08-17 11:13:29,002 [INFO] (DataBackendFactory) Configuring text embed backend: text-embeds
Loading pipeline components...:   0%|                                                                                                                                                                  | 0/5 [00:00<?, ?it/s]Loaded scheduler as FlowMatchEulerDiscreteScheduler from `scheduler` subfolder of models/FLUX.1-dev.
Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 1317.97it/s]
2024-08-17 11:13:29,009 [INFO] (TextEmbeddingCache) (Rank: 0) (id=text-embeds) Listing all text embed cache entries
2024-08-17 11:13:29,009 [INFO] (DataBackendFactory) Pre-computing null embedding
2024-08-17 11:32:47,020 [INFO] (DataBackendFactory) Completed loading text embed services.                                   
2024-08-17 11:32:47,021 [INFO] (DataBackendFactory) Configuring data backend: pseudo-camera-10k-flux
2024-08-17 11:32:47,021 [INFO] (DataBackendFactory) (id=pseudo-camera-10k-flux) Loading bucket manager.                      
2024-08-17 11:32:47,022 [INFO] (JsonMetadataBackend) Checking for cache file: datasets/pseudo-camera-10k/aspect_ratio_bucket_indices.json
2024-08-17 11:32:47,022 [WARNING] (JsonMetadataBackend) No cache file found, creating new one.
2024-08-17 11:32:47,022 [INFO] (DataBackendFactory) (id=pseudo-camera-10k-flux) Refreshing aspect buckets on main process.
2024-08-17 11:32:47,022 [INFO] (BaseMetadataBackend) Discovering new files...
2024-08-17 11:33:03,844 [INFO] (BaseMetadataBackend) Compressed 0 existing files from 0.
2024-08-17 11:36:11,472 [INFO] (BaseMetadataBackend) Image processing statistics: {'total_processed': 12926, 'skipped': {'already_exists': 0, 'metadata_missing': 0, 'not_found': 0, 'too_small': 0, 'other': 0}}
2024-08-17 11:36:11,531 [INFO] (BaseMetadataBackend) Enforcing minimum image size of 512. This could take a while for very-large datasets.
2024-08-17 11:36:11,553 [INFO] (BaseMetadataBackend) Completed aspect bucket update.                                                                                                                                         
2024-08-17 11:36:11,568 [INFO] (DataBackendFactory) Configured backend: {'id': 'pseudo-camera-10k-flux', 'config': {'crop': True, 'crop_aspect': 'square', 'crop_aspect_buckets': None, 'crop_style': 'center', 'disable_validation': False, 'resolution': 512, 'resolution_type': 'pixel', 'caption_strategy': 'filename', 'instance_data_dir': 'datasets/pseudo-camera-10k', 'maximum_image_size': 512, 'target_downsample_size': 512, 'config_version': 2}, 'dataset_type': 'image', 'data_backend': <helpers.data_backend.local.LocalDataBackend object at 0x7ccf6beccc50>, 'instance_data_dir': 'datasets/pseudo-camera-10k', 'metadata_backend': <helpers.metadata.backends.json.JsonMetadataBackend object at 0x7ccf6bece1d0>}
(Rank: 0)  | Bucket     | Image Count (per-GPU)
------------------------------
(Rank: 0)  | 1.0        | 13926       
2024-08-17 11:36:11,569 [INFO] (DataBackendFactory) (id=pseudo-camera-10k-flux) Collecting captions.
2024-08-17 11:36:11,611 [INFO] (DataBackendFactory) (id=pseudo-camera-10k-flux) Initialise text embed pre-computation using the filename caption strategy. We have 14102 captions to process.
Write embeds to disk:   0%|                                                         | 3/14102 [57:29<4499:44:06, 1148.95s/it]

Processing prompts:   0%|                                                           | 3/14102 [57:29<4499:29:35, 1148.89s/it]

the conf.env is as follows:

RESUME_CHECKPOINT='latest'
DATALOADER_CONFIG='config/multidatabackend.json'
ASPECT_BUCKET_ROUNDING='2'
TRAINING_SEED='42'
USE_EMA='false'
USE_XFORMERS='false'
MINIMUM_RESOLUTION='0'
OUTPUT_DIR='output/models'
USE_DORA='false'
USE_BITFIT='false'
PUSH_TO_HUB='false'
PUSH_CHECKPOINTS='false'
MAX_NUM_STEPS='1000'
NUM_EPOCHS='0'
CHECKPOINTING_STEPS='50'
CHECKPOINTING_LIMIT='5'
DEBUG_EXTRA_ARGS=''
MODEL_TYPE='lora'
MODEL_NAME='models/FLUX.1-dev'
FLUX='true'
KOLORS='false'
STABLE_DIFFUSION_3='false'
STABLE_DIFFUSION_LEGACY='false'
FLUX_LORA_TARGET='all'
TRAIN_BATCH_SIZE='1'
USE_GRADIENT_CHECKPOINTING='true'
GRADIENT_ACCUMULATION_STEPS='2'
CAPTION_DROPOUT_PROBABILITY='0.1'
RESOLUTION_TYPE='area'
RESOLUTION='1.0'
VALIDATION_SEED='42'
VALIDATION_STEPS='50'
VALIDATION_RESOLUTION='1024x1024'
VALIDATION_GUIDANCE='7.5'
VALIDATION_GUIDANCE_RESCALE='0.0'
VALIDATION_NUM_INFERENCE_STEPS='20'
VALIDATION_PROMPT='A photo-realistic image of a cat'
ALLOW_TF32='false'
MIXED_PRECISION='bf16'
OPTIMIZER='adamw_bf16'
LEARNING_RATE='8e-5'
LR_SCHEDULE='polynomial'
LR_WARMUP_STEPS='100'
ACCELERATE_EXTRA_ARGS=''
TRAINING_NUM_PROCESSES='1'
TRAINING_NUM_MACHINES='1'
VALIDATION_TORCH_COMPILE='false'
TRAINER_DYNAMO_BACKEND='no'
TRAINER_EXTRA_ARGS='--lora_rank=64 --lr_end=1e-8 --compress_disk_cache'

and the multidatabackend is here

   {
      "id": "pseudo-camera-10k-flux",
      "type": "local",
      "crop": true,
      "crop_aspect": "square",
      "crop_style": "center",
      "resolution": 512,
      "minimum_image_size": 512,
      "maximum_image_size": 512,
      "target_downsample_size": 512,
      "resolution_type": "pixel",
      "cache_dir_vae": "cache/vae/flux/pseudo-camera-10k",
      "instance_data_dir": "datasets/pseudo-camera-10k",
      "disabled": false,
      "skip_file_discovery": "",
      "caption_strategy": "filename",
      "metadata_backend": "json"
    },
    {
      "id": "text-embeds",
      "type": "local",
      "dataset_type": "text_embeds",
      "default": true,
      "cache_dir": "cache/text/flux/pseudo-camera-10k",
      "disabled": false,
      "write_batch_size": 128
    }
  ]

mhirki · 2024-08-17T11:16:09Z

mhirki
Aug 17, 2024

It's probably running on CPU instead of GPU.

Also, RTX 3080 doesn't have enough VRAM to train a Flux LoRA without quantization. Most RTX 3080 cards are either 10 GB or 12 GB. You would need int4 quantization to squeeze the model down to that size. Or maybe wait and see if NF4 quantization support eventually lands here.

0 replies

bidilun · 2024-08-17T11:23:03Z

bidilun
Aug 17, 2024
Author

sorry I put the wrong number it is a 3090 with 24GB so normally OK for flux and I have a second older Nvidia card with 8Gb on the machine but that should not affects things

0 replies

mhirki · 2024-08-17T11:46:36Z

mhirki
Aug 17, 2024

So this is a multi-GPU machine? That could also be causing these issues if it's trying to use both GPUs and the slower GPU is holding back everything. You can run accelerate config to tell it to use only one GPU.

And yes, RTX 3090 is better but you still need either fp8 or int8 quantization.

0 replies

mhirki · 2024-08-17T11:54:01Z

mhirki
Aug 17, 2024

And since you don't always have internet access, you should probably run wandb offline. Or alternatively, just switch to tensorboard which works locally.

0 replies

bidilun · 2024-08-17T15:55:09Z

bidilun
Aug 17, 2024
Author

not using wandb or tensorboard but maybe I will try tensorboard if I manage to pass this embed pre-computation there is only using one core 100% and not doing much on the GPU

0 replies

mhirki · 2024-08-17T16:33:34Z

mhirki
Aug 17, 2024

One CPU core at 100% is normal. GPU should be busy when pre-computing the text embeds. There's probably something wrong with your system specifically.

SimpleTuner is using CUDA 12.4 so the minimum Linux driver version is 550.54.14.
https://docs.nvidia.com/cuda/archive/12.4.0/cuda-toolkit-release-notes/

0 replies

bidilun · 2024-08-17T20:36:45Z

bidilun
Aug 17, 2024
Author

ok I'm on 555.58.02 and CUDA Version: 12.5

Made different tests with --base_model_precision=int8-quanto
but no effects...
For some reason it is not using the GPU but as it dose not say anything it is hard to guess what is the problem...

what i do not understand is that it seems to be happily moving the text encoders on the GPU:
2024-08-17 22:16:43,899 [INFO] (__main__) Load VAE: black-forest-labs/FLUX.1-dev 2024-08-17 22:16:44,198 [INFO] (__main__) Moving text encoder to GPU. 2024-08-17 22:16:44,199 [INFO] (__main__) Moving text encoder 2 to GPU. 2024-08-17 22:16:44,202 [INFO] (__main__) Loading VAE onto accelerator, converting from torch.float32 to torch.bfloat16

but then the GPU is not used
and looking at the vram during the loading there is no change! it is not loading anything

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange issues using local model #793

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Strange issues using local model #793

bidilun Aug 17, 2024

Replies: 7 comments

mhirki Aug 17, 2024

bidilun Aug 17, 2024 Author

mhirki Aug 17, 2024

mhirki Aug 17, 2024

bidilun Aug 17, 2024 Author

mhirki Aug 17, 2024

bidilun Aug 17, 2024 Author

bidilun
Aug 17, 2024

mhirki
Aug 17, 2024

bidilun
Aug 17, 2024
Author

mhirki
Aug 17, 2024

mhirki
Aug 17, 2024

bidilun
Aug 17, 2024
Author

mhirki
Aug 17, 2024

bidilun
Aug 17, 2024
Author