Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Textual Inversion code giving error #16

Open
anandvarrier opened this issue Jul 24, 2023 · 13 comments
Open

Textual Inversion code giving error #16

anandvarrier opened this issue Jul 24, 2023 · 13 comments
Labels
bug Something isn't working

Comments

@anandvarrier
Copy link

Description

Hi,
@lukemelas, great work. Wanted something like this for a while. Your model's accuracy is better than earlier versions of 2D to 3D models.

I am running all my code on Google Collab(free version). I am following the Readme, however, I encountered the following error at the Text Inversion step. I had to edit few lines to make it run but in no vain. @lukemelas or anyone could you kindly help me out in setting up the code?

I am uploading 2 screenshots for reference.

Thank you
Screenshot (32)
Screenshot (33)

Steps to Reproduce

.

Expected Behavior

I expected the given code to run as per readme document.

Environment

Google Collab, Python 3.10

@anandvarrier anandvarrier added the bug Something isn't working label Jul 24, 2023
@lukemelas
Copy link
Owner

Hello!

Perhaps I missed it in your post, but I don't see the error message. Can you provide the error message?

Luke

@anandvarrier
Copy link
Author

Yes I missed out on that.
Error message:
text_config_dict is provided which will be used to initialize CLIPTextConfig. The value text_config["id2label"] will be overriden.
text_config_dict is provided which will be used to initialize CLIPTextConfig. The value text_config["bos_token_id"] will be overriden.
text_config_dict is provided which will be used to initialize CLIPTextConfig. The value text_config["eos_token_id"] will be overriden.
2023-07-24 11:20:51.936670: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "/content/realfusion/textual-inversion/textual_inversion.py", line 925, in
main()
File "/content/realfusion/textual-inversion/textual_inversion.py", line 574, in main
accelerator = Accelerator(
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 369, in init
trackers = filter_trackers(log_with, self.logging_dir)
File "/usr/local/lib/python3.10/dist-packages/accelerate/tracking.py", line 725, in filter_trackers
raise ValueError(
ValueError: Logging with tensorboard requires a logging_dir to be passed in.

@anandvarrier
Copy link
Author

anandvarrier commented Jul 25, 2023

Hi @lukemelas,
When I am running the code below:

from transformers.pipelines.base import Pipeline
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")

! export DATA_DIR="/content/realfusion/examples/natural-images/banana_1"
! export OUTPUT_DIR="/content/realfusion/examples/Output_Folder"

!python /content/realfusion/textual-inversion/textual_inversion.py
--pretrained_model_name_or_path=$MODEL_NAME
--train_data_dir= DATA_DIR
--learnable_property="object"
--placeholder_token="banana"
--initializer_token="banana"
--resolution=512
--train_batch_size=1
--gradient_accumulation_steps=4
--max_train_steps=3000
--learning_rate=5.0e-04 --scale_lr
--lr_scheduler="constant"
--lr_warmup_steps=0
--output_dir=OUTPUT_DIR
--use_augmentations

I am getting the following error:
text_config_dictis provided which will be used to initializeCLIPTextConfig. The value text_config["id2label"]will be overriden.text_config_dictis provided which will be used to initializeCLIPTextConfig. The value text_config["bos_token_id"]will be overriden.text_config_dictis provided which will be used to initializeCLIPTextConfig. The value text_config["eos_token_id"]` will be overriden.
2023-07-25 10:34:47.636451: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
usage: textual_inversion.py
[-h]
[--save_steps SAVE_STEPS]
[--only_save_embeds]
--pretrained_model_name_or_path
PRETRAINED_MODEL_NAME_OR_PATH
[--revision REVISION]
[--tokenizer_name TOKENIZER_NAME]
--train_data_dir
TRAIN_DATA_DIR
--placeholder_token
PLACEHOLDER_TOKEN
--initializer_token
INITIALIZER_TOKEN
[--learnable_property LEARNABLE_PROPERTY]
[--repeats REPEATS]
[--output_dir OUTPUT_DIR]
[--seed SEED]
[--resolution RESOLUTION]
[--center_crop]
[--train_batch_size TRAIN_BATCH_SIZE]
[--num_train_epochs NUM_TRAIN_EPOCHS]
[--max_train_steps MAX_TRAIN_STEPS]
[--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
[--gradient_checkpointing]
[--learning_rate LEARNING_RATE]
[--scale_lr]
[--lr_scheduler LR_SCHEDULER]
[--lr_warmup_steps LR_WARMUP_STEPS]
[--dataloader_num_workers DATALOADER_NUM_WORKERS]
[--adam_beta1 ADAM_BETA1]
[--adam_beta2 ADAM_BETA2]
[--adam_weight_decay ADAM_WEIGHT_DECAY]
[--adam_epsilon ADAM_EPSILON]
[--push_to_hub]
[--hub_token HUB_TOKEN]
[--hub_model_id HUB_MODEL_ID]
[--logging_dir LOGGING_DIR]
[--mixed_precision {no,fp16,bf16}]
[--allow_tf32]
[--report_to REPORT_TO]
[--validation_prompt VALIDATION_PROMPT]
[--num_validation_images NUM_VALIDATION_IMAGES]
[--validation_steps VALIDATION_STEPS]
[--validation_epochs VALIDATION_EPOCHS]
[--local_rank LOCAL_RANK]
[--checkpointing_steps CHECKPOINTING_STEPS]
[--checkpoints_total_limit CHECKPOINTS_TOTAL_LIMIT]
[--resume_from_checkpoint RESUME_FROM_CHECKPOINT]
[--enable_xformers_memory_efficient_attention]
[--use_augmentations]
textual_inversion.py: error: unrecognized arguments: DATA_DIR

Could you help?

@lukemelas
Copy link
Owner

Are you using $'s with your environment variables? You have to pass them as $DATA_DIR if you export them using export DATA_DIR=...

@anandvarrier
Copy link
Author

Thank you @lukemelas.
As you said, I corrected the above error. But now I am getting a new error.
2023-07-26 10:27:32.459138: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "/content/realfusion/textual-inversion/textual_inversion.py", line 925, in
main()
File "/content/realfusion/textual-inversion/textual_inversion.py", line 574, in main
accelerator = Accelerator(
TypeError: Accelerator.init() got an unexpected keyword argument 'logging_dir'

I appreciate your assistance @lukemelas.
Thank you

@lukemelas
Copy link
Owner

lukemelas commented Jul 26, 2023

Hello, this is because accelerate had a breaking change in a recent update. You can either downgrade accelerate or change logging_dir to match the new api (see https://huggingface.co/docs/accelerate/v0.21.0/en/usage_guides/tracking#integrated-trackers)

Hope this helps!

@anandvarrier
Copy link
Author

Hello @lukemelas ,
I did not get exactly how to do the above steps.
How do I downgrade accelerate to match the logging_dir?

I checked my accelerate version. It is :
Name: accelerate
Version: 0.21.0

  1. How do I downgrade this version?

  2. How do I change the logging_dir to match the new api? I tried reading the document you referred above, could not understand much.

Thank you for your replies.

@anandvarrier
Copy link
Author

anandvarrier commented Jul 27, 2023

Hello @lukemelas,

  1. As you mentioned I downgraded accelerate version to 18.0 - Gave version error. The code asked me to use version <=20.3.

  2. I then upgraded to 20.3 version, it still gave the same error i.e TypeError: Accelerator.init() got an unexpected keyword argument 'logging_dir'.
    This problem is persisting even when I am using 21.0 or 20.3 version.

I tried resolving this error by doing the following:
A) I commented line 572 i.e logging_dir = os.path.join(args.output_dir, args.logging_dir)
B) I then added an argument on line 574 i.e accelerator_project_config = ProjectConfiguration(total_limit=args.checkpoints_total_limit, logging_dir=args.logging_dir)
Earlier it was just- accelerator_project_config = ProjectConfiguration(total_limit=args.checkpoints_total_limit)

After running this, the model went into training mode, however, now there is new error as follows:
07/27/2023 06:48:52 - INFO - main - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: no

{'timestep_spacing', 'prediction_type', 'thresholding', 'clip_sample_range', 'dynamic_thresholding_ratio', 'sample_max_value', 'variance_type'} was not found in config. Values will be initialized to default values.
{'force_upcast', 'scaling_factor'} was not found in config. Values will be initialized to default values.
{'cross_attention_norm', 'time_embedding_dim', 'use_linear_projection', 'addition_embed_type', 'conv_out_kernel', 'resnet_out_scale_factor', 'encoder_hid_dim', 'dual_cross_attention', 'transformer_layers_per_block', 'resnet_time_scale_shift', 'num_class_embeds', 'time_embedding_act_fn', 'encoder_hid_dim_type', 'time_embedding_type', 'class_embed_type', 'addition_time_embed_dim', 'conv_in_kernel', 'mid_block_only_cross_attention', 'projection_class_embeddings_input_dim', 'time_cond_proj_dim', 'only_cross_attention', 'addition_embed_type_num_heads', 'class_embeddings_concat', 'upcast_attention', 'mid_block_type', 'num_attention_heads', 'timestep_post_act', 'resnet_skip_time_act'} was not found in config. Values will be initialized to default values.
07/27/2023 06:49:11 - INFO - main - ***** Running training *****
07/27/2023 06:49:11 - INFO - main - Num examples = 500
07/27/2023 06:49:11 - INFO - main - Num Epochs = 24
07/27/2023 06:49:11 - INFO - main - Instantaneous batch size per device = 1
07/27/2023 06:49:11 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 4
07/27/2023 06:49:11 - INFO - main - Gradient Accumulation steps = 4
07/27/2023 06:49:11 - INFO - main - Total optimization steps = 3000
Steps: 0% 1/3000 [00:11<9:55:00, 11.90s/it, loss=0.00327, lr=0.002]Traceback (most recent call last):
File "/content/realfusion/textual-inversion/textual_inversion.py", line 925, in
main()
File "/content/realfusion/textual-inversion/textual_inversion.py", line 823, in main
for step, batch in enumerate(train_dataloader):
File "/usr/local/lib/python3.10/dist-packages/accelerate/data_loader.py", line 394, in iter
next_batch = next(dataloader_iter)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 633, in next
data = self._next_data()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 677, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/realfusion/textual-inversion/textual_inversion.py", line 514, in getitem
image = Image.open(self.image_paths[i % self.num_images])
File "/usr/local/lib/python3.10/dist-packages/PIL/Image.py", line 3227, in open
fp = builtins.open(filename, "rb")
IsADirectoryError: [Errno 21] Is a directory: '/content/realfusion/examples/natural-images/sofa/.ipynb_checkpoints'
Steps: 0% 1/3000 [00:12<10:08:58, 12.18s/it, loss=0.00327, lr=0.002]

How do I go about this error and was the addition on line 572 the correct way to go ahead?

  1. I then read the link you provided. As per my understanding, tensorboard is being used in the log_with parameter, accelerator.init_trackers is given correctly, accelerator.log is also given correctly, accelerator.end_training() is also given correctly.

  2. However, when it comes to creating a learned_embeds.bin file, it is not getting created. I am not understanding why it is so.

  3. Also in line 570 of textual_inversion.py: logging_dir = os.path.join(args.output_dir, args.logging_dir)
    Nowhere, in the github documented code we are giving the logging_dir path like we are passing the output_dir and data_dir path, how will line 570 then concatenate logging_dir if not mentioned by us? Is my understanding right. Kindly correct me if I am wrong.

  4. Also I did not get the part when you mentioned - change logging_dir to match the new api.

  5. How do I get past this step?

Kindly assist.

Thank you

@anandvarrier
Copy link
Author

Hi @lukemelas,

I am getting this error for the last 3 days:

  1. If I just run the code from your github repo as it is, google colab gives the below error:
    2023-07-31 05:50:26.676773: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
    Traceback (most recent call last):
    File "/content/realfusion/textual-inversion/textual_inversion.py", line 926, in
    main()
    File "/content/realfusion/textual-inversion/textual_inversion.py", line 574, in main
    accelerator = Accelerator(
    TypeError: Accelerator.init() got an unexpected keyword argument 'logging_dir'

  2. When I change the code like the way I have done below:
    def main():
    args = parse_args()
    #logging_dir = os.path.join(args.output_dir, args.logging_dir)

    accelerator_project_config = ProjectConfiguration(total_limit=args.checkpoints_total_limit, logging_dir=os.path.join(args.output_dir, args.logging_dir))

    accelerator = Accelerator(
    gradient_accumulation_steps=args.gradient_accumulation_steps,
    mixed_precision=args.mixed_precision,
    log_with=args.report_to,
    #logging_dir=logging_dir,
    project_config=accelerator_project_config,
    )

It gives another error that is:
2023-07-31 06:16:03.209058: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
07/31/2023 06:16:06 - INFO - main - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: no

Traceback (most recent call last):
File "/content/realfusion/textual-inversion/textual_inversion.py", line 926, in
main()
File "/content/realfusion/textual-inversion/textual_inversion.py", line 621, in main
os.makedirs(args.output_dir, exist_ok=True)
File "/usr/lib/python3.10/os.py", line 225, in makedirs
mkdir(name, mode)
FileNotFoundError: [Errno 2] No such file or directory: ''

  1. Since I was not able to solve these 2 issues, I went ahead to run 'Side note: Textual Inversion Initialization' code snippet. It generated tokens. Then I ran python main.py --0. It took some time to execute and towards the end it gave a message saying the following:
    Traceback (most recent call last):
    File "/content/realfusion/main.py", line 163, in
    main()
    File "/content/realfusion/main.py", line 102, in main
    add_tokens_to_model_from_path(
    File "/content/realfusion/sd/utils.py", line 40, in add_tokens_to_model_from_path
    add_tokens_to_model(learned_embeds, text_encoder, tokenizer, override_token)
    File "/content/realfusion/sd/utils.py", line 15, in add_tokens_to_model
    embedding = embedding.to(text_encoder.get_input_embeddings().weight.dtype)
    AttributeError: 'tuple' object has no attribute 'get_input_embeddings'

I am not able to understand how to exactly solve these 2 issues. I tried understanding your code, but nothing fruitful.

@lukemelas I would really need your assistance in these questions or anyone who has implemented his code from the repo.

Thank you

Warm regards,
Anand Varrier

@lilyuam
Copy link

lilyuam commented Aug 10, 2023

Hi @anandvarrier and @lukemelas . I am getting the same error as you with the accelerate package and the logging directory error. Did you manage to get it to work?

@lilyuam
Copy link

lilyuam commented Aug 11, 2023

@anandvarrier For your point 3. I think you need to upgrade diffusers to at least 0.15.0 . If you run pip install diffusers==0.15.0 then it should work properly and resolve the error.

@anandvarrier
Copy link
Author

Hi @lilyuam,
No I was not able to pass through that error. I will try your solution. Were you able to go ahead and run the code?
Thank you for the response.

@ScarletGospel
Copy link

@anandvarrier @lilyuam @lukemelas
I got the solution for this, delete logging_dir=logging_dir in line 578, textual_inversion.py, and add this as a parameter in line 572, the code should be like this:
def main():
args = parse_args()
logging_dir = os.path.join(args.output_dir, args.logging_dir)

accelerator_project_config = ProjectConfiguration(total_limit=args.checkpoints_total_limit,logging_dir=logging_dir)

accelerator = Accelerator(
    gradient_accumulation_steps=args.gradient_accumulation_steps,
    mixed_precision=args.mixed_precision,
    log_with=args.report_to,
    project_config=accelerator_project_config,
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants