Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making the frame size bigger #4

Open
Don-Chad opened this issue Jul 29, 2023 · 10 comments
Open

Making the frame size bigger #4

Don-Chad opened this issue Jul 29, 2023 · 10 comments

Comments

@Don-Chad
Copy link

It works!

Thanks for sharing this.

Any idea how we could change the video lenght to something like 32 or 48? Longer motion would be great. At the moment it seems to be capped at 24.

It would be fine to start over, instead of using with the existing motion data set.

Error I am getting now is:

File "g:\content\animatediff\animatediff\models\motion_module.py", line 244, in forward
x = x + self.pe[:, :x.size(1)]
RuntimeError: The size of tensor a (32) must match the size of tensor b (24) at non-singleton dimension 1

@tumurzakov
Copy link
Owner

tumurzakov commented Jul 29, 2023

I cherry picked awesome idea from https://github.com/dajes/AnimateDiff. It in devel branch. Still working with it.

PR: guoyww/AnimateDiff#25

Changing pe size needs to retrain model. Too expensive for me.

@Don-Chad
Copy link
Author

Don-Chad commented Jul 31, 2023

Yes this combination would be a perfect approach! I would be happy to do new training's and provide the GPU power for it. We could also have smaller models initially.

Would you be able to make a model which does 52 motion frames? Would be very dope to have longer video's! @tumurzakov

@tumurzakov
Copy link
Owner

tumurzakov commented Aug 2, 2023

@Don-Chad I increased to 48 (24*2) by doubling pe tensors from original module and trained 1000 steps. It works well. It better than train from stretch.

Main problem not in gpu power but in dataset.

@Don-Chad
Copy link
Author

Don-Chad commented Aug 3, 2023

Wow! Would you please want to share the pipeline_animation which is doubled? (sorry I cannot find how to do this..)

I would love to work on the dataset. I have a lot of good varied content with labels. Happy to share a new motion module.

@tumurzakov
Copy link
Owner

@Don-Chad very simple. Code in devel branch.

@tumurzakov
Copy link
Owner

tumurzakov commented Aug 3, 2023

Trained 96 frames on A100 for 1000 steps (20 minutes). It took 21GB VRAM. It seems on A100 can be trained up to 184 frames. Infer on A100 took 20GB VRAM.
96frames-1000

But on that frame count could be problems with pe. In AnimateDiff pe got from NLP transformer. Possibly we could try ViT positional encodings there to encode longer videos

just for fun, 48 frames on 96 frame model
48on96model

@Don-Chad
Copy link
Author

Don-Chad commented Aug 4, 2023

Thanks kindly for sharing! Just one line makes a difference :-)

Good to see it works. Let me give it a try.

@Don-Chad
Copy link
Author

Don-Chad commented Aug 7, 2023

@tumurzakov What difference do you think ViT can make in this regard for PE?

@ezra-ch
Copy link

ezra-ch commented Sep 1, 2023

i cant seem to use the motion_module_pe_multiplier feature

motion_module: models\Motion_Module\mmv1.5.pth
output_dir: models\Motion_Module\fff2
train_data:
  video_path: data/fff2.mp4
  prompt: girl
  n_sample_frames: 48
  width: 512
  height: 512
  sample_start_idx: 0
  sample_frame_rate: 1 #rate of sampler (how many frames it skips like sample_frame_rate 4 would make the loop +4 frames in front)
validation_data:
  prompts:
  - girl 
  video_length: 48
  temporal_context: 200
  width: 512
  height: 512
  num_inference_steps: 20
  guidance_scale: 5
  use_inv_latent: true
  num_inv_steps: 40
learning_rate: 3.0e-05
train_batch_size: 1
max_train_steps: 1000
checkpointing_steps: 100
validation_steps: 100
train_whole_module: false
trainable_modules:
- to_q
seed: 34
mixed_precision: fp16
use_8bit_adam: false
gradient_checkpointing: true
enable_xformers_memory_efficient_attention: true
motion_module_pe_multiplier: 2
  File "G:\tuneavid\AnimateDiff\train.py", line 417, in <module>
    main(**OmegaConf.load(args.config))
  File "G:\tuneavid\AnimateDiff\train.py", line 133, in main
    missing, unexpected = unet.load_state_dict(motion_module_state_dict, strict=False)
  File "G:\anaconda3\envs\tuneavid\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNet3DConditionModel:
        size mismatch for down_blocks.0.motion_modules.0.temporal_transformer.transformer_blocks.0.attention_blocks.0.pos_encoder.pe: copying a param with shape torch.Size([1, 48, 320]) from checkpoint, the shape in current model is torch.Size([1, 24, 320]).
        ```

@tumurzakov
Copy link
Owner

Here is my config for 264 frames

pretrained_model_path: /content/animatediff/models/StableDiffusion/
motion_module: /content/animatediff/models/Motion_Module/mm_sd_v15.ckpt
motion_module_pe_multiplier: 11
inference_config_path: /content/drive/MyDrive/AI/video/videos/couplet2/train-full-256/valid.yaml
start_global_step: 0
output_dir: /content/drive/MyDrive/AI/video/videos/couplet2/train-full-256
dataset_class: FramesDataset
train_data:
  samples_dir: /content/drive/MyDrive/AI/video/videos/couplet2/dataset256
  prompt_map_path: /content/drive/MyDrive/AI/video/videos/couplet2/prompt_map.json
  video_length: 264
  width: 480
  height: 272
validation_data:
  prompts:
  - standing face girl
  video_length: 264
  width: 480
  height: 272
  temporal_context: 264
  num_inference_steps: 10
  guidance_scale: 12.5
  use_inv_latent: true
  num_inv_steps: 50
learning_rate: 3.0e-05
train_batch_size: 1
max_train_steps: 2000
checkpointing_steps: 100
validation_steps: 10000
train_whole_module: true
trainable_modules:
- to_q
seed: 33
mixed_precision: fp16
use_8bit_adam: false
gradient_checkpointing: true
enable_xformers_memory_efficient_attention: true

take a look at train_data section

train_data:
  samples_dir: /content/drive/MyDrive/AI/video/videos/couplet2/dataset256
  prompt_map_path: /content/drive/MyDrive/AI/video/videos/couplet2/prompt_map.json
  video_length: 264 <---- missed
  width: 480
  height: 272

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants