Skip to content

TheDenk/cogvideox-controlnet

Repository files navigation

CogvideoX Controlnet Extention

stacked_ship_video.mp4

This repo contains the code for simple Controlnet module for CogvideoX model.

ComfyUI

ComfyUI-CogVideoXWrapper supports controlnet pipeline. See an example file.

Models

Supported models for 5B:

Supported models for 2B:

How to

Clone repo

git clone https://github.com/TheDenk/cogvideox-controlnet.git
cd cogvideox-controlnet

Create venv

python -m venv venv
source venv/bin/activate

Install requirements

pip install -r requirements.txt

Simple examples

Inference with cli

python -m inference.cli_demo \
    --video_path "resources/car.mp4" \
    --prompt "The camera follows behind red car. Car is surrounded by a panoramic view of the vast, azure ocean. Seagulls soar overhead, and in the distance, a lighthouse stands sentinel, its beam cutting through the twilight. The scene captures a perfect blend of adventure and serenity, with the car symbolizing freedom on the open sea." \
    --controlnet_type "canny" \
    --base_model_path THUDM/CogVideoX-5b \
    --controlnet_model_path TheDenk/cogvideox-5b-controlnet-canny-v1

Inference with Gradio

python -m inference.gradio_web_demo \
    --controlnet_type "canny" \
    --base_model_path THUDM/CogVideoX-5b \
    --controlnet_model_path TheDenk/cogvideox-5b-controlnet-canny-v1

Detailed inference

CUDA_VISIBLE_DEVICES=0 python -m inference.cli_demo \
    --video_path "resources/car.mp4" \
    --prompt "The camera follows behind red car. Car is surrounded by a panoramic view of the vast, azure ocean. Seagulls soar overhead, and in the distance, a lighthouse stands sentinel, its beam cutting through the twilight. The scene captures a perfect blend of adventure and serenity, with the car symbolizing freedom on the open sea." \
    --controlnet_type "canny" \
    --base_model_path THUDM/CogVideoX-5b \
    --controlnet_model_path TheDenk/cogvideox-5b-controlnet-canny-v1 \
    --num_inference_steps 50 \
    --guidance_scale 6.0 \
    --controlnet_weights 1.0 \
    --controlnet_guidance_start 0.0 \
    --controlnet_guidance_end 0.5 \
    --output_path "./output.mp4" \
    --seed 42

Training

The 2B model requires 48 GB VRAM (For example A6000) and 80 GB for 5B. But it depends on the number of transformer blocks which default is 8 (controlnet_transformer_num_layers parameter in the config).

Dataset

OpenVid-1M dataset was taken as the base variant. CSV files for the dataset you can find here.

Train script

For start training you need fill the config files accelerate_config_machine_single.yaml and finetune_single_rank.sh.
In accelerate_config_machine_single.yaml set parameternum_processes: 1 to your GPU count.
In finetune_single_rank.sh:

  1. Set MODEL_PATH for base CogVideoX model. Default is THUDM/CogVideoX-2b.
  2. Set CUDA_VISIBLE_DEVICES (Default is 0).
  3. (For OpenVid dataset) Set video_root_dir to directory with video files and csv_path.

Run taining

cd training
bash finetune_single_rank.sh

Acknowledgements

Original code and models CogVideoX.

Contacts

Issues should be raised directly in the repository. For professional support and recommendations please [email protected].

About

Simple Controlnet module for CogvideoX model.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published