Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat add & bugfix #642

Merged
merged 26 commits into from
Jan 8, 2025
Merged

feat add & bugfix #642

merged 26 commits into from
Jan 8, 2025

Conversation

OleehyO
Copy link
Collaborator

@OleehyO OleehyO commented Jan 6, 2025

  1. Added prompt embedding pre-caching in the dataset to reduce memory usage during training, and adapted it for the T2V and I2V models.
  2. Modified training details for the T2V and I2V models.
  3. Deleted the computation graph before evaluation and enabled offloading to further reduce memory usage.
  4. Fixed some minor issues in the SAT-related code.
  5. Adjusted default parameters in certain scripts and updated the README.

OleehyO and others added 15 commits January 2, 2025 03:07
- Add validation check to ensure number of frames is multiple of 8
- Add format validation for train_resolution string (frames x height x width)
- Add caching for prompt embeddings
- Store cached files using safetensors format
- Add cache directory structure under data_root/cache
- Optimize memory usage by moving tensors to CPU after caching
- Add debug logging for cache hits
- Add info logging for cache writes

The caching system helps reduce redundant computation and memory usage during training by:
1. Caching prompt embeddings based on prompt text hash
2. Caching encoded video latents based on video filename
3. Moving tensors to CPU after caching to free GPU memory
…lution

- Add validation to ensure (frames - 1) is multiple of 8
- Add specific resolution check (480x720) for cogvideox-5b models
- Add error handling for invalid resolution format
- Add support for cached prompt embeddings in dataset
- Fix bug where first frame wasn't properly padded in latent space
This change enables caching of prompt embeddings in the CogVideoX text-to-video
LoRA trainer, which can improve training efficiency by avoiding redundant text
encoding operations.
Add docstring to train_frames field in State schema to explicitly indicate
that it includes one image padding frame
- Add pipe.remove_all_hooks() after validation to prevent memory leaks
- Clean up validation pipeline properly to avoid potential issues in subsequent training steps
- Add text embedding support in dataset collation
- Pad 2 random noise frames at the beginning of latent space during training
@OleehyO OleehyO requested a review from zRzRzRzRzRzRzR January 6, 2025 10:54
Copy link
Member

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today I will proceed with the operation, thank you.

zRzRzRzRzRzRzR and others added 11 commits January 7, 2025 13:16
When training i2v models without specifying image_column, automatically extract
and use first frames from training videos as conditioning images. This includes:

- Add load_images_from_videos() utility function to extract and cache first frames
- Update BaseI2VDataset to support auto-extraction when image_column is None
- Add validation and warning message in Args schema for i2v without image_column

The first frames are extracted once and cached to avoid repeated video loading.
…ache

Before precomputing the latent cache and text embeddings, cast the VAE and
text encoder to the target training dtype (fp16/bf16) instead of keeping them
in fp32. This reduces memory usage during the precomputation phase.

The change occurs in prepare_dataset() where the models are moved to device
and cast to weight_dtype before being used to generate the cache.
Add a table in README files showing hardware requirements for training
different CogVideoX models, including:
- Memory requirements for each model variant
- Supported training types (LoRA)
- Training resolutions
- Mixed precision settings

Updated in all language versions (EN/ZH/JA).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants