-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat add & bugfix #642
Merged
Merged
feat add & bugfix #642
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Collaborator
OleehyO
commented
Jan 6, 2025
•
edited
Loading
edited
- Added prompt embedding pre-caching in the dataset to reduce memory usage during training, and adapted it for the T2V and I2V models.
- Modified training details for the T2V and I2V models.
- Deleted the computation graph before evaluation and enabled offloading to further reduce memory usage.
- Fixed some minor issues in the SAT-related code.
- Adjusted default parameters in certain scripts and updated the README.
- Add validation check to ensure number of frames is multiple of 8 - Add format validation for train_resolution string (frames x height x width)
- Add caching for prompt embeddings - Store cached files using safetensors format - Add cache directory structure under data_root/cache - Optimize memory usage by moving tensors to CPU after caching - Add debug logging for cache hits - Add info logging for cache writes The caching system helps reduce redundant computation and memory usage during training by: 1. Caching prompt embeddings based on prompt text hash 2. Caching encoded video latents based on video filename 3. Moving tensors to CPU after caching to free GPU memory
…lution - Add validation to ensure (frames - 1) is multiple of 8 - Add specific resolution check (480x720) for cogvideox-5b models - Add error handling for invalid resolution format
- Add support for cached prompt embeddings in dataset - Fix bug where first frame wasn't properly padded in latent space
This change enables caching of prompt embeddings in the CogVideoX text-to-video LoRA trainer, which can improve training efficiency by avoiding redundant text encoding operations.
Add docstring to train_frames field in State schema to explicitly indicate that it includes one image padding frame
- Add pipe.remove_all_hooks() after validation to prevent memory leaks - Clean up validation pipeline properly to avoid potential issues in subsequent training steps
- Add text embedding support in dataset collation - Pad 2 random noise frames at the beginning of latent space during training
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today I will proceed with the operation, thank you.
When training i2v models without specifying image_column, automatically extract and use first frames from training videos as conditioning images. This includes: - Add load_images_from_videos() utility function to extract and cache first frames - Update BaseI2VDataset to support auto-extraction when image_column is None - Add validation and warning message in Args schema for i2v without image_column The first frames are extracted once and cached to avoid repeated video loading.
…ache Before precomputing the latent cache and text embeddings, cast the VAE and text encoder to the target training dtype (fp16/bf16) instead of keeping them in fp32. This reduces memory usage during the precomputation phase. The change occurs in prepare_dataset() where the models are moved to device and cast to weight_dtype before being used to generate the cache.
Add a table in README files showing hardware requirements for training different CogVideoX models, including: - Memory requirements for each model variant - Supported training types (LoRA) - Training resolutions - Mixed precision settings Updated in all language versions (EN/ZH/JA).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.