Act offload lora fix#2928
Conversation
📝 WalkthroughWalkthroughThe changes introduce specialized activation offloading for PEFT/LoRA models, refine activation offloading context management, simplify schema validation logic by removing special-case handling for LoRA adapters, update documentation with a new FAQ entry, and add end-to-end tests for activation offloading. Several obsolete unit tests are removed. Changes
Estimated code review effort3 (~45 minutes) Possibly related PRs
Suggested labels
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (5)
💤 Files with no reviewable changes (1)
🚧 Files skipped from review as they are similar to previous changes (4)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/e2e/test_activation_offloading.py (1)
46-46: Consider reducing the dataset split size for faster tests.Using 10% of the dataset might be excessive for a test that only runs 2 steps. Consider reducing to 1% or using a fixed small number of examples.
- "split": "train[:10%]", + "split": "train[:1%]",src/axolotl/core/trainers/mixins/activation_checkpointing.py (1)
120-193: Comprehensive output head detection with room for extensibility.The heuristic detection covers common output head patterns. However, consider documenting how users can extend this for custom model architectures.
Consider exposing a configuration option or hook mechanism for users to specify custom output head module names for non-standard model architectures.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/faq.qmd(1 hunks)src/axolotl/core/trainers/mixins/activation_checkpointing.py(3 hunks)src/axolotl/utils/schemas/validation.py(1 hunks)tests/e2e/test_activation_offloading.py(1 hunks)tests/utils/schemas/validation/test_activation_offloading.py(0 hunks)
💤 Files with no reviewable changes (1)
- tests/utils/schemas/validation/test_activation_offloading.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
tests/e2e/test_activation_offloading.py (4)
src/axolotl/utils/config/__init__.py (1)
validate_config(261-305)src/axolotl/utils/dict.py (1)
DictDefault(6-38)tests/e2e/utils.py (1)
check_model_output_exists(162-183)tests/utils/schemas/validation/test_activation_offloading.py (1)
TestActivationOffloading(7-35)
src/axolotl/core/trainers/mixins/activation_checkpointing.py (1)
src/axolotl/utils/logging.py (1)
get_logger(42-49)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: PyTest (3.11, 2.7.0)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: pre-commit
- GitHub Check: pre-commit
- GitHub Check: preview
🔇 Additional comments (10)
docs/faq.qmd (1)
140-142: LGTM! Clear and helpful FAQ entry.The documentation provides a practical workaround for users encountering edge cases with the new activation offloading implementation.
src/axolotl/utils/schemas/validation.py (1)
1076-1079: Good simplification aligned with the new LoRA-aware activation offloading.The removal of special-case handling for LoRA adapters is appropriate since the new implementation in
activation_checkpointing.pynow handles PEFT/LoRA models with a specialized context manager.tests/e2e/test_activation_offloading.py (2)
17-20: Well-structured test class for activation offloading.Good approach to test activation offloading across different adapter configurations.
22-26: Good parameterization coverage.Testing with
lora,qlora, and no adapter ensures the new activation offloading logic works correctly across all supported configurations.src/axolotl/core/trainers/mixins/activation_checkpointing.py (6)
7-23: Appropriate imports for PEFT support.The new imports properly support the specialized activation offloading for PEFT/LoRA models.
33-40: Clean implementation of PEFT-specific activation offloading.The conditional logic correctly identifies PEFT models and applies the specialized context manager, ensuring proper handling of LoRA wrapped models.
54-94: Well-documented function with comprehensive parameters.The function signature and documentation clearly explain the purpose and parameters, making it easy for future maintainers to understand the activation offloading behavior for LoRA models.
110-118: Good model unwrapping logic.The code properly handles different wrapper types including PEFT models, ensuring the base model is accessible for output head detection.
194-200: Helpful warning message for debugging.The warning provides clear guidance when no output head is detected, helping users understand potential performance implications.
202-216: Smart handling of Liger modules and checkpoint-wrapped modules.The logic correctly identifies and excludes modules that shouldn't be offloaded, ensuring compatibility with LoRA training.
|
📖 Documentation Preview: https://687f779976a8840c25274406--resonant-treacle-0fd729.netlify.app Deployed on Netlify from commit 0225747 |
Codecov ReportAttention: Patch coverage is
📢 Thoughts on this report? Let us know! |
eb0db6b to
0225747
Compare
Follow up to #2927 so that CUDA streams for activation offloading now works.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests