Conversation
|
Warning Rate limit exceeded@djsaunde has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 4 minutes and 47 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis change introduces two new monkeypatches for the Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~15–20 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changesNo out-of-scope changes detected. Possibly related PRs
Suggested reviewers
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 4
🔭 Outside diff range comments (1)
src/axolotl/monkeypatch/transformers/trainer_loss_calc.py (1)
1-144: Add validation tests for the actual behavior change.While the unit tests verify that patches are applied, they don't test the actual behavior change (handling NaN values correctly).
Consider adding integration tests that verify NaN handling:
def test_evaluation_handles_nan(): """Test that patched evaluation_loop handles NaN values correctly.""" import numpy as np import torch from unittest.mock import MagicMock, patch # Apply the patch patch_evaluation_loop() # Create a mock trainer with NaN losses trainer = Trainer(model=MagicMock(), args=MagicMock()) # Simulate losses with NaN values with patch.object(trainer, 'prediction_loop') as mock_pred: mock_pred.return_value = ( None, # predictions None, # label_ids {'loss': torch.tensor([1.0, float('nan'), 2.0])} # metrics with NaN ) metrics = trainer.evaluation_loop( dataloader=MagicMock(), description="test", metric_key_prefix="eval" ) # Should handle NaN and return 1.5 (mean of 1.0 and 2.0) assert abs(metrics['eval_loss'] - 1.5) < 1e-6
🧹 Nitpick comments (1)
tests/monkeypatch/test_trainer_loss_calc.py (1)
13-39: Consider adding cleanup in tearDown to restore original state.While the tests clean up before running, they don't restore the original state after completion. This could affect other tests if they depend on an unpatched Trainer class.
Consider adding a tearDown method to restore the original state:
class TestTrainerLossCalc(unittest.TestCase): """ Unit test class for trainer loss calc monkeypatch """ + + def tearDown(self): + """Clean up patches after each test.""" + if hasattr(Trainer, "_original_evaluation_loop"): + delattr(Trainer, "_original_evaluation_loop") + if hasattr(Trainer, "_original_maybe_log_save_evaluate"): + delattr(Trainer, "_original_maybe_log_save_evaluate") def test_patch_evaluation_loop_applies(self):
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/axolotl/loaders/model.py(1 hunks)src/axolotl/loaders/patch_manager.py(1 hunks)src/axolotl/monkeypatch/transformers/trainer_loss_calc.py(1 hunks)tests/monkeypatch/test_trainer_loss_calc.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/monkeypatch/test_trainer_loss_calc.py (1)
src/axolotl/monkeypatch/transformers/trainer_loss_calc.py (2)
patch_evaluation_loop(20-88)patch_maybe_log_save_evaluate(92-143)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.7.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: pre-commit
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: pre-commit
🔇 Additional comments (2)
src/axolotl/loaders/model.py (1)
1-2: LGTM! Docstring formatting improvement.Good fix to consolidate the split text into a single line for better readability.
src/axolotl/loaders/patch_manager.py (1)
79-86: LGTM! Proper integration of trainer loss calculation patches.The new patches are correctly imported and invoked in the appropriate sequence within the transformers patches section.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
|
||
| PATCHED_FSDP2_CODE = """ | ||
| if hasattr(model, "eval") and callable(model.eval): | ||
| self.model.eval() |
There was a problem hiding this comment.
should this be model.eval() since that's what you checked?
There was a problem hiding this comment.
I'm just copying @NanoCode012's patch here... but it looks like you're right. I'll test
winglian
left a comment
There was a problem hiding this comment.
lgtm! Let's open some upstream PRs for these changes too.
Description
Title. If a sub-sequence (via context parallel) is fully masked, we get NaN loss on it.
Motivation and Context
Closes #3026.
Ideally this would just go upstream, but it's not needed there since HF trainer doesn't support context parallelism.
How has this been tested?
Screenshots (if appropriate)
Types of changes
Social Handles (Optional)
Summary by CodeRabbit
Summary by CodeRabbit