fix: fail early for CI if meet CUDA error#3737
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdds 29 lines to ChangesCUDA Fatal Error Detection
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Description
I'm seeing a few failures, in main, in unrelated PRs (for ex https://github.com/axolotl-ai-cloud/axolotl/actions/runs/27540583170/job/81407934320?pr=3736) , where the modal gpu docker e2e fails due to some device flakiness. This change makes those failures fail fast, so we can restart them at our own time and save cost instead of stuck in error loop.
Motivation and Context
How has this been tested?
AI Usage Disclaimer
Claude
Screenshots (if appropriate)
Types of changes
Social Handles (Optional)
Summary by CodeRabbit