MLFlow issue: Every single run is marked as FINISHED never FAILED #20827
              
  
  Closed
                Unanswered
              
          
                  
                    
                      Saya47
                    
                  
                
                  asked this question in
                Lightning Trainer API: Trainer, LightningModule, LightningDataModule
              
            Replies: 1 comment
-
| 
         Figured out my code had a bug, sorry if I took anyone's time, I've struggled too long with this until I posted this then after 12 hours realized the bug.  | 
  
Beta Was this translation helpful? Give feedback.
                  
                    0 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
Hello good day to everybody.
I track my experiments using MLFLow. My issue is that even if my code has bugs and raises during run, Lightning marks all my runs as FINISHED in MLFLow. Below I asked an LLM to demonstrate this:
The run is marked with status 3 (FINISHED) on exceptions which is really bad.
I use the status to filter out bad runs because I use MLFLow to aggregate metrics/losses across epochs/runs and resume checkpoints.
Beta Was this translation helpful? Give feedback.
All reactions