when stop training and resume from epoch 13 (for example) --auto-resume and resume_from #10438

soonyoung-hwang · 2023-06-04T11:59:00Z

Hi, I have a general question on how to resume training.

After searching I found there are two options to resume training: --auto-resume option in command line, and resume_from in the code.

My question here is,

when use --auto-resume, do I need to set pretrained = None and resume_from = None or , don't I have to do that?
when use resume_from in the code, will the 'pretrained' and 'load_from' be ignored?

Thank you for your time in advance.

emvollmer · 2023-07-13T09:30:59Z

I've been looking into this a little for both the current v3 and an older v2.21 of MMDetection, so I'll share my observations.

In v21.2, you could either set a --resume-from /path/to/ckp.pth (start from provided path) or a --auto-resume (start from latest checkpoint) flag. Both flags would then change the definition of the parameter resume_from in the configs from None to the path in question. Additionally, you could use the --cfg-options to define the load_from config parameter. I've found that:

If you define both load_from and resume_from (through either flag), the resume_from path will take precedence.
If you want to use i.e. COCO pre-trained weights, you have to use load_from for it to work, so you should ensure you don't define resume_from through flags or changing config settings.

In the current v3, you only have the --resume flag which equates to the previous --auto-resume when you don't provide any further information or --resume-from when you add a /path/to/ckp.pth. You can still use the --cfg-options to define i.e. the load_from parameter. A look at the current train.py script shows that adding a --resume flag automatically defines the load_from parameter with the latest / provided path.

In other words: You can either add the flag with or without a path or change the load_from parameter directly. Defining a resume_from parameter will have no effect, because it is no longer used (compare configs from v2.21 and v3)
Again, when you define both load_from directly and add the --resume flag, the flag's value will replace the cfg defined value, as the cfg is created and merged before the flag values are read.

Regarding setting pretrained=None I'm not quite sure what you're referring to. If you mean the definition of the model's backbone init_cfg=dict(type='Pretrained', checkpoint='torchvision://...')), that is independent of resuming training. You can see that by looking at the logs. When you comment out that line, you'll see the warning message

mmdet - WARNING - No pre-trained weights for <model>, training start from scratch

but this only refers to the backbone.
If you resume from a previous training or use for example COCO pre-trained weights, you should see the following message appear in your logs later on:

 mmdet - INFO - load checkpoint from local path: /path/to/checkpoint/or/pretrained/weights.pth

This is the case for both v2.21 and v3.

mm-assistant bot assigned ZwwWayne Jun 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when stop training and resume from epoch 13 (for example) --auto-resume and resume_from #10438

when stop training and resume from epoch 13 (for example) --auto-resume and resume_from #10438

soonyoung-hwang commented Jun 4, 2023

emvollmer commented Jul 13, 2023

when stop training and resume from epoch 13 (for example) --auto-resume and resume_from #10438

when stop training and resume from epoch 13 (for example) --auto-resume and resume_from #10438

Comments

soonyoung-hwang commented Jun 4, 2023

emvollmer commented Jul 13, 2023