Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when stop training and resume from epoch 13 (for example) --auto-resume and resume_from #10438

Open
soonyoung-hwang opened this issue Jun 4, 2023 · 1 comment
Assignees

Comments

@soonyoung-hwang
Copy link

Hi, I have a general question on how to resume training.

After searching I found there are two options to resume training: --auto-resume option in command line, and resume_from in the code.

My question here is,

  1. when use --auto-resume, do I need to set pretrained = None and resume_from = None or , don't I have to do that?
  2. when use resume_from in the code, will the 'pretrained' and 'load_from' be ignored?

Thank you for your time in advance.

@emvollmer
Copy link

I've been looking into this a little for both the current v3 and an older v2.21 of MMDetection, so I'll share my observations.

In v21.2, you could either set a --resume-from /path/to/ckp.pth (start from provided path) or a --auto-resume (start from latest checkpoint) flag. Both flags would then change the definition of the parameter resume_from in the configs from None to the path in question. Additionally, you could use the --cfg-options to define the load_from config parameter. I've found that:

  • If you define both load_from and resume_from (through either flag), the resume_from path will take precedence.
  • If you want to use i.e. COCO pre-trained weights, you have to use load_from for it to work, so you should ensure you don't define resume_from through flags or changing config settings.

In the current v3, you only have the --resume flag which equates to the previous --auto-resume when you don't provide any further information or --resume-from when you add a /path/to/ckp.pth. You can still use the --cfg-options to define i.e. the load_from parameter. A look at the current train.py script shows that adding a --resume flag automatically defines the load_from parameter with the latest / provided path.

  • In other words: You can either add the flag with or without a path or change the load_from parameter directly. Defining a resume_from parameter will have no effect, because it is no longer used (compare configs from v2.21 and v3)
  • Again, when you define both load_from directly and add the --resume flag, the flag's value will replace the cfg defined value, as the cfg is created and merged before the flag values are read.

Regarding setting pretrained=None I'm not quite sure what you're referring to. If you mean the definition of the model's backbone init_cfg=dict(type='Pretrained', checkpoint='torchvision://...')), that is independent of resuming training. You can see that by looking at the logs. When you comment out that line, you'll see the warning message

mmdet - WARNING - No pre-trained weights for <model>, training start from scratch

but this only refers to the backbone.
If you resume from a previous training or use for example COCO pre-trained weights, you should see the following message appear in your logs later on:

 mmdet - INFO - load checkpoint from local path: /path/to/checkpoint/or/pretrained/weights.pth

This is the case for both v2.21 and v3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants