Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix eval hook save wrong best checkpoint bug #5341

Closed
wants to merge 1 commit into from
Closed

fix eval hook save wrong best checkpoint bug #5341

wants to merge 1 commit into from

Conversation

shenmishajing
Copy link

Motivation

When we enable save best in eval hook, it will link last ckpt as best ckpt. Related code as follow:
/mmdetection/mmdet/core/evaluation/eval_hooks.py

    def save_best_checkpoint(self, runner, key_score):
        best_score = runner.meta['hook_msgs'].get(
            'best_score', self.init_value_map[self.rule])
        if self.compare_func(key_score, best_score) and 'last_ckpt' in runner.meta['hook_msgs']:
            best_score = key_score
            runner.meta['hook_msgs']['best_score'] = best_score
            last_ckpt = runner.meta['hook_msgs']['last_ckpt']
            runner.meta['hook_msgs']['best_ckpt'] = last_ckpt
            mmcv.symlink(
                last_ckpt,
                osp.join(runner.work_dir, f'best_{self.key_indicator}.pth'))
            time_stamp = runner.epoch + 1 if self.by_epoch else runner.iter + 1
            self.logger.info(f'Now best checkpoint is epoch_{time_stamp}.pth.'
                             f'Best {self.key_indicator} is {best_score:0.4f}')

But, when we add checkpoint hook to runner, we use priority 70. We add eval hook use priority 50. Related code:
/mmcv/runner/base_runner.py

    def register_checkpoint_hook(self, checkpoint_config):
        if checkpoint_config is None:
            return
        if isinstance(checkpoint_config, dict):
            checkpoint_config.setdefault('type', 'CheckpointHook')
            hook = mmcv.build_from_cfg(checkpoint_config, HOOKS)
        else:
            hook = checkpoint_config
        self.register_hook(hook, priority=70)

    def register_hook(self, hook, priority='NORMAL'):
        """Register a hook into the hook list.

        The hook will be inserted into a priority queue, with the specified
        priority (See :class:`Priority` for details of priorities).
        For hooks with the same priority, they will be triggered in the same
        order as they are registered.

        Args:
            hook (:obj:`Hook`): The hook to be registered.
            priority (int or str or :obj:`Priority`): Hook priority.
                Lower value means higher priority.
        """
        assert isinstance(hook, Hook)
        if hasattr(hook, 'priority'):
            raise ValueError('"priority" is a reserved attribute for hooks')
        priority = get_priority(priority)
        hook.priority = priority
        # insert the hook to a sorted list
        inserted = False
        for i in range(len(self._hooks) - 1, -1, -1):
            if priority >= self._hooks[i].priority:
                self._hooks.insert(i + 1, hook)
                inserted = True
                break
        if not inserted:
            self._hooks.insert(0, hook)

class Priority(Enum):
    """Hook priority levels.

    +------------+------------+
    | Level      | Value      |
    +============+============+
    | HIGHEST    | 0          |
    +------------+------------+
    | VERY_HIGH  | 10         |
    +------------+------------+
    | HIGH       | 30         |
    +------------+------------+
    | NORMAL     | 50         |
    +------------+------------+
    | LOW        | 70         |
    +------------+------------+
    | VERY_LOW   | 90         |
    +------------+------------+
    | LOWEST     | 100        |
    +------------+------------+
    """

    HIGHEST = 0
    VERY_HIGH = 10
    HIGH = 30
    NORMAL = 50
    LOW = 70
    VERY_LOW = 90
    LOWEST = 100

/mmdetection/mmdet/api/train.py

        runner.register_hook(eval_hook(val_dataloader, **eval_cfg))

So, eval hook with a higher pirority than checkpoint hook, and it will be called before checkpoint hook. Therefore, it will save the last epoch ckpt instead of cur epoch ckpt when save best ckpt. Example, cur epoch is 9 epoch, and eval hook will test model with 9 epoch's weight but link 8 epoch's ckpt as best ckpt.

Modification

change
/mmdetection/mmdet/api/train.py

        runner.register_hook(eval_hook(val_dataloader, **eval_cfg))

to
/mmdetection/mmdet/api/train.py

        runner.register_hook(eval_hook(val_dataloader, **eval_cfg), priority = 80)

@CLAassistant
Copy link

CLAassistant commented Jun 12, 2021

CLA assistant check
All committers have signed the CLA.

@hhaAndroid
Copy link
Collaborator

Hi @shenmishajing Thanks for the contribution. At present, eval_hookdirectly uses the code in mmcv. Can you consider changing to the latest version and testing it?

@hhaAndroid
Copy link
Collaborator

Since this problem is fixed by open-mmlab/mmcv#1120, it can be turned off.

@hhaAndroid hhaAndroid closed this Jun 30, 2021
@shenmishajing shenmishajing deleted the fix_eval_hook_save_wrong_best_ckpt_bug branch August 25, 2021 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants