Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic seeding when using the LightningCLI #12804

Closed
Schinkikami opened this issue Apr 19, 2022 · 4 comments · Fixed by #12822
Closed

Automatic seeding when using the LightningCLI #12804

Schinkikami opened this issue Apr 19, 2022 · 4 comments · Fixed by #12822
Labels
design Includes a design discussion feature Is an improvement or enhancement lightningcli pl.cli.LightningCLI
Milestone

Comments

@Schinkikami
Copy link
Contributor

Schinkikami commented Apr 19, 2022

🚀 Feature

Currently the LightningCLI offers the argument seed_everything_default: Optional[int] = None.

As it is good practice in scientific research to provide reproducible experiments, I propose to have an option to automatically set a randomly chosen seed at runtime. This then has the advantage that every experiment with the CLI is automatically reproducible as the seed is logged in the config.yaml via the SafeConfigCallback.

Pitch

I propose to extend the signature to
seed_everything_default: Optional[Union[bool,int]] = False, with the Optional being kept for backwards compatibility. Here the argument None is then equivalent to False.

In this case, setting seed_everything_default to True, automatically generate's a seed that is logged to the argument seed_everything, which like before is then saved to the config.yaml by the SafeConfigCallback.

The code change could be as following:


    def add_default_arguments_to_parser(self, parser: LightningArgumentParser) -> None:
        """Adds default arguments to the parser."""
        parser.add_argument(
            "--seed_everything",
            type=Optional[int],
            default=self.seed_everything_default,
            help="Set to an int to run seed_everything with this value before classes instantiation",
        )

to

    def add_default_arguments_to_parser(self, parser: LightningArgumentParser) -> None:
        """Adds default arguments to the parser."""

        default_seed = self.seed_everything_default

        # If self.seed_everything_default is a bool, choose a random seed or none at all
        if isinstance(self.seed_everything_default, bool) and self.seed_everything_default:
            default_seed = random.randint(int(torch.iinfo(torch.int32).min), int(torch.iinfo(torch.int32).max))
        else:
            default_seed = None

        parser.add_argument(
            "--seed_everything",
            type=Optional[int],
            default=default_seed,
            help="Set to an int to run seed_everything with this value before classes instantiation. Set to True to automatically choose a seed at runtime",
        )


This change also has the advantage that it prevents passing an int greater than pytorchs int32.max, which results, without raising a warning, in using a different random seed (same behaviour for numpy).


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @Borda @tchaton @justusschock @awaelchli @carmocca @mauvilsa

@Schinkikami Schinkikami added the needs triage Waiting to be triaged by maintainers label Apr 19, 2022
@carmocca carmocca added design Includes a design discussion lightningcli pl.cli.LightningCLI feature Is an improvement or enhancement and removed needs triage Waiting to be triaged by maintainers labels Apr 19, 2022
@carmocca
Copy link
Contributor

I think it would be best that we have it as int | bool = False where passing True would call seed_everything() without an explicit seed that already contains logic to select the seed randomly. https://github.com/PyTorchLightning/pytorch-lightning/blob/7816e122d0d37bbd9d27d024380001db69b8e129/pytorch_lightning/utilities/seed.py#L40

Are you interested in opening a PR with this change?

@carmocca carmocca added this to the 1.7 milestone Apr 19, 2022
@Schinkikami
Copy link
Contributor Author

Schinkikami commented Apr 19, 2022

I knew that seed_everything() had this logic already implemented, but I didn't know it returned the seed. This is of course a possibility. However, seed_everything() is called after the ArgumentParser parsing in the __init__ function, while for a clean implementation (?) the seed should be set as a default argument for the --seed_everything argument prior to the parsing, as otherwise, it would be necessary to modify the config object of the CLI by inserting the chosen seed, which is done after the parsing. Additionally, seed_everything() throws a rank_zero_warning() if no seed is provided, which we would need to catch.

If I should re-use code, I could use
https://github.com/PyTorchLightning/pytorch-lightning/blob/7816e122d0d37bbd9d27d024380001db69b8e129/pytorch_lightning/utilities/seed.py#L83
As far as I can see the method isn't used without arguments somewhere else. Would it make sense to change the default arguments min_seed_value and max_seed_value to np.iinfo(np.int32).{min,max}?

Regarding the Optional flag in the argument, for most features that are deprecated, Lightning throws a LightningDeprecationWarning ( ".. is deprecated and will be removed in version 1.X ..."), which could be done here. So completely removing the Optional should be done at a later stage?

Hopefully the textblock made sense.
I would like to create a PR with this change.

@mauvilsa
Copy link
Contributor

I didn't know about the seed range. Instead of having in the type int might be better to do:

from jsonargparse.typing import restricted_number_type

int32 = restricted_number_type('int32', int, [('>=', np.iinfo(np.int32).min), ('<=', np.iinfo(np.int32).max)])

Then in the add_argument it would be type=Union[int32, bool]. This way there is a parse error when someone gives as seed an int outside the range.

However, seed_everything() is called after the ArgumentParser parsing in the init function, while for a clean implementation (?) the seed should be set as a default argument for the --seed_everything argument prior to the parsing, as otherwise, it would be necessary to modify the config object of the CLI by inserting the chosen seed, which is done after the parsing.

I think it would be fine to insert the value in the config after the parsing. I don't think there is a need for a deprecation warning for removing the Optional. Saving the seed I agree is a good practice, so maybe the default value should be True.

@carmocca
Copy link
Contributor

I agree with @mauvilsa's thoughts. You can incorporate both his and my suggestions in the same PR.

As far as I can see the method isn't used without arguments somewhere else. Would it make sense to change the default arguments min_seed_value and max_seed_value to np.iinfo(np.int32).{min,max}?

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Includes a design discussion feature Is an improvement or enhancement lightningcli pl.cli.LightningCLI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants