Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

precision parameter in Trainer does not accept "16" passed as str #16011

Closed
pkubik opened this issue Dec 12, 2022 · 6 comments · Fixed by #14857
Closed

precision parameter in Trainer does not accept "16" passed as str #16011

pkubik opened this issue Dec 12, 2022 · 6 comments · Fixed by #14857
Assignees
Labels
bug Something isn't working docs Documentation related help wanted Open to be worked on precision: amp Automatic Mixed Precision
Milestone

Comments

@pkubik
Copy link

pkubik commented Dec 12, 2022

Bug description

When you try to create Trainer with wrong value for precision it provides correct values as:

MisconfigurationException("Precision '<wrong value>' is invalid. Allowed precision values: ('16', '32', '64', 'bf16')")

This indicate that the values are supposed to be string. However, later when I pass "16" as a value. It fails with:

RuntimeError('No precision set')

Only value 16 passed as an integer works right now. It causes problems with some configuration frameworks or hyperparam optimization libraries. They often do not work with unions.

How to reproduce the bug

import pytorch_lightning as pl
pl.Trainer(precision="16")

Error messages and logs

File ~/myenv/python3.9/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:226, in AcceleratorConnector.__init__(self, devices, num_nodes, accelerator, strategy, plugins, precision, amp_type, amp_level, sync_batchnorm, benchmark, replace_sampler_ddp, deterministic, auto_select_gpus, num_processes, tpu_cores, ipus, gpus)
    223 self._init_strategy()
    225 # 5. Instantiate Precision Plugin
--> 226 self.precision_plugin = self._check_and_init_precision()
    228 # 6. Instantiate Strategy - Part 2
    229 self._lazy_init_strategy()

File ~/myenv/python3.9/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:735, in AcceleratorConnector._check_and_init_precision(self)
    732         self._amp_level_flag = self._amp_level_flag or "O2"
    733         return ApexMixedPrecisionPlugin(self._amp_level_flag)
--> 735 raise RuntimeError("No precision set")

RuntimeError: No precision set

Environment

  • CUDA:
    • GPU: None
    • available: False
    • version: None
  • Lightning:
    • lightning-utilities: 0.3.0
    • pytorch-lightning: 1.8.3.post1
    • torch: 1.13.0
    • torchaudio: 0.13.0
    • torchmetrics: 0.10.3
    • torchvision: 0.14.0
  • System:
    • OS: Darwin
    • architecture:
      • 64bit
    • processor: arm
    • python: 3.9.15
    • version: Darwin Kernel Version 21.6.0

More info

Also checked on newest pytorch_lightning pytorch_lightning-1.8.4.post0.

Works fine for following snippet:

import pytorch_lightning as pl
pl.Trainer(precision=16)

cc @Borda @carmocca @justusschock @awaelchli

@pkubik pkubik added the needs triage Waiting to be triaged by maintainers label Dec 12, 2022
@pkubik
Copy link
Author

pkubik commented Dec 12, 2022

I guess this is due to this line:
https://github.com/Lightning-AI/lightning/blob/6aaac8b910e6c101ea3d62e3f2a5576c07a214b6/src/pytorch_lightning/trainer/connectors/accelerator_connector.py#L708

if self._precision_flag in (16, "bf16"):

It should probably be something like:

if str(self._precision_flag) in ("16", "bf16"):

@awaelchli
Copy link
Contributor

awaelchli commented Dec 13, 2022

Hi @pkubik
The intended type is indeed int. String was not meant to be supported. The addition of "bf16" was added at a later time when it was introduced in PyTorch.

The error message can be misleading, I agree. We can consider to change it to remove the quotes.
We can also consider adding the string values as accepted types. In fact, we wanted to do this in Lightning Lite already. cc @carmocca

It causes problems with some configuration frameworks or hyperparam optimization libraries. They often do not work with unions.

What library are you working with?

Another related issue that proposes an overhaul: #9956

@awaelchli awaelchli added bug Something isn't working docs Documentation related precision: amp Automatic Mixed Precision and removed needs triage Waiting to be triaged by maintainers labels Dec 13, 2022
@Borda
Copy link
Member

Borda commented Dec 13, 2022

for consistency I would extend support for int and str 🦦

@pkubik
Copy link
Author

pkubik commented Dec 13, 2022

What library are you working with?

We actually tried to use this with hydra with schema defined in a dataclass. We probably could find some nasty workaround.

I do not claim that my case is sufficient justification to make a decision here :). I just saw this inconsistency with the error message and thought that accepting str was expected here.

@carmocca
Copy link
Contributor

We can also consider adding the string values as accepted types. In fact, we wanted to do this in Lightning Lite already. cc @carmocca

Yes. A contribution is welcome!

@carmocca carmocca added the help wanted Open to be worked on label Dec 13, 2022
@carmocca carmocca added this to the future milestone Dec 13, 2022
@awaelchli awaelchli self-assigned this Dec 25, 2022
@awaelchli
Copy link
Contributor

@pkubik The string value is now also supported as input (with the next release 1.9.0).

@carmocca carmocca modified the milestones: future, v1.9 Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working docs Documentation related help wanted Open to be worked on precision: amp Automatic Mixed Precision
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants