Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is precision="mixed" redundant? #9956

Closed
carmocca opened this issue Oct 15, 2021 · 26 comments · Fixed by #16783
Closed

Is precision="mixed" redundant? #9956

carmocca opened this issue Oct 15, 2021 · 26 comments · Fixed by #16783
Assignees
Labels
design Includes a design discussion precision: amp Automatic Mixed Precision refactor
Milestone

Comments

@carmocca
Copy link
Contributor

carmocca commented Oct 15, 2021

Proposed refactoring or deprecation

Does precision="mixed" act differently to precision=16 in any way?

I understand that "mixed" is more correct as 16-bit precision can still run some computations in 32-bit.

Motivation

In #9763 I noticed we did not even have a PrecisionType for "mixed".

There's a single test in the codebase passing the "mixed" value:

https://github.com/PyTorchLightning/pytorch-lightning/blob/master/tests/plugins/test_deepspeed_plugin.py#L153

And no mention at all of this value in the docs.

Pitch

Have one value to set this, whether it is 16 or "mixed". Most likely 16 since its the one widely used.

Otherwise, add tests for passing "mixed"

Additional context

$ grep -iIrn '"mixed"' pytorch_lightning
pytorch_lightning/plugins/training_type/sharded.py:62:                    is_fp16 = precision in ("mixed", 16)
pytorch_lightning/plugins/training_type/fully_sharded.py:135:            mixed_precision=precision == "mixed",
pytorch_lightning/plugins/training_type/ipu.py:42:        if self.precision in ("mixed", 16):
pytorch_lightning/plugins/training_type/deepspeed.py:405:            dtype = torch.float16 if self.precision in (16, "mixed") else torch.float32
pytorch_lightning/plugins/training_type/deepspeed.py:473:            dtype = torch.float16 if self.precision in (16, "mixed") else torch.float32
pytorch_lightning/plugins/training_type/deepspeed.py:602:        if precision in (16, "mixed"):
pytorch_lightning/plugins/precision/mixed.py:26:    precision: Union[str, int] = "mixed"
pytorch_lightning/plugins/precision/fully_sharded_native_amp.py:26:    precision = "mixed"
$ grep -iIrn '"mixed"' tests
tests/plugins/test_deepspeed_plugin.py:153:@pytest.mark.parametrize("precision", [16, "mixed"])
$ grep -Irn 'mixed' docs | grep 'precision='
# no mention in the docs!

If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning

  • Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

  • Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @justusschock @awaelchli @akihironitta @rohitgr7 @tchaton @Borda @carmocca

@carmocca carmocca added refactor design Includes a design discussion labels Oct 15, 2021
@carmocca carmocca mentioned this issue Oct 22, 2021
23 tasks
@justusschock
Copy link
Member

@carmocca during the first accelerator refactor we have introduced the term mixed (before it was 16 always) because we were actually planning to deprecate the alias of 16 for AMP and have real half precision training (like we have for double precision) with that flag instead.

@jzazo
Copy link

jzazo commented Nov 22, 2021

What's the status of this discussion? I wanted to train a BERT-like model with half precision, where the model is also half precision and not only the inputs. What would be the correct way to train not using mixed precision? Thanks!

@justusschock
Copy link
Member

That's not yet decided.

For a true half precision training the current way to go would be to use a custom PrecisionPlugin similar to what we do for double precision

@stale
Copy link

stale bot commented Dec 27, 2021

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Dec 27, 2021
@awaelchli awaelchli added precision: amp Automatic Mixed Precision and removed won't fix This will not be worked on labels Dec 27, 2021
@stale
Copy link

stale bot commented Feb 18, 2022

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Feb 18, 2022
@carmocca carmocca added this to the future milestone Feb 18, 2022
@stale stale bot removed the won't fix This will not be worked on label Feb 18, 2022
@carmocca
Copy link
Contributor Author

carmocca commented Apr 6, 2022

A user in Slack got confused by this

How do I do mixed precision training (CPU / GPU)? [...] Allowed precision values: ('16', '32', '64', 'bf16', 'mixed') suggest that 16 trains pure 16 -bit and mixed trains amp . However, the documentation states that 16 trains with mixed precision. When calling Trainer(precision='mixed') I get an error message No precision set . So how do I do pure half-precision, mixed and full-precision training?

Do we want to support true half-precision? I think we should.

The problem is the API. Since precision=16 is used for mixed 16bit, we would need a deprecation phase so that precision=16 becomes true half-precision and precision=mixed_16 for mixed half-precision.

The disadvantage of this is that the instinct from users will be to pass precision=16 because many people don't understand the difference between true precision and mixed precision. On the other hand, having precision=16 be mixed half-precision and requiring precision='true_16' would be inconsistent with the behavior of precision=16.

One more option is to add one extra flag precision_type="full"|"mixed" where we remove precision="mixed". We automatically choose the best type given the precision value.

@SeanNaren
Copy link
Contributor

Thanks, @carmocca I think this is a great issue to bring up.

I too was confused by the naming convention originally and required me to go through Pytorch Lightning to understand what the flag means.

I believe that changing precision=16 from being AMP is very dangerous, and could lead to many angry users that do not heed the deprecation warning (unless it was incredible loud :P). I love the idea of adding a precision_type of some sort to control whether we'd assume mixed or full. Maybe even just mixed_precision=True | False.

@wisecornelius
Copy link

My suggestion for the API would be to have arguments

  • amp_backend as one argument ("apex" vs "native")
  • precision, which is either [00, 01, 02, 03] for apex or [16, mixed, 32, 64] for native.

In terms of naming for the native backend, I would suggest

Full precision: "full" and 32
Mixed precision: "mixed"
Half precision: "half" and 16
Double precision: "double" and 64
Bfloat precision: "bf16" or "bfloat16"

If there wont be half precision, 16 should be deprecated because it is a misnomer.

@akihironitta
Copy link
Contributor

I believe that changing precision=16 from being AMP is very dangerous, and could lead to many angry users that do not heed the deprecation warning (unless it was incredible loud :P). I love the idea of adding a precision_type of some sort to control whether we'd assume mixed or full. Maybe even just mixed_precision=True | False.

I agree with @SeanNaren. Even though users will need to remember to set two flags (precision and mixed_precision) for doing one thing (true16, ...), I don't think it's worth changing it from being amp, just for a backward-compatibility reason as SeanNaren mentioned above.

I also like @carmocca's suggestion for adding a flag mixed_precision=True | False even though that's adding one more thing to remember for users. I think it may be fine if we document it properly so that the new flag is discoverable.

@rohitgr7
Copy link
Contributor

rohitgr7 commented Apr 7, 2022

also like @carmocca's suggestion for adding a flag mixed_precision=True | False even though that's adding one more thing to remember for users. I think it may be fine if we document it properly so that the new flag is discoverable.

with another flag, we will end up with 4 flags that determine precision for different configurations.

off-topic: I was thinking if we could deprecate amp_level btw and let users choose a different amp_level by passing Trainer(..., plugins=[ApexMixedPrecisionPlugin(amp_level=...)]).

@carmocca
Copy link
Contributor Author

carmocca commented Apr 7, 2022

My suggestion for the API would be to have arguments...

@wisecornelius I think that is very correct and concise but prone to errors from inexperienced users who don't understand all the nuances and differences between backends.

Maybe even just mixed_precision=True | False.

This works now, I just wonder if there'll ever be a new precision algorithm. We would want to deprecate this bool if that ever happens.
If we go with the bool, it should be amp_enabled=True|False for consistency

So we would have precision, amp_backend, amp_level, precision_type/amp_enabled

What I don't really like is that the product of options is very sparse. It does feel boilerplate-y to me but there's no better solution without going through deprecations.

off-topic: I was thinking if we could deprecate amp_level...

This makes sense to me, but you should open a new issue for that.

@wisecornelius
Copy link

What I don't really like is that the product of options is very sparse. It does feel boilerplate-y to me but there's no better solution without going through deprecations.

@carmocca I agree. I do not like the sparsity. It seems that the strategy repository is currently a project to make the strategy selection using dense arguments. I just learned that amp_level=01 only works when precision=16. I did that wrong for 6 months...

@otaj
Copy link
Contributor

otaj commented Apr 8, 2022

I feel like having four different options controlling the precision AND, as @carmocca pointed out, the product of these to be very sparse is not a good approach.

How about having "only" three of them and folding the precision_type/amp_enabled suggested in the last comment into amp_level, where None would indicate something along the lines of amp_enabled=False and anything other would indicate amp_enabled=True?

@awaelchli
Copy link
Contributor

awaelchli commented Apr 11, 2022

Looking at the history, newly added trainer arguments usually don't survive that long 😅

I believe we should take in the learnings from the accelerator/strategy/plugin syntax, where we introduced the concept of registries for combinations of selections under one name.
This helped us push trainer arguments to the individual plugins while still enabling an easy to use interface in code and CLI. Since amp_level and amp_backend only apply to a subset of available plugins, these could be moved to the individual implementations and registered as names for the precision argument (introducing a precision registry).
This solves the sparsity problem as we would only expose the most common combinations as names and makes it easier to discover and understand the Trainer arguments. It also makes it easier to validate the selection since only the registry and the strategy are involved.

# before
Trainer(precision=32|16|64)
# after
Trainer(precision="full"|"mixed"|"double")

# new: true half precision everything
Trainer(precision="half")

# same
Trainer(precision="bf16")

# before
Trainer(precision=16, amp_backend="apex", amp_level="02")
# after
Trainer(precision="apex02")

# for specific customization
Trainer(plugins=ThisSpecialPrecisionPlugin(backend=..., level="X", ...)

These are my random thoughts.

@justusschock
Copy link
Member

justusschock commented Apr 11, 2022

@awaelchli I like that approach! However, this implies that when having precision="mixed" it will always use native amp, right? What do we do if new precisions come up? Do we always refer to them by the fraction of fp32? E.g. 8-bit would be 'quarter'? I think that's not very clear to many people and referring to them by something like precision='16bit' would be more self-explaining

@awaelchli
Copy link
Contributor

I think that's not very clear to many people and referring to them by something like precision='16bit' would be more self-explaining

Yes, I like it. In my proposal the main motivation was to use a different name than for the the (int) 16 to be able to introduce true half precision. So yours works too.

@carmocca
Copy link
Contributor Author

carmocca commented Aug 6, 2022

Trying to revive this discussion, here's what I find most logical

Flags Before Precision After
precision=16 "16-mixed"
precision=64 "64-true" (true)
precision=32 "32-true" (true)
precision=bf16 "bf16-mixed"
Not supported "16-true" (true)
Not supported "bf16-true" (true)

The old precision values show a warning explaining the historical change

@rohitgr7
Copy link
Contributor

rohitgr7 commented Aug 6, 2022

I like it... just not sure precision-type-amp_backend is a good way to get the configuration.

Should we do dict?

{'precision: 16, 'mixed': False, amp_type: 'apex'}

key names can be improved.

also, amp_level is deprecated, so you can remove that, we need to make more changes to this once this is removed.

@carmocca
Copy link
Contributor Author

carmocca commented Aug 6, 2022

I prefer the simplicity of the string options, they are consistent with our other registry based trainer arguments.

Also, passing a dict is very similar to passing separate arguments, and has the disadvantages of #9956 (comment)

@rohitgr7
Copy link
Contributor

rohitgr7 commented Aug 6, 2022

they are consistent with our other registry based trainer arguments

Well, sort of. They are indicated by the argument name in their respective strategy class ("deepspeed_stage_3"), but here one needs to remember the order (precision-type-amp_backend). That's why I suggested a little more explicit configuration. But if others think it's fine, then 👍

@justusschock
Copy link
Member

@carmocca do we really want to append "-true" to all non-mixed precision stuff? intuitively, when I see precision="16", I would assume that this is always the case and there would be no need to append "true"

@Borda Borda moved this to In Review in Frameworks Planning Aug 8, 2022
@Borda Borda moved this from In Review to In Progress in Frameworks Planning Aug 8, 2022
@carmocca
Copy link
Contributor Author

carmocca commented Aug 9, 2022

@carmocca do we really want to append "-true" to all non-mixed precision stuff? intuitively, when I see precision="16", I would assume that this is always the case and there would be no need to append "true"

Okay. Updated the table above. However this is less explicit which can be bad for those who don't notice there's a difference between "true" and "mixed"

@awaelchli
Copy link
Contributor

awaelchli commented Sep 22, 2022

Two notes I want to bring up:

  • I would develop this in parallel with Deprecate the nvidia/apex integration #14416 and release together to minimize friction
  • I would prefer to leave the "native" postfix out of the picture. I think it is fair to assume that anything, unless explicitly mentioned, is native to what torch does. This has the benefit that the name becomes shorter, but does not prevent us from future integrations with new backends that are not native, as we would postfix the name.

@carmocca
Copy link
Contributor Author

I agree @awaelchli. I believe I wrote the above proposal before we decided to deprecate Apex

@awaelchli
Copy link
Contributor

awaelchli commented Sep 22, 2022

The comment about "native" in the name is also something I'd like to push for removing in the class names in the future. We already have a plugin now that has NativeNative (twice!) in the name, which is beyond ridiculous IMO, considering that most of the precision plugins we have are actually native. Although this is not as big of a user impact compared to these trainer arguments.

@carmocca
Copy link
Contributor Author

The removal of apex has simplified this proposal greatly. I edited my proposed API in #9956 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Includes a design discussion precision: amp Automatic Mixed Precision refactor
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

9 participants