-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add --precision option to ketos train and ketos segtrain #453
Conversation
You're running pytorch-lightning master, right? Because the values supported by the latest stable release are |
As I understand Mixed is use to configure AMP in the plugin: Lines 80 to 83 in 8bf17e1
As per last stable release Maybe a sanity check on the device should be done though. |
The |
You are right. Should I limit mixed precision to ADA and the latter GPU? Maybe PL already has a fallback mechanism. |
No I actually meant it would be great to check that the device used is CUDA (in case someone does something weird such as mixed on CPU) |
Adding onto what I just said: actually mixed should be the default only if you use CUDA, no ? |
You are right. I will add it. |
Not in stable. EDIT: Pure half precision training on master is still not possible. The semantics are explained here. There's no |
By the way mixed precision also works on CPU so it can be left enabled without CUDA as well. The question is if other accelerators like MPS support it so it might be best to filter it out for any device that isn't cuda/cpu. |
I know but the semantic you are referring to is only implemented in master: Lightning-AI/pytorch-lightning#16783 (comment). With PL<=1.9, if you set precision=16, Cuda will issue the following warning:
It sounds like true half-precision to me. |
As soon as PL 2.0 get released, we can get rid of: Lines 80 to 83 in 8bf17e1
|
As I said, there's no true 16bit precision training in PTL, neither stable nor master. The plugin is completely unnecessary. |
So why Cuda says "Using 16bit None Automatic Mixed Precision (AMP)"? |
The blame for this specific print could be https://github.com/Lightning-AI/lightning/blame/5fafe10a2598bb455aa387f0f123b328b9be7177/src/pytorch_lightning/trainer/connectors/accelerator_connector.py#L745 We used to have to provide a AMP mode I think in lightning: https://pytorch-lightning.readthedocs.io/en/1.8.1/common/trainer.html#amp-backend Try setting |
The format string is |
My bad, I thought it was a Cuda warning. |
Any suggestions? |
If you could add it to the pretraining command as well I'd merge it today. |
Thanks! |
Add a
--precision
option toketos train
andketos segtrain
to choose the numerical precision to use during training as discussed in #451. It can be set to: '32', 'bf16', '16', '16-mixed', 'bf16-mixed'. The default is16-mixed
.