add --precision option to ketos train and ketos segtrain #453

colibrisson · 2023-02-22T20:55:28Z

Add a --precision option to ketos train and ketos segtrain to choose the numerical precision to use during training as discussed in #451. It can be set to: '32', 'bf16', '16', '16-mixed', 'bf16-mixed'. The default is 16-mixed.

mittagessen · 2023-02-23T09:48:02Z

You're running pytorch-lightning master, right? Because the values supported by the latest stable release are ('64', '32', '16', 'bf16'). I'd change them to the current values and pin PTL to >=1.9.0,<2.0.

PonteIneptique · 2023-02-23T09:54:45Z

As I understand **-mixed values are split before being passed to the Trainer which indeed only accept 64, 32, b16 and 16.

Mixed is use to configure AMP in the plugin:

kraken/kraken/lib/train.py

Lines 80 to 83 in 8bf17e1

    
           if 'mixed' in kwargs['precision']: 
        
               precision = kwargs['precision'].split('-')[0] 
        
               kwargs['precision'] = precision 
        
               kwargs['plugins'] = [pl.plugins.precision.MixedPrecisionPlugin(precision, 'cuda')]

As per last stable release

Maybe a sanity check on the device should be done though.

colibrisson · 2023-02-23T09:55:02Z

The precision argument in trainer only set the numerical precision: precision=16 is just half precision. To use AMP you need to provide a plugin to the trainer object. It's why I added 16-mixed.

colibrisson · 2023-02-23T09:57:06Z

Maybe a sanity check on the device should be done though.

You are right. Should I limit mixed precision to ADA and the latter GPU? Maybe PL already has a fallback mechanism.

PonteIneptique · 2023-02-23T09:57:58Z

No I actually meant it would be great to check that the device used is CUDA (in case someone does something weird such as mixed on CPU)

PonteIneptique · 2023-02-23T10:00:08Z

Adding onto what I just said: actually mixed should be the default only if you use CUDA, no ?

colibrisson · 2023-02-23T10:01:57Z

Adding onto what I just said: actually mixed should be the default only if you use CUDA, no ?

You are right. I will add it.

mittagessen · 2023-02-23T10:08:37Z

The precision argument in trainer only set the numerical precision: precision=16 is just half precision. To use AMP you need to provide a plugin to the trainer object. It's why I added 16-mixed.

Not in stable. precision=16 is enough to enable AMP (pure half precision training isn't supported). ~~Master/2.0 changes/will change the behavior to what you describe. See Lightning-AI/pytorch-lightning#9956 (comment).~~

EDIT: Pure half precision training on master is still not possible. The semantics are explained here. There's no 16-true value.

mittagessen · 2023-02-23T10:20:44Z

By the way mixed precision also works on CPU so it can be left enabled without CUDA as well. The question is if other accelerators like MPS support it so it might be best to filter it out for any device that isn't cuda/cpu.

colibrisson · 2023-02-23T10:21:39Z

I know but the semantic you are referring to is only implemented in master: Lightning-AI/pytorch-lightning#16783 (comment). With PL<=1.9, if you set precision=16, Cuda will issue the following warning:

Using 16bit None Automatic Mixed Precision (AMP)

It sounds like true half-precision to me.

colibrisson · 2023-02-23T10:31:02Z

As soon as PL 2.0 get released, we can get rid of:

kraken/kraken/lib/train.py

Lines 80 to 83 in 8bf17e1

    
           if 'mixed' in kwargs['precision']: 
        
               precision = kwargs['precision'].split('-')[0] 
        
               kwargs['precision'] = precision 
        
               kwargs['plugins'] = [pl.plugins.precision.MixedPrecisionPlugin(precision, 'cuda')]

mittagessen · 2023-02-23T10:39:37Z

As I said, there's no true 16bit precision training in PTL, neither stable nor master. The plugin is completely unnecessary.

colibrisson · 2023-02-23T10:41:26Z

As I said, there's no true 16bit precision training in PTL, neither stable nor master. The plugin is completely unnecessary.

So why Cuda says "Using 16bit None Automatic Mixed Precision (AMP)"?

PonteIneptique · 2023-02-23T10:48:10Z

The blame for this specific print could be https://github.com/Lightning-AI/lightning/blame/5fafe10a2598bb455aa387f0f123b328b9be7177/src/pytorch_lightning/trainer/connectors/accelerator_connector.py#L745

We used to have to provide a AMP mode I think in lightning: https://pytorch-lightning.readthedocs.io/en/1.8.1/common/trainer.html#amp-backend

Try setting Trainer(amp_backend="native") just to see if this is the issue :)

mittagessen · 2023-02-23T10:48:16Z

The format string is f"Using 16bit {self._amp_type_flag} Automatic Mixed Precision (AMP)". The None refers to the AMP implementation flag that can optionally be given to the trainer (apex or native). It defaults to native if none is given. It isn't a warning, just a info message.

colibrisson · 2023-02-23T11:21:27Z

My bad, I thought it was a Cuda warning.

colibrisson · 2023-02-23T11:51:40Z

Any suggestions?

mittagessen · 2023-02-23T12:10:35Z

If you could add it to the pretraining command as well I'd merge it today.

mittagessen · 2023-02-23T13:32:45Z

Thanks!

colibrisson added 2 commits February 22, 2023 20:46

add --precision option to ketos train and segtrain

d53dbfa

fix typo

8bf17e1

add check for mix-precision

fe1bfa3

colibrisson added 3 commits February 23, 2023 11:45

remove mix-precision plugin

78c1a50

update --precision semantics

1b7ace6

pin PL to 1.9

ac6b0b3

add --precision option to ketos pretrain

3dccaa0

mittagessen merged commit 50d7860 into mittagessen:master Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add --precision option to ketos train and ketos segtrain #453

add --precision option to ketos train and ketos segtrain #453

colibrisson commented Feb 22, 2023

mittagessen commented Feb 23, 2023

PonteIneptique commented Feb 23, 2023

colibrisson commented Feb 23, 2023

colibrisson commented Feb 23, 2023 •

edited

Loading

PonteIneptique commented Feb 23, 2023

PonteIneptique commented Feb 23, 2023

colibrisson commented Feb 23, 2023 •

edited

Loading

mittagessen commented Feb 23, 2023 •

edited

Loading

mittagessen commented Feb 23, 2023

colibrisson commented Feb 23, 2023 •

edited

Loading

colibrisson commented Feb 23, 2023

mittagessen commented Feb 23, 2023 •

edited

Loading

colibrisson commented Feb 23, 2023

PonteIneptique commented Feb 23, 2023

mittagessen commented Feb 23, 2023 •

edited

Loading

colibrisson commented Feb 23, 2023

colibrisson commented Feb 23, 2023

mittagessen commented Feb 23, 2023

mittagessen commented Feb 23, 2023

add --precision option to ketos train and ketos segtrain #453

add --precision option to ketos train and ketos segtrain #453

Conversation

colibrisson commented Feb 22, 2023

mittagessen commented Feb 23, 2023

PonteIneptique commented Feb 23, 2023

colibrisson commented Feb 23, 2023

colibrisson commented Feb 23, 2023 • edited Loading

PonteIneptique commented Feb 23, 2023

PonteIneptique commented Feb 23, 2023

colibrisson commented Feb 23, 2023 • edited Loading

mittagessen commented Feb 23, 2023 • edited Loading

mittagessen commented Feb 23, 2023

colibrisson commented Feb 23, 2023 • edited Loading

colibrisson commented Feb 23, 2023

mittagessen commented Feb 23, 2023 • edited Loading

colibrisson commented Feb 23, 2023

PonteIneptique commented Feb 23, 2023

mittagessen commented Feb 23, 2023 • edited Loading

colibrisson commented Feb 23, 2023

colibrisson commented Feb 23, 2023

mittagessen commented Feb 23, 2023

mittagessen commented Feb 23, 2023

colibrisson commented Feb 23, 2023 •

edited

Loading

colibrisson commented Feb 23, 2023 •

edited

Loading

mittagessen commented Feb 23, 2023 •

edited

Loading

colibrisson commented Feb 23, 2023 •

edited

Loading

mittagessen commented Feb 23, 2023 •

edited

Loading

mittagessen commented Feb 23, 2023 •

edited

Loading