Update hapi to support AMP #31417

LiuChiachi · 2021-03-03T13:31:09Z

PR types

New features

PR changes

APIs

Describe

Add AMP support for High-level API.

API

def prepare(self,
     optimizer=None, loss=None, metrics=None,
     amp_configs=None): # new  parameters for `Model.prepare()`

amp_configs (str|dict|None): AMP configurations. If AMP or pure float16 training is used, the key 'level' of 'amp_configs' should be set to 'O1' or 'O2' respectively. Otherwise, the value of 'level' defaults to 'O0', which means float32 training. In addition to 'level', users could pass in more parameters consistent with mixed precision API. The supported keys are: 'init_loss_scaling', 'incr_ratio', 'decr_ratio', 'incr_every_n_steps', 'decr_every_n_nan_or_inf', 'use_dynamic_loss_scaling', 'custom_white_list', 'custom_black_list', and 'custom_black_varnames'or 'use_fp16_guard' is only supported in static mode. Users could refer to mixed precision API documentations :ref:api_paddle_amp_auto_cast and :ref:api_paddle_amp_GradScaler for details. For convenience, 'amp_configs' could be set to 'O1' or 'O2' if no more
parameters are needed. 'amp_configs' could be None in float32 training. Default: None.

API description

When training on GPU, auto mixed precision (AMP) training is supported, and pure float16 training is also supported in static mode while using Adam, AdamW and Momentum optimizer. Before using pure float16 training, multi_precision could be set to True when creating optimizer, which can avoid poor accuracy or slow convergence in a way, and inputs of dtype float
should be cast to float16 by users. Users should also use paddle.static.amp.fp16_guard API to limit the range of pure float16 training, otherwise, 'use_fp16_guard' should be set to False by users. However, limiting the range of is not supported during training using AMP.

Example code:

import paddle
import paddle.nn as nn
import paddle.vision.transforms as T

def run_example_code():
  device = paddle.set_device('gpu')

  net = nn.Sequential(nn.Flatten(1), nn.Linear(784, 200), nn.Tanh(),
                      nn.Linear(200, 10))

  model = paddle.Model(net)
  optim = paddle.optimizer.SGD(learning_rate=1e-3, parameters=model.parameters())

  amp_configs = {
      "level": "O1",
      "custom_white_list": {'conv2d'},
      "use_dynamic_loss_scaling": True
  }
  model.prepare(optim,
      paddle.nn.CrossEntropyLoss(),
      paddle.metric.Accuracy(),
      amp_configs=amp_configs)

  transform = T.Compose([T.Transpose(), T.Normalize([127.5], [127.5])])
  data = paddle.vision.datasets.MNIST(mode='train', transform=transform)
  model.fit(data, epochs=2, batch_size=32, verbose=1)

# mixed precision training is only supported on GPU now.
if paddle.is_compiled_with_cuda():
  run_example_code()

paddle-bot-old · 2021-03-12T03:25:41Z

Sorry to inform you that 15e1f1b's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

… add-amp-support-for-hapi

zhiqiu · 2021-04-08T07:56:53Z

python/paddle/hapi/model.py

@@ -1255,7 +1348,19 @@ def prepare(self, optimizer=None, loss=None, metrics=None):
                It can be None when there is no loss.
            metrics (Metric|list of Metric|None): If metrics is set, all
                metrics will be calculated and output in train/eval mode.
-
+            amp_configs (dict|None): AMP configurations. If AMP or pure


I think it is better to make a class for amp_config, which can limit the key-values that are accepted by amp_configs.

Thanks. If some key-values are not accepted, the program will break down here. It is necessary to catch the wrong key-values earlier.

zhiqiu · 2021-04-08T08:01:01Z

python/paddle/hapi/model.py

-        else:
-            outputs = self.model.network.forward(
-                * [to_variable(x) for x in inputs])
+        scaler = paddle.amp.GradScaler(**self._amp_configs)


Are all key-values in self._amp_configs accepted by GradScaler?

Thanks! Done. I added input check for amp_configs. Thanks:)

zhiqiu · 2021-04-08T08:01:23Z

python/paddle/hapi/model.py

+        scaler = paddle.amp.GradScaler(**self._amp_configs)
+        with paddle.amp.auto_cast(
+                enable=False if self._amp_level == 'O0' else True,
+                **self._amp_custom_lists):


similar above

zhiqiu · 2021-04-08T08:02:51Z

python/paddle/hapi/model.py


-        self.model._optimizer.minimize(final_loss)
-        self.model.network.clear_gradients()
+        if self._amp_level != "O0":


dygraph does not support 02 yet?

When O2 is chosen in dygraph, it would be treated as O1, and users would be warned. Is this correct?

… add-amp-support-for-hapi

qingqing01 · 2021-04-12T01:52:03Z

python/paddle/hapi/model.py

          model.prepare(optim,
                        paddle.nn.CrossEntropyLoss(),
-                        paddle.metric.Accuracy())
+                        paddle.metric.Accuracy(),
+                        amp_configs=amp_configs)


better to add another example for amp

Thanks! Done.

qingqing01 · 2021-04-12T02:00:45Z

python/paddle/hapi/model.py

+                'use_dynamic_loss_scaling', 'custom_white_list',
+                'custom_black_list', and 'custom_black_varnames',
+                'use_pure_fp16' or 'use_fp16_guard' is only supported in static
+                mode. Users could refer to low-level API documentations


remove 'low-level'

Thanks! Done.

… add-amp-support-for-hapi

guoshengCS · 2021-04-12T09:14:16Z

python/paddle/hapi/model.py

+            ]:
+                if param_name in self._adapter._amp_configs:
+                    self._adapter._amp_custom_lists[
+                        param_name] = amp_configs.pop(param_name)


Better not to pop from input dict amp_configs and treat it as constant, since the user defined input might be reused elsewhere, such as another amp model.

Thanks! Done.

guoshengCS · 2021-04-12T09:20:44Z

python/paddle/hapi/model.py

+            amp_configs (dict|None): AMP configurations. If AMP or pure
+                float16 training is used, the key 'level' of 'amp_configs'
+                should be set to 'O1' or 'O2' respectively. Otherwise, the
+                value of 'level' defaults to 'O0', which means training without


Maybe also need support for string O1 and O2 directly to keep concise, if it is the most common case.

Thanks! Done.

… add-amp-support-for-hapi

wzzju · 2021-04-13T13:06:32Z

python/paddle/hapi/model.py

+            amp_configs (str|dict|None): AMP configurations. If AMP or pure
+                float16 training is used, the key 'level' of 'amp_configs'
+                should be set to 'O1' or 'O2' respectively. Otherwise, the
+                value of 'level' defaults to 'O0', which means training without


Maybe which means float32 training. is more suitable.

Thanks! Done.

wzzju · 2021-04-13T13:10:01Z

python/paddle/hapi/model.py

@@ -598,6 +620,10 @@ def _compile_and_initialize(self, prog, mode):
                startup_prog = self._startup_prog._prune(uninitialized)
                self._executor.run(startup_prog)

+        if self._amp_level == "O2" and mode == 'train' and core.is_compiled_with_cuda(
+        ) and self._nranks <= 1:


If self._nranks > 1, self.model._optimizer.amp_init(place) is also required under "O2".

Thanks. Done. I upgraded the usage of Fleet API and fix this.
Before this, old Fleet APIs we used don't have amp_init function and I solved this in a wrong way.
Thanks:)

wzzju · 2021-04-13T13:22:40Z

python/paddle/hapi/model.py

+            accepted_param_set = {
+                'init_loss_scaling', 'incr_ratio', 'decr_ratio',
+                'incr_every_n_steps', 'decr_every_n_nan_or_inf',
+                'use_dynamic_loss_scaling', 'use_fp16_guard', 'use_pure_fp16'


If the user set level to O2, but set use_pure_fp16 to False, what will happen?
Or, set level to O1, but set use_pure_fp16 to True, what will happen?

The value of use_pure_fp16 would be ignored, and the true level only depends on what the user pass to amp_configs['level'] or amp_configs(str). I will disable use_pure_fp16 this parameters.

if 'use_pure_fp16' in amp_configs: raise ValueError("''use_pure_fp16' is an invalid parameter," "the level of mixed precision training only depends on 'O1' or 'O2'.")

… add-amp-support-for-hapi

saxon-zh

LGTM

zhiqiu

LGTM

XiaoguangHu01

LG API

qingqing01

LGTM

LiuChiachi added 2 commits March 3, 2021 13:27

make hapi support amp, and add unittest

fbd9374

make unittest only support GPU

15e1f1b

LiuChiachi added 4 commits March 16, 2021 09:22

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

3e596c0

… add-amp-support-for-hapi

update parameters for amp in hapi.Model

9a04619

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

85277bb

… add-amp-support-for-hapi

update hapi.Model.prepare interface, and update unittest

f21ad97

LiuChiachi added 2 commits March 25, 2021 16:07

fix test_model.py unittest bug

6724f68

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

8ded59d

… add-amp-support-for-hapi

LiuChiachi force-pushed the add-amp-support-for-hapi branch from 44a5ab0 to 8ded59d Compare March 25, 2021 16:08

LiuChiachi added 4 commits March 31, 2021 11:48

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

f4859ac

… add-amp-support-for-hapi

add grad clear in dygraph

ea476c7

use_fp16_guard defaults to True, which could avoid nan

e1eee29

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

ec09e24

… add-amp-support-for-hapi

zhiqiu reviewed Apr 8, 2021

View reviewed changes

LiuChiachi added 2 commits April 10, 2021 14:48

add input check, and add internal doc link to low level api

94d6562

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

4f41806

… add-amp-support-for-hapi

qingqing01 reviewed Apr 12, 2021

View reviewed changes

LiuChiachi added 2 commits April 12, 2021 07:27

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

bdffa79

… add-amp-support-for-hapi

update doc, and decrease the sample num of dataset to avoid timeout

ed29314

guoshengCS reviewed Apr 12, 2021

View reviewed changes

LiuChiachi added 5 commits April 12, 2021 13:16

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

12aedea

… add-amp-support-for-hapi

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

d950b9f

… add-amp-support-for-hapi

make hapi amp param support str 'O1' or 'O2'

d779d22

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

6d2bded

… add-amp-support-for-hapi

resume calling , modify the code of the check part

614d3a0

wzzju reviewed Apr 13, 2021

View reviewed changes

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

fa57be7

… add-amp-support-for-hapi

upgrade the usage of Fleet API, and disable 'pure_fp16' param

e0a4d62

saxon-zh approved these changes Apr 15, 2021

View reviewed changes

zhiqiu approved these changes Apr 15, 2021

View reviewed changes

XiaoguangHu01 approved these changes Apr 15, 2021

View reviewed changes

kolinwei approved these changes Apr 15, 2021

View reviewed changes

qingqing01 approved these changes Apr 15, 2021

View reviewed changes

guoshengCS merged commit fabdb43 into PaddlePaddle:develop Apr 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update hapi to support AMP #31417

Update hapi to support AMP #31417

LiuChiachi commented Mar 3, 2021 •

edited

Loading

paddle-bot-old bot commented Mar 12, 2021

zhiqiu Apr 8, 2021

LiuChiachi Apr 8, 2021

zhiqiu Apr 8, 2021

LiuChiachi Apr 14, 2021

zhiqiu Apr 8, 2021

zhiqiu Apr 8, 2021

LiuChiachi Apr 8, 2021

qingqing01 Apr 12, 2021

LiuChiachi Apr 14, 2021

qingqing01 Apr 12, 2021

LiuChiachi Apr 14, 2021

guoshengCS Apr 12, 2021

LiuChiachi Apr 14, 2021

guoshengCS Apr 12, 2021

LiuChiachi Apr 14, 2021

wzzju Apr 13, 2021

LiuChiachi Apr 14, 2021

wzzju Apr 13, 2021

LiuChiachi Apr 14, 2021

wzzju Apr 13, 2021

LiuChiachi Apr 14, 2021

saxon-zh left a comment

zhiqiu left a comment

XiaoguangHu01 left a comment

qingqing01 left a comment

Update hapi to support AMP #31417

Update hapi to support AMP #31417

Conversation

LiuChiachi commented Mar 3, 2021 • edited Loading

PR types

PR changes

Describe

API

API description

Example code:

paddle-bot-old bot commented Mar 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saxon-zh left a comment

Choose a reason for hiding this comment

zhiqiu left a comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

qingqing01 left a comment

Choose a reason for hiding this comment

LiuChiachi commented Mar 3, 2021 •

edited

Loading