Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update hapi to support AMP #31417

Merged
merged 23 commits into from
Apr 15, 2021

Conversation

LiuChiachi
Copy link
Contributor

@LiuChiachi LiuChiachi commented Mar 3, 2021

PR types

New features

PR changes

APIs

Describe

Add AMP support for High-level API.

API

def prepare(self,
     optimizer=None, loss=None, metrics=None,
     amp_configs=None): # new  parameters for `Model.prepare()`

amp_configs (str|dict|None): AMP configurations. If AMP or pure float16 training is used, the key 'level' of 'amp_configs' should be set to 'O1' or 'O2' respectively. Otherwise, the value of 'level' defaults to 'O0', which means float32 training. In addition to 'level', users could pass in more parameters consistent with mixed precision API. The supported keys are: 'init_loss_scaling', 'incr_ratio', 'decr_ratio', 'incr_every_n_steps', 'decr_every_n_nan_or_inf', 'use_dynamic_loss_scaling', 'custom_white_list', 'custom_black_list', and 'custom_black_varnames'or 'use_fp16_guard' is only supported in static mode. Users could refer to mixed precision API documentations :ref:api_paddle_amp_auto_cast and :ref:api_paddle_amp_GradScaler for details. For convenience, 'amp_configs' could be set to 'O1' or 'O2' if no more
parameters are needed. 'amp_configs' could be None in float32 training. Default: None.

API description

When training on GPU, auto mixed precision (AMP) training is supported, and pure float16 training is also supported in static mode while using Adam, AdamW and Momentum optimizer. Before using pure float16 training, multi_precision could be set to True when creating optimizer, which can avoid poor accuracy or slow convergence in a way, and inputs of dtype float
should be cast to float16 by users. Users should also use paddle.static.amp.fp16_guard API to limit the range of pure float16 training, otherwise, 'use_fp16_guard' should be set to False by users. However, limiting the range of is not supported during training using AMP.

Example code:

import paddle
import paddle.nn as nn
import paddle.vision.transforms as T

def run_example_code():
  device = paddle.set_device('gpu')

  net = nn.Sequential(nn.Flatten(1), nn.Linear(784, 200), nn.Tanh(),
                      nn.Linear(200, 10))

  model = paddle.Model(net)
  optim = paddle.optimizer.SGD(learning_rate=1e-3, parameters=model.parameters())

  amp_configs = {
      "level": "O1",
      "custom_white_list": {'conv2d'},
      "use_dynamic_loss_scaling": True
  }
  model.prepare(optim,
      paddle.nn.CrossEntropyLoss(),
      paddle.metric.Accuracy(),
      amp_configs=amp_configs)

  transform = T.Compose([T.Transpose(), T.Normalize([127.5], [127.5])])
  data = paddle.vision.datasets.MNIST(mode='train', transform=transform)
  model.fit(data, epochs=2, batch_size=32, verbose=1)

# mixed precision training is only supported on GPU now.
if paddle.is_compiled_with_cuda():
  run_example_code()

@paddle-bot-old
Copy link

Sorry to inform you that 15e1f1b's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@@ -1255,7 +1348,19 @@ def prepare(self, optimizer=None, loss=None, metrics=None):
It can be None when there is no loss.
metrics (Metric|list of Metric|None): If metrics is set, all
metrics will be calculated and output in train/eval mode.

amp_configs (dict|None): AMP configurations. If AMP or pure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to make a class for amp_config, which can limit the key-values that are accepted by amp_configs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. If some key-values are not accepted, the program will break down here. It is necessary to catch the wrong key-values earlier.

else:
outputs = self.model.network.forward(
* [to_variable(x) for x in inputs])
scaler = paddle.amp.GradScaler(**self._amp_configs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all key-values in self._amp_configs accepted by GradScaler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done. I added input check for amp_configs. Thanks:)

scaler = paddle.amp.GradScaler(**self._amp_configs)
with paddle.amp.auto_cast(
enable=False if self._amp_level == 'O0' else True,
**self._amp_custom_lists):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar above


self.model._optimizer.minimize(final_loss)
self.model.network.clear_gradients()
if self._amp_level != "O0":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dygraph does not support 02 yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When O2 is chosen in dygraph, it would be treated as O1, and users would be warned. Is this correct?

model.prepare(optim,
paddle.nn.CrossEntropyLoss(),
paddle.metric.Accuracy())
paddle.metric.Accuracy(),
amp_configs=amp_configs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to add another example for amp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

'use_dynamic_loss_scaling', 'custom_white_list',
'custom_black_list', and 'custom_black_varnames',
'use_pure_fp16' or 'use_fp16_guard' is only supported in static
mode. Users could refer to low-level API documentations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove 'low-level'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

]:
if param_name in self._adapter._amp_configs:
self._adapter._amp_custom_lists[
param_name] = amp_configs.pop(param_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better not to pop from input dict amp_configs and treat it as constant, since the user defined input might be reused elsewhere, such as another amp model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

amp_configs (dict|None): AMP configurations. If AMP or pure
float16 training is used, the key 'level' of 'amp_configs'
should be set to 'O1' or 'O2' respectively. Otherwise, the
value of 'level' defaults to 'O0', which means training without
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also need support for string O1 and O2 directly to keep concise, if it is the most common case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

amp_configs (str|dict|None): AMP configurations. If AMP or pure
float16 training is used, the key 'level' of 'amp_configs'
should be set to 'O1' or 'O2' respectively. Otherwise, the
value of 'level' defaults to 'O0', which means training without
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe which means float32 training. is more suitable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

@@ -598,6 +620,10 @@ def _compile_and_initialize(self, prog, mode):
startup_prog = self._startup_prog._prune(uninitialized)
self._executor.run(startup_prog)

if self._amp_level == "O2" and mode == 'train' and core.is_compiled_with_cuda(
) and self._nranks <= 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If self._nranks > 1, self.model._optimizer.amp_init(place) is also required under "O2".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done. I upgraded the usage of Fleet API and fix this.
Before this, old Fleet APIs we used don't have amp_init function and I solved this in a wrong way.
Thanks:)

accepted_param_set = {
'init_loss_scaling', 'incr_ratio', 'decr_ratio',
'incr_every_n_steps', 'decr_every_n_nan_or_inf',
'use_dynamic_loss_scaling', 'use_fp16_guard', 'use_pure_fp16'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user set level to O2, but set use_pure_fp16 to False, what will happen?
Or, set level to O1, but set use_pure_fp16 to True, what will happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of use_pure_fp16 would be ignored, and the true level only depends on what the user pass to amp_configs['level'] or amp_configs(str). I will disable use_pure_fp16 this parameters.

  if 'use_pure_fp16' in amp_configs:
      raise ValueError("''use_pure_fp16' is an invalid parameter,"
          "the level of mixed precision training only depends on 'O1' or 'O2'.")

Copy link

@saxon-zh saxon-zh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG API

Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@guoshengCS guoshengCS merged commit fabdb43 into PaddlePaddle:develop Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants