Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Commit

Permalink
Improve docs for AMP (#15455)
Browse files Browse the repository at this point in the history
  • Loading branch information
anirudh2290 authored Jul 11, 2019
1 parent 6191dd7 commit 7a83883
Showing 1 changed file with 47 additions and 2 deletions.
49 changes: 47 additions & 2 deletions docs/tutorials/amp/amp_tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

# Using AMP (Automatic Mixed Precision) in MXNet

Training Deep Learning networks is a very computationally intensive task. Novel model architectures tend to have increasing number of layers and parameters, which slows down training. Fortunately, new generations of training hardware as well as software optimizations, make it a feasible task.
Training Deep Learning networks is a very computationally intensive task. Novel model architectures tend to have increasing number of layers and parameters, which slows down training. Fortunately, new generations of training hardware as well as software optimizations, make it a feasible task.

However, where most of the (both hardware and software) optimization opportunities exists is in exploiting lower precision (like FP16) to, for example, utilize Tensor Cores available on new Volta and Turing GPUs. While training in FP16 showed great success in image classification tasks, other more complicated neural networks typically stayed in FP32 due to difficulties in applying the FP16 training guidelines.

Expand Down Expand Up @@ -253,7 +253,10 @@ We got 60% speed increase from 3 additional lines of code!

## Inference with AMP

To do inference with mixed precision for a trained model in FP32, you can use the conversion APIs: `amp.convert_model` for symbolic model and `amp.convert_hybrid_block` for gluon models. The conversion APIs will take the FP32 model as input and will return a mixed precision model, which can be used to run inference. Below, we demonstrate for a gluon model and a symbolic model: 1. Conversion from FP32 model to mixed precision model 2. Run inference on the mixed precision model.
To do inference with mixed precision for a trained model in FP32, you can use the conversion APIs: `amp.convert_model` for symbolic model and `amp.convert_hybrid_block` for gluon models. The conversion APIs will take the FP32 model as input and will return a mixed precision model, which can be used to run inference.
Below, we demonstrate for a gluon model and a symbolic model:
- Conversion from FP32 model to mixed precision model.
- Run inference on the mixed precision model.

```python
with mx.Context(mx.gpu(0)):
Expand Down Expand Up @@ -289,6 +292,48 @@ with mx.Context(mx.gpu(0)):
print("Conversion and Inference completed successfully")
```

You can also customize the operators to run in FP16 versus the operator to run in FP32 or to conditionally run in FP32.
Also, you can force cast the params wherever possible to FP16. Below is an example which demonstrates both these use cases
for symbolic model. You can do the same for gluon hybrid block with `amp.convert_hybrid_block` API, `cast_optional_params` flag.

```python
with mx.Context(mx.gpu(0)):
# Below is an example of converting a symbolic model to a mixed precision model
# with only Convolution op being force casted to FP16.
dir_path = os.path.dirname(os.path.realpath(__file__))
model_path = os.path.join(dir_path, 'model')
if not os.path.isdir(model_path):
os.mkdir(model_path)
prefix, epoch = mx.test_utils.download_model("imagenet1k-resnet-18", dst_dir=model_path)
sym, arg_params, aux_params = mx.model.load_checkpoint(prefix, epoch)

# All Convolution ops should run in FP16, SoftmaxOutput and FullyConnected should run in FP32
# cast_optional_params=True: Force cast params to FP16 wherever possible
result_sym, result_arg_params, result_aux_params = amp.convert_model(sym,
arg_params,
aux_params,
target_dtype_ops=["Convolution"],
fp32_ops=["SoftmaxOutput", "FullyConnected"],
cast_optional_params=True)

# Run dummy inference with the converted symbolic model
mod = mx.mod.Module(result_sym, data_names=["data"], label_names=["softmax_label"], context=mx.current_context())
mod.bind(data_shapes=[['data', (1, 3, 224, 224)]], label_shapes=[['softmax_label', (1,)]])
mod.set_params(result_arg_params, result_aux_params)
mod.forward(mx.io.DataBatch(data=[mx.nd.ones((1, 3, 224, 224))],
label=[mx.nd.ones((1,))]))
mod.get_outputs()[0].wait_to_read()

# Assert that the params for conv are in FP16, this is because cast_optional_params is set to True
assert mod._arg_params["conv0_weight"].dtype == np.float16
# FullyConnected params stay in FP32
assert mod._arg_params["fc1_bias"].dtype == np.float32

print("Conversion and Inference completed successfully")

# Serialize AMP model and save to disk
mod.save_checkpoint("amp_tutorial_model", 0, remove_amp_cast=False)
```


## Current limitations of AMP
Expand Down

0 comments on commit 7a83883

Please sign in to comment.