Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Improve docs for AMP #15455

Merged
merged 1 commit into from
Jul 11, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 47 additions & 2 deletions docs/tutorials/amp/amp_tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

# Using AMP (Automatic Mixed Precision) in MXNet

Training Deep Learning networks is a very computationally intensive task. Novel model architectures tend to have increasing number of layers and parameters, which slows down training. Fortunately, new generations of training hardware as well as software optimizations, make it a feasible task.
Training Deep Learning networks is a very computationally intensive task. Novel model architectures tend to have increasing number of layers and parameters, which slows down training. Fortunately, new generations of training hardware as well as software optimizations, make it a feasible task.

However, where most of the (both hardware and software) optimization opportunities exists is in exploiting lower precision (like FP16) to, for example, utilize Tensor Cores available on new Volta and Turing GPUs. While training in FP16 showed great success in image classification tasks, other more complicated neural networks typically stayed in FP32 due to difficulties in applying the FP16 training guidelines.

Expand Down Expand Up @@ -253,7 +253,10 @@ We got 60% speed increase from 3 additional lines of code!

## Inference with AMP

To do inference with mixed precision for a trained model in FP32, you can use the conversion APIs: `amp.convert_model` for symbolic model and `amp.convert_hybrid_block` for gluon models. The conversion APIs will take the FP32 model as input and will return a mixed precision model, which can be used to run inference. Below, we demonstrate for a gluon model and a symbolic model: 1. Conversion from FP32 model to mixed precision model 2. Run inference on the mixed precision model.
To do inference with mixed precision for a trained model in FP32, you can use the conversion APIs: `amp.convert_model` for symbolic model and `amp.convert_hybrid_block` for gluon models. The conversion APIs will take the FP32 model as input and will return a mixed precision model, which can be used to run inference.
Below, we demonstrate for a gluon model and a symbolic model:
- Conversion from FP32 model to mixed precision model.
- Run inference on the mixed precision model.

```python
with mx.Context(mx.gpu(0)):
Expand Down Expand Up @@ -289,6 +292,48 @@ with mx.Context(mx.gpu(0)):
print("Conversion and Inference completed successfully")
```

You can also customize the operators to run in FP16 versus the operator to run in FP32 or to conditionally run in FP32.
Also, you can force cast the params wherever possible to FP16. Below is an example which demonstrates both these use cases
for symbolic model. You can do the same for gluon hybrid block with `amp.convert_hybrid_block` API, `cast_optional_params` flag.

```python
with mx.Context(mx.gpu(0)):
# Below is an example of converting a symbolic model to a mixed precision model
# with only Convolution op being force casted to FP16.
dir_path = os.path.dirname(os.path.realpath(__file__))
model_path = os.path.join(dir_path, 'model')
if not os.path.isdir(model_path):
os.mkdir(model_path)
prefix, epoch = mx.test_utils.download_model("imagenet1k-resnet-18", dst_dir=model_path)
sym, arg_params, aux_params = mx.model.load_checkpoint(prefix, epoch)

# All Convolution ops should run in FP16, SoftmaxOutput and FullyConnected should run in FP32
# cast_optional_params=True: Force cast params to FP16 wherever possible
result_sym, result_arg_params, result_aux_params = amp.convert_model(sym,
arg_params,
aux_params,
target_dtype_ops=["Convolution"],
fp32_ops=["SoftmaxOutput", "FullyConnected"],
cast_optional_params=True)

# Run dummy inference with the converted symbolic model
mod = mx.mod.Module(result_sym, data_names=["data"], label_names=["softmax_label"], context=mx.current_context())
mod.bind(data_shapes=[['data', (1, 3, 224, 224)]], label_shapes=[['softmax_label', (1,)]])
mod.set_params(result_arg_params, result_aux_params)
mod.forward(mx.io.DataBatch(data=[mx.nd.ones((1, 3, 224, 224))],
label=[mx.nd.ones((1,))]))
mod.get_outputs()[0].wait_to_read()

# Assert that the params for conv are in FP16, this is because cast_optional_params is set to True
assert mod._arg_params["conv0_weight"].dtype == np.float16
# FullyConnected params stay in FP32
assert mod._arg_params["fc1_bias"].dtype == np.float32

print("Conversion and Inference completed successfully")

# Serialize AMP model and save to disk
mod.save_checkpoint("amp_tutorial_model", 0, remove_amp_cast=False)
```


## Current limitations of AMP
Expand Down