apache · anirudh2290 · Jul 11, 2019 · Jul 9, 2019
@@ -17,7 +17,7 @@
 
 # Using AMP (Automatic Mixed Precision) in MXNet
 
-Training Deep Learning networks is a very computationally intensive task. Novel model architectures tend to have increasing number of layers and parameters, which slows down training. Fortunately, new generations of training hardware as well as software optimizations, make it a feasible task. 
+Training Deep Learning networks is a very computationally intensive task. Novel model architectures tend to have increasing number of layers and parameters, which slows down training. Fortunately, new generations of training hardware as well as software optimizations, make it a feasible task.
 
 However, where most of the (both hardware and software) optimization opportunities exists is in exploiting lower precision (like FP16) to, for example, utilize Tensor Cores available on new Volta and Turing GPUs. While training in FP16 showed great success in image classification tasks, other more complicated neural networks typically stayed in FP32 due to difficulties in applying the FP16 training guidelines.
 
@@ -253,7 +253,10 @@ We got 60% speed increase from 3 additional lines of code!
 
 ## Inference with AMP
 
-To do inference with mixed precision for a trained model in FP32, you can use the conversion APIs: `amp.convert_model` for symbolic model and `amp.convert_hybrid_block` for gluon models. The conversion APIs will take the FP32 model as input and will return a mixed precision model, which can be used to run inference. Below, we demonstrate for a gluon model and a symbolic model: 1. Conversion from FP32 model to mixed precision model 2. Run inference on the mixed precision model.
+To do inference with mixed precision for a trained model in FP32, you can use the conversion APIs: `amp.convert_model` for symbolic model and `amp.convert_hybrid_block` for gluon models. The conversion APIs will take the FP32 model as input and will return a mixed precision model, which can be used to run inference.
+Below, we demonstrate for a gluon model and a symbolic model:
+- Conversion from FP32 model to mixed precision model.
+- Run inference on the mixed precision model.
 
 ```python
 with mx.Context(mx.gpu(0)):
@@ -289,6 +292,48 @@ with mx.Context(mx.gpu(0)):
     print("Conversion and Inference completed successfully")
 ```
 
+You can also customize the operators to run in FP16 versus the operator to run in FP32 or to conditionally run in FP32.
+Also, you can force cast the params wherever possible to FP16. Below is an example which demonstrates both these use cases
+for symbolic model. You can do the same for gluon hybrid block with `amp.convert_hybrid_block` API, `cast_optional_params` flag.
+
+```python
+with mx.Context(mx.gpu(0)):
+    # Below is an example of converting a symbolic model to a mixed precision model
+    # with only Convolution op being force casted to FP16.
+    dir_path = os.path.dirname(os.path.realpath(__file__))
+    model_path = os.path.join(dir_path, 'model')
+    if not os.path.isdir(model_path):
+        os.mkdir(model_path)
+    prefix, epoch = mx.test_utils.download_model("imagenet1k-resnet-18", dst_dir=model_path)
+    sym, arg_params, aux_params = mx.model.load_checkpoint(prefix, epoch)
+
+    # All Convolution ops should run in FP16, SoftmaxOutput and FullyConnected should run in FP32
+    # cast_optional_params=True: Force cast params to FP16 wherever possible
+    result_sym, result_arg_params, result_aux_params = amp.convert_model(sym,
+                                                                         arg_params,
+                                                                         aux_params,
+                                                                         target_dtype_ops=["Convolution"],
+                                                                         fp32_ops=["SoftmaxOutput", "FullyConnected"],
+                                                                         cast_optional_params=True)
+
+    # Run dummy inference with the converted symbolic model
+    mod = mx.mod.Module(result_sym, data_names=["data"], label_names=["softmax_label"], context=mx.current_context())
+    mod.bind(data_shapes=[['data', (1, 3, 224, 224)]], label_shapes=[['softmax_label', (1,)]])
+    mod.set_params(result_arg_params, result_aux_params)
+    mod.forward(mx.io.DataBatch(data=[mx.nd.ones((1, 3, 224, 224))],
+                                label=[mx.nd.ones((1,))]))
+    mod.get_outputs()[0].wait_to_read()
+
+    # Assert that the params for conv are in FP16, this is because cast_optional_params is set to True
+    assert mod._arg_params["conv0_weight"].dtype == np.float16
+    # FullyConnected params stay in FP32
+    assert mod._arg_params["fc1_bias"].dtype == np.float32
+
+    print("Conversion and Inference completed successfully")
+
+    # Serialize AMP model and save to disk
+    mod.save_checkpoint("amp_tutorial_model", 0, remove_amp_cast=False)
+```
 
 
 ## Current limitations of AMP