From 3d317f8fd82a88f1562c65ade735065455927c28 Mon Sep 17 00:00:00 2001 From: zha0q1 Date: Fri, 28 Jun 2019 13:55:18 -0700 Subject: [PATCH 01/14] update profiler tutorial --- docs/tutorials/python/profiler.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/docs/tutorials/python/profiler.md b/docs/tutorials/python/profiler.md index 9eed452c2e27..da2582bdba08 100644 --- a/docs/tutorials/python/profiler.md +++ b/docs/tutorials/python/profiler.md @@ -206,6 +206,15 @@ Let's zoom in to check the time taken by operators The above picture visualizes the sequence in which the operators were executed and the time taken by each operator. +### Profiling Custom Operators +Should the existing NDArray operators fail to meet all your model's needs, MXNet supports [`custom operators`](https://mxnet.incubator.apache.org/versions/master/tutorials/gluon/customop.html) that you can define in python. In forward() and backward()of a custom operator, there are two kinds of code: `pure python` code (Numpy operators inclued) and `sub-operators` (NDArray operators called within foward() and backward()). With that said, MXNet can profile the execution time of both kinds without additional setup. More specifically, the MXNet profiler will break a single custom operator call into a `pure python` event and several `sub-operator` events if there is any. Furthermore, all those events will have a prefix in their names, which is conviniently the name of the custom operator you called. + +![Custom Operator Profiling Screenshot](https://cwiki.apache.org/confluence/download/attachments/118172065/image2019-6-14_15-23-42.png?version=1&modificationDate=1560551022000&api=v2) + +As shown by the sreenshot, in the `Custom Operator` domain where all the custom-operator-related events fall into, you can easily visualize the execution time of each segment of your custom operator. + +Please note that: to be able to see the above-dscribed information, you need to set `profile_imperative` to `True` even when you are using custom operators in [`symbolic mode`](https://mxnet.incubator.apache.org/versions/master/tutorials/basic/symbol.html). The reason is that within custom operators, `pure python code` and `sub-operators` are still called imperatively. + ## Advanced: Using NVIDIA Profiling Tools MXNet's Profiler is the recommended starting point for profiling MXNet code, but NVIDIA also provides a couple of tools for low-level profiling of CUDA code: [NVProf](https://devblogs.nvidia.com/cuda-pro-tip-nvprof-your-handy-universal-gpu-profiler/), [Visual Profiler](https://developer.nvidia.com/nvidia-visual-profiler) and [Nsight Compute](https://developer.nvidia.com/nsight-compute). You can use these tools to profile all kinds of executables, so they can be used for profiling Python scripts running MXNet. And you can use these in conjunction with the MXNet Profiler to see high-level information from MXNet alongside the low-level CUDA kernel information. From 8b23a03d9ea24411a365302b4c5a6ef2d127f0f9 Mon Sep 17 00:00:00 2001 From: Zhaoqi Zhu Date: Fri, 28 Jun 2019 14:01:56 -0700 Subject: [PATCH 02/14] Update profiler.md --- docs/tutorials/python/profiler.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/tutorials/python/profiler.md b/docs/tutorials/python/profiler.md index da2582bdba08..dbc3a3c8d808 100644 --- a/docs/tutorials/python/profiler.md +++ b/docs/tutorials/python/profiler.md @@ -207,13 +207,13 @@ Let's zoom in to check the time taken by operators The above picture visualizes the sequence in which the operators were executed and the time taken by each operator. ### Profiling Custom Operators -Should the existing NDArray operators fail to meet all your model's needs, MXNet supports [`custom operators`](https://mxnet.incubator.apache.org/versions/master/tutorials/gluon/customop.html) that you can define in python. In forward() and backward()of a custom operator, there are two kinds of code: `pure python` code (Numpy operators inclued) and `sub-operators` (NDArray operators called within foward() and backward()). With that said, MXNet can profile the execution time of both kinds without additional setup. More specifically, the MXNet profiler will break a single custom operator call into a `pure python` event and several `sub-operator` events if there is any. Furthermore, all those events will have a prefix in their names, which is conviniently the name of the custom operator you called. +Should the existing NDArray operators fail to meet all your model's needs, MXNet supports [Custom Operators](https://mxnet.incubator.apache.org/versions/master/tutorials/gluon/customop.html) that you can define in python. In forward() and backward()of a custom operator, there are two kinds of code: `pure python` code (Numpy operators inclued) and `sub-operators` (NDArray operators called within foward() and backward()). With that said, MXNet can profile the execution time of both kinds without additional setup. More specifically, the MXNet profiler will break a single custom operator call into a `pure python` event and several `sub-operator` events if there is any. Furthermore, all those events will have a prefix in their names, which is conviniently the name of the custom operator you called. ![Custom Operator Profiling Screenshot](https://cwiki.apache.org/confluence/download/attachments/118172065/image2019-6-14_15-23-42.png?version=1&modificationDate=1560551022000&api=v2) -As shown by the sreenshot, in the `Custom Operator` domain where all the custom-operator-related events fall into, you can easily visualize the execution time of each segment of your custom operator. +As shown by the sreenshot, in the `Custom Operator` domain where all the custom-operator-related events fall into, you can easily visualize the execution time of each segment of your custom operator. Forexample, we know that "CustomAddTwo::sqrt" is a sub operator of custom operator "CustomAddTwo", and we also know when it is exectued accurately. -Please note that: to be able to see the above-dscribed information, you need to set `profile_imperative` to `True` even when you are using custom operators in [`symbolic mode`](https://mxnet.incubator.apache.org/versions/master/tutorials/basic/symbol.html). The reason is that within custom operators, `pure python code` and `sub-operators` are still called imperatively. +Please note that: to be able to see the above-dscribed information, you need to set `profile_imperative` to `True` even when you are using custom operators in [symbolic mode](https://mxnet.incubator.apache.org/versions/master/tutorials/basic/symbol.html). The reason is that within custom operators, `pure python code` and `sub-operators` are still called imperatively. ## Advanced: Using NVIDIA Profiling Tools From a2e5d9f7ddd024b66080b9ab7b6f12da7d43d883 Mon Sep 17 00:00:00 2001 From: Zhaoqi Zhu Date: Fri, 28 Jun 2019 14:02:33 -0700 Subject: [PATCH 03/14] Update profiler.md --- docs/tutorials/python/profiler.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/python/profiler.md b/docs/tutorials/python/profiler.md index dbc3a3c8d808..875c25b5fc59 100644 --- a/docs/tutorials/python/profiler.md +++ b/docs/tutorials/python/profiler.md @@ -207,7 +207,7 @@ Let's zoom in to check the time taken by operators The above picture visualizes the sequence in which the operators were executed and the time taken by each operator. ### Profiling Custom Operators -Should the existing NDArray operators fail to meet all your model's needs, MXNet supports [Custom Operators](https://mxnet.incubator.apache.org/versions/master/tutorials/gluon/customop.html) that you can define in python. In forward() and backward()of a custom operator, there are two kinds of code: `pure python` code (Numpy operators inclued) and `sub-operators` (NDArray operators called within foward() and backward()). With that said, MXNet can profile the execution time of both kinds without additional setup. More specifically, the MXNet profiler will break a single custom operator call into a `pure python` event and several `sub-operator` events if there is any. Furthermore, all those events will have a prefix in their names, which is conviniently the name of the custom operator you called. +Should the existing NDArray operators fail to meet all your model's needs, MXNet supports [Custom Operators](https://mxnet.incubator.apache.org/versions/master/tutorials/gluon/customop.html) that you can define in python. In forward() and backward() of a custom operator, there are two kinds of code: `pure python` code (Numpy operators inclued) and `sub-operators` (NDArray operators called within foward() and backward()). With that said, MXNet can profile the execution time of both kinds without additional setup. More specifically, the MXNet profiler will break a single custom operator call into a `pure python` event and several `sub-operator` events if there is any. Furthermore, all those events will have a prefix in their names, which is conviniently the name of the custom operator you called. ![Custom Operator Profiling Screenshot](https://cwiki.apache.org/confluence/download/attachments/118172065/image2019-6-14_15-23-42.png?version=1&modificationDate=1560551022000&api=v2) From 44d8017fa8b305a0ef8b314e2447420d8ffb6ba4 Mon Sep 17 00:00:00 2001 From: Zhaoqi Zhu Date: Fri, 28 Jun 2019 14:10:48 -0700 Subject: [PATCH 04/14] Update profiler.md --- docs/tutorials/python/profiler.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/python/profiler.md b/docs/tutorials/python/profiler.md index 875c25b5fc59..70a272e4083f 100644 --- a/docs/tutorials/python/profiler.md +++ b/docs/tutorials/python/profiler.md @@ -211,7 +211,7 @@ Should the existing NDArray operators fail to meet all your model's needs, MXNet ![Custom Operator Profiling Screenshot](https://cwiki.apache.org/confluence/download/attachments/118172065/image2019-6-14_15-23-42.png?version=1&modificationDate=1560551022000&api=v2) -As shown by the sreenshot, in the `Custom Operator` domain where all the custom-operator-related events fall into, you can easily visualize the execution time of each segment of your custom operator. Forexample, we know that "CustomAddTwo::sqrt" is a sub operator of custom operator "CustomAddTwo", and we also know when it is exectued accurately. +As shown by the sreenshot, in the `Custom Operator` domain where all the custom-operator-related events fall into, you can easily visualize the execution time of each segment of your custom operator. For example, we know that "CustomAddTwo::sqrt" is a `sub-operator` of custom operator "CustomAddTwo", and we also know when it is exectued accurately. Please note that: to be able to see the above-dscribed information, you need to set `profile_imperative` to `True` even when you are using custom operators in [symbolic mode](https://mxnet.incubator.apache.org/versions/master/tutorials/basic/symbol.html). The reason is that within custom operators, `pure python code` and `sub-operators` are still called imperatively. From 59d99755d84f9664c97f029eb4c383c445dedbad Mon Sep 17 00:00:00 2001 From: Zhaoqi Zhu Date: Fri, 28 Jun 2019 16:08:55 -0700 Subject: [PATCH 05/14] Update docs/tutorials/python/profiler.md Co-Authored-By: Aaron Markham --- docs/tutorials/python/profiler.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/python/profiler.md b/docs/tutorials/python/profiler.md index 70a272e4083f..d56520a5425a 100644 --- a/docs/tutorials/python/profiler.md +++ b/docs/tutorials/python/profiler.md @@ -207,7 +207,7 @@ Let's zoom in to check the time taken by operators The above picture visualizes the sequence in which the operators were executed and the time taken by each operator. ### Profiling Custom Operators -Should the existing NDArray operators fail to meet all your model's needs, MXNet supports [Custom Operators](https://mxnet.incubator.apache.org/versions/master/tutorials/gluon/customop.html) that you can define in python. In forward() and backward() of a custom operator, there are two kinds of code: `pure python` code (Numpy operators inclued) and `sub-operators` (NDArray operators called within foward() and backward()). With that said, MXNet can profile the execution time of both kinds without additional setup. More specifically, the MXNet profiler will break a single custom operator call into a `pure python` event and several `sub-operator` events if there is any. Furthermore, all those events will have a prefix in their names, which is conviniently the name of the custom operator you called. +Should the existing NDArray operators fail to meet all your model's needs, MXNet supports [Custom Operators](https://mxnet.incubator.apache.org/versions/master/tutorials/gluon/customop.html) that you can define in Python. In `forward()` and `backward()` of a custom operator, there are two kinds of code: "pure Python" code (NumPy operators included) and "sub-operators" (NDArray operators called within `forward()` and `backward()`). With that said, MXNet can profile the execution time of both kinds without additional setup. Specifically, the MXNet profiler will break a single custom operator call into a pure Python event and several sub-operator events if there are any. Furthermore, all of those events will have a prefix in their names, which is, conveniently, the name of the custom operator you called. ![Custom Operator Profiling Screenshot](https://cwiki.apache.org/confluence/download/attachments/118172065/image2019-6-14_15-23-42.png?version=1&modificationDate=1560551022000&api=v2) From 4dc7b66dec0a9eace3cfd743095b830778b19214 Mon Sep 17 00:00:00 2001 From: Zhaoqi Zhu Date: Fri, 28 Jun 2019 16:09:57 -0700 Subject: [PATCH 06/14] Update docs/tutorials/python/profiler.md Co-Authored-By: Aaron Markham --- docs/tutorials/python/profiler.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/python/profiler.md b/docs/tutorials/python/profiler.md index d56520a5425a..e1d5d0f90ab7 100644 --- a/docs/tutorials/python/profiler.md +++ b/docs/tutorials/python/profiler.md @@ -211,7 +211,7 @@ Should the existing NDArray operators fail to meet all your model's needs, MXNet ![Custom Operator Profiling Screenshot](https://cwiki.apache.org/confluence/download/attachments/118172065/image2019-6-14_15-23-42.png?version=1&modificationDate=1560551022000&api=v2) -As shown by the sreenshot, in the `Custom Operator` domain where all the custom-operator-related events fall into, you can easily visualize the execution time of each segment of your custom operator. For example, we know that "CustomAddTwo::sqrt" is a `sub-operator` of custom operator "CustomAddTwo", and we also know when it is exectued accurately. +As shown by the screenshot, in the **Custom Operator** domain where all the custom operator-related events fall into, you can easily visualize the execution time of each segment of your custom operator. For example, we know that `CustomAddTwo::sqrt` is a sub-operator of custom operator `CustomAddTwo`, and we also know when it is executed accurately. Please note that: to be able to see the above-dscribed information, you need to set `profile_imperative` to `True` even when you are using custom operators in [symbolic mode](https://mxnet.incubator.apache.org/versions/master/tutorials/basic/symbol.html). The reason is that within custom operators, `pure python code` and `sub-operators` are still called imperatively. From c548bb0b4d785a13ab905a58cdb981c499d0d6ed Mon Sep 17 00:00:00 2001 From: Zhaoqi Zhu Date: Fri, 28 Jun 2019 16:10:03 -0700 Subject: [PATCH 07/14] Update docs/tutorials/python/profiler.md Co-Authored-By: Aaron Markham --- docs/tutorials/python/profiler.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/python/profiler.md b/docs/tutorials/python/profiler.md index e1d5d0f90ab7..a036819428cf 100644 --- a/docs/tutorials/python/profiler.md +++ b/docs/tutorials/python/profiler.md @@ -213,7 +213,7 @@ Should the existing NDArray operators fail to meet all your model's needs, MXNet As shown by the screenshot, in the **Custom Operator** domain where all the custom operator-related events fall into, you can easily visualize the execution time of each segment of your custom operator. For example, we know that `CustomAddTwo::sqrt` is a sub-operator of custom operator `CustomAddTwo`, and we also know when it is executed accurately. -Please note that: to be able to see the above-dscribed information, you need to set `profile_imperative` to `True` even when you are using custom operators in [symbolic mode](https://mxnet.incubator.apache.org/versions/master/tutorials/basic/symbol.html). The reason is that within custom operators, `pure python code` and `sub-operators` are still called imperatively. +Please note that: to be able to see the previously described information, you need to set `profile_imperative` to `True` even when you are using custom operators in [symbolic mode](https://mxnet.incubator.apache.org/versions/master/tutorials/basic/symbol.html). The reason is that within custom operators, pure python code and sub-operators are still called imperatively. ## Advanced: Using NVIDIA Profiling Tools From da7d782cbaced46aed484d6f0ea825a15941c0fb Mon Sep 17 00:00:00 2001 From: Zhaoqi Zhu Date: Sat, 29 Jun 2019 20:25:21 -0700 Subject: [PATCH 08/14] Update profiler.md change image url to dmlc and add a code example --- docs/tutorials/python/profiler.md | 48 +++++++++++++++++++++++++++++-- 1 file changed, 46 insertions(+), 2 deletions(-) diff --git a/docs/tutorials/python/profiler.md b/docs/tutorials/python/profiler.md index a036819428cf..6411753765c8 100644 --- a/docs/tutorials/python/profiler.md +++ b/docs/tutorials/python/profiler.md @@ -209,9 +209,53 @@ The above picture visualizes the sequence in which the operators were executed a ### Profiling Custom Operators Should the existing NDArray operators fail to meet all your model's needs, MXNet supports [Custom Operators](https://mxnet.incubator.apache.org/versions/master/tutorials/gluon/customop.html) that you can define in Python. In `forward()` and `backward()` of a custom operator, there are two kinds of code: "pure Python" code (NumPy operators included) and "sub-operators" (NDArray operators called within `forward()` and `backward()`). With that said, MXNet can profile the execution time of both kinds without additional setup. Specifically, the MXNet profiler will break a single custom operator call into a pure Python event and several sub-operator events if there are any. Furthermore, all of those events will have a prefix in their names, which is, conveniently, the name of the custom operator you called. -![Custom Operator Profiling Screenshot](https://cwiki.apache.org/confluence/download/attachments/118172065/image2019-6-14_15-23-42.png?version=1&modificationDate=1560551022000&api=v2) +Let's try profiling custom operators with the following code example: +'''python +import mxnet as mx +from mxnet import nd +from mxnet import profiler + +class MyAddOne(mx.operator.CustomOp): + def forward(self, is_train, req, in_data, out_data, aux): + self.assign(out_data[0], req[0], in_data[0]+1) + + def backward(self, req, out_grad, in_data, out_data, in_grad, aux): + self.assign(in_grad[0], req[0], out_grad[0]) + +@mx.operator.register('MyAddOne') +class CustomAddOneProp(mx.operator.CustomOpProp): + def __init__(self): + super(CustomAddOneProp, self).__init__(need_top_grad=True) + + def list_arguments(self): + return ['data'] + + def list_outputs(self): + return ['output'] + + def infer_shape(self, in_shape): + return [in_shape[0]], [in_shape[0]], [] + + def create_operator(self, ctx, shapes, dtypes): + return MyAddOne() + + +inp = mx.nd.zeros(shape=(500, 500)) + +profiler.set_config(profile_all=True, continuous_dump = True) +profiler.set_state('run') + +w = nd.Custom(inp, op_type="MyAddOne") + +mx.nd.waitall() + +profiler.set_state('stop') +profiler.dump() -As shown by the screenshot, in the **Custom Operator** domain where all the custom operator-related events fall into, you can easily visualize the execution time of each segment of your custom operator. For example, we know that `CustomAddTwo::sqrt` is a sub-operator of custom operator `CustomAddTwo`, and we also know when it is executed accurately. +''' +Here, we have created a custom operator called `MyAddOne`, and within its `foward()` function, we simply add one to the input. We can visualize the dump file in `chrome://tracing/`: +![Custom Operator Profiling Screenshot](https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tutorials/python/profiler/profiler_output_custom_operator_chrome.png.png) +As shown by the screenshot, in the **Custom Operator** domain where all the custom operator-related events fall into, we can easily visualize the execution time of each segment of `MyAddOne`. We can tell that `MyAddOne::pure_python` is executed first. We also know that `CopyCPU2CPU` and `_plus_scalr` are two "sub-operators" of `MyAddOne` and the sequence in which they are exectued. Please note that: to be able to see the previously described information, you need to set `profile_imperative` to `True` even when you are using custom operators in [symbolic mode](https://mxnet.incubator.apache.org/versions/master/tutorials/basic/symbol.html). The reason is that within custom operators, pure python code and sub-operators are still called imperatively. From 1aa75cd699e20b327054839d56a00af551099685 Mon Sep 17 00:00:00 2001 From: Zhaoqi Zhu Date: Sat, 29 Jun 2019 20:26:48 -0700 Subject: [PATCH 09/14] Update profiler.md --- docs/tutorials/python/profiler.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/tutorials/python/profiler.md b/docs/tutorials/python/profiler.md index 6411753765c8..707ec7a8e04c 100644 --- a/docs/tutorials/python/profiler.md +++ b/docs/tutorials/python/profiler.md @@ -210,6 +210,7 @@ The above picture visualizes the sequence in which the operators were executed a Should the existing NDArray operators fail to meet all your model's needs, MXNet supports [Custom Operators](https://mxnet.incubator.apache.org/versions/master/tutorials/gluon/customop.html) that you can define in Python. In `forward()` and `backward()` of a custom operator, there are two kinds of code: "pure Python" code (NumPy operators included) and "sub-operators" (NDArray operators called within `forward()` and `backward()`). With that said, MXNet can profile the execution time of both kinds without additional setup. Specifically, the MXNet profiler will break a single custom operator call into a pure Python event and several sub-operator events if there are any. Furthermore, all of those events will have a prefix in their names, which is, conveniently, the name of the custom operator you called. Let's try profiling custom operators with the following code example: + '''python import mxnet as mx from mxnet import nd @@ -253,8 +254,11 @@ profiler.set_state('stop') profiler.dump() ''' + Here, we have created a custom operator called `MyAddOne`, and within its `foward()` function, we simply add one to the input. We can visualize the dump file in `chrome://tracing/`: + ![Custom Operator Profiling Screenshot](https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tutorials/python/profiler/profiler_output_custom_operator_chrome.png.png) + As shown by the screenshot, in the **Custom Operator** domain where all the custom operator-related events fall into, we can easily visualize the execution time of each segment of `MyAddOne`. We can tell that `MyAddOne::pure_python` is executed first. We also know that `CopyCPU2CPU` and `_plus_scalr` are two "sub-operators" of `MyAddOne` and the sequence in which they are exectued. Please note that: to be able to see the previously described information, you need to set `profile_imperative` to `True` even when you are using custom operators in [symbolic mode](https://mxnet.incubator.apache.org/versions/master/tutorials/basic/symbol.html). The reason is that within custom operators, pure python code and sub-operators are still called imperatively. From f5ff2e10526b7ae8e302af548d3212ece85a383f Mon Sep 17 00:00:00 2001 From: Zhaoqi Zhu Date: Sat, 29 Jun 2019 20:28:11 -0700 Subject: [PATCH 10/14] Update profiler.md --- docs/tutorials/python/profiler.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/tutorials/python/profiler.md b/docs/tutorials/python/profiler.md index 707ec7a8e04c..f835d16f7a23 100644 --- a/docs/tutorials/python/profiler.md +++ b/docs/tutorials/python/profiler.md @@ -211,7 +211,8 @@ Should the existing NDArray operators fail to meet all your model's needs, MXNet Let's try profiling custom operators with the following code example: -'''python +```python + import mxnet as mx from mxnet import nd from mxnet import profiler @@ -252,8 +253,7 @@ mx.nd.waitall() profiler.set_state('stop') profiler.dump() - -''' +``` Here, we have created a custom operator called `MyAddOne`, and within its `foward()` function, we simply add one to the input. We can visualize the dump file in `chrome://tracing/`: From fcb5facb1bb678e0b56d767f12536aaa72b1c846 Mon Sep 17 00:00:00 2001 From: Zhaoqi Zhu Date: Sat, 29 Jun 2019 20:29:31 -0700 Subject: [PATCH 11/14] Update profiler.md --- docs/tutorials/python/profiler.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/python/profiler.md b/docs/tutorials/python/profiler.md index f835d16f7a23..6e792a6cc333 100644 --- a/docs/tutorials/python/profiler.md +++ b/docs/tutorials/python/profiler.md @@ -257,7 +257,7 @@ profiler.dump() Here, we have created a custom operator called `MyAddOne`, and within its `foward()` function, we simply add one to the input. We can visualize the dump file in `chrome://tracing/`: -![Custom Operator Profiling Screenshot](https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tutorials/python/profiler/profiler_output_custom_operator_chrome.png.png) +![Custom Operator Profiling Screenshot](https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tutorials/python/profiler/profiler_output_custom_operator_chrome.png) As shown by the screenshot, in the **Custom Operator** domain where all the custom operator-related events fall into, we can easily visualize the execution time of each segment of `MyAddOne`. We can tell that `MyAddOne::pure_python` is executed first. We also know that `CopyCPU2CPU` and `_plus_scalr` are two "sub-operators" of `MyAddOne` and the sequence in which they are exectued. From 7f2c9db39bec0d9d60e3f62291db9dc159907243 Mon Sep 17 00:00:00 2001 From: Zhaoqi Zhu Date: Mon, 1 Jul 2019 17:41:14 -0700 Subject: [PATCH 12/14] Update profiler.md --- docs/tutorials/python/profiler.md | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/docs/tutorials/python/profiler.md b/docs/tutorials/python/profiler.md index 6e792a6cc333..8c98f880b426 100644 --- a/docs/tutorials/python/profiler.md +++ b/docs/tutorials/python/profiler.md @@ -255,13 +255,30 @@ profiler.set_state('stop') profiler.dump() ``` -Here, we have created a custom operator called `MyAddOne`, and within its `foward()` function, we simply add one to the input. We can visualize the dump file in `chrome://tracing/`: +Here, we have created a custom operator called `MyAddOne`, and within its `forward()` function, we simply add one to the input. We can visualize the dump file in `chrome://tracing/`: ![Custom Operator Profiling Screenshot](https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tutorials/python/profiler/profiler_output_custom_operator_chrome.png) -As shown by the screenshot, in the **Custom Operator** domain where all the custom operator-related events fall into, we can easily visualize the execution time of each segment of `MyAddOne`. We can tell that `MyAddOne::pure_python` is executed first. We also know that `CopyCPU2CPU` and `_plus_scalr` are two "sub-operators" of `MyAddOne` and the sequence in which they are exectued. +As shown by the screenshot, in the **Custom Operator** domain where all the custom operator-related events fall into, we can easily visualize the execution time of each segment of `MyAddOne`. We can tell that `MyAddOne::pure_python` is executed first. We also know that `CopyCPU2CPU` and `_plus_scalr` are two "sub-operators" of `MyAddOne` and the sequence in which they are executed. -Please note that: to be able to see the previously described information, you need to set `profile_imperative` to `True` even when you are using custom operators in [symbolic mode](https://mxnet.incubator.apache.org/versions/master/tutorials/basic/symbol.html). The reason is that within custom operators, pure python code and sub-operators are still called imperatively. +Please note that: to be able to see the previously described information, you need to set `profile_imperative` to `True` even when you are using custom operators in [symbolic mode](https://mxnet.incubator.apache.org/versions/master/tutorials/basic/symbol.html) (refer to the code snippet below, which is the symbolic-mode equivelent of the code example above). The reason is that within custom operators, pure python code and sub-operators are still called imperatively. +```python +# Set profile_all to True +profiler.set_config(profile_all=True, aggregate_stats=True, continuous_dump = True) +# OR, Explicitly Set profile_symbolic and profile_imperative to True +profiler.set_config(profile_symbolic = False, profile_imperative = False, \ + aggregate_stats=True, continuous_dump = True) + +profiler.set_state('run') +# Use Symbolic Mode +a = mx.symbol.Variable('a') +b = mx.symbol.Custom(data=a, op_type='MyAddOne') +c = b.bind(mx.cpu(), {'a': inp}) +y = c.forward() +mx.nd.waitall() +profiler.set_state('stop') +profiler.dump() +``` ## Advanced: Using NVIDIA Profiling Tools From aec603946a7898f4a567ebe3e765871df7961518 Mon Sep 17 00:00:00 2001 From: Zhaoqi Zhu Date: Mon, 1 Jul 2019 22:26:15 -0700 Subject: [PATCH 13/14] Re-trigger build --- docs/tutorials/python/profiler.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/tutorials/python/profiler.md b/docs/tutorials/python/profiler.md index 8c98f880b426..b6b5efff766e 100644 --- a/docs/tutorials/python/profiler.md +++ b/docs/tutorials/python/profiler.md @@ -262,6 +262,7 @@ Here, we have created a custom operator called `MyAddOne`, and within its `forwa As shown by the screenshot, in the **Custom Operator** domain where all the custom operator-related events fall into, we can easily visualize the execution time of each segment of `MyAddOne`. We can tell that `MyAddOne::pure_python` is executed first. We also know that `CopyCPU2CPU` and `_plus_scalr` are two "sub-operators" of `MyAddOne` and the sequence in which they are executed. Please note that: to be able to see the previously described information, you need to set `profile_imperative` to `True` even when you are using custom operators in [symbolic mode](https://mxnet.incubator.apache.org/versions/master/tutorials/basic/symbol.html) (refer to the code snippet below, which is the symbolic-mode equivelent of the code example above). The reason is that within custom operators, pure python code and sub-operators are still called imperatively. + ```python # Set profile_all to True profiler.set_config(profile_all=True, aggregate_stats=True, continuous_dump = True) From 9ef94be830c18ced87f61028aea24e4fca7eb52a Mon Sep 17 00:00:00 2001 From: Zhaoqi Zhu Date: Tue, 2 Jul 2019 10:25:04 -0700 Subject: [PATCH 14/14] Update profiler.md --- docs/tutorials/python/profiler.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/tutorials/python/profiler.md b/docs/tutorials/python/profiler.md index b6b5efff766e..f2da62833fcf 100644 --- a/docs/tutorials/python/profiler.md +++ b/docs/tutorials/python/profiler.md @@ -267,7 +267,7 @@ Please note that: to be able to see the previously described information, you ne # Set profile_all to True profiler.set_config(profile_all=True, aggregate_stats=True, continuous_dump = True) # OR, Explicitly Set profile_symbolic and profile_imperative to True -profiler.set_config(profile_symbolic = False, profile_imperative = False, \ +profiler.set_config(profile_symbolic = True, profile_imperative = True, \ aggregate_stats=True, continuous_dump = True) profiler.set_state('run')