Conversion from FP32 model to Mixed Precision model #15118

anirudh2290 · 2019-06-01T02:41:11Z

Description

Users want to bring a FP32 model, to convert it to a mixed precision model to run inference on it. This work leverages the existing work already done by @ptrendx, @Caenorst with AMP and tries to provide users with conversion APIs to convert their symbolic model or gluon model to a mixed precision model. This also adds the necessary C APIs, so that similar support for conversion APIs can be added in other frontends.

Thanks to all involved in prior discussions, suggestions and design review of the project (sincere apologies if I missed someone):
@ptrendx, @rahul003, Sudipta Sengupta (AWS) (@sudiptasengupta), Poorna Chand Srinivas Perumalla (@bhagatindia) (AWS), Wei Xiao (AWS), @Vikas89, @lupesko, @pengzhao-intel , @ZhennanQin

API Additions (Python)

Converting a symbolic model:

convert_model(sym, arg_params, aux_params, target_dtype="float16",
              target_dtype_ops=None, fp32_ops=None,
              conditional_fp32_ops=None, excluded_sym_names=None,
              cast_optional_params=False)

Converting a gluon model (hybrid block):

convert_hybrid_block(block, target_dtype="float16", target_dtype_ops=None,
                     fp32_ops=None, conditional_fp32_ops=None, excluded_sym_names=None,
                     ctx=mx.gpu(0), cast_optional_params=False)

Converting a symbol:

convert_symbol(sym, target_dtype="float16", target_dtype_ops=None,
               fp32_ops=None, conditional_fp32_ops=None,
               excluded_sym_names=None, data_names=None, cast_optional_params=False)

Refactoring or Existing code changes

Module API (executor_group.py)

Added support for retrieving input_types from the symbol and updating the input_types dict used in simple_bind.

Gluon API (parameter.py)

Refactored code to support loading params from param_dict as well as file. This is used in the convert_hybrid_block API.

Test Utils API (test_utils.py)

Copied download_model from example/image-classification/common/modelzoo.py to test_utils since this is a common use case and used in the tests

AMP Tests

Moved the AMP tests to gpu directory since wanted to limit AMP tests to one file and wanted to add some additional tests to it which work only on GPU.

Additions

Added APIs mentioned above in python/mxnet/contrib/amp/amp.py.
C API for symbol conversion.
Added nnvm pass in src/nnvm/low_precision_pass.cc
Added example, tutorials and tests.

Fixes : #14584
Doc: https://tinyurl.com/y42kx9hl

Other Flaky Tests/Bug Fixes:

Attempts to fix: Flaky Test: test_tensorrt_lenet5.test_tensorrt_inference #14978

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

anirudh2290 · 2019-06-06T22:14:14Z

cc @ptrendx @ZhennanQin @pengzhao-intel

docs/tutorials/amp/amp_tutorial.md

src/c_api/c_api_symbolic.cc

…to fp16_convert_model

anirudh2290 · 2019-06-20T01:44:29Z

@ptrendx @pengzhao-intel @samskalicky @larroy @ZhennanQin Thank you for your review ! I have addressed your comments.

…to fp16_convert_model

larroy

I think the approach and architecture look good, just refining a few C++ formalisms remain in my view.

src/nnvm/low_precision_pass.cc

larroy · 2019-06-20T22:56:01Z

src/c_api/c_api_symbolic.cc

@@ -810,6 +810,191 @@ int MXQuantizeSymbol(SymbolHandle sym_handle,
  API_END_HANDLE_ERROR(delete s);
 }

+// helper function to add mapping of node_name -> dtype map
+// for the given indexed graph and inferred_dtypes
+inline void _SetInputDTypes(


please use anon namespace or static function for helpers, not inline.

larroy · 2019-06-20T22:56:36Z

src/c_api/c_api_symbolic.cc

+    const std::unordered_map<std::string, int>& node_name_dtype_map,
+    const std::unordered_map<std::string, int>& node_without_dtype_map,
+    const std::unordered_set<std::string>& model_params,
+    const std::vector<nnvm::NodePtr>& args) {


args being const and modified through NodePtr is misleading, a documentation bit would help.

larroy · 2019-06-20T22:57:47Z

src/nnvm/low_precision_pass.cc

+}
+
+// add amp_cast node between curr_node and input
+void AddCastNode(const nnvm::NodeEntry &e, const std::string &suffix,


static or anon namespace to avoid additional uneccesary linker symbols.

larroy · 2019-06-20T22:58:13Z

src/nnvm/low_precision_pass.cc

+}
+
+// get suffix for a node entry so that it can be used for amp_cast/amp_multicast node name
+std::string GetSuffix(const nnvm::NodeEntry &node_entry,


Same comment about anon static, and I guess applies to other places.

larroy · 2019-06-20T23:04:06Z

src/c_api/c_api_symbolic.cc

+  std::unordered_set<std::string> widest_dtype_ops;
+  std::unordered_set<std::string> excluded_syms;
+  std::unordered_set<std::string> model_params;
+  std::unordered_map<std::string,


Suggest to add comment on what this thing represents, or maybe a typedef with a doc. It would help with code maintainability.

larroy

Nice pr.

anirudh2290 · 2019-06-27T01:43:41Z

From an offline review done by sudipta@, feedback was provided that it is important to for users to be able to obtain models with params casted wherever possible. After additional discussion with @ptrendx , we decided to add additional graph pass, which would go through all inputs of amp_cast and amp_multicast to infer the dtypes of the input nodes wherever possible. I have added the support for the same in the recent commits.

anirudh2290 mentioned this pull request Jun 1, 2019

[DON'T MERGE]PoC for Conversion from FP32 to mixed precision model #14702

Closed

anirudh2290 added 12 commits June 4, 2019 18:22

Initial AMP commit

36e5579

Fix

70409d0

Merge AMP Changes

ae3734f

AMP Changes to support conditional op names switch

9f041cc

Add example and fix issues with AMP conversion

4dce69e

Remove amp convert symbol test

8d63335

Fix comment for inference use case

e526c16

Remove input_names for convert_hybrid_block

888daa7

Check all conditions

ea8b220

Fix lint

eded365

Fix error_str for load_dict

be5d0dd

Fix lint, Add tests, fix bugs, add examples

3e8ca54

anirudh2290 force-pushed the fp16_convert_model branch from 40b33cd to 3e8ca54 Compare June 4, 2019 18:36

anirudh2290 added 7 commits June 4, 2019 21:17

Fix warnings

7640f50

Add license for example script

42967e8

Remove gpu test and move tests to test_contrib_amp

f502d74

Clean up AMP tests

f7d051d

Add additional comments, add tutorial

57060e7

Move the test to gpu dir

5a4b1f7

Make the code python3 compatible

7e1feae

anirudh2290 changed the title ~~[WIP] Conversion from FP32 model to Mixed Precision model~~ Conversion from FP32 model to Mixed Precision model Jun 6, 2019

anirudh2290 marked this pull request as ready for review June 6, 2019 22:12

anirudh2290 requested review from eric-haibin-lin and szha as code owners June 6, 2019 22:12

ptrendx reviewed Jun 6, 2019

View reviewed changes

docs/tutorials/amp/amp_tutorial.md Show resolved Hide resolved

ptrendx reviewed Jun 7, 2019

View reviewed changes

src/c_api/c_api_symbolic.cc Show resolved Hide resolved

anirudh2290 added 2 commits June 7, 2019 00:31

Upgrade archive utility, fixes: apache#15084

ea7dd32

Allow AR path to be chosen by user

94156b6

anirudh2290 added 2 commits June 17, 2019 09:50

Address review comments

8e52789

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

61e942f

…to fp16_convert_model

This was referenced Jun 18, 2019

cuda memcheck failures with different cuda versions #15273

Open

Count Sketch Backward, CUDA Memcheck failures #15284

Open

anirudh2290 added 4 commits June 20, 2019 00:50

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

bba14e0

…to fp16_convert_model

Add range based for

9c72372

Change quantized to low precision

89ea0cc

Fix lint

ed1b814

anirudh2290 added 2 commits June 20, 2019 01:50

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

65ebc74

…to fp16_convert_model

Fix pylint

43bc0c9

anirudh2290 force-pushed the fp16_convert_model branch from 7e9633d to 43bc0c9 Compare June 20, 2019 02:27

Forward args for Node::Create

b70ab2f

larroy suggested changes Jun 20, 2019

View reviewed changes

cjolivier01 approved these changes Jun 21, 2019

View reviewed changes

larroy approved these changes Jun 21, 2019

View reviewed changes

Fixes

ed82db1

ptrendx approved these changes Jun 24, 2019

View reviewed changes

anirudh2290 added 3 commits June 27, 2019 00:41

Add dtype casting wherever needed

1200dde

Fix lint in source

85a50e2

Add cast_optional_params to example

22d3a76

anirudh2290 added 4 commits June 27, 2019 21:27

Tweak example

383d664

Add README

2480273

Add README

8df637d

Add cast_optional_params test for convert_model and convert_hybrid_bloc

9903222

ptrendx merged commit ca565a0 into apache:master Jun 28, 2019

roywei mentioned this pull request Jul 3, 2019

fix nightly CI failure #15452

Merged

samskalicky mentioned this pull request Aug 29, 2019

[v1.5.x] FP16 Support for C Predict API (#15245) #16027

Closed

7 tasks

PawelGlomski-Intel mentioned this pull request Nov 26, 2021

Improve AMP, bf16 support. Support oneDNN ops in AMP #20753

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conversion from FP32 model to Mixed Precision model #15118

Conversion from FP32 model to Mixed Precision model #15118

anirudh2290 commented Jun 1, 2019 •

edited

Loading

anirudh2290 commented Jun 6, 2019

anirudh2290 commented Jun 20, 2019

larroy left a comment •

edited

Loading

larroy Jun 20, 2019

larroy Jun 20, 2019

larroy Jun 20, 2019

larroy Jun 20, 2019

larroy Jun 20, 2019

larroy left a comment

anirudh2290 commented Jun 27, 2019

Conversion from FP32 model to Mixed Precision model #15118

Conversion from FP32 model to Mixed Precision model #15118

Conversation

anirudh2290 commented Jun 1, 2019 • edited Loading

Description

API Additions (Python)

Refactoring or Existing code changes

Module API (executor_group.py)

Gluon API (parameter.py)

Test Utils API (test_utils.py)

AMP Tests

Additions

Checklist

Essentials

Changes

Comments

anirudh2290 commented Jun 6, 2019

anirudh2290 commented Jun 20, 2019

larroy left a comment • edited Loading

Choose a reason for hiding this comment

larroy Jun 20, 2019

Choose a reason for hiding this comment

larroy Jun 20, 2019

Choose a reason for hiding this comment

larroy Jun 20, 2019

Choose a reason for hiding this comment

larroy Jun 20, 2019

Choose a reason for hiding this comment

larroy Jun 20, 2019

Choose a reason for hiding this comment

larroy left a comment

Choose a reason for hiding this comment

anirudh2290 commented Jun 27, 2019

anirudh2290 commented Jun 1, 2019 •

edited

Loading

larroy left a comment •

edited

Loading