Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Conversion from FP32 model to Mixed Precision model #15118

Merged
merged 45 commits into from
Jun 28, 2019

Conversation

anirudh2290
Copy link
Member

@anirudh2290 anirudh2290 commented Jun 1, 2019

Description

Users want to bring a FP32 model, to convert it to a mixed precision model to run inference on it. This work leverages the existing work already done by @ptrendx, @Caenorst with AMP and tries to provide users with conversion APIs to convert their symbolic model or gluon model to a mixed precision model. This also adds the necessary C APIs, so that similar support for conversion APIs can be added in other frontends.

Thanks to all involved in prior discussions, suggestions and design review of the project (sincere apologies if I missed someone):
@ptrendx, @rahul003, Sudipta Sengupta (AWS) (@sudiptasengupta), Poorna Chand Srinivas Perumalla (@bhagatindia) (AWS), Wei Xiao (AWS), @Vikas89, @lupesko, @pengzhao-intel , @ZhennanQin

API Additions (Python)

  1. Converting a symbolic model:
convert_model(sym, arg_params, aux_params, target_dtype="float16",
              target_dtype_ops=None, fp32_ops=None,
              conditional_fp32_ops=None, excluded_sym_names=None,
              cast_optional_params=False)
  1. Converting a gluon model (hybrid block):
convert_hybrid_block(block, target_dtype="float16", target_dtype_ops=None,
                     fp32_ops=None, conditional_fp32_ops=None, excluded_sym_names=None,
                     ctx=mx.gpu(0), cast_optional_params=False)
  1. Converting a symbol:
convert_symbol(sym, target_dtype="float16", target_dtype_ops=None,
               fp32_ops=None, conditional_fp32_ops=None,
               excluded_sym_names=None, data_names=None, cast_optional_params=False)

Refactoring or Existing code changes

Module API (executor_group.py)

  1. Added support for retrieving input_types from the symbol and updating the input_types dict used in simple_bind.

Gluon API (parameter.py)

  1. Refactored code to support loading params from param_dict as well as file. This is used in the convert_hybrid_block API.

Test Utils API (test_utils.py)

  1. Copied download_model from example/image-classification/common/modelzoo.py to test_utils since this is a common use case and used in the tests

AMP Tests

  1. Moved the AMP tests to gpu directory since wanted to limit AMP tests to one file and wanted to add some additional tests to it which work only on GPU.

Additions

  1. Added APIs mentioned above in python/mxnet/contrib/amp/amp.py.
  2. C API for symbol conversion.
  3. Added nnvm pass in src/nnvm/low_precision_pass.cc
  4. Added example, tutorials and tests.

Fixes : #14584
Doc: https://tinyurl.com/y42kx9hl

Other Flaky Tests/Bug Fixes:

  1. Attempts to fix: Flaky Test: test_tensorrt_lenet5.test_tensorrt_inference #14978

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@anirudh2290 anirudh2290 changed the title [WIP] Conversion from FP32 model to Mixed Precision model Conversion from FP32 model to Mixed Precision model Jun 6, 2019
@anirudh2290 anirudh2290 marked this pull request as ready for review June 6, 2019 22:12
@anirudh2290
Copy link
Member Author

@anirudh2290
Copy link
Member Author

@ptrendx @pengzhao-intel @samskalicky @larroy @ZhennanQin Thank you for your review ! I have addressed your comments.

Copy link
Contributor

@larroy larroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the approach and architecture look good, just refining a few C++ formalisms remain in my view.

src/nnvm/low_precision_pass.cc Show resolved Hide resolved
@@ -810,6 +810,191 @@ int MXQuantizeSymbol(SymbolHandle sym_handle,
API_END_HANDLE_ERROR(delete s);
}

// helper function to add mapping of node_name -> dtype map
// for the given indexed graph and inferred_dtypes
inline void _SetInputDTypes(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use anon namespace or static function for helpers, not inline.

const std::unordered_map<std::string, int>& node_name_dtype_map,
const std::unordered_map<std::string, int>& node_without_dtype_map,
const std::unordered_set<std::string>& model_params,
const std::vector<nnvm::NodePtr>& args) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args being const and modified through NodePtr is misleading, a documentation bit would help.

}

// add amp_cast node between curr_node and input
void AddCastNode(const nnvm::NodeEntry &e, const std::string &suffix,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static or anon namespace to avoid additional uneccesary linker symbols.

}

// get suffix for a node entry so that it can be used for amp_cast/amp_multicast node name
std::string GetSuffix(const nnvm::NodeEntry &node_entry,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about anon static, and I guess applies to other places.

std::unordered_set<std::string> widest_dtype_ops;
std::unordered_set<std::string> excluded_syms;
std::unordered_set<std::string> model_params;
std::unordered_map<std::string,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to add comment on what this thing represents, or maybe a typedef with a doc. It would help with code maintainability.

Copy link
Contributor

@larroy larroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice pr.

@anirudh2290
Copy link
Member Author

From an offline review done by sudipta@, feedback was provided that it is important to for users to be able to obtain models with params casted wherever possible. After additional discussion with @ptrendx , we decided to add additional graph pass, which would go through all inputs of amp_cast and amp_multicast to infer the dtypes of the input nodes wherever possible. I have added the support for the same in the recent commits.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FP16 pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky Test: test_tensorrt_lenet5.test_tensorrt_inference Conversion from FP32 to Mixed Precision Models
9 participants