Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Add unit tests for TensorRT integration and fix some bugs #15399

Merged
merged 20 commits into from
Oct 20, 2019

Conversation

Caenorst
Copy link
Contributor

Description

TensorRT integration lacked of unit tests, we instead relied on output comparison of a full network which is not very pertinent, difficult to find the tolerance and not very helpful if it fails.

This PR have two purposes:

  1. Add unit test for all operations:
    As we only partition subgraph of at least 2 Ops we always append an identity to each output. we then compare to MXNet, using both TRT FP32 and FP16 computation.
  2. Fix a bunch of edge cases bugs that have been exposed by the unit tests:
  • NHWC is currently not compatible with TensorRT
  • Pooling is currently not compatible with count_include_pad and only compatible with pooling_convention Valid
  • FullyConnected without bias cannot be converted to MatMul (as there should be a transpose before) for the moment we decided to remove compatibility, it is quite rare to do FullyConnected without bias anyway
  • Concat is not compatible if the concatenation axis is the batch axis
  • The dropout have to be only enabled on training mode (act as identity)
  • BatchNorm with fix_gamma can have gamma != 1. so we force the value to be 1. when loading it.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Comments

  • the test of output comparison in a full networks should probably be replaced by an accuracy comparison over a full dataset

@Caenorst Caenorst requested a review from szha as a code owner June 28, 2019 08:14
@anirudhacharya
Copy link
Member

@mxnet-label-bot add [pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label Jun 28, 2019
@Caenorst
Copy link
Contributor Author

@KellenSunderland
Copy link
Contributor

Looks like CI caught a few issues. For example http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-15399/1/pipeline seems like it should be relevant to this PR. I'll have a look to see if there's anything else that jumps out at me.

@@ -157,6 +157,12 @@ std::string ConvertNnvmGraphToOnnx(
return serialized_onnx_graph;
}

void ConvertIdentity(NodeProto* node_proto, const NodeAttrs& attrs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea if TRT actually optimizes this out? I've seen this in a few prod services :-/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be optimized by ONNX-TRT

return (param.dim != 0);
}

if (op_name == "Dropout") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, will TensorRT optimize this out? We don't want it at inference time right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropout have always been seen as identity function in MXNet-TensorRT integration so I don't see any changement on this, regarding to whether or not identity is actually doing a copy or not I'm not quite sure, here is the onnx-tensorrt conversion: https://github.com/onnx/onnx-tensorrt/blob/0ab159579551cabfa05fd66f338357f116e96835/trt_utils.hpp#L169-L180

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, non-blocking comment for this PR. I'm just thinking about adding a warning in the future if people are using TRT with operations that don't make sense at inference time (Dropout, Ident, Empty Concats or Copies, etc.)

Copy link
Contributor

@KellenSunderland KellenSunderland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few small changes requested. Looks like CI caught a few issues as well.

@karan6181
Copy link
Contributor

@Caenorst Could you please address the review comments? Thanks!

@piyushghai
Copy link
Contributor

@Caenorst Gentle ping...

@Caenorst
Copy link
Contributor Author

Caenorst commented Oct 7, 2019

I don't understand the error on windows-gpu, it doesn't seems related to my modifications...

@Caenorst
Copy link
Contributor Author

@KellenSunderland can we merge it ? (I did a bunch of modifications since the last review that you may wanna review too)

@KellenSunderland KellenSunderland merged commit 746cbc5 into apache:master Oct 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants