Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BYOC][TensorRT] Add TensorRT own int8 calibration support to TensorRT BYOC integration #8808

Merged
merged 60 commits into from
Sep 15, 2021

Conversation

tiandiao123
Copy link
Contributor

@tiandiao123 tiandiao123 commented Aug 23, 2021

Hi, this commit is used to add calibration and tensorrt int8 mode in the original src/runtime/contrib/tensorrt (thanks for @trevor-m help). Currently, I tested several models including mobilenet_v2 and resnet34, and all of them have greater performance gain. Here is my GitHub gist for testing trt int8 result: https://gist.github.com/tiandiao123/67adb11ab3d73df8e83a1469707d7db4

Firstly, users need to provide calibration data to do calibration, then use the following command line to run your program:

TVM_TENSORRT_USE_INT8=1 TENSORRT_NUM_CALI_INT8=10 python test_trt.py

, where TVM_TENSORRT_USE_INT8 denotes whether to use trt int8, and TENSORRT_NUM_CALI_INT8 denotes how many times we need to do data calibration (we'll do 10-times data calibration in the example).

Here are some testing results (based on T4 machine, and TRT 7):

                                   INT8 TRT                    FP16 TRT
resnet18                    0.38039684295654297 ms             0.5254888534545898 ms
super_resolution            0.2799248695373535 ms              0.44 ms

resnet18 python testing script: https://gist.github.com/tiandiao123/c535b62918677d0a7c45fc005a44f27d
super_resolution testing script: https://gist.github.com/tiandiao123/de52ef96f574645c2dccb3544b291487

Also, in my script, I used a single image for calibrating data for 10-times (just for demonstration). In real project, we should use authors' own data to do calibration.

Also, I used this test file (https://github.com/tiandiao123/tvm/blob/pr_trt_int8/tests/python/contrib/test_tensorrt_int8_exp.py) to test trt int8 precision result with respect to different kinds of resnet. I tested several cosine distances between outputs of the original torch models and the corresponding trt int8 tvm ouputs. I think all of them are under expectation since their similarities are greater than 0.99. The calibration results really depend on which models we choose, and what kind of sample data we choose to do calibration, and how many times do we need to do calibrations. In my experience, the number of calibration times also impact final calibration results. We need to make sure the calibration data is from actual models' training data or evaluation data, otherwise the cosine difference will be impacted. @jcf94

                      cosine distance
resnet34             0.005518794059753418
resnet18             0.005487024784088135
resnet50             0.002280294895172119

I also used the python script above to test densenet161 and mobilenet_v2, and both of them's cosine distance around 0.02~0.03. Since my image used is based on tutorial (https://tvm.apache.org/docs/tutorials/frontend/from_pytorch.html#sphx-glr-tutorials-frontend-from-pytorch-py), and the sample image is actually from resnet evaluation data, probably we need to use actual sample data belonging to densenet161 and mobilenet_v2 to get actual cosine distance, but overall the results are under expectation.

@comaniac
Copy link
Contributor

comaniac commented Aug 23, 2021

@tiandiao123 thanks for adding this support. Please add details to the PR description.

@trevor-m it would be great if you could help sharpen this PR :)

@comaniac comaniac changed the title Add TRT int8 mode in byoc TRT [BYOC][TRT] Support int8 Aug 23, 2021
@vinx13
Copy link
Member

vinx13 commented Aug 23, 2021

cc @FrozenGene

@Laurawly
Copy link
Contributor

Hi, this commit is used to add calibration and tensorrt int8 mode in the original src/runtime/contrib/tensorrt. Currently, I tested several models including mobilenet_v2 and resnet34, and all of them have greater performance gain. Here is my GitHub gist for testing trt int8 result: https://gist.github.com/tiandiao123/67adb11ab3d73df8e83a1469707d7db4

Firstly, users need to provide calibration data to do calibration, then use the following command line to run your program:

TVM_TENSORRT_USE_INT8=1 TENSORRT_NUM_CALI_INT8=10 python test_trt.py

, where TVM_TENSORRT_USE_INT8 denotes whether to use trt int8, and TENSORRT_NUM_CALI_INT8 denotes how many times do we need to do data calibration (we'll do 10-times data calibration in the example).

Please add some performance numbers for the models you tested.

@Laurawly Laurawly changed the title [BYOC][TRT] Support int8 [BYOC][TensorRT] Support int8 Aug 23, 2021
@Laurawly Laurawly changed the title [BYOC][TensorRT] Support int8 [BYOC][TensorRT] Add Int8 support to TensorRT BYOC integration Aug 23, 2021
src/runtime/contrib/tensorrt/tensorrt_builder.cc Outdated Show resolved Hide resolved
src/runtime/contrib/tensorrt/tensorrt_runtime.cc Outdated Show resolved Hide resolved
src/runtime/contrib/tensorrt/tensorrt_runtime.cc Outdated Show resolved Hide resolved
@FrozenGene
Copy link
Member

@jcf94 please help to review it too as you have implemented similar functionality

jcf94
jcf94 previously requested changes Aug 24, 2021
Copy link
Contributor

@jcf94 jcf94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! @tiandiao123 @FrozenGene

It's great to see guys also working on tvm trt, the overall implementation is good, but let's make it better to be accept.

Also a basic UT is needed in tests/python/contrib/test_tensorrt.py. 😄

@@ -125,17 +147,26 @@ class TensorRTRuntime : public JSONRuntimeBase {

/*! \brief Run inference using built engine. */
void Run() override {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to create calibrator and process the input information here, if there is still some remaining calibration batches, just return.

This can get rid of building an engine then destory it and build one again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to build the engine first to know the input binding order that tensorrt assigns to the inputs. It might not match TVM input signature directly.
This is the same strategy used by TF-TRT

Copy link
Contributor

@jcf94 jcf94 Aug 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to build the engine first to know the input binding order that tensorrt assigns to the inputs. It might not match TVM input signature directly.
This is the same strategy used by TF-TRT

Yeah, I mean we can also just put all of the input information check here, use something like unordered_map<string, void*>, unordered_map<string, size> to create the calibrator and set_batch here. During real inference, these information will be useless.

The trt get_batch function has a parameter const char* names[], that can be used when getting data out. So the order does not matter in this way.

At least in TF-2.4.1, they used:

  // Construct a calibrator for future calibration.
  TRTInt8Calibrator(
      const std::unordered_map<string, std::pair<void*, size_t>>& dev_buffers,
      int batch_size, string engine_name);

  // Feed calibration data to the calibrator, and return true if the data is
  // accepted. Return false if the calibrator has been terminated.
  bool setBatch(const std::unordered_map<string, void*>& data,
                const cudaStream_t stream);

I think that would be a good example for reference.

src/runtime/contrib/tensorrt/tensorrt_runtime.cc Outdated Show resolved Hide resolved
@FrozenGene FrozenGene changed the title [BYOC][TensorRT] Add Int8 support to TensorRT BYOC integration [BYOC][TensorRT] Add TensorRT own int8 calibration support to TensorRT BYOC integration Aug 24, 2021
@Laurawly Laurawly requested a review from jcf94 September 8, 2021 22:30
@Laurawly
Copy link
Contributor

Laurawly commented Sep 9, 2021

@trevor-m @jcf94 The CI is green now, please give another round of review, thanks!

Copy link
Contributor

@jcf94 jcf94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! @tiandiao123 LGTM now.

@Laurawly
Copy link
Contributor

Laurawly commented Sep 9, 2021

@FrozenGene @trevor-m Could you update your reviews?

@Laurawly Laurawly removed the status: need update need update based on feedbacks label Sep 9, 2021
Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Left two nits and I think they can be fixed in a follow-up PR.

@@ -153,6 +153,9 @@ class TensorRTBuilder {
/*! \brief Whether to automatically convert model to 16-bit floating point precision. */
bool use_fp16_;

/*! \brief whether to automatically convert model to int8 precision */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, use_int8_ is exclusive with use_fp16_? If so, we should combine them to be a single variable like target_dtype.

@comaniac comaniac dismissed trevor-m’s stale review September 15, 2021 20:19

I dismissed this change request to unblock this PR first as @trevor-m is on PTO. We could fix additional comments in follow-up PRs if needed.

@comaniac comaniac merged commit 6f5b674 into apache:main Sep 15, 2021
@comaniac
Copy link
Contributor

AndrewZhaoLuo added a commit to AndrewZhaoLuo/tvm that referenced this pull request Sep 16, 2021
* main: (102 commits)
  Implementation of relay_to_tir target hook (apache#8423)
  [Onnx] Fix NLL Loss tests (apache#8971)
  [Bugfix] Fix other div zero errors also in rewrite_simplify (apache#8983)
  [ONNX] enable the onnx tests after PR apache#8274 merged (apache#9019)
  [Hexagon] Disable `thread_local` on Hexagon (apache#9025)
  [Hexagon] Allow undefined symbols in libtvm_runtime.so on Hexagon (apache#9024)
  [Onnx] Add momentum (apache#9000)
  fix (apache#9021)
  [Community] @AndrewZhaoLuo -> Reviewer (apache#9020)
  [Hexagon] Implement model launcher (apache#8986)
  [Relay][Pass] Add ExtractOperators pass (apache#8996)
  [BYOC][TensorRT] Add TensorRT own int8 calibration support to TensorRT BYOC integration (apache#8808)
  [ONNX] Add Einsum converter (apache#8985)
  Add standalone_crt/ to be part of the wheel package, when available. (apache#9005)
  [Relay] Remove memory planing from LowerTEPass  (apache#8974)
  [Hexagon] Treat floats as float32 when passing args to offloaded kernels (apache#9010)
  [Runtime] Pipeline Executor Initial patch. (apache#8702)
  [Hexagon] `llvm-options` attribute is an array of strings (apache#9011)
  disable cuda int8 schedule for non-cuda gpu target (apache#9014)
  [Torch] Add an option to make imported models compatible with the Relay text parser (apache#9015)
  ...
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
…T BYOC integration (apache#8808)

* update trt

* clean codes

* tetsing running trt

* clean data

* clean codes?

* remove env func

* fix num_bings

* add buildfromjson func

* change condition

* reset input and output func

* re-config func

* re-added trt version check

* checking sanity

* try to fix sanity issue

* checking sainity

* fixing sanity issue

* fixing sainity issue

* fixing sanity

* clang format fixed

* clang format fixing

* clean trt cali

* try to fix clang format

* fixed some comments

* remove double destroy engine codes

* modify comments

* add checking function

* add trt int8 test

* update trt int8 test file

* Update test_tensorrt_int8_exp.py

* update trt int8 fikle

* change a little

* upate trt int8 file

* upate trt int8 file

* fixing ci

* fixing ci

* fixing ci

* fixing ci

* fixing ci

* fixing ci issue

* fixing ci issue

* fixing ci

* fixing ci issue

* fixing ci

* fixing ci problem

* fixing ci

* upate trt python int8 test file

* fixed ci

* fixed ci

* fix gpu build

* fixed ci

* update trt int8 test file

* fix bug

* fix bug

* update trtint8 file

* reformat

* update trt int8 file

* update

* modify
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
…T BYOC integration (apache#8808)

* update trt

* clean codes

* tetsing running trt

* clean data

* clean codes?

* remove env func

* fix num_bings

* add buildfromjson func

* change condition

* reset input and output func

* re-config func

* re-added trt version check

* checking sanity

* try to fix sanity issue

* checking sainity

* fixing sanity issue

* fixing sainity issue

* fixing sanity

* clang format fixed

* clang format fixing

* clean trt cali

* try to fix clang format

* fixed some comments

* remove double destroy engine codes

* modify comments

* add checking function

* add trt int8 test

* update trt int8 test file

* Update test_tensorrt_int8_exp.py

* update trt int8 fikle

* change a little

* upate trt int8 file

* upate trt int8 file

* fixing ci

* fixing ci

* fixing ci

* fixing ci

* fixing ci

* fixing ci issue

* fixing ci issue

* fixing ci

* fixing ci issue

* fixing ci

* fixing ci problem

* fixing ci

* upate trt python int8 test file

* fixed ci

* fixed ci

* fix gpu build

* fixed ci

* update trt int8 test file

* fix bug

* fix bug

* update trtint8 file

* reformat

* update trt int8 file

* update

* modify
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants