[BYOC][TensorRT] Add TensorRT own int8 calibration support to TensorRT BYOC integration #8808

tiandiao123 · 2021-08-23T06:54:02Z

Hi, this commit is used to add calibration and tensorrt int8 mode in the original src/runtime/contrib/tensorrt (thanks for @trevor-m help). Currently, I tested several models including mobilenet_v2 and resnet34, and all of them have greater performance gain. Here is my GitHub gist for testing trt int8 result: https://gist.github.com/tiandiao123/67adb11ab3d73df8e83a1469707d7db4

Firstly, users need to provide calibration data to do calibration, then use the following command line to run your program:

TVM_TENSORRT_USE_INT8=1 TENSORRT_NUM_CALI_INT8=10 python test_trt.py

, where TVM_TENSORRT_USE_INT8 denotes whether to use trt int8, and TENSORRT_NUM_CALI_INT8 denotes how many times we need to do data calibration (we'll do 10-times data calibration in the example).

Here are some testing results (based on T4 machine, and TRT 7):

                                   INT8 TRT                    FP16 TRT
resnet18                    0.38039684295654297 ms             0.5254888534545898 ms
super_resolution            0.2799248695373535 ms              0.44 ms

resnet18 python testing script: https://gist.github.com/tiandiao123/c535b62918677d0a7c45fc005a44f27d
super_resolution testing script: https://gist.github.com/tiandiao123/de52ef96f574645c2dccb3544b291487

Also, in my script, I used a single image for calibrating data for 10-times (just for demonstration). In real project, we should use authors' own data to do calibration.

Also, I used this test file (https://github.com/tiandiao123/tvm/blob/pr_trt_int8/tests/python/contrib/test_tensorrt_int8_exp.py) to test trt int8 precision result with respect to different kinds of resnet. I tested several cosine distances between outputs of the original torch models and the corresponding trt int8 tvm ouputs. I think all of them are under expectation since their similarities are greater than 0.99. The calibration results really depend on which models we choose, and what kind of sample data we choose to do calibration, and how many times do we need to do calibrations. In my experience, the number of calibration times also impact final calibration results. We need to make sure the calibration data is from actual models' training data or evaluation data, otherwise the cosine difference will be impacted. @jcf94

                      cosine distance
resnet34             0.005518794059753418
resnet18             0.005487024784088135
resnet50             0.002280294895172119

I also used the python script above to test densenet161 and mobilenet_v2, and both of them's cosine distance around 0.02~0.03. Since my image used is based on tutorial (https://tvm.apache.org/docs/tutorials/frontend/from_pytorch.html#sphx-glr-tutorials-frontend-from-pytorch-py), and the sample image is actually from resnet evaluation data, probably we need to use actual sample data belonging to densenet161 and mobilenet_v2 to get actual cosine distance, but overall the results are under expectation.

comaniac · 2021-08-23T07:00:56Z

@tiandiao123 thanks for adding this support. Please add details to the PR description.

@trevor-m it would be great if you could help sharpen this PR :)

vinx13 · 2021-08-23T15:01:53Z

cc @FrozenGene

Laurawly · 2021-08-23T16:59:36Z

Hi, this commit is used to add calibration and tensorrt int8 mode in the original src/runtime/contrib/tensorrt. Currently, I tested several models including mobilenet_v2 and resnet34, and all of them have greater performance gain. Here is my GitHub gist for testing trt int8 result: https://gist.github.com/tiandiao123/67adb11ab3d73df8e83a1469707d7db4

Firstly, users need to provide calibration data to do calibration, then use the following command line to run your program:
TVM_TENSORRT_USE_INT8=1 TENSORRT_NUM_CALI_INT8=10 python test_trt.py
, where TVM_TENSORRT_USE_INT8 denotes whether to use trt int8, and TENSORRT_NUM_CALI_INT8 denotes how many times do we need to do data calibration (we'll do 10-times data calibration in the example).

Please add some performance numbers for the models you tested.

src/runtime/contrib/tensorrt/tensorrt_builder.cc

src/runtime/contrib/tensorrt/tensorrt_runtime.cc

FrozenGene · 2021-08-24T02:31:56Z

@jcf94 please help to review it too as you have implemented similar functionality

jcf94

Thanks! @tiandiao123 @FrozenGene

It's great to see guys also working on tvm trt, the overall implementation is good, but let's make it better to be accept.

Also a basic UT is needed in tests/python/contrib/test_tensorrt.py. 😄

src/runtime/contrib/tensorrt/tensorrt_builder.cc

jcf94 · 2021-08-24T02:30:20Z

src/runtime/contrib/tensorrt/tensorrt_runtime.cc

@@ -125,17 +147,26 @@ class TensorRTRuntime : public JSONRuntimeBase {

  /*! \brief Run inference using built engine. */
  void Run() override {
+


Suggest to create calibrator and process the input information here, if there is still some remaining calibration batches, just return.

This can get rid of building an engine then destory it and build one again.

We have to build the engine first to know the input binding order that tensorrt assigns to the inputs. It might not match TVM input signature directly.
This is the same strategy used by TF-TRT

We have to build the engine first to know the input binding order that tensorrt assigns to the inputs. It might not match TVM input signature directly.
This is the same strategy used by TF-TRT

Yeah, I mean we can also just put all of the input information check here, use something like unordered_map<string, void*>, unordered_map<string, size> to create the calibrator and set_batch here. During real inference, these information will be useless.

The trt get_batch function has a parameter const char* names[], that can be used when getting data out. So the order does not matter in this way.

At least in TF-2.4.1, they used:

// Construct a calibrator for future calibration. TRTInt8Calibrator( const std::unordered_map<string, std::pair<void*, size_t>>& dev_buffers, int batch_size, string engine_name); // Feed calibration data to the calibrator, and return true if the data is // accepted. Return false if the calibrator has been terminated. bool setBatch(const std::unordered_map<string, void*>& data, const cudaStream_t stream);

I think that would be a good example for reference.

src/runtime/contrib/tensorrt/tensorrt_runtime.cc

Laurawly · 2021-09-09T04:09:05Z

@trevor-m @jcf94 The CI is green now, please give another round of review, thanks!

jcf94

Thanks! @tiandiao123 LGTM now.

Laurawly · 2021-09-09T20:47:42Z

@FrozenGene @trevor-m Could you update your reviews?

comaniac

Overall LGTM. Left two nits and I think they can be fixed in a follow-up PR.

comaniac · 2021-09-15T20:13:10Z

src/runtime/contrib/tensorrt/tensorrt_builder.h

@@ -153,6 +153,9 @@ class TensorRTBuilder {
  /*! \brief Whether to automatically convert model to 16-bit floating point precision. */
  bool use_fp16_;

+  /*! \brief whether to automatically convert model to int8 precision */


IIUC, use_int8_ is exclusive with use_fp16_? If so, we should combine them to be a single variable like target_dtype.

src/runtime/contrib/tensorrt/tensorrt_runtime.cc

@trevor-m

I dismissed this change request to unblock this PR first as @trevor-m is on PTO. We could fix additional comments in follow-up PRs if needed.

comaniac · 2021-09-15T20:20:35Z

Thanks @tiandiao123 @trevor-m @FrozenGene @jcf94 @vinx13 @Laurawly

@AndrewZhaoLuo

* main: (102 commits) Implementation of relay_to_tir target hook (apache#8423) [Onnx] Fix NLL Loss tests (apache#8971) [Bugfix] Fix other div zero errors also in rewrite_simplify (apache#8983) [ONNX] enable the onnx tests after PR apache#8274 merged (apache#9019) [Hexagon] Disable `thread_local` on Hexagon (apache#9025) [Hexagon] Allow undefined symbols in libtvm_runtime.so on Hexagon (apache#9024) [Onnx] Add momentum (apache#9000) fix (apache#9021) [Community] @AndrewZhaoLuo -> Reviewer (apache#9020) [Hexagon] Implement model launcher (apache#8986) [Relay][Pass] Add ExtractOperators pass (apache#8996) [BYOC][TensorRT] Add TensorRT own int8 calibration support to TensorRT BYOC integration (apache#8808) [ONNX] Add Einsum converter (apache#8985) Add standalone_crt/ to be part of the wheel package, when available. (apache#9005) [Relay] Remove memory planing from LowerTEPass (apache#8974) [Hexagon] Treat floats as float32 when passing args to offloaded kernels (apache#9010) [Runtime] Pipeline Executor Initial patch. (apache#8702) [Hexagon] `llvm-options` attribute is an array of strings (apache#9011) disable cuda int8 schedule for non-cuda gpu target (apache#9014) [Torch] Add an option to make imported models compatible with the Relay text parser (apache#9015) ...

…T BYOC integration (apache#8808) * update trt * clean codes * tetsing running trt * clean data * clean codes? * remove env func * fix num_bings * add buildfromjson func * change condition * reset input and output func * re-config func * re-added trt version check * checking sanity * try to fix sanity issue * checking sainity * fixing sanity issue * fixing sainity issue * fixing sanity * clang format fixed * clang format fixing * clean trt cali * try to fix clang format * fixed some comments * remove double destroy engine codes * modify comments * add checking function * add trt int8 test * update trt int8 test file * Update test_tensorrt_int8_exp.py * update trt int8 fikle * change a little * upate trt int8 file * upate trt int8 file * fixing ci * fixing ci * fixing ci * fixing ci * fixing ci * fixing ci issue * fixing ci issue * fixing ci * fixing ci issue * fixing ci * fixing ci problem * fixing ci * upate trt python int8 test file * fixed ci * fixed ci * fix gpu build * fixed ci * update trt int8 test file * fix bug * fix bug * update trtint8 file * reformat * update trt int8 file * update * modify

cuiqing.li added 2 commits August 21, 2021 06:17

update trt

2375c0b

Merge https://github.com/apache/tvm into pr_trt_int8

66c5fe2

tiandiao123 requested review from areusch, junrushao, kazum, liangfu, masahi, tmoreau89, tqchen, vinx13 and ZihengJiang as code owners August 23, 2021 06:54

comaniac assigned trevor-m Aug 23, 2021

comaniac added the status: need review label Aug 23, 2021

comaniac changed the title ~~Add TRT int8 mode in byoc TRT~~ [BYOC][TRT] Support int8 Aug 23, 2021

Laurawly changed the title ~~[BYOC][TRT] Support int8~~ [BYOC][TensorRT] Support int8 Aug 23, 2021

Laurawly changed the title ~~[BYOC][TensorRT] Support int8~~ [BYOC][TensorRT] Add Int8 support to TensorRT BYOC integration Aug 23, 2021

FrozenGene requested changes Aug 24, 2021

View reviewed changes

src/runtime/contrib/tensorrt/tensorrt_builder.cc Outdated Show resolved Hide resolved

src/runtime/contrib/tensorrt/tensorrt_runtime.cc Outdated Show resolved Hide resolved

src/runtime/contrib/tensorrt/tensorrt_runtime.cc Outdated Show resolved Hide resolved

FrozenGene assigned jcf94 Aug 24, 2021

jcf94 previously requested changes Aug 24, 2021

View reviewed changes

FrozenGene changed the title ~~[BYOC][TensorRT] Add Int8 support to TensorRT BYOC integration~~ [BYOC][TensorRT] Add TensorRT own int8 calibration support to TensorRT BYOC integration Aug 24, 2021

jcf94 added status: need update need update based on feedbacks status: review in progress and removed status: need review labels Aug 24, 2021

cuiqing.li added 3 commits August 24, 2021 13:09

clean codes

9113659

tetsing running trt

6671366

clean data

99c0a57

cuiqing.li added 16 commits September 8, 2021 12:45

fixing ci

c1f0faf

fixing ci problem

81761b4

fixing ci

bf30e8e

upate trt python int8 test file

63512ad

fixed ci

d3ac8c9

fixed ci

ac58979

fix gpu build

90eabe1

fixed ci

89c8eeb

update trt int8 test file

30648d6

fix bug

3c85b9a

fix bug

41be39a

update trtint8 file

c7d2bcc

reformat

2d8ce42

update trt int8 file

63cf965

update

5685800

modify

c0e931d

Laurawly requested a review from jcf94 September 8, 2021 22:30

jcf94 approved these changes Sep 9, 2021

View reviewed changes

Laurawly removed the status: need update need update based on feedbacks label Sep 9, 2021

FrozenGene approved these changes Sep 10, 2021

View reviewed changes

comaniac approved these changes Sep 15, 2021

View reviewed changes

comaniac merged commit 6f5b674 into apache:main Sep 15, 2021

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BYOC][TensorRT] Add TensorRT own int8 calibration support to TensorRT BYOC integration #8808

[BYOC][TensorRT] Add TensorRT own int8 calibration support to TensorRT BYOC integration #8808

tiandiao123 commented Aug 23, 2021 •

edited

Loading

comaniac commented Aug 23, 2021 •

edited

Loading

vinx13 commented Aug 23, 2021

Laurawly commented Aug 23, 2021

FrozenGene commented Aug 24, 2021

jcf94 left a comment •

edited

Loading

jcf94 Aug 24, 2021

trevor-m Aug 24, 2021

jcf94 Aug 26, 2021 •

edited

Loading

Laurawly commented Sep 9, 2021

jcf94 left a comment

Laurawly commented Sep 9, 2021

comaniac left a comment

comaniac Sep 15, 2021

comaniac commented Sep 15, 2021

		@@ -125,17 +147,26 @@ class TensorRTRuntime : public JSONRuntimeBase {

		/! \brief Run inference using built engine. /
		void Run() override {

[BYOC][TensorRT] Add TensorRT own int8 calibration support to TensorRT BYOC integration #8808

[BYOC][TensorRT] Add TensorRT own int8 calibration support to TensorRT BYOC integration #8808

Conversation

tiandiao123 commented Aug 23, 2021 • edited Loading

comaniac commented Aug 23, 2021 • edited Loading

vinx13 commented Aug 23, 2021

Laurawly commented Aug 23, 2021

FrozenGene commented Aug 24, 2021

jcf94 left a comment • edited Loading

Choose a reason for hiding this comment

jcf94 Aug 24, 2021

Choose a reason for hiding this comment

trevor-m Aug 24, 2021

Choose a reason for hiding this comment

jcf94 Aug 26, 2021 • edited Loading

Choose a reason for hiding this comment

Laurawly commented Sep 9, 2021

jcf94 left a comment

Choose a reason for hiding this comment

Laurawly commented Sep 9, 2021

comaniac left a comment

Choose a reason for hiding this comment

comaniac Sep 15, 2021

Choose a reason for hiding this comment

comaniac commented Sep 15, 2021

tiandiao123 commented Aug 23, 2021 •

edited

Loading

comaniac commented Aug 23, 2021 •

edited

Loading

jcf94 left a comment •

edited

Loading

jcf94 Aug 26, 2021 •

edited

Loading