-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BYOC][TensorRT] Add TensorRT own int8 calibration support to TensorRT BYOC integration #8808
Conversation
@tiandiao123 thanks for adding this support. Please add details to the PR description. @trevor-m it would be great if you could help sharpen this PR :) |
cc @FrozenGene |
Please add some performance numbers for the models you tested. |
@jcf94 please help to review it too as you have implemented similar functionality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! @tiandiao123 @FrozenGene
It's great to see guys also working on tvm trt, the overall implementation is good, but let's make it better to be accept.
Also a basic UT is needed in tests/python/contrib/test_tensorrt.py
. 😄
@@ -125,17 +147,26 @@ class TensorRTRuntime : public JSONRuntimeBase { | |||
|
|||
/*! \brief Run inference using built engine. */ | |||
void Run() override { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to create calibrator and process the input information here, if there is still some remaining calibration batches, just return.
This can get rid of building an engine then destory it and build one again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to build the engine first to know the input binding order that tensorrt assigns to the inputs. It might not match TVM input signature directly.
This is the same strategy used by TF-TRT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to build the engine first to know the input binding order that tensorrt assigns to the inputs. It might not match TVM input signature directly.
This is the same strategy used by TF-TRT
Yeah, I mean we can also just put all of the input information check here, use something like unordered_map<string, void*>
, unordered_map<string, size>
to create the calibrator and set_batch here. During real inference, these information will be useless.
The trt get_batch
function has a parameter const char* names[]
, that can be used when getting data out. So the order does not matter in this way.
At least in TF-2.4.1, they used:
// Construct a calibrator for future calibration.
TRTInt8Calibrator(
const std::unordered_map<string, std::pair<void*, size_t>>& dev_buffers,
int batch_size, string engine_name);
// Feed calibration data to the calibrator, and return true if the data is
// accepted. Return false if the calibrator has been terminated.
bool setBatch(const std::unordered_map<string, void*>& data,
const cudaStream_t stream);
I think that would be a good example for reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! @tiandiao123 LGTM now.
@FrozenGene @trevor-m Could you update your reviews? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Left two nits and I think they can be fixed in a follow-up PR.
@@ -153,6 +153,9 @@ class TensorRTBuilder { | |||
/*! \brief Whether to automatically convert model to 16-bit floating point precision. */ | |||
bool use_fp16_; | |||
|
|||
/*! \brief whether to automatically convert model to int8 precision */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, use_int8_
is exclusive with use_fp16_
? If so, we should combine them to be a single variable like target_dtype
.
I dismissed this change request to unblock this PR first as @trevor-m is on PTO. We could fix additional comments in follow-up PRs if needed.
* main: (102 commits) Implementation of relay_to_tir target hook (apache#8423) [Onnx] Fix NLL Loss tests (apache#8971) [Bugfix] Fix other div zero errors also in rewrite_simplify (apache#8983) [ONNX] enable the onnx tests after PR apache#8274 merged (apache#9019) [Hexagon] Disable `thread_local` on Hexagon (apache#9025) [Hexagon] Allow undefined symbols in libtvm_runtime.so on Hexagon (apache#9024) [Onnx] Add momentum (apache#9000) fix (apache#9021) [Community] @AndrewZhaoLuo -> Reviewer (apache#9020) [Hexagon] Implement model launcher (apache#8986) [Relay][Pass] Add ExtractOperators pass (apache#8996) [BYOC][TensorRT] Add TensorRT own int8 calibration support to TensorRT BYOC integration (apache#8808) [ONNX] Add Einsum converter (apache#8985) Add standalone_crt/ to be part of the wheel package, when available. (apache#9005) [Relay] Remove memory planing from LowerTEPass (apache#8974) [Hexagon] Treat floats as float32 when passing args to offloaded kernels (apache#9010) [Runtime] Pipeline Executor Initial patch. (apache#8702) [Hexagon] `llvm-options` attribute is an array of strings (apache#9011) disable cuda int8 schedule for non-cuda gpu target (apache#9014) [Torch] Add an option to make imported models compatible with the Relay text parser (apache#9015) ...
…T BYOC integration (apache#8808) * update trt * clean codes * tetsing running trt * clean data * clean codes? * remove env func * fix num_bings * add buildfromjson func * change condition * reset input and output func * re-config func * re-added trt version check * checking sanity * try to fix sanity issue * checking sainity * fixing sanity issue * fixing sainity issue * fixing sanity * clang format fixed * clang format fixing * clean trt cali * try to fix clang format * fixed some comments * remove double destroy engine codes * modify comments * add checking function * add trt int8 test * update trt int8 test file * Update test_tensorrt_int8_exp.py * update trt int8 fikle * change a little * upate trt int8 file * upate trt int8 file * fixing ci * fixing ci * fixing ci * fixing ci * fixing ci * fixing ci issue * fixing ci issue * fixing ci * fixing ci issue * fixing ci * fixing ci problem * fixing ci * upate trt python int8 test file * fixed ci * fixed ci * fix gpu build * fixed ci * update trt int8 test file * fix bug * fix bug * update trtint8 file * reformat * update trt int8 file * update * modify
…T BYOC integration (apache#8808) * update trt * clean codes * tetsing running trt * clean data * clean codes? * remove env func * fix num_bings * add buildfromjson func * change condition * reset input and output func * re-config func * re-added trt version check * checking sanity * try to fix sanity issue * checking sainity * fixing sanity issue * fixing sainity issue * fixing sanity * clang format fixed * clang format fixing * clean trt cali * try to fix clang format * fixed some comments * remove double destroy engine codes * modify comments * add checking function * add trt int8 test * update trt int8 test file * Update test_tensorrt_int8_exp.py * update trt int8 fikle * change a little * upate trt int8 file * upate trt int8 file * fixing ci * fixing ci * fixing ci * fixing ci * fixing ci * fixing ci issue * fixing ci issue * fixing ci * fixing ci issue * fixing ci * fixing ci problem * fixing ci * upate trt python int8 test file * fixed ci * fixed ci * fix gpu build * fixed ci * update trt int8 test file * fix bug * fix bug * update trtint8 file * reformat * update trt int8 file * update * modify
Hi, this commit is used to add calibration and tensorrt int8 mode in the original src/runtime/contrib/tensorrt (thanks for @trevor-m help). Currently, I tested several models including mobilenet_v2 and resnet34, and all of them have greater performance gain. Here is my GitHub gist for testing trt int8 result: https://gist.github.com/tiandiao123/67adb11ab3d73df8e83a1469707d7db4
Firstly, users need to provide calibration data to do calibration, then use the following command line to run your program:
, where TVM_TENSORRT_USE_INT8 denotes whether to use trt int8, and TENSORRT_NUM_CALI_INT8 denotes how many times we need to do data calibration (we'll do 10-times data calibration in the example).
Here are some testing results (based on T4 machine, and TRT 7):
resnet18 python testing script: https://gist.github.com/tiandiao123/c535b62918677d0a7c45fc005a44f27d
super_resolution testing script: https://gist.github.com/tiandiao123/de52ef96f574645c2dccb3544b291487
Also, in my script, I used a single image for calibrating data for 10-times (just for demonstration). In real project, we should use authors' own data to do calibration.
Also, I used this test file (https://github.com/tiandiao123/tvm/blob/pr_trt_int8/tests/python/contrib/test_tensorrt_int8_exp.py) to test trt int8 precision result with respect to different kinds of resnet. I tested several cosine distances between outputs of the original torch models and the corresponding trt int8 tvm ouputs. I think all of them are under expectation since their similarities are greater than 0.99. The calibration results really depend on which models we choose, and what kind of sample data we choose to do calibration, and how many times do we need to do calibrations. In my experience, the number of calibration times also impact final calibration results. We need to make sure the calibration data is from actual models' training data or evaluation data, otherwise the cosine difference will be impacted. @jcf94
I also used the python script above to test densenet161 and mobilenet_v2, and both of them's cosine distance around 0.02~0.03. Since my image used is based on tutorial (https://tvm.apache.org/docs/tutorials/frontend/from_pytorch.html#sphx-glr-tutorials-frontend-from-pytorch-py), and the sample image is actually from resnet evaluation data, probably we need to use actual sample data belonging to densenet161 and mobilenet_v2 to get actual cosine distance, but overall the results are under expectation.