Skip to content

Conversation

@chilo-ms
Copy link
Contributor

@chilo-ms chilo-ms commented Jan 15, 2022

TensorRT provides the timing cache feature to reduce the builder time by keeping the layer profiling information during the builder phase. This PR added timing cache feature into ORT-TRT.

Also, please notice that ORT won't use OrtTensorRTProviderOptions struct anymore for TRT EP when adding additional provider option. Instead, it uses the opaque struct OrtTensorRTProviderOptionsV2 as internal struct for setting provider options that can be converted to a string.
Please see #7808 and #10188 for more details and context.

@chilo-ms
Copy link
Contributor Author

chilo-ms commented Jan 15, 2022

Will add unit test for testing this feature.

@jywu-msft
Copy link
Member

jywu-msft commented Jan 19, 2022

Will add unit test for testing this feature.

yes, we definitely need some test cases here.
need to test in conjunction with engine cache enabled/disabled as well.

@chilo-ms
Copy link
Contributor Author

chilo-ms commented Feb 5, 2022

Will add unit test for testing this feature.

yes, we definitely need some test cases here. need to test in conjunction with engine cache enabled/disabled as well.

Test cases for timing cache have been added

@stale
Copy link

stale bot commented Apr 16, 2022

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@stale stale bot added the stale issues that have not been addressed in a while; categorized by a bot label Apr 16, 2022
chilo-ms added a commit that referenced this pull request Mar 10, 2023
### Description

This will enable a user to use a TensorRT timing cache based on #10297
to accelerate build times on a device with the same compute capability.
This will work across models as it simply store kernel runtimes for
specific configurations. Those files are usually very small (only a few
MB) which makes them very easy to ship with an application to accelerate
the build time on the user end.

### Motivation and Context
Especially for workstation use cases TRT build times can be a roadblock.
With a few model from ONNX model zoo i evaluated speedups when a timing
cache is present.
`./build/onnxruntime_perf_test -e tensorrt -I -t 5 -i
"trt_timing_cache_enable|true" <onnx_path>`

|Model | no Cache | with Cache|
| ------------- | ------------- | ------------- |
|efficientnet-lite4-11 | 34.6 s | 7.7 s|
|yolov4 | 108.62 s | 9.4 s|

To capture this is had to modify the onnxruntime_perf_test. The time is
sometimes not captured within "Session creation time cost:" which is why
i introduced "First inference time cost:".

---------

Co-authored-by: Chi Lo <[email protected]>
@gedoensmax
Copy link
Contributor

I think we can close this dur to #14767 right ?

@stale stale bot removed the stale issues that have not been addressed in a while; categorized by a bot label Mar 16, 2023
@chilo-ms
Copy link
Contributor Author

I think we can close this dur to #14767 right ?

yes, we can

@chilo-ms chilo-ms closed this Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants