-
Notifications
You must be signed in to change notification settings - Fork 3.7k
EP documentation updates #6253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EP documentation updates #6253
Changes from all commits
dc3e0b4
065122c
5d0d163
1556c44
1164561
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,90 @@ | ||
| --- | ||
| title: CUDA | ||
| parent: Execution Providers | ||
| grand_parent: Reference | ||
| nav_order: 1 | ||
| --- | ||
|
|
||
| # CUDA Execution Provider | ||
|
|
||
| The CUDA Execution Provider enables hardware accelerated computation on Nvidia CUDA-enabled GPUs. | ||
|
|
||
| ## Build | ||
| For build instructions, please see the [BUILD page](../../how-to/build.md#CUDA). | ||
|
|
||
| ## Configuration Options | ||
| The CUDA Execution Provider supports the following configuration options. | ||
|
|
||
| ### device_id | ||
| The device ID. | ||
|
|
||
| Default value: 0 | ||
|
|
||
| ### cuda_mem_limit | ||
| The size limit of the device memory arena in bytes. This size limit is only for the execution provider's arena. The total device memory usage may be higher. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry for adding another comment. Maybe it is worth specifying the "default" values for each of these parameters ? For example, reading the doc, I understand what the valid values for |
||
|
|
||
| Default value: max value of C++ size_t type (effectively unlimited) | ||
|
|
||
| ### arena_extend_strategy | ||
| The strategy for extending the device memory arena. | ||
|
|
||
| Value | Description | ||
| -|- | ||
| kNextPowerOfTwo (0) | subsequent extensions extend by larger amounts (multiplied by powers of two) | ||
| kSameAsRequested (1) | extend by the requested amount | ||
|
|
||
| Default value: kNextPowerOfTwo | ||
|
|
||
| ### cudnn_conv_algo_search | ||
| The type of search done for cuDNN convolution algorithms. | ||
|
|
||
| Value | Description | ||
| -|- | ||
| EXHAUSTIVE (0) | expensive exhaustive benchmarking using cudnnFindConvolutionForwardAlgorithmEx | ||
| HEURISTIC (1) | lightweight heuristic based search using cudnnGetConvolutionForwardAlgorithm_v7 | ||
| DEFAULT (2) | default algorithm using CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM | ||
|
|
||
| Default value: EXHAUSTIVE | ||
|
|
||
| ### do_copy_in_default_stream | ||
| Whether to do copies in the default stream or use separate streams. The recommended setting is true. If false, there are race conditions and possibly better performance. | ||
|
|
||
| Default value: true | ||
|
|
||
| ## Example Usage | ||
|
|
||
| ### Python | ||
|
|
||
| ```python | ||
| import onnxruntime as ort | ||
|
|
||
| model_path = '<path to model>' | ||
|
|
||
| providers = [ | ||
| ('CUDAExecutionProvider', { | ||
| 'device_id': 0, | ||
| 'arena_extend_strategy': 'kNextPowerOfTwo', | ||
| 'cuda_mem_limit': 2 * 1024 * 1024 * 1024, | ||
| 'cudnn_conv_algo_search': 'EXHAUSTIVE', | ||
hariharans29 marked this conversation as resolved.
Show resolved
Hide resolved
hariharans29 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| 'do_copy_in_default_stream': True, | ||
| }), | ||
| 'CPUExecutionProvider', | ||
| ] | ||
|
|
||
| session = ort.InferenceSession(model_path, providers=providers) | ||
| ``` | ||
|
|
||
| ### C/C++ | ||
|
|
||
| ```c++ | ||
| OrtSessionOptions* session_options = /* ... */; | ||
|
|
||
| OrtCUDAProviderOptions options; | ||
| options.device_id = 0; | ||
| options.arena_extend_strategy = 0; | ||
| options.cuda_mem_limit = 2 * 1024 * 1024 * 1024; | ||
| options.cudnn_conv_algo_search = OrtCudnnConvAlgoSearch::EXHAUSTIVE; | ||
| options.do_copy_in_default_stream = 1; | ||
|
|
||
| SessionOptionsAppendExecutionProvider_CUDA(session_options, &options); | ||
| ``` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,7 +2,7 @@ | |
| title: NNAPI | ||
| parent: Execution Providers | ||
| grand_parent: Reference | ||
| nav_order: 5 | ||
| nav_order: 6 | ||
| --- | ||
|
|
||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.