Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: ARM Compute Library (ACL)
parent: Execution Providers
grand_parent: Reference
nav_order: 7
nav_order: 8
---

# ACL Execution Provider
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: ARM NN
parent: Execution Providers
grand_parent: Reference
nav_order: 8
nav_order: 9
---

## ArmNN Execution Provider
Expand Down
90 changes: 90 additions & 0 deletions docs/reference/execution-providers/CUDA-ExecutionProvider.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
title: CUDA
parent: Execution Providers
grand_parent: Reference
nav_order: 1
---

# CUDA Execution Provider

The CUDA Execution Provider enables hardware accelerated computation on Nvidia CUDA-enabled GPUs.

## Build
For build instructions, please see the [BUILD page](../../how-to/build.md#CUDA).

## Configuration Options
The CUDA Execution Provider supports the following configuration options.

### device_id
The device ID.

Default value: 0

### cuda_mem_limit
The size limit of the device memory arena in bytes. This size limit is only for the execution provider's arena. The total device memory usage may be higher.
Copy link
Member

@hariharans29 hariharans29 Jan 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for adding another comment. Maybe it is worth specifying the "default" values for each of these parameters ? For example, reading the doc, I understand what the valid values for cudnn_conv_algo_search are and what each option means. What is hard to parse is the "default" value for each. #Resolved


Default value: max value of C++ size_t type (effectively unlimited)

### arena_extend_strategy
The strategy for extending the device memory arena.

Value | Description
-|-
kNextPowerOfTwo (0) | subsequent extensions extend by larger amounts (multiplied by powers of two)
kSameAsRequested (1) | extend by the requested amount

Default value: kNextPowerOfTwo

### cudnn_conv_algo_search
The type of search done for cuDNN convolution algorithms.

Value | Description
-|-
EXHAUSTIVE (0) | expensive exhaustive benchmarking using cudnnFindConvolutionForwardAlgorithmEx
HEURISTIC (1) | lightweight heuristic based search using cudnnGetConvolutionForwardAlgorithm_v7
DEFAULT (2) | default algorithm using CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM

Default value: EXHAUSTIVE

### do_copy_in_default_stream
Whether to do copies in the default stream or use separate streams. The recommended setting is true. If false, there are race conditions and possibly better performance.

Default value: true

## Example Usage

### Python

```python
import onnxruntime as ort

model_path = '<path to model>'

providers = [
('CUDAExecutionProvider', {
'device_id': 0,
'arena_extend_strategy': 'kNextPowerOfTwo',
'cuda_mem_limit': 2 * 1024 * 1024 * 1024,
'cudnn_conv_algo_search': 'EXHAUSTIVE',
'do_copy_in_default_stream': True,
}),
'CPUExecutionProvider',
]

session = ort.InferenceSession(model_path, providers=providers)
```

### C/C++

```c++
OrtSessionOptions* session_options = /* ... */;

OrtCUDAProviderOptions options;
options.device_id = 0;
options.arena_extend_strategy = 0;
options.cuda_mem_limit = 2 * 1024 * 1024 * 1024;
options.cudnn_conv_algo_search = OrtCudnnConvAlgoSearch::EXHAUSTIVE;
options.do_copy_in_default_stream = 1;

SessionOptionsAppendExecutionProvider_CUDA(session_options, &options);
```
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Intel DNNL
parent: Execution Providers
grand_parent: Reference
nav_order: 4
nav_order: 5
---

# DNNL Execution Provider
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Direct ML
parent: Execution Providers
grand_parent: Reference
nav_order: 3
nav_order: 4
---

# DirectML Execution Provider
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: AMD MI GraphX
parent: Execution Providers
grand_parent: Reference
nav_order: 6
nav_order: 7
---

# MIGraphX Execution Provider
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: NNAPI
parent: Execution Providers
grand_parent: Reference
nav_order: 5
nav_order: 6
---


Expand Down
16 changes: 11 additions & 5 deletions docs/reference/execution-providers/Nuphar-ExecutionProvider.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Nuphar
parent: Execution Providers
grand_parent: Reference
nav_order: 9
nav_order: 10
---

# Nuphar Execution Provider (preview)
Expand Down Expand Up @@ -33,7 +33,13 @@ Ort::Session session(env, model_path, sf);

### Python

You can use the Nuphar execution provider via the python wheel from the ONNX Runtime build. The Nuphar execution provider will be automatically prioritized over the default CPU execution providers, thus no need to separately register the execution provider. Python APIs details are [here](/python/api_summary).
```python
import onnxruntime as ort

model_path = '<path to model>'
providers = ['NupharExecutionProvider', 'CPUExecutionProvider']
session = ort.InferenceSession(model_path, providers=providers)
```

## Performance and Accuracy Testing

Expand Down Expand Up @@ -161,12 +167,12 @@ SessionOptions.MakeSessionOptionWithNupharProvider("nuphar_cache_path:/path/to/c

* Using in Python

Settings string should be passed in before InferenceSession is created, as providers are not currently exposed yet. Here's an example in Python to set cache path and model checksum:
Settings string can be set as an execution provider-specific option. Here's an example in Python to set cache path and model checksum:

```python
nuphar_settings = 'nuphar_cache_path:{}, nuphar_cache_model_checksum:{}'.format(cache_dir, model_checksum)
onnxruntime.capi._pybind_state.set_nuphar_settings(nuphar_settings)
sess = onnxruntime.InferenceSession(model_path)
providers = [('NupharExecutionProvider', {'nuphar_settings': nuphar_settings}), 'CPUExecutionProvider']
sess = onnxruntime.InferenceSession(model_path, providers=providers)
```

## Known issues
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: OpenVINO
parent: Execution Providers
grand_parent: Reference
nav_order: 2
nav_order: 3
---

# OpenVINO Execution Provider
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: RKNPU
parent: Execution Providers
grand_parent: Reference
nav_order: 10
nav_order: 11
---

# RKNPU Execution Provider (preview)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: TensorRT
parent: Execution Providers
grand_parent: Reference
nav_order: 1
nav_order: 2
---

# TensorRT Execution Provider
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Vitis AI
parent: Execution Providers
grand_parent: Reference
nav_order: 11
nav_order: 12
---

# Vitis-AI Execution Provider
Expand Down