Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
3dad979
initial test
Sep 8, 2022
75a725b
test
Sep 9, 2022
c6a1def
[Need fix] use temporary link
PeixuanZuo Sep 9, 2022
498bb9e
[add] add rocm buiild
PeixuanZuo Sep 9, 2022
33b512e
Merge pull request #1 from PeixuanZuo/rocm-ep
ytaous Sep 9, 2022
fef8d86
execution providers
Sep 12, 2022
f6391a9
execution providers
Sep 12, 2022
9ff0598
execution providers
Sep 12, 2022
72801d3
execution providers
Sep 12, 2022
7df41be
execution providers
Sep 12, 2022
f967ab8
execution providers
Sep 12, 2022
3a4b1d5
execution providers
Sep 12, 2022
d81de75
execution providers
Sep 12, 2022
fc41a7b
ort install page
Sep 12, 2022
15b9e35
rocm ep page
Sep 12, 2022
0f45985
rocm ep page
Sep 13, 2022
b2dff8d
Merge branch 'rocm-ep' of https://github.com/ytaous/onnxruntime into …
PeixuanZuo Sep 13, 2022
8cc80b9
[Update] build training
PeixuanZuo Sep 13, 2022
2fe4faf
[Update] build training
PeixuanZuo Sep 13, 2022
4c50332
[Add] add mircobench
PeixuanZuo Sep 13, 2022
d2c0700
[Add] add rocm optimization
PeixuanZuo Sep 13, 2022
bde33d9
[Update] format
PeixuanZuo Sep 13, 2022
cd240e5
[Update] optimization
PeixuanZuo Sep 13, 2022
54060fb
[Update] update performance
PeixuanZuo Sep 13, 2022
5844ed9
[Update] rm microbench and update rocm profiling
PeixuanZuo Sep 14, 2022
a305bcb
Merge pull request #2 from ytaous/peixuanzuo/add_performance
PeixuanZuo Sep 14, 2022
68795ab
[Update] update dockerfile.rocm link
PeixuanZuo Sep 15, 2022
f387993
[Update] rocm execution provider
PeixuanZuo Sep 15, 2022
c5d00ca
[Fix] ffix doc
PeixuanZuo Sep 19, 2022
5b33d8a
[Fix] ep selection format
PeixuanZuo Sep 19, 2022
a21b72f
[Fix] ep selection format
PeixuanZuo Sep 19, 2022
0e5518b
[Fix] ep selection format
PeixuanZuo Sep 19, 2022
eab192b
[Update] variable name
PeixuanZuo Sep 20, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions docs/build/eps.md
Original file line number Diff line number Diff line change
Expand Up @@ -650,6 +650,27 @@ See more information on the MIGraphX Execution Provider [here](../execution-prov

Dockerfile instructions are available [here](https://github.com/microsoft/onnxruntime/blob/master/dockerfiles#migraphx).

## AMD ROCm

See more information on the ROCm Execution Provider [here](../execution-providers/ROCm-ExecutionProvider.md).

### Prerequisites
{: .no_toc }

* Install [ROCm](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.2.3/page/How_to_Install_ROCm.html#_How_to_Install)
* The ROCm execution provider for ONNX Runtime is built and tested with ROCm5.2.3

### Build Instructions
{: .no_toc }

#### Linux

```bash
./build.sh --config <Release|Debug|RelWithDebInfo> --use_rocm --rocm_home <path to ROCm home>
```

Dockerfile instructions are available [here](https://github.com/microsoft/onnxruntime/tree/main/dockerfiles#rocm).

## NNAPI

Usage of NNAPI on Android platforms is via the NNAPI Execution Provider (EP).
Expand Down
8 changes: 3 additions & 5 deletions docs/build/training.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,18 +76,16 @@ These dependency versions should reflect what is in the [Dockerfiles](https://gi

This produces the .whl file in `./build/Linux/RelWithDebInfo/dist` for ONNX Runtime Training.

## GPU / ROCM
## GPU / ROCm
### Prerequisites
{: .no_toc }

The default AMD GPU build requires ROCM software toolkit installed on the system:
The default AMD GPU build requires ROCm software toolkit installed on the system:

* [ROCM](https://rocmdocs.amd.com/en/latest/)
* [ROCm](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.2.3/page/How_to_Install_ROCm.html#_How_to_Install) 5.2.3
* [OpenMPI](https://www.open-mpi.org/) 4.0.4
* See [install_openmpi.sh](https://github.com/microsoft/onnxruntime/blob/master/tools/ci_build/github/linux/docker/scripts/install_openmpi.sh)

These dependency versions should reflect what is in the [Dockerfiles](https://github.com/pytorch/ort/tree/main/docker).

### Build instructions
{: .no_toc }

Expand Down
71 changes: 71 additions & 0 deletions docs/execution-providers/ROCm-ExecutionProvider.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
title: ROCm (AMD)
description: Instructions to execute ONNX Runtime with the AMD ROCm execution provider
parent: Execution Providers
nav_order: 11
redirect_from: /docs/reference/execution-providers/ROCm-ExecutionProvider
---

# ROCm Execution Provider
{: .no_toc }

The ROCm Execution Provider enables hardware accelerated computation on AMD ROCm-enabled GPUs.

## Contents
{: .no_toc }

* TOC placeholder
{:toc}

## Install

Pre-built binaries of ONNX Runtime with ROCm EP are published for most language bindings. Please reference [Install ORT](../install).

## Requirements


|ONNX Runtime|ROCm|
|---|---|
|main|5.2.3|
|1.12|5.2.3|
|1.12|5.2|


## Build
For build instructions, please see the [BUILD page](../build/eps.md#amd-rocm).

## Usage

### C/C++

```c++
Ort::Env env = Ort::Env{ORT_LOGGING_LEVEL_ERROR, "Default"};
Ort::SessionOptions so;
int device_id = 0;
Ort::ThrowOnError(OrtSessionOptionsAppendExecutionProvider_ROCm(so, device_id));
```

The C API details are [here](../get-started/with-c.md).

### Python
Python APIs details are [here](https://onnxruntime.ai/docs/api/python/api_summary.html).

## Performance Tuning
For performance tuning, please see guidance on this page: [ONNX Runtime Perf Tuning](../performance/tune-performance.md)

## Samples

### Python

```python
import onnxruntime as ort

model_path = '<path to model>'

providers = [
'ROCmExecutionProvider',
'CPUExecutionProvider',
]

session = ort.InferenceSession(model_path, providers=providers)
```
2 changes: 1 addition & 1 deletion docs/execution-providers/SNPE-ExecutionProvider.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: SNPE (Qualcomm)
description: Execute ONNX models with SNPE Execution Provider
parent: Execution Providers
nav_order: 11
nav_order: 12
redirect_from: /docs/reference/execution-providers/SNPE-ExecutionProvider
---

Expand Down
2 changes: 1 addition & 1 deletion docs/execution-providers/TVM-ExecutionProvider.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: TVM (Apache)
description: Instructions to execute ONNX Runtime with the Apache TVM execution provider
parent: Execution Providers
nav_order: 13
nav_order: 14
---

# TVM Execution Provider
Expand Down
2 changes: 1 addition & 1 deletion docs/execution-providers/TensorRT-ExecutionProvider.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: TensorRT (NVIDIA)
description: Instructions to execute ONNX Runtime on NVIDIA GPUs with the TensorRT execution provider
parent: Execution Providers
nav_order: 12
nav_order: 13
redirect_from: /docs/reference/execution-providers/TensorRT-ExecutionProvider
---

Expand Down
2 changes: 1 addition & 1 deletion docs/execution-providers/Vitis-AI-ExecutionProvider.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Vitis AI
description: Instructions to execute ONNX Runtime on Xilinx devices with the Vitis AI execution provider
parent: Execution Providers
nav_order: 14
nav_order: 15
redirect_from: /docs/reference/execution-providers/Vitis-AI-ExecutionProvider
---

Expand Down
6 changes: 3 additions & 3 deletions docs/execution-providers/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@ ONNX Runtime supports many different execution providers today. Some of the EPs
|[Intel DNNL](../execution-providers/oneDNN-ExecutionProvider.md)|[NVIDIA TensorRT](../execution-providers/TensorRT-ExecutionProvider.md)|[ARM Compute Library](../execution-providers/ACL-ExecutionProvider.md) (*preview*)|[Xilinx Vitis-AI](../execution-providers/Vitis-AI-ExecutionProvider.md) (*preview*)|
|[TVM](../execution-providers/TVM-ExecutionProvider.md) (*preview*)|[DirectML](../execution-providers/DirectML-ExecutionProvider.md)|[Android Neural Networks API](../execution-providers/NNAPI-ExecutionProvider.md)||
|[Intel OpenVINO](../execution-providers/OpenVINO-ExecutionProvider.md)|[AMD MIGraphX](../execution-providers/MIGraphX-ExecutionProvider.md) (*preview*)|[ARM-NN](../execution-providers/ArmNN-ExecutionProvider.md) (*preview*)|
||[TVM](../execution-providers/TVM-ExecutionProvider.md) (*preview*)|[CoreML](../execution-providers/CoreML-ExecutionProvider.md) (*preview*)|
||[Intel OpenVINO](../execution-providers/OpenVINO-ExecutionProvider.md)|[TVM](../execution-providers/TVM-ExecutionProvider.md) (*preview*)|
|||[Qualcomm SNPE](../execution-providers/SNPE-ExecutionProvider.md)
||[AMD ROCm](../execution-providers/ROCm-ExecutionProvider.md) (*preview*)|[CoreML](../execution-providers/CoreML-ExecutionProvider.md) (*preview*)|
||[TVM](../execution-providers/TVM-ExecutionProvider.md) (*preview*)|[TVM](../execution-providers/TVM-ExecutionProvider.md) (*preview*)|
||[Intel OpenVINO](../execution-providers/OpenVINO-ExecutionProvider.md)|[Qualcomm SNPE](../execution-providers/SNPE-ExecutionProvider.md)

### Add an Execution Provider

Expand Down
3 changes: 3 additions & 0 deletions docs/install/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -250,3 +250,6 @@ The _location_ needs to be specified for any specific version other than the def
|PyTorch 1.9 (CUDA 11.1)|[**onnxruntime_stable_torch190.cu111**](https://onnxruntimepackages.z14.web.core.windows.net/onnxruntime_stable_torch190.cu111.html)|[onnxruntime_nightly_torch190.cu111](https://onnxruntimepackages.z14.web.core.windows.net/onnxruntime_nightly_torch190.cu111.html)|
|[*Preview*] PyTorch 1.8.1 (ROCm 4.2)|[**onnxruntime_stable_torch181.rocm42**](https://onnxruntimepackages.z14.web.core.windows.net/onnxruntime_stable_torch181.rocm42.html)|[onnxruntime_nightly_torch181.rocm42](https://onnxruntimepackages.z14.web.core.windows.net/onnxruntime_nightly_torch181.rocm42.html)|
|[*Preview*] PyTorch 1.9 (ROCm 4.2)|[**onnxruntime_stable_torch190.rocm42**](https://onnxruntimepackages.z14.web.core.windows.net/onnxruntime_stable_torch190.rocm42.html)|[onnxruntime_nightly_torch190.rocm42](https://onnxruntimepackages.z14.web.core.windows.net/onnxruntime_nightly_torch190.rocm42.html)|
|[*Preview*] PyTorch 1.11 (ROCm 5.1.1)|[**onnxruntime_stable_torch1110.rocm511**](https://download.onnxruntime.ai/onnxruntime_stable_rocm511.html)|[onnxruntime_nightly_torch1110.rocm511](https://download.onnxruntime.ai/onnxruntime_nightly_rocm511.html)|
|[*Preview*] PyTorch 1.11 (ROCm 5.2)||[onnxruntime_nightly_torch1110.rocm52](https://download.onnxruntime.ai/onnxruntime_nightly_rocm511.html)|
|[*Preview*] PyTorch 1.12.1 (ROCm 5.2.3)||[onnxruntime_nightly_torch1121.rocm523](https://download.onnxruntime.ai/onnxruntime_nightly_rocm523.html)|
18 changes: 9 additions & 9 deletions docs/performance/graph-optimizations.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,28 +53,28 @@ These are semantics-preserving graph rewrites which remove redundant nodes and r

### Extended Graph Optimizations

These optimizations include complex node fusions. They are run after graph partitioning and are only applied to the nodes assigned to the CPU or CUDA execution provider. Available extended graph optimizations are as follows:
These optimizations include complex node fusions. They are run after graph partitioning and are only applied to the nodes assigned to the CPU or CUDA or ROCm execution provider. Available extended graph optimizations are as follows:

| Optimization | Execution Provider | Comment |
|---------------------------------|--------------------|-----------------------------------------------------------------------------|
| GEMM Activation Fusion | CPU | |
| Matmul Add Fusion | CPU | |
| Conv Activation Fusion | CPU | |
| GELU Fusion | CPU or CUDA | |
| Layer Normalization Fusion | CPU or CUDA | |
| BERT Embedding Layer Fusion | CPU or CUDA | Fuse BERT embedding layer, layer normalization and attention mask length |
| Attention Fusion* | CPU or CUDA | |
| Skip Layer Normalization Fusion | CPU or CUDA | Fuse bias of fully connected layer, skip connection and layer normalization |
| Bias GELU Fusion | CPU or CUDA | Fuse bias of fully connected layer and GELU activation |
| GELU Approximation* | CUDA | Disabled by default. Enable with [kOrtSessionOptionsEnableGeluApproximation](https://cs.github.com/microsoft/onnxruntime/blob/175acf08f470db0bb2e4b8eefe55cdeb87c8b132/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h?q=kOrtSessionOptionsEnableGeluApproximation#L52) |
| GELU Fusion | CPU, CUDA, ROCm | |
| Layer Normalization Fusion | CPU, CUDA, ROCm | |
| BERT Embedding Layer Fusion | CPU, CUDA, ROCm | Fuse BERT embedding layer, layer normalization and attention mask length |
| Attention Fusion* | CPU, CUDA, ROCm | |
| Skip Layer Normalization Fusion | CPU, CUDA, ROCm | Fuse bias of fully connected layer, skip connection and layer normalization |
| Bias GELU Fusion | CPU, CUDA, ROCm | Fuse bias of fully connected layer and GELU activation |
| GELU Approximation* | CUDA, ROCm | Disabled by default. Enable with [kOrtSessionOptionsEnableGeluApproximation](https://cs.github.com/microsoft/onnxruntime/blob/175acf08f470db0bb2e4b8eefe55cdeb87c8b132/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h?q=kOrtSessionOptionsEnableGeluApproximation#L52) |


<details>
<summary>
Approximations (click to expand)
</summary>

To optimize performance of [BERT](https://en.wikipedia.org/wiki/BERT_(language_model)), approximation is used in GELU Approximation and Attention Fusion for CUDA execution provider. The impact on accuracy is negligible based on our evaluation: F1 score for a BERT model on SQuAD v1.1 is almost same (87.05 vs 87.03).
To optimize performance of [BERT](https://en.wikipedia.org/wiki/BERT_(language_model)), approximation is used in GELU Approximation and Attention Fusion for CUDA and ROCm execution provider. The impact on accuracy is negligible based on our evaluation: F1 score for a BERT model on SQuAD v1.1 is almost same (87.05 vs 87.03).

</details>

Expand Down
5 changes: 4 additions & 1 deletion docs/performance/tune-performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,10 @@ In both cases, you will get a JSON file which contains the detailed performance
* Type chrome://tracing in the address bar
* Load the generated JSON file

To profile CUDA kernels, please add the cupti library to your PATH and use the onnxruntime binary built from source with `--enable_cuda_profiling`. Performance numbers from the device will then be attached to those from the host. For example:
To profile CUDA kernels, please add the cupti library to your PATH and use the onnxruntime binary built from source with `--enable_cuda_profiling`.
To profile ROCm kernels, please add the roctracer library to your PATH and use the onnxruntime binary built from source with `--enable_rocm_profiling`.

Performance numbers from the device will then be attached to those from the host. For example:

```json
{"cat":"Node", "name":"Add_1234", "dur":17, ...}
Expand Down
34 changes: 18 additions & 16 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -245,35 +245,37 @@ <h3 id="selectHardwareAcceleration">Hardware Acceleration</h3>
</div>
<div class="col-md-9 r-content pr-0 pl-md-4" role="listbox" id="listbox-4" aria-labelledby="selectHardwareAcceleration" aria-describedby="decriptionHardwareAcceleration">
<div class="row hardwareAcceleration">
<div class="col-lg-2dot5 col r-option version" role="option" tabindex="0" aria-selected="false" id="DefaultCPU">
<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="0" aria-selected="false" id="DefaultCPU">
<span>Default&nbsp; <abbr>CPU</abbr></span></div>
<div class="col-lg-2dot5 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="CoreML">
<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="CoreML">
<span>CoreML </span></div>
<div class="col-lg-2dot5 col r-option version" role="option" tabindex="-1" aria-selected="false" id="CUDA">
<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="CUDA">
<span><abbr>CUDA</abbr></span></div>
<div class="col-lg-2dot5 col r-option version" role="option" tabindex="-1" aria-selected="false" id="DirectML">
<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="DirectML">
<span>Direct<abbr>ML</abbr></span></div>
<div class="col-lg-2dot5 col r-option version" role="option" tabindex="-1" aria-selected="false" id="DNNL">
<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="DNNL">
<span><abbr>oneDNN</abbr></span></div>
<div class="col-lg-2dot5 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="OpenVINO">
<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="OpenVINO">
<span>OpenVINO</span></div>
<div class="col-lg-2dot5 col r-option version" role="option" tabindex="-1" aria-selected="false" id="TensorRT">
<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="TensorRT">
<span>Tensor<abbr>RT</abbr></span></div>
<div class="col-lg-2dot5 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="NNAPI">
<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="NNAPI">
<span>NNAPI </span></div>
<div class="col-lg-2dot5 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="ACL">
<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="ACL">
<span>ACL (Preview)</span></div>
<div class="col-lg-2dot5 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="ArmNN">
<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="ArmNN">
<span>ArmNN (Preview)</span></div>
<div class="col-lg-2dot5 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="MIGraphX">
<span>MIGraphX (Preview)</span></div>
<div class="col-lg-2dot5 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="RockchipNPU">
<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="MIGraphX">
<span>MIGraphX (Preview)</span></div>
<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="ROCm">
<span>ROCm (Preview)</span></div>
<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="RockchipNPU">
<span>Rockchip NPU (Preview)</span></div>
<div class="col-lg-2dot5 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="SNPE">
<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="SNPE">
<span>SNPE</span></div>
<div class="col-lg-2dot5 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="TVM">
<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="TVM">
<span>TVM (Preview)</span></div>
<div class="col-lg-2dot5 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="VitisAI">
<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="VitisAI">
<span>Vitis AI (Preview)</span></div>
</div>
</div>
Expand Down
9 changes: 9 additions & 0 deletions js/script.js
Original file line number Diff line number Diff line change
Expand Up @@ -1048,6 +1048,15 @@ var validCombos = {
"linux,C++,X86,MIGraphX":
"Follow build instructions from <a href='https://aka.ms/build-ort-migraphx' target='_blank'>here</a>",

"linux,Python,X86,ROCm":
"Follow build instructions from <a href='https://aka.ms/build-ort-rocm' target='_blank'>here</a>",

"linux,C-API,X86,ROCm":
"Follow build instructions from <a href='https://aka.ms/build-ort-rocm' target='_blank'>here</a>",

"linux,C++,X86,ROCm":
"Follow build instructions from <a href='https://aka.ms/build-ort-rocm' target='_blank'>here</a>",

"linux,Python,ARM64,ACL":
"Follow build instructions from <a href='https://aka.ms/build-ort-acl' target='_blank'>here</a>",

Expand Down