microsoft · faxu · Sep 20, 2022 · Sep 8, 2022 · Sep 9, 2022 · Sep 9, 2022
diff --git a/docs/build/eps.md b/docs/build/eps.md
@@ -650,6 +650,27 @@ See more information on the MIGraphX Execution Provider [here](../execution-prov
 
 Dockerfile instructions are available [here](https://github.com/microsoft/onnxruntime/blob/master/dockerfiles#migraphx).
 
+## AMD ROCm
+
+See more information on the ROCm Execution Provider [here](../execution-providers/ROCm-ExecutionProvider.md).
+
+### Prerequisites
+{: .no_toc }
+
+* Install [ROCm](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.2.3/page/How_to_Install_ROCm.html#_How_to_Install)
+  * The ROCm execution provider for ONNX Runtime is built and tested with ROCm5.2.3
+
+### Build Instructions
+{: .no_toc }
+
+#### Linux
+
+```bash
+./build.sh --config <Release|Debug|RelWithDebInfo> --use_rocm --rocm_home <path to ROCm home>
+```
+
+Dockerfile instructions are available [here](https://github.com/microsoft/onnxruntime/tree/main/dockerfiles#rocm).
+
 ## NNAPI
 
 Usage of NNAPI on Android platforms is via the NNAPI Execution Provider (EP).

diff --git a/docs/build/training.md b/docs/build/training.md
@@ -76,18 +76,16 @@ These dependency versions should reflect what is in the [Dockerfiles](https://gi
 
     This produces the .whl file in `./build/Linux/RelWithDebInfo/dist` for ONNX Runtime Training.
 
-## GPU / ROCM
+## GPU / ROCm
 ### Prerequisites
 {: .no_toc }
 
-The default AMD GPU build requires ROCM software toolkit installed on the system:
+The default AMD GPU build requires ROCm software toolkit installed on the system:
 
-* [ROCM](https://rocmdocs.amd.com/en/latest/)
+* [ROCm](https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.2.3/page/How_to_Install_ROCm.html#_How_to_Install) 5.2.3
 * [OpenMPI](https://www.open-mpi.org/) 4.0.4
   * See [install_openmpi.sh](https://github.com/microsoft/onnxruntime/blob/master/tools/ci_build/github/linux/docker/scripts/install_openmpi.sh)
 
-These dependency versions should reflect what is in the [Dockerfiles](https://github.com/pytorch/ort/tree/main/docker).
-
 ### Build instructions
 {: .no_toc }
 

diff --git a/docs/execution-providers/ROCm-ExecutionProvider.md b/docs/execution-providers/ROCm-ExecutionProvider.md
@@ -0,0 +1,71 @@
+---
+title: ROCm (AMD)
+description: Instructions to execute ONNX Runtime with the AMD ROCm execution provider
+parent: Execution Providers
+nav_order: 11
+redirect_from: /docs/reference/execution-providers/ROCm-ExecutionProvider
+---
+
+# ROCm Execution Provider
+{: .no_toc }
+
+The ROCm Execution Provider enables hardware accelerated computation on AMD ROCm-enabled GPUs. 
+
+## Contents
+{: .no_toc }
+
+* TOC placeholder
+{:toc}
+
+## Install
+
+Pre-built binaries of ONNX Runtime with ROCm EP are published for most language bindings. Please reference [Install ORT](../install).
+
+## Requirements
+
+
+|ONNX Runtime|ROCm|
+|---|---|
+|main|5.2.3|
+|1.12|5.2.3|
+|1.12|5.2|
+
+
+## Build
+For build instructions, please see the [BUILD page](../build/eps.md#amd-rocm). 
+
+## Usage
+
+### C/C++
+
+```c++
+Ort::Env env = Ort::Env{ORT_LOGGING_LEVEL_ERROR, "Default"};
+Ort::SessionOptions so;
+int device_id = 0;
+Ort::ThrowOnError(OrtSessionOptionsAppendExecutionProvider_ROCm(so, device_id));
+```
+
+The C API details are [here](../get-started/with-c.md).
+
+### Python
+Python APIs details are [here](https://onnxruntime.ai/docs/api/python/api_summary.html).
+
+## Performance Tuning
+For performance tuning, please see guidance on this page: [ONNX Runtime Perf Tuning](../performance/tune-performance.md)
+
+## Samples
+
+### Python
+
+```python
+import onnxruntime as ort
+
+model_path = '<path to model>'
+
+providers = [
+    'ROCmExecutionProvider',
+    'CPUExecutionProvider',
+]
+
+session = ort.InferenceSession(model_path, providers=providers)
+```
diff --git a/docs/execution-providers/SNPE-ExecutionProvider.md b/docs/execution-providers/SNPE-ExecutionProvider.md
@@ -2,7 +2,7 @@
 title: SNPE (Qualcomm)
 description: Execute ONNX models with SNPE Execution Provider 
 parent: Execution Providers
-nav_order: 11
+nav_order: 12
 redirect_from: /docs/reference/execution-providers/SNPE-ExecutionProvider
 ---
 

diff --git a/docs/execution-providers/TVM-ExecutionProvider.md b/docs/execution-providers/TVM-ExecutionProvider.md
@@ -2,7 +2,7 @@
 title: TVM (Apache)
 description: Instructions to execute ONNX Runtime with the Apache TVM execution provider
 parent: Execution Providers
-nav_order: 13
+nav_order: 14
 ---
 
 # TVM Execution Provider

diff --git a/docs/execution-providers/TensorRT-ExecutionProvider.md b/docs/execution-providers/TensorRT-ExecutionProvider.md
@@ -2,7 +2,7 @@
 title: TensorRT (NVIDIA)
 description: Instructions to execute ONNX Runtime on NVIDIA GPUs with the TensorRT execution provider
 parent: Execution Providers
-nav_order: 12
+nav_order: 13
 redirect_from: /docs/reference/execution-providers/TensorRT-ExecutionProvider
 ---
 

diff --git a/docs/execution-providers/Vitis-AI-ExecutionProvider.md b/docs/execution-providers/Vitis-AI-ExecutionProvider.md
@@ -2,7 +2,7 @@
 title: Vitis AI
 description: Instructions to execute ONNX Runtime on Xilinx devices with the Vitis AI execution provider
 parent: Execution Providers
-nav_order: 14
+nav_order: 15
 redirect_from: /docs/reference/execution-providers/Vitis-AI-ExecutionProvider
 ---
 

diff --git a/docs/execution-providers/index.md b/docs/execution-providers/index.md
@@ -32,9 +32,9 @@ ONNX Runtime supports many different execution providers today. Some of the EPs
 |[Intel DNNL](../execution-providers/oneDNN-ExecutionProvider.md)|[NVIDIA TensorRT](../execution-providers/TensorRT-ExecutionProvider.md)|[ARM Compute Library](../execution-providers/ACL-ExecutionProvider.md) (*preview*)|[Xilinx Vitis-AI](../execution-providers/Vitis-AI-ExecutionProvider.md) (*preview*)|
 |[TVM](../execution-providers/TVM-ExecutionProvider.md) (*preview*)|[DirectML](../execution-providers/DirectML-ExecutionProvider.md)|[Android Neural Networks API](../execution-providers/NNAPI-ExecutionProvider.md)||
 |[Intel OpenVINO](../execution-providers/OpenVINO-ExecutionProvider.md)|[AMD MIGraphX](../execution-providers/MIGraphX-ExecutionProvider.md) (*preview*)|[ARM-NN](../execution-providers/ArmNN-ExecutionProvider.md) (*preview*)|
-||[TVM](../execution-providers/TVM-ExecutionProvider.md) (*preview*)|[CoreML](../execution-providers/CoreML-ExecutionProvider.md) (*preview*)|
-||[Intel OpenVINO](../execution-providers/OpenVINO-ExecutionProvider.md)|[TVM](../execution-providers/TVM-ExecutionProvider.md) (*preview*)|
-|||[Qualcomm SNPE](../execution-providers/SNPE-ExecutionProvider.md)
+||[AMD ROCm](../execution-providers/ROCm-ExecutionProvider.md) (*preview*)|[CoreML](../execution-providers/CoreML-ExecutionProvider.md) (*preview*)|
+||[TVM](../execution-providers/TVM-ExecutionProvider.md) (*preview*)|[TVM](../execution-providers/TVM-ExecutionProvider.md) (*preview*)|
+||[Intel OpenVINO](../execution-providers/OpenVINO-ExecutionProvider.md)|[Qualcomm SNPE](../execution-providers/SNPE-ExecutionProvider.md)
 
 ### Add an Execution Provider
 

diff --git a/docs/install/index.md b/docs/install/index.md
@@ -250,3 +250,6 @@ The _location_ needs to be specified for any specific version other than the def
 |PyTorch 1.9 (CUDA 11.1)|[**onnxruntime_stable_torch190.cu111**](https://onnxruntimepackages.z14.web.core.windows.net/onnxruntime_stable_torch190.cu111.html)|[onnxruntime_nightly_torch190.cu111](https://onnxruntimepackages.z14.web.core.windows.net/onnxruntime_nightly_torch190.cu111.html)|
 |[*Preview*] PyTorch 1.8.1 (ROCm 4.2)|[**onnxruntime_stable_torch181.rocm42**](https://onnxruntimepackages.z14.web.core.windows.net/onnxruntime_stable_torch181.rocm42.html)|[onnxruntime_nightly_torch181.rocm42](https://onnxruntimepackages.z14.web.core.windows.net/onnxruntime_nightly_torch181.rocm42.html)|
 |[*Preview*] PyTorch 1.9 (ROCm 4.2)|[**onnxruntime_stable_torch190.rocm42**](https://onnxruntimepackages.z14.web.core.windows.net/onnxruntime_stable_torch190.rocm42.html)|[onnxruntime_nightly_torch190.rocm42](https://onnxruntimepackages.z14.web.core.windows.net/onnxruntime_nightly_torch190.rocm42.html)|
+|[*Preview*] PyTorch 1.11 (ROCm 5.1.1)|[**onnxruntime_stable_torch1110.rocm511**](https://download.onnxruntime.ai/onnxruntime_stable_rocm511.html)|[onnxruntime_nightly_torch1110.rocm511](https://download.onnxruntime.ai/onnxruntime_nightly_rocm511.html)|
+|[*Preview*] PyTorch 1.11 (ROCm 5.2)||[onnxruntime_nightly_torch1110.rocm52](https://download.onnxruntime.ai/onnxruntime_nightly_rocm511.html)|
+|[*Preview*] PyTorch 1.12.1 (ROCm 5.2.3)||[onnxruntime_nightly_torch1121.rocm523](https://download.onnxruntime.ai/onnxruntime_nightly_rocm523.html)|
diff --git a/docs/performance/graph-optimizations.md b/docs/performance/graph-optimizations.md
@@ -53,28 +53,28 @@ These are semantics-preserving graph rewrites which remove redundant nodes and r
 
 ### Extended Graph Optimizations
 
-These optimizations include complex node fusions. They are run after graph partitioning and are only applied to the nodes assigned to the CPU or CUDA execution provider. Available extended graph optimizations are as follows:
+These optimizations include complex node fusions. They are run after graph partitioning and are only applied to the nodes assigned to the CPU or CUDA or ROCm execution provider. Available extended graph optimizations are as follows:
 
 | Optimization                    | Execution Provider | Comment                                                                     |
 |---------------------------------|--------------------|-----------------------------------------------------------------------------|
 | GEMM Activation Fusion          | CPU                |                                                                             |
 | Matmul Add Fusion               | CPU                |                                                                             |
 | Conv Activation Fusion          | CPU                |                                                                             |
-| GELU Fusion                     | CPU or CUDA        |                                                                             |
-| Layer Normalization Fusion      | CPU or CUDA        |                                                                             |
-| BERT Embedding Layer Fusion     | CPU or CUDA        | Fuse BERT embedding layer, layer normalization and attention mask length    |
-| Attention Fusion*               | CPU or CUDA        |                                                                             |
-| Skip Layer Normalization Fusion | CPU or CUDA        | Fuse bias of fully connected layer, skip connection and layer normalization |
-| Bias GELU Fusion                | CPU or CUDA        | Fuse bias of fully connected layer and GELU activation                      |
-| GELU Approximation*             | CUDA               | Disabled by default. Enable with [kOrtSessionOptionsEnableGeluApproximation](https://cs.github.com/microsoft/onnxruntime/blob/175acf08f470db0bb2e4b8eefe55cdeb87c8b132/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h?q=kOrtSessionOptionsEnableGeluApproximation#L52) |
+| GELU Fusion                     | CPU, CUDA, ROCm    |                                                                             |
+| Layer Normalization Fusion      | CPU, CUDA, ROCm    |                                                                             |
+| BERT Embedding Layer Fusion     | CPU, CUDA, ROCm    | Fuse BERT embedding layer, layer normalization and attention mask length    |
+| Attention Fusion*               | CPU, CUDA, ROCm    |                                                                             |
+| Skip Layer Normalization Fusion | CPU, CUDA, ROCm    | Fuse bias of fully connected layer, skip connection and layer normalization |
+| Bias GELU Fusion                | CPU, CUDA, ROCm    | Fuse bias of fully connected layer and GELU activation                      |
+| GELU Approximation*             | CUDA, ROCm         | Disabled by default. Enable with [kOrtSessionOptionsEnableGeluApproximation](https://cs.github.com/microsoft/onnxruntime/blob/175acf08f470db0bb2e4b8eefe55cdeb87c8b132/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h?q=kOrtSessionOptionsEnableGeluApproximation#L52) |
 
 
 <details>
   <summary>
     Approximations (click to expand)
   </summary>
 
-  To optimize performance of [BERT](https://en.wikipedia.org/wiki/BERT_(language_model)), approximation is used in GELU Approximation and Attention Fusion for CUDA execution provider. The impact on accuracy is negligible based on our evaluation: F1 score for a BERT model on SQuAD v1.1 is almost same (87.05 vs 87.03).
+  To optimize performance of [BERT](https://en.wikipedia.org/wiki/BERT_(language_model)), approximation is used in GELU Approximation and Attention Fusion for CUDA and ROCm execution provider. The impact on accuracy is negligible based on our evaluation: F1 score for a BERT model on SQuAD v1.1 is almost same (87.05 vs 87.03).
 
 </details>
 

diff --git a/docs/performance/tune-performance.md b/docs/performance/tune-performance.md
@@ -49,7 +49,10 @@ In both cases, you will get a JSON file which contains the detailed performance
 * Type chrome://tracing in the address bar
 * Load the generated JSON file
 
-To profile CUDA kernels, please add the cupti library to your PATH and use the onnxruntime binary built from source with `--enable_cuda_profiling`. Performance numbers from the device will then be attached to those from the host. For example:
+To profile CUDA kernels, please add the cupti library to your PATH and use the onnxruntime binary built from source with `--enable_cuda_profiling`.
+To profile ROCm kernels, please add the roctracer library to your PATH and use the onnxruntime binary built from source with `--enable_rocm_profiling`. 
+
+Performance numbers from the device will then be attached to those from the host. For example:
 
 ```json
 {"cat":"Node", "name":"Add_1234", "dur":17, ...}

diff --git a/index.html b/index.html
@@ -245,35 +245,37 @@ <h3 id="selectHardwareAcceleration">Hardware Acceleration</h3>
 												</div>
 												<div class="col-md-9 r-content pr-0 pl-md-4" role="listbox" id="listbox-4" aria-labelledby="selectHardwareAcceleration" aria-describedby="decriptionHardwareAcceleration">
 													<div class="row hardwareAcceleration">
-														<div class="col-lg-2dot5 col r-option version" role="option" tabindex="0" aria-selected="false" id="DefaultCPU">
+														<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="0" aria-selected="false" id="DefaultCPU">
 															<span>Default&nbsp; <abbr>CPU</abbr></span></div>														
-														<div class="col-lg-2dot5 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="CoreML">
+														<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="CoreML">
 															<span>CoreML </span></div>	
-														<div class="col-lg-2dot5 col r-option version" role="option" tabindex="-1" aria-selected="false" id="CUDA">
+														<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="CUDA">
 															<span><abbr>CUDA</abbr></span></div>
-														<div class="col-lg-2dot5 col r-option version" role="option" tabindex="-1" aria-selected="false" id="DirectML">
+														<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="DirectML">
 															<span>Direct<abbr>ML</abbr></span></div>														
-														<div class="col-lg-2dot5 col r-option version" role="option" tabindex="-1" aria-selected="false" id="DNNL">
+														<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="DNNL">
 															<span><abbr>oneDNN</abbr></span></div>															
-														<div class="col-lg-2dot5 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="OpenVINO">
+														<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="OpenVINO">
 															<span>OpenVINO</span></div>
-														<div class="col-lg-2dot5 col r-option version" role="option" tabindex="-1" aria-selected="false" id="TensorRT">
+														<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="TensorRT">
 															<span>Tensor<abbr>RT</abbr></span></div>									
-														<div class="col-lg-2dot5 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="NNAPI">
+														<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="NNAPI">
 															<span>NNAPI </span></div>
-														<div class="col-lg-2dot5 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="ACL">
+														<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="ACL">
 															<span>ACL (Preview)</span></div>	
-														<div class="col-lg-2dot5 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="ArmNN">
+														<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="ArmNN">
 															<span>ArmNN (Preview)</span></div>												
-														<div class="col-lg-2dot5 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="MIGraphX">
-															<span>MIGraphX (Preview)</span></div>		
-														<div class="col-lg-2dot5 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="RockchipNPU">
+														<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="MIGraphX">
+															<span>MIGraphX (Preview)</span></div>	
+														<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="ROCm">
+															<span>ROCm (Preview)</span></div>		
+														<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="RockchipNPU">
 																<span>Rockchip NPU (Preview)</span></div>	
-														<div class="col-lg-2dot5 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="SNPE">
+														<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="SNPE">
 															<span>SNPE</span></div>	
-														<div class="col-lg-2dot5 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="TVM">
+														<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="TVM">
 															<span>TVM (Preview)</span></div>		
-														<div class="col-lg-2dot5 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="VitisAI">
+														<div class="col-lg-3 col-md-3 r-option version" role="option" tabindex="-1" aria-selected="false" id="VitisAI">
 															<span>Vitis AI (Preview)</span></div>	
 														</div>
 												</div>

diff --git a/js/script.js b/js/script.js
@@ -1048,6 +1048,15 @@ var validCombos = {
     "linux,C++,X86,MIGraphX":
         "Follow build instructions from <a href='https://aka.ms/build-ort-migraphx' target='_blank'>here</a>",
 
+    "linux,Python,X86,ROCm":
+        "Follow build instructions from <a href='https://aka.ms/build-ort-rocm' target='_blank'>here</a>",
+
+    "linux,C-API,X86,ROCm":
+        "Follow build instructions from <a href='https://aka.ms/build-ort-rocm' target='_blank'>here</a>",
+
+    "linux,C++,X86,ROCm":
+        "Follow build instructions from <a href='https://aka.ms/build-ort-rocm' target='_blank'>here</a>",
+
     "linux,Python,ARM64,ACL":
         "Follow build instructions from <a href='https://aka.ms/build-ort-acl' target='_blank'>here</a>",