microsoft · faxu · Mar 28, 2022 · Mar 26, 2022 · Mar 27, 2022 · Mar 28, 2022
diff --git a/docs/execution-providers/OpenVINO-ExecutionProvider.md b/docs/execution-providers/OpenVINO-ExecutionProvider.md
@@ -73,7 +73,7 @@ Use `AUTO:<device 1><device 2>..` as the device name to delegate selection of an
 From the application point of view, this is just another device that handles all accelerators in full system.
 
 For more information on Auto-Device plugin of OpenVINO, please refer to the
-[Intel OpenVINO Auto Device Plugin](https://docs.openvino.ai/latest/openvino_docs_IE_DG_supported_plugins_AUTO.html).
+[Intel OpenVINO Auto Device Plugin](https://docs.openvino.ai/latest/openvino_docs_OV_UG_Hetero_execution.html).
 
 ### Model caching feature for OpenVINO EP
 
@@ -84,7 +84,7 @@ This feature enables users to save and load the blobs directly. These pre-compil
 
 #### CL Cache capability for iGPU
 
-Starting from version 2021.4 OpenVINO supports [model caching](https://docs.openvino.ai/latest/openvino_docs_IE_DG_Model_caching_overview.html).
+Starting from version 2021.4 OpenVINO supports [model caching](https://docs.openvino.ai/latest/openvino_docs_OV_UG_Model_caching_overview.html).
 
 This feature enables users to save and load the cl_cache files directly. These cl_cache files can be directly loaded on to igpu hardware device target and inferencing can be done. This feature is only supported on iGPU hardware device target.
 

diff --git a/docs/performance/tune-performance.md b/docs/performance/tune-performance.md
@@ -349,17 +349,24 @@ While using the CUDA EP, ORT supports the usage of [CUDA Graphs](https://develop
 Currently, there are some constraints with regards to using the CUDA Graphs feature which are listed below:
 
 1) Models with control-flow ops (i.e.) models with `If`, `Loop`, and `Scan` ops are not supported
+
 2) Usage of CUDA Graphs is limited to models where-in all the model ops (graph nodes) can be partitioned to the CUDA EP
-3) The input/output types of models need to be tensors 
+
+3) The input/output types of models need to be tensors
+
 4) Shapes of inputs/outputs cannot change across inference calls. Dynamic shape models are supported - the only constraint is that the input/output shapes should be the same across all inference calls
+
 5) By design, [CUDA Graphs](https://developer.nvidia.com/blog/cuda-10-features-revealed/) is designed to read from/write to the same CUDA virtual memory addresses during the graph replaying step as it does during the graph capturing step. Due to this requirement, usage of this feature requires using IOBinding so as to bind memory which will be used as input(s)/output(s) for the CUDA Graph machinery to read from/write to(please see samples below)
+
 6) While updating the input(s) for subsequent inference calls, the fresh input(s) need to be copied over to the corresponding CUDA memory location(s) of the bound `OrtValue` input(s) (please see samples below to see how this can be achieved). This is due to the fact that the "graph replay" will require reading inputs from the same CUDA virtual memory addresses
+
 7) Multi-threaded usage is not supported currently (i.e.) `Run()` MAY NOT be invoked on the same `InferenceSession` object from multiple threads while using CUDA Graphs
 
 NOTE: The very first `Run()` performs a variety of tasks under the hood like making CUDA memory allocations, capturing the CUDA graph for the model, and then performing a graph replay to ensure that the graph runs. Due to this, the latency associated with the first `Run()` is bound to be high. The subsequent `Run()`s only perform graph replays of the graph captured and cached in the first `Run()`. 
 
 * Python
-```
+
+```python
 providers = [("CUDAExecutionProvider", {"enable_cuda_graph": '1'})]
 sess_options = ort.SessionOptions()
 sess = ort.InferenceSession("my_model.onnx",  sess_options = sess_options, providers=providers)
@@ -373,26 +380,27 @@ y_ortvalue = onnxrt.OrtValue.ortvalue_from_numpy(y, 'cuda', 0)
 session = onnxrt.InferenceSession("matmul_2.onnx", providers=providers)
 io_binding = session.io_binding()
 
-'''Bind the input and output'''
+# Bind the input and output
 io_binding.bind_ortvalue_input('X', x_ortvalue)
 io_binding.bind_ortvalue_output('Y', y_ortvalue)
 
-'''One regular run for the necessary memory allocation and cuda graph capturing'''
+# One regular run for the necessary memory allocation and cuda graph capturing
 session.run_with_iobinding(io_binding)
 expected_y = np.array([[5.0], [11.0], [17.0]], dtype=np.float32)
 np.testing.assert_allclose(expected_y, y_ortvalue.numpy(), rtol=1e-05, atol=1e-05)
 
-'''After capturing, CUDA graph replay happens from this Run onwards'''
+# After capturing, CUDA graph replay happens from this Run onwards
 session.run_with_iobinding(io_binding)
 np.testing.assert_allclose(expected_y, y_ortvalue.numpy(), rtol=1e-05, atol=1e-05)
 
-'''Update input and then replay CUDA graph with the updated input'''
+# Update input and then replay CUDA graph with the updated input
 x_ortvalue.update_inplace(np.array([[10.0, 20.0], [30.0, 40.0], [50.0, 60.0]], dtype=np.float32))
 session.run_with_iobinding(io_binding)
 ```
 
 * C/C++
-```
+
+```c++
 const auto& api = Ort::GetApi();
 
 struct CudaMemoryDeleter {