From bd445ad56b9a3eb42d0bc0284f6d6fc7d02cefea Mon Sep 17 00:00:00 2001
From: Kunal Vaishnavi <kvaishnavi@microsoft.com>
Date: Sat, 19 Apr 2025 02:13:44 +0000
Subject: [PATCH 1/6] Add Gemma-3 tutorial for generating ONNX models

---
 examples/python/gemma-3-vision-tutorial.md | 162 +++++++++++++++++++++
 1 file changed, 162 insertions(+)
 create mode 100644 examples/python/gemma-3-vision-tutorial.md

diff --git a/examples/python/gemma-3-vision-tutorial.md b/examples/python/gemma-3-vision-tutorial.md
new file mode 100644
index 0000000000..49015f8ea9
--- /dev/null
+++ b/examples/python/gemma-3-vision-tutorial.md
@@ -0,0 +1,162 @@
+# Build your Gemma-3 vision ONNX models for ONNX Runtime GenAI
+
+## Steps
+0. [Pre-requisites](#pre-requisites)
+1. [Prepare Local Workspace](#prepare-local-workspace)
+2. [Build ONNX Components](#build-onnx-components)
+3. [Build ORT GenAI Configs](#build-genai_configjson-and-processor_configjson)
+4. [Run Gemma-3 vision ONNX models](#run-Gemma-3-vision-onnx-models)
+
+## 0. Pre-requisites
+
+Please ensure you have the following Python packages installed to create the ONNX models.
+
+- `huggingface_hub[cli]`
+- `numpy`
+  - Please ensure that your `numpy` version is less than 2.0.0 after installing all of the pre-requisite packages. If it is greater than or equal to 2.0.0, please uninstall `numpy` with `pip uninstall -y numpy` and install an older version (e.g. `pip install numpy==1.26.4`).
+- `onnx`
+- `onnxruntime` and `onnxruntime-genai`
+  - ONNX Runtime: Please install the latest nightly version. To ensure the right version is installed, please install ONNX Runtime GenAI first. Then you can uninstall the stable version of ONNX Runtime that gets auto-installed as a dependency.
+  - ONNX Runtime GenAI: Please build from source until the latest changes are published in a stable release package. The build instructions can be found [here](https://onnxruntime.ai/docs/genai/howto/build-from-source.html).
+  
+  - For CPU:
+  ```bash
+  # 1. Build ONNX Runtime GenAI from source for CPU
+  # Instructions: https://onnxruntime.ai/docs/genai/howto/build-from-source.html
+
+  # 2. Install ONNX Runtime GenAI wheel produced by build.py
+  pip install build/wheel/*.whl
+
+  # 3. Uninstall stable version of ONNX Runtime that is auto-installed by ONNX Runtime GenAI
+  pip uninstall -y onnxruntime
+
+  # 4. Install nightly version of ONNX Runtime
+  pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ --pre onnxruntime
+  ```
+
+  - For CUDA:
+  ```bash
+  # 1. Build ONNX Runtime GenAI from source for CUDA
+  # Instructions: https://onnxruntime.ai/docs/genai/howto/build-from-source.html
+
+  # 2. Install ONNX Runtime GenAI wheel produced by build.py
+  pip install build/wheel/*.whl
+
+  # 3. Uninstall stable version of ONNX Runtime that is auto-installed by ONNX Runtime GenAI
+  pip uninstall -y onnxruntime-gpu
+
+  # 4. Install nightly version of ONNX Runtime
+  pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ --pre onnxruntime-gpu
+  ```
+
+  - For DirectML: 
+  ```bash
+  # 1. Build ONNX Runtime GenAI from source for DirectML
+  # Instructions: https://onnxruntime.ai/docs/genai/howto/build-from-source.html
+
+  # 2. Install ONNX Runtime GenAI wheel produced by build.py
+  pip install build/wheel/*.whl
+
+  # 3. Uninstall stable version of ONNX Runtime that is auto-installed by ONNX Runtime GenAI
+  pip uninstall -y onnxruntime-directml
+
+  # 4. Install nightly version of ONNX Runtime
+  pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ --pre onnxruntime-directml
+  ```
+- `onnxscript`
+  - Please install the latest nightly version of onnxscript with `pip install --pre onnxscript`.
+- `pillow`
+- `requests`
+- `torch`
+  - Please install torch by following the [instructions](https://pytorch.org/get-started/locally/). For getting ONNX models that can run on CUDA or DirectML, please install torch with CUDA and ensure the CUDA version you choose in the instructions is the one you have installed.
+  - Please ensure that your `torch` version is greater than or equal to 2.7.0 after installing all of the pre-requisite packages. If it is less than 2.7.0, please uninstall `torch`, `torchaudio`, and `torchvision` with `pip uninstall -y torch torchaudio torchvision` and install a newer version (e.g. `pip install torch==2.7.0`).
+- `torchvision`
+- `transformers`
+
+## 1. Prepare Local Workspace
+
+Gemma-3 vision is a multimodal model consisting of several models internally. In order to run Gemma-3 vision with ONNX Runtime GenAI, each internal model needs to be created as a separate ONNX model. To get these ONNX models, some of the original PyTorch modeling files have to be modified.
+
+### Download the original PyTorch modeling files
+
+First, let's download the original PyTorch modeling files.
+
+```bash
+# Download PyTorch model and files
+$ mkdir -p gemma3-vision-it/pytorch
+$ cd gemma3-vision-it/pytorch
+
+# Inside the {} below, choose between one of the following official parameter sizes (`4b`, `12b`, `27b`)
+$ huggingface-cli download google/gemma-3-{}-it --local-dir .
+```
+
+### Download the modified PyTorch modeling files
+
+Now, let's download the modified PyTorch modeling files that have been uploaded to the Gemma-3 vision ONNX repository on Hugging Face.
+
+```bash
+# Download modified files
+$ cd ..
+$ huggingface-cli download onnxruntime/Gemma-3-ONNX --include onnx/* --local-dir .
+```
+
+### Replace original PyTorch repo files with modified files
+
+```bash
+# In our `config.json`, we added `_attn_implementation: eager`
+# Inside the {} below, choose between one of the following official parameter sizes (`4b`, `12b`, `27b`)
+$ rm pytorch/config.json
+$ mv onnx/{}/config.json pytorch/
+
+# We need a copy of `configuration_gemma3.py` to load any classes modified for exporting to ONNX
+$ mv onnx/configuration_gemma3.py pytorch/
+
+# In our `modeling_gemma3.py`, we modified some classes for exporting to ONNX
+$ mv onnx/modeling_gemma3.py pytorch/
+
+# Move the builder script to the root directory
+$ mv onnx/builder.py .
+
+# Delete empty `onnx` directory
+$ rm -rf onnx/
+```
+
+If you have your own fine-tuned version of Gemma-3 vision, you can now replace the `*.safetensors` files in the `pytorch` folder with your `*.safetensors` files.
+
+## 2. Build ONNX Components
+
+Here are some examples of how you can build the components as INT4 ONNX models.
+
+```bash
+# Build INT4 components with FP32 inputs/outputs for CPU
+$ python3 builder.py --input ./pytorch --output ./cpu --precision fp32 --execution_provider cpu
+```
+
+```bash
+# Build INT4 components with FP16 inputs/outputs for CUDA
+$ python3 builder.py --input ./pytorch --output ./cuda --precision fp16 --execution_provider cuda
+```
+
+```bash
+# Build INT4 components with FP16 inputs/outputs for DirectML
+$ python3 builder.py --input ./pytorch --output ./dml --precision fp16 --execution_provider dml
+```
+
+## 3. Build `genai_config.json` and `processor_config.json`
+
+Currently, both JSON files needed to run with ONNX Runtime GenAI are created by hand. Because the fields have been hand-crafted, it is recommended that you copy the already-uploaded JSON files and modify the fields as needed for your fine-tuned Gemma-3 vision model. [Here](https://huggingface.co/onnxruntime/gemma-3-it-onnx/blob/main/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/genai_config.json) is an example for `genai_config.json` and [here](https://huggingface.co/onnxruntime/gemma-3-it-onnx/blob/main/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/processor_config.json) is an example for `processor_config.json`.
+
+## 4. Run Gemma-3 vision ONNX models
+
+[Here](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi3v.py) is an example of how you can run your Gemma-3 vision model with ONNX Runtime GenAI.
+
+### CUDA
+```bash
+$ python .\phi3v.py -m .\gemma3-vision-it\cuda -e cuda
+```
+
+### DirectML
+
+```bash
+$ python .\phi3v.py -m .\gemma3-vision-it\dml -e dml
+```

From 025c24554f538db20f940a7086ab9b6e2c7f9b2c Mon Sep 17 00:00:00 2001
From: Kunal Vaishnavi <kvaishnavi@microsoft.com>
Date: Fri, 13 Jun 2025 07:06:51 +0000
Subject: [PATCH 2/6] Update Gemma-3 vision tutorial

---
 examples/python/gemma-3-vision-tutorial.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/examples/python/gemma-3-vision-tutorial.md b/examples/python/gemma-3-vision-tutorial.md
index 49015f8ea9..37311c4042 100644
--- a/examples/python/gemma-3-vision-tutorial.md
+++ b/examples/python/gemma-3-vision-tutorial.md
@@ -148,15 +148,15 @@ Currently, both JSON files needed to run with ONNX Runtime GenAI are created by
 
 ## 4. Run Gemma-3 vision ONNX models
 
-[Here](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi3v.py) is an example of how you can run your Gemma-3 vision model with ONNX Runtime GenAI.
+[Here](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/model-vision.py) is an example of how you can run your Gemma-3 vision model with ONNX Runtime GenAI.
 
 ### CUDA
 ```bash
-$ python .\phi3v.py -m .\gemma3-vision-it\cuda -e cuda
+$ python .\model-vision.py -m .\gemma3-vision-it\cuda -e cuda
 ```
 
 ### DirectML
 
 ```bash
-$ python .\phi3v.py -m .\gemma3-vision-it\dml -e dml
+$ python .\model-vision.py -m .\gemma3-vision-it\dml -e dml
 ```

From 1075486d6832b185271d8e1a7b7c3ef7eb9c79fb Mon Sep 17 00:00:00 2001
From: Kunal Vaishnavi <kvaishnavi@microsoft.com>
Date: Mon, 18 Aug 2025 20:04:35 +0000
Subject: [PATCH 3/6] Update package versions in tutorial

---
 examples/python/gemma-3-vision-tutorial.md | 67 ++++++++++------------
 examples/python/phi-4-multi-modal.md       | 55 ++++++++----------
 2 files changed, 52 insertions(+), 70 deletions(-)

diff --git a/examples/python/gemma-3-vision-tutorial.md b/examples/python/gemma-3-vision-tutorial.md
index 37311c4042..03f5c869a9 100644
--- a/examples/python/gemma-3-vision-tutorial.md
+++ b/examples/python/gemma-3-vision-tutorial.md
@@ -17,52 +17,43 @@ Please ensure you have the following Python packages installed to create the ONN
 - `onnx`
 - `onnxruntime` and `onnxruntime-genai`
   - ONNX Runtime: Please install the latest nightly version. To ensure the right version is installed, please install ONNX Runtime GenAI first. Then you can uninstall the stable version of ONNX Runtime that gets auto-installed as a dependency.
-  - ONNX Runtime GenAI: Please build from source until the latest changes are published in a stable release package. The build instructions can be found [here](https://onnxruntime.ai/docs/genai/howto/build-from-source.html).
-  
-  - For CPU:
-  ```bash
-  # 1. Build ONNX Runtime GenAI from source for CPU
-  # Instructions: https://onnxruntime.ai/docs/genai/howto/build-from-source.html
+  - ONNX Runtime GenAI: Please install the latest stable release package.
 
-  # 2. Install ONNX Runtime GenAI wheel produced by build.py
-  pip install build/wheel/*.whl
+    - For CPU:
+    ```bash
+    # 1. Install ONNX Runtime GenAI wheel
+    pip install onnxruntime-genai
 
-  # 3. Uninstall stable version of ONNX Runtime that is auto-installed by ONNX Runtime GenAI
-  pip uninstall -y onnxruntime
+    # 2. Uninstall stable version of ONNX Runtime that is auto-installed by ONNX Runtime GenAI
+    pip uninstall -y onnxruntime
 
-  # 4. Install nightly version of ONNX Runtime
-  pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ --pre onnxruntime
-  ```
+    # 3. Install nightly version of ONNX Runtime
+    pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ --pre onnxruntime
+    ```
 
-  - For CUDA:
-  ```bash
-  # 1. Build ONNX Runtime GenAI from source for CUDA
-  # Instructions: https://onnxruntime.ai/docs/genai/howto/build-from-source.html
+    - For CUDA:
+    ```bash
+    # 1. Install ONNX Runtime GenAI wheel
+    pip install onnxruntime-genai-cuda
 
-  # 2. Install ONNX Runtime GenAI wheel produced by build.py
-  pip install build/wheel/*.whl
+    # 2. Uninstall stable version of ONNX Runtime that is auto-installed by ONNX Runtime GenAI
+    pip uninstall -y onnxruntime-gpu
 
-  # 3. Uninstall stable version of ONNX Runtime that is auto-installed by ONNX Runtime GenAI
-  pip uninstall -y onnxruntime-gpu
+    # 3. Install nightly version of ONNX Runtime
+    pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ --pre onnxruntime-gpu
+    ```
 
-  # 4. Install nightly version of ONNX Runtime
-  pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ --pre onnxruntime-gpu
-  ```
+    - For DirectML:
+    ```bash
+    # 1. Install ONNX Runtime GenAI wheel
+    pip install onnxruntime-genai-directml
 
-  - For DirectML: 
-  ```bash
-  # 1. Build ONNX Runtime GenAI from source for DirectML
-  # Instructions: https://onnxruntime.ai/docs/genai/howto/build-from-source.html
+    # 2. Uninstall stable version of ONNX Runtime that is auto-installed by ONNX Runtime GenAI
+    pip uninstall -y onnxruntime-directml
 
-  # 2. Install ONNX Runtime GenAI wheel produced by build.py
-  pip install build/wheel/*.whl
-
-  # 3. Uninstall stable version of ONNX Runtime that is auto-installed by ONNX Runtime GenAI
-  pip uninstall -y onnxruntime-directml
-
-  # 4. Install nightly version of ONNX Runtime
-  pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ --pre onnxruntime-directml
-  ```
+    # 3. Install nightly version of ONNX Runtime
+    pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ --pre onnxruntime-directml
+    ```
 - `onnxscript`
   - Please install the latest nightly version of onnxscript with `pip install --pre onnxscript`.
 - `pillow`
@@ -144,7 +135,7 @@ $ python3 builder.py --input ./pytorch --output ./dml --precision fp16 --executi
 
 ## 3. Build `genai_config.json` and `processor_config.json`
 
-Currently, both JSON files needed to run with ONNX Runtime GenAI are created by hand. Because the fields have been hand-crafted, it is recommended that you copy the already-uploaded JSON files and modify the fields as needed for your fine-tuned Gemma-3 vision model. [Here](https://huggingface.co/onnxruntime/gemma-3-it-onnx/blob/main/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/genai_config.json) is an example for `genai_config.json` and [here](https://huggingface.co/onnxruntime/gemma-3-it-onnx/blob/main/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/processor_config.json) is an example for `processor_config.json`.
+Currently, both JSON files needed to run with ONNX Runtime GenAI are created by hand. Because the fields have been hand-crafted, it is recommended that you copy the already-uploaded JSON files and modify the fields as needed for your fine-tuned Gemma-3 vision model. [Here](https://huggingface.co/onnxruntime/Gemma-3-ONNX/blob/main/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/genai_config.json) is an example for `genai_config.json` and [here](https://huggingface.co/onnxruntime/Gemma-3-ONNX/blob/main/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/processor_config.json) is an example for `processor_config.json`.
 
 ## 4. Run Gemma-3 vision ONNX models
 
diff --git a/examples/python/phi-4-multi-modal.md b/examples/python/phi-4-multi-modal.md
index cdbccaf0bc..184d54d742 100644
--- a/examples/python/phi-4-multi-modal.md
+++ b/examples/python/phi-4-multi-modal.md
@@ -18,50 +18,41 @@ Please ensure you have the following Python packages installed to create the ONN
 - `onnx`
 - `onnxruntime` and `onnxruntime-genai`
     - ONNX Runtime: Please install the latest nightly version. To ensure the right version is installed, please install ONNX Runtime GenAI first. Then you can uninstall the stable version of ONNX Runtime that gets auto-installed as a dependency.
-    - ONNX Runtime GenAI: Please build from source until the latest changes are published in a stable release package. The build instructions can be found [here](https://onnxruntime.ai/docs/genai/howto/build-from-source.html).
-    
+    - ONNX Runtime GenAI: Please install the latest stable release package.
+
     - For CPU:
     ```bash
-    # 1. Build ONNX Runtime GenAI from source for CPU
-    # Instructions: https://onnxruntime.ai/docs/genai/howto/build-from-source.html
-
-    # 2. Install ONNX Runtime GenAI wheel produced by build.py
-    pip install build/wheel/*.whl
+    # 1. Install ONNX Runtime GenAI wheel
+    pip install onnxruntime-genai
 
-    # 3. Uninstall stable version of ONNX Runtime that is auto-installed by ONNX Runtime GenAI
+    # 2. Uninstall stable version of ONNX Runtime that is auto-installed by ONNX Runtime GenAI
     pip uninstall -y onnxruntime
 
-    # 4. Install nightly version of ONNX Runtime
+    # 3. Install nightly version of ONNX Runtime
     pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ --pre onnxruntime
     ```
 
     - For CUDA:
     ```bash
-    # 1. Build ONNX Runtime GenAI from source for CUDA
-    # Instructions: https://onnxruntime.ai/docs/genai/howto/build-from-source.html
+    # 1. Install ONNX Runtime GenAI wheel
+    pip install onnxruntime-genai-cuda
 
-    # 2. Install ONNX Runtime GenAI wheel produced by build.py
-    pip install build/wheel/*.whl
-
-    # 3. Uninstall stable version of ONNX Runtime that is auto-installed by ONNX Runtime GenAI
+    # 2. Uninstall stable version of ONNX Runtime that is auto-installed by ONNX Runtime GenAI
     pip uninstall -y onnxruntime-gpu
 
-    # 4. Install nightly version of ONNX Runtime
+    # 3. Install nightly version of ONNX Runtime
     pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ --pre onnxruntime-gpu
     ```
 
     - For DirectML: 
     ```bash
-    # 1. Build ONNX Runtime GenAI from source for DirectML
-    # Instructions: https://onnxruntime.ai/docs/genai/howto/build-from-source.html
-
-    # 2. Install ONNX Runtime GenAI wheel produced by build.py
-    pip install build/wheel/*.whl
+    # 1. Install ONNX Runtime GenAI wheel
+    pip install onnxruntime-genai-directml
 
-    # 3. Uninstall stable version of ONNX Runtime that is auto-installed by ONNX Runtime GenAI
+    # 2. Uninstall stable version of ONNX Runtime that is auto-installed by ONNX Runtime GenAI
     pip uninstall -y onnxruntime-directml
 
-    # 4. Install nightly version of ONNX Runtime
+    # 3. Install nightly version of ONNX Runtime
     pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ --pre onnxruntime-directml
     ```
 - `onnxscript`
@@ -71,34 +62,34 @@ Please ensure you have the following Python packages installed to create the ONN
 - `scipy`
 - `soundfile`
 - `torch`
-    - Please install the Jan 25, 2025 nightly version. You can install torch by following the [instructions](https://pytorch.org/get-started/locally/). For getting ONNX models that can run on CUDA or DirectML, please install torch with CUDA and ensure the CUDA version you choose in the instructions is the one you have installed.
+    - Please install the latest nightly version. You can install torch by following the [instructions](https://pytorch.org/get-started/locally/). For getting ONNX models that can run on CUDA or DirectML, please install torch with CUDA and ensure the CUDA version you choose in the instructions is the one you have installed.
     - For CPU:
     ```bash
-    pip install torch==2.7.0.dev20250125
+    pip install torch
     ```
     - For CUDA:
     ```bash
-    pip install torch==2.7.0.dev20250125+cu124 --index-url https://download.pytorch.org/whl/nightly/cu124
+    pip install torch --index-url https://download.pytorch.org/whl/nightly/cu124
     ```
 - `torchaudio`
-    - Please install the Jan 25, 2025 nightly version. You can install torchaudio by following the [instructions](https://pytorch.org/get-started/locally/). For getting ONNX models that can run on CUDA or DirectML, please install torchaudio with CUDA and ensure the CUDA version you choose in the instructions is the one you have installed.
+    - Please install the latest nightly version. You can install torchaudio by following the [instructions](https://pytorch.org/get-started/locally/). For getting ONNX models that can run on CUDA or DirectML, please install torchaudio with CUDA and ensure the CUDA version you choose in the instructions is the one you have installed.
     - For CPU:
     ```bash
-    pip install torchaudio==2.6.0.dev20250125+cu124
+    pip install torchaudio
     ```
     - For CUDA:
     ```bash
-    pip install torchaudio==2.6.0.dev20250125+cu124 --index-url https://download.pytorch.org/whl/nightly/cu124
+    pip install torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
     ```
 - `torchvision` 
-    - Please install the Jan 25, 2025 nightly version. You can install torchvision by following the [instructions](https://pytorch.org/get-started/locally/). For getting ONNX models that can run on CUDA or DirectML, please install torchvision with CUDA and ensure the CUDA version you choose in the instructions is the one you have installed.
+    - Please install the latest nightly version. You can install torchvision by following the [instructions](https://pytorch.org/get-started/locally/). For getting ONNX models that can run on CUDA or DirectML, please install torchvision with CUDA and ensure the CUDA version you choose in the instructions is the one you have installed.
     - For CPU:
     ```bash
-    pip install torchvision==0.22.0.dev20250125+cu124
+    pip install torchvision
     ```
     - For CUDA:
     ```bash
-    pip install torchvision==0.22.0.dev20250125+cu124 --index-url https://download.pytorch.org/whl/nightly/cu124
+    pip install torchvision --index-url https://download.pytorch.org/whl/nightly/cu124
     ```
 - `transformers`
 

From e2967edb2c88fd7bea013037369264c70b23a55e Mon Sep 17 00:00:00 2001
From: Kunal Vaishnavi <kvaishnavi@microsoft.com>
Date: Thu, 25 Sep 2025 17:37:30 +0000
Subject: [PATCH 4/6] Add BF16 CUDA example

---
 examples/python/gemma-3-vision-tutorial.md | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/examples/python/gemma-3-vision-tutorial.md b/examples/python/gemma-3-vision-tutorial.md
index 03f5c869a9..7c30110b49 100644
--- a/examples/python/gemma-3-vision-tutorial.md
+++ b/examples/python/gemma-3-vision-tutorial.md
@@ -128,6 +128,11 @@ $ python3 builder.py --input ./pytorch --output ./cpu --precision fp32 --executi
 $ python3 builder.py --input ./pytorch --output ./cuda --precision fp16 --execution_provider cuda
 ```
 
+```bash
+# Build INT4 components with BF16 inputs/outputs for CUDA
+$ python3 builder.py --input ./pytorch --output ./cuda --precision bf16 --execution_provider cuda
+```
+
 ```bash
 # Build INT4 components with FP16 inputs/outputs for DirectML
 $ python3 builder.py --input ./pytorch --output ./dml --precision fp16 --execution_provider dml

From e2aa0a6d181906e1def82467077596c2b7ceb01b Mon Sep 17 00:00:00 2001
From: Kunal Vaishnavi <kvaishnavi@microsoft.com>
Date: Thu, 25 Sep 2025 21:41:32 +0000
Subject: [PATCH 5/6] Use nightly ORT GenAI until next stable release

---
 examples/python/gemma-3-vision-tutorial.md | 23 +++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/examples/python/gemma-3-vision-tutorial.md b/examples/python/gemma-3-vision-tutorial.md
index 7c30110b49..3a74623567 100644
--- a/examples/python/gemma-3-vision-tutorial.md
+++ b/examples/python/gemma-3-vision-tutorial.md
@@ -17,12 +17,12 @@ Please ensure you have the following Python packages installed to create the ONN
 - `onnx`
 - `onnxruntime` and `onnxruntime-genai`
   - ONNX Runtime: Please install the latest nightly version. To ensure the right version is installed, please install ONNX Runtime GenAI first. Then you can uninstall the stable version of ONNX Runtime that gets auto-installed as a dependency.
-  - ONNX Runtime GenAI: Please install the latest stable release package.
+  - ONNX Runtime GenAI: Please install the latest nightly version.
 
     - For CPU:
     ```bash
-    # 1. Install ONNX Runtime GenAI wheel
-    pip install onnxruntime-genai
+    # 1. Install nightly version of ONNX Runtime GenAI
+    pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ --pre onnxruntime-genai
 
     # 2. Uninstall stable version of ONNX Runtime that is auto-installed by ONNX Runtime GenAI
     pip uninstall -y onnxruntime
@@ -33,8 +33,8 @@ Please ensure you have the following Python packages installed to create the ONN
 
     - For CUDA:
     ```bash
-    # 1. Install ONNX Runtime GenAI wheel
-    pip install onnxruntime-genai-cuda
+    # 1. Install nightly version of ONNX Runtime GenAI
+    pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ --pre onnxruntime-genai-cuda
 
     # 2. Uninstall stable version of ONNX Runtime that is auto-installed by ONNX Runtime GenAI
     pip uninstall -y onnxruntime-gpu
@@ -45,8 +45,8 @@ Please ensure you have the following Python packages installed to create the ONN
 
     - For DirectML:
     ```bash
-    # 1. Install ONNX Runtime GenAI wheel
-    pip install onnxruntime-genai-directml
+    # 1. Install nightly version of ONNX Runtime GenAI
+    pip install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ --pre onnxruntime-genai-directml
 
     # 2. Uninstall stable version of ONNX Runtime that is auto-installed by ONNX Runtime GenAI
     pip uninstall -y onnxruntime-directml
@@ -146,13 +146,18 @@ Currently, both JSON files needed to run with ONNX Runtime GenAI are created by
 
 [Here](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/model-vision.py) is an example of how you can run your Gemma-3 vision model with ONNX Runtime GenAI.
 
+### CPU
+```bash
+$ python model-vision.py -m ./gemma3-vision-it/cpu -e cpu
+```
+
 ### CUDA
 ```bash
-$ python .\model-vision.py -m .\gemma3-vision-it\cuda -e cuda
+$ python model-vision.py -m ./gemma3-vision-it/cuda -e cuda
 ```
 
 ### DirectML
 
 ```bash
-$ python .\model-vision.py -m .\gemma3-vision-it\dml -e dml
+$ python model-vision.py -m ./gemma3-vision-it/dml -e dml
 ```

From 3587cbd69d6ff33420d2045b3442a38dd7526706 Mon Sep 17 00:00:00 2001
From: Kunal Vaishnavi <kvaishnavi@microsoft.com>
Date: Fri, 26 Sep 2025 00:20:04 +0000
Subject: [PATCH 6/6] Fix example paths to JSON files

---
 examples/python/gemma-3-vision-tutorial.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/python/gemma-3-vision-tutorial.md b/examples/python/gemma-3-vision-tutorial.md
index 3a74623567..5b235f9b47 100644
--- a/examples/python/gemma-3-vision-tutorial.md
+++ b/examples/python/gemma-3-vision-tutorial.md
@@ -140,7 +140,7 @@ $ python3 builder.py --input ./pytorch --output ./dml --precision fp16 --executi
 
 ## 3. Build `genai_config.json` and `processor_config.json`
 
-Currently, both JSON files needed to run with ONNX Runtime GenAI are created by hand. Because the fields have been hand-crafted, it is recommended that you copy the already-uploaded JSON files and modify the fields as needed for your fine-tuned Gemma-3 vision model. [Here](https://huggingface.co/onnxruntime/Gemma-3-ONNX/blob/main/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/genai_config.json) is an example for `genai_config.json` and [here](https://huggingface.co/onnxruntime/Gemma-3-ONNX/blob/main/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/processor_config.json) is an example for `processor_config.json`.
+Currently, both JSON files needed to run with ONNX Runtime GenAI are created by hand. Because the fields have been hand-crafted, it is recommended that you copy the already-uploaded JSON files and modify the fields as needed for your fine-tuned Gemma-3 vision model. [Here](https://huggingface.co/onnxruntime/Gemma-3-ONNX/blob/main/gemma-3-4b-it/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/genai_config.json) is an example for `genai_config.json` and [here](https://huggingface.co/onnxruntime/Gemma-3-ONNX/blob/main/gemma-3-4b-it/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/processor_config.json) is an example for `processor_config.json`.
 
 ## 4. Run Gemma-3 vision ONNX models