diff --git a/documents/DownloadModels.md b/doc/DownloadModels.md similarity index 100% rename from documents/DownloadModels.md rename to doc/DownloadModels.md diff --git a/doc/RunGPTOSSWebGPU.md b/doc/RunGPTOSSWebGPU.md new file mode 100644 index 0000000000..e51ba8a6d2 --- /dev/null +++ b/doc/RunGPTOSSWebGPU.md @@ -0,0 +1,46 @@ +# Run GPT OSS 20B with Web GPU (preview) + +These instructions allow you to experiment with GPT OSS 20B on any GPU. + +Note: once the model is officially supported by Foundry Local that will be the easiest way to run it. + +## Model + +Download the model + +```bash +https://huggingface.co/onnx-community/gpt-oss-20b-ONNX +``` + +Copy the files in the onnx folder into the parent folder + +Change the name of model_q4f16.onnx into model.onnx (as referenced in genai_config.json) + +## Runtime + +```bash +pip install onnxruntime-webgpu +pip install onnxruntime-genai --no-deps +``` + +## Example script + +```bash +curl -O https://raw.githubusercontent.com/microsoft/onnxruntime-genai/refs/heads/main/examples/python/model-qa.py +``` + +## Run the model + +```bash +python model-qa.py -m gpt-oss-20b-ONNX +``` + +Note that the reasoning tokens are not processed by this script and appear in the output. + +```bash +Prompt (Use quit() to exit): Hello + +Output: <|channel|>analysis<|message|>We need to respond to "Hello". Simple greeting. Probably a short hello back and ask how can help.<|end|><|start|>assistant<|channel|>final<|message|>Hello! 👋 How can I help you today? + +Prompt (Use quit() to exit): +``` diff --git a/documents/Runtime_option.md b/doc/Runtime_option.md similarity index 100% rename from documents/Runtime_option.md rename to doc/Runtime_option.md