microsoft · edgchen1 · Apr 7, 2026 · Mar 26, 2026 · Mar 26, 2026 · Mar 26, 2026
diff --git a/.agents/skills/ort-build/SKILL.md b/.agents/skills/ort-build/SKILL.md
@@ -0,0 +1,85 @@
+---
+name: ort-build
+description: Build ONNX Runtime from source. Use this skill when asked to build, compile, or generate CMake files for ONNX Runtime.
+---
+
+# Building ONNX Runtime
+
+The build scripts `build.sh` (Linux/macOS) and `build.bat` (Windows) delegate to `tools/ci_build/build.py`.
+
+## Build phases
+
+Three phases, controlled by flags:
+
+- `--update` — generate CMake build files
+- `--build` — compile (add `--parallel` to speed this up)
+- `--test` — run tests
+
+For native builds, if none are specified (and `--skip_tests` is not passed), **all three run by default**. For cross-compiled builds, the default is `--update` + `--build` only.
+
+### When to use `--update`
+
+You need `--update` when:
+- First build in a new build directory
+- New source files are added (some CMake targets use glob patterns, others use explicit file lists — re-run to pick up new files either way)
+- CMake configuration changes (new flags, updated CMakeLists.txt)
+
+You do **not** need `--update` when only modifying existing `.cc`/`.h` files — just use `--build`. Skipping it saves time.
+
+## Examples
+
+```bash
+# Full build (update + build + test)
+./build.sh --config Release --parallel
+.\build.bat --config Release --parallel     # Windows
+
+# Just regenerate CMake files
+./build.sh --config Release --update
+
+# Just compile (skip CMake regeneration and tests)
+./build.sh --config Release --build --parallel
+
+# Just run tests (after a prior build)
+./build.sh --config Release --test
+
+# Build with CUDA execution provider
+./build.sh --config Release --parallel --use_cuda --cuda_home /usr/local/cuda --cudnn_home /usr/local/cuda
+
+# Build Python wheel
+./build.sh --config Release --parallel --build_wheel
+
+# Build a specific CMake target (much faster than a full build)
+./build.sh --config Release --build --parallel --target onnxruntime_common
+
+# Load flags from an option file (one flag per line)
+./build.sh "@./custom_options.opt" --build --parallel
+```
+
+## Key flags
+
+| Flag | Description |
+|------|-------------|
+| `--config` | `Debug`, `MinSizeRel`, `Release`, or `RelWithDebInfo` |
+| `--parallel` | Enable parallel compilation (recommended) |
+| `--skip_tests` | Skip running tests after build |
+| `--build_wheel` | Build the Python wheel package |
+| `--use_cuda` | Enable CUDA EP. Requires `--cuda_home`/`--cudnn_home` or `CUDA_HOME`/`CUDNN_HOME` env vars. On Windows, only `cuda_home`/`CUDA_HOME` is validated. |
+| `--target T` | Build a specific CMake target (requires `--build`; e.g., `onnxruntime_common`, `onnxruntime_test_all`) |
+| `--build_dir` | Build output directory |
+
+## Build output path
+
+Default: `build/<Platform>/<Config>/` where Platform is `Linux`, `MacOS`, or `Windows`.
+
+With Visual Studio multi-config generators, the config name appears twice (e.g., `build/Windows/Release/Release/`).
+
+It may be customized with `--build_dir`.
+
+## Agent tips
+
+- **Activate a Python virtual environment** before building. See "Python > Virtual environment" in `AGENTS.md`.
+- **Prefer `python tools/ci_build/build.py` directly** over `build.bat`/`build.sh` when redirecting output. The `.bat` wrapper runs in `cmd.exe`, which breaks PowerShell redirection.
+- **Redirect output to a file** (e.g., `> build_log.txt 2>&1`). Build output is large and will overflow terminal buffers.
+- **Run builds in the background** — a full build can take tens of minutes to over an hour. Poll the log for `"Build complete"` or errors.
+- **Use `--parallel`** by default unless the user says otherwise.
+- Ask the user what they want to build (config, execution providers, wheel, etc.) if not clear from their prompt.
diff --git a/.agents/skills/ort-lint/SKILL.md b/.agents/skills/ort-lint/SKILL.md
@@ -0,0 +1,43 @@
+---
+name: ort-lint
+description: Lint and format ONNX Runtime code. Use this skill when asked to lint, format, or check code style for C++ or Python files in ONNX Runtime.
+---
+
+# Linting and Formatting ONNX Runtime Code
+
+ONNX Runtime uses [lintrunner](https://github.com/suo/lintrunner) for both C++ (clang-format) and Python (ruff).
+
+## Setup (one-time)
+
+```bash
+pip install -r requirements-lintrunner.txt
+lintrunner init
+```
+
+## Commands
+
+```bash
+lintrunner -a                                        # auto-fix changed files
+lintrunner -a --all-files                            # auto-fix all files
+lintrunner -a path/to/file.py path/to/other_file.cc  # auto-fix specific files
+lintrunner f --all-files                             # format Python files only
+lintrunner                                           # check without fixing (dry run)
+```
+
+## Style rules
+
+### C++
+- Google C++ Style with modifications (see `docs/Coding_Conventions_and_Standards.md` for full details)
+- Max line length: 120 characters, but **aim for 80** when possible
+- Configured in `.clang-format` and `.clang-tidy`
+
+### Python
+- Google Python Style Guide (extension of PEP 8)
+- Max line length: 120 characters
+- Configured in `pyproject.toml`
+
+## Agent tips
+
+- **Activate a Python virtual environment** before installing dependencies. See "Python > Virtual environment" in `AGENTS.md`.
+- If lintrunner is not yet set up, install and initialize first (see [Setup](#setup-one-time)).
+- Prefer `lintrunner -a` (changed files only) over `--all-files` unless the user asks for a full sweep.
diff --git a/.agents/skills/ort-test/SKILL.md b/.agents/skills/ort-test/SKILL.md
@@ -0,0 +1,81 @@
+---
+name: ort-test
+description: Run ONNX Runtime tests. Use this skill when asked to run tests, debug test failures, or find and execute specific test cases in ONNX Runtime.
+---
+
+# Running ONNX Runtime Tests
+
+ONNX Runtime uses **Google Test** for C++ and **unittest** (preferred) / **pytest** for Python.
+
+## C++ tests
+
+### Test executables
+
+| Executable | What it tests |
+|---|---|
+| `onnxruntime_test_all` | Core framework, graph, optimizer, session tests |
+| `onnxruntime_provider_test` | Operator/kernel tests (Conv, MatMul, etc.) across execution providers |
+
+Use `--gtest_filter` to select specific tests:
+
+```bash
+./onnxruntime_provider_test --gtest_filter="*Conv3D*"
+```
+
+### Running tests
+
+**Always run from the build output directory** — tests may fail to find dependencies otherwise.
+
+```bash
+# Linux
+cd build/Linux/Release
+./onnxruntime_provider_test --gtest_filter="*TestName*"
+
+# macOS
+cd build/MacOS/Release
+./onnxruntime_provider_test --gtest_filter="*TestName*"
+
+# Windows
+cd build\Windows\Release
+.\onnxruntime_provider_test.exe --gtest_filter="*TestName*"
+```
+
+You can also run all tests via the build script (assumes a prior successful build):
+
+```bash
+./build.sh --config Release --test
+.\build.bat --config Release --test    # Windows
+```
+
+### Locating the build output directory
+
+The default path follows the pattern `build/<Platform>/<Config>/` where Platform is `Linux`, `MacOS`, or `Windows`. With Visual Studio multi-config generators on Windows, the config may appear twice (e.g., `build/Windows/Release/Release/`). The path can also be customized via `--build_dir`.
+
+If you can't find a test binary, search for it:
+
+```powershell
+# Windows
+Get-ChildItem -Path build -Recurse -Filter "onnxruntime_provider_test.exe" | Select-Object -ExpandProperty FullName
+
+# Linux/macOS
+find build -name "onnxruntime_provider_test" -type f
+```
+
+## Python tests
+
+Use `pytest` as the test runner:
+
+```bash
+pytest onnxruntime/test/python/test_specific.py                          # entire file
+pytest onnxruntime/test/python/test_specific.py::TestClass::test_method  # specific test
+pytest -k "test_keyword" onnxruntime/test/python/                        # by keyword
+```
+
+Python test naming convention: `test_<method>_<expected_behavior>_[when_<condition>]`
+
+## Agent tips
+
+- **Activate a Python virtual environment** before running tests. See "Python > Virtual environment" in `AGENTS.md`.
+- **Redirect test output to a file** (e.g., `> test_output.txt 2>&1`) — output can be large.
+- For C++ tests, verify the build directory exists and a prior build completed before running.
+- Use `--gtest_filter` to run a targeted subset when the full suite takes too long.
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,112 @@
+# Agent Instructions for ONNX Runtime
+
+## Build, Test, and Lint
+
+See the `/ort-build`, `/ort-test`, and `/ort-lint` skills (in `.agents/skills/`) for detailed instructions.
+
+## Architecture Overview
+
+ONNX Runtime is a cross-platform inference and training engine for ONNX models. The core pipeline is: **Load model → Build graph → Optimize graph → Partition across Execution Providers → Execute**.
+
+### Key layers (`onnxruntime/core/`)
+
+- **`graph/`** — ONNX model/graph IR. `Model` wraps a `Graph` of `Node`s. `GraphViewer` provides read-only traversal.
+- **`optimizer/`** — Graph transformations (fusion, elimination, constant folding, layout transforms). Organized by optimization level (Level1–Level4).
+- **`framework/`** — Execution machinery: `OpKernel`, `Tensor`, `KernelRegistry`, allocators, executors.
+- **`session/`** — `InferenceSession`: `Load()` → `Initialize()` (optimize + assign kernels) → `Run()`.
+- **`providers/`** — Execution Provider (EP) implementations. Each EP implements `IExecutionProvider`. CPU EP is the default fallback. 20+ EPs exist (CUDA, TensorRT, DirectML, CoreML, OpenVINO, WebGPU, QNN, etc.).
+- **`common/`** — Utilities, status/error types, logging, threading.
+- **`platform/`** — OS abstraction (file I/O, threading).
+
+### Contrib ops (`onnxruntime/contrib_ops/`)
+
+Custom operators not in the ONNX standard, organized by EP (`cpu/`, `cuda/`, `js/`, `webgpu/`). Each EP has its own contrib kernel registration file (e.g., `cpu_contrib_kernels.cc`, `cuda_contrib_kernels.cc`, `js_contrib_kernels.cc`, `webgpu_contrib_kernels.cc`).
+
+### Training (`orttraining/`)
+
+Training-specific code (gradient ops, loss functions, optimizers, `TrainingSession`) layered on top of the inference framework.
+
+### Language bindings
+
+`csharp/`, `java/`, `js/`, `objectivec/`, `rust/` — each wraps the C API (`include/onnxruntime/core/session/onnxruntime_c_api.h`).
+
+## C++ Conventions
+
+**Style**: Google C++ Style with modifications. Max line length 120 (aim for 80). See `docs/Coding_Conventions_and_Standards.md` for full details.
+
+### Error handling
+
+Functions that can fail return `onnxruntime::common::Status`. Key macros from `core/common/common.h`:
+
+- `ORT_RETURN_IF_ERROR(expr)` — early-return if `expr` returns non-OK Status
+- `ORT_THROW_IF_ERROR(expr)` — throw if `expr` returns non-OK Status
+- `ORT_RETURN_IF(cond, ...)` / `ORT_RETURN_IF_NOT(cond, ...)` — conditional early-return with message
+- `ORT_ENFORCE(cond, ...)` — assert-like; throws `OnnxRuntimeException` on failure
+- `ORT_MAKE_STATUS(category, code, ...)` — construct a Status object
+
+Exceptions may be disabled in a build, in which case, the throwing macros will call `abort()` instead.
+
+At the C API boundary, use `API_IMPL_BEGIN` / `API_IMPL_END` to catch exceptions — C++ exceptions must never cross the C API boundary.
+
+### Container types
+
+Use these instead of `std::vector` / `std::unordered_map`:
+
+- `InlinedVector<T>` — small-buffer-optimized vector (64 bytes inline)
+- `InlinedHashSet<T>`, `InlinedHashMap<K,V>` — flat hash containers
+- `NodeHashSet<T>`, `NodeHashMap<K,V>` — when pointer stability is needed
+- `TensorShapeVector` — for shape dimensions
+
+Use `reserve()` not `resize()`. Do not use `absl::` directly — use the ORT typedefs.
+
+### Other conventions
+
+- `#pragma once` for header guards
+- `ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE` for new classes until copy/move is proven necessary
+- Prefer `gsl::span<const T>` over `const std::vector<T>&` for input parameters
+- Prefer `std::string_view` by value over `const std::string&`
+- `SafeInt<size_t>` (from `core/common/safeint.h`) for memory size arithmetic
+- Don't use `else` after `return`
+- Avoid `long` (ambiguous width) — use `int64_t` for dimensions, `size_t` for counts
+- `using namespace` allowed in limited scope but never at global scope in headers
+- `std::make_unique()` for heap allocations; prefer `std::optional` over `unique_ptr` for optional/delayed construction
+
+## Python
+
+### Virtual environment
+
+Build and test processes may install Python packages. Create and activate an isolated virtual environment first:
+
+```bash
+python -m venv .venv                  # one-time setup
+source .venv/bin/activate             # Linux/macOS
+.\.venv\Scripts\Activate.ps1          # Windows (PowerShell)
+```
+
+If a virtual environment already exists (e.g., `.venv/`), activate it rather than creating a new one.
+
+### Conventions
+
+- Follow [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html) (extension of PEP 8)
+- Max line length: 120 characters
+- Formatter: ruff (configured in `pyproject.toml`)
+- Static type checking: pyright/pylance
+- Test framework: `unittest` (preferred) with `pytest` as runner
+
+## C API Conventions
+
+The main public C API header is `include/onnxruntime/core/session/onnxruntime_c_api.h`. Other public headers are in `include/onnxruntime/core/session/` and `orttraining/orttraining/training_api/include/`.
+
+- Functions that may fail return `OrtStatus*` (`nullptr` on success); release/cleanup functions return `void`
+- Object lifecycle: `OrtCreateXxx` / `OrtReleaseXxx`
+- All strings are UTF-8 encoded
+- Use `int64_t` for dimensions, `size_t` for counts and memory sizes
+- APIs requiring allocation take an `OrtAllocator*` parameter
+- Failed calls must not modify out-parameters
+
+## PR Guidelines
+
+- Keep PRs small (aim for ≤10 files; separate cosmetic changes from functional ones)
+- All changes must have unit tests, unless documentation-only or already adequately covered
+- Build and test locally on at least one platform before submitting
+- PR author is responsible for merging after approval