microsoft · edgchen1 · Apr 7, 2026 · Mar 26, 2026 · Mar 26, 2026 · Mar 26, 2026
diff --git a/.agents/skills/ort-build/SKILL.md b/.agents/skills/ort-build/SKILL.md
@@ -0,0 +1,93 @@
+---
+name: ort-build
+description: Build ONNX Runtime from source. Use this skill when asked to build, compile, or generate CMake files for ONNX Runtime.
+---
+
+# Building ONNX Runtime
+
+The main build scripts are `build.sh` (Linux/macOS) and `build.bat` (Windows). They delegate to `tools/ci_build/build.py`.
+
+## Build phases
+
+There are three phases, controlled by flags:
+
+- `--update` — generate CMake build files
+- `--build` — compile (add `--parallel` to speed this up)
+- `--test` — run tests
+
+For native builds, if none of `--update`, `--build`, or `--test` are specified (and `--skip_tests` is not passed), **all three run by default**.
+
+For cross-compiled builds, the default is `--update` + `--build` only; `--test` must be specified explicitly.
+
+## Platform detection
+
+- On **Windows**, use `.\build.bat`
+- On **Linux/macOS**, use `./build.sh`
+
+Detect the platform and use the correct script automatically.
+
+## Common build commands
+
+```bash
+# Full build (update + build + test)
+./build.sh --config Release --parallel
+# Windows equivalent
+.\build.bat --config Release --parallel
+
+# Just regenerate CMake files
+./build.sh --config Release --update
+
+# Just compile (skip CMake regeneration and tests)
+./build.sh --config Release --build --parallel
+
+# Just run tests (after a prior build)
+./build.sh --config Release --test
+
+# Build with CUDA execution provider
+./build.sh --config Release --parallel --use_cuda --cuda_home /usr/local/cuda --cudnn_home /usr/local/cuda
+
+# Build Python wheel
+./build.sh --config Release --parallel --build_wheel
+
+# Build only a specific CMake target (much faster than a full build)
+./build.sh --config Release --build --parallel --target onnxruntime_common
+```
+
+## Key flags
+
+| Flag | Description |
+|------|-------------|
+| `--config` | Build configuration: `Debug`, `MinSizeRel`, `Release`, `RelWithDebInfo` |
+| `--parallel` | Enable parallel compilation (recommended) |
+| `--skip_tests` | Skip running tests after build |
+| `--build_wheel` | Build the Python wheel package |
+| `--use_cuda` | Enable CUDA execution provider (requires `--cuda_home` and `--cudnn_home`) |
+| `--use_tensorrt` | Enable TensorRT execution provider |
+| `--use_dml` | Enable DirectML execution provider (Windows) |
+| `--use_openvino` | Enable OpenVINO execution provider |
+| `--target TARGET` | Build a specific CMake target (e.g., `onnxruntime_common`). Can be repeated. |
+| `--targets T1 T2 ...` | Build one or more specific CMake targets in a single flag. |
+| `--build_dir` | Specify build output directory (default: `build/`) |
+
+## Build duration
+
+A full ONNX Runtime build can take a **long time** (tens of minutes to over an hour depending on hardware and configuration). If you have other work to do while the build runs, consider redirecting output to a file and running the build in the background so you can continue with other tasks:
+
+```bash
+# Run build in the background, redirecting output to a log file
+./build.sh --config Release --parallel > build.log 2>&1 &
+
+# Windows equivalent (PowerShell)
+Start-Process -NoNewWindow -FilePath .\build.bat -ArgumentList '--config','Release','--parallel' -RedirectStandardOutput build.log -RedirectStandardError build_err.log
+```
+
+When using the CLI agent's shell tools, prefer running the build with `mode="sync"` and a short `initial_wait` — the command will continue in the background and you'll be notified when it completes, freeing you to do other work in the meantime.
+
+## Workflow
+
+1. Ask the user what they want to build (config, execution providers, wheel, etc.) if not clear from their prompt.
+2. Detect the OS and choose `build.bat` or `build.sh`.
+3. Construct the build command with the appropriate flags.
+4. Run the build. Use `--parallel` by default unless the user says otherwise.
+5. Since the build may take a long time, continue with other tasks while waiting for it to complete.
+6. If the build fails, examine the error output and suggest fixes.
diff --git a/.agents/skills/ort-lint/SKILL.md b/.agents/skills/ort-lint/SKILL.md
@@ -0,0 +1,57 @@
+---
+name: ort-lint
+description: Lint and format ONNX Runtime code. Use this skill when asked to lint, format, or check code style for C++ or Python files in ONNX Runtime.
+---
+
+# Linting and Formatting ONNX Runtime Code
+
+ONNX Runtime uses [lintrunner](https://github.com/suo/lintrunner) to manage linting for both C++ (clang-format) and Python (ruff).
+
+## Setup (one-time)
+
+Install lintrunner and initialize it:
+
+```bash
+pip install -r requirements-lintrunner.txt
+lintrunner init
+```
+
+This downloads and configures the required linting tools (clang-format, ruff, etc.).
+
+## Common commands
+
+```bash
+# Auto-fix lint issues in changed files (staged + unstaged vs HEAD)
+lintrunner -a
+
+# Auto-fix lint issues in ALL files
+lintrunner -a --all-files
+
+# Format Python files only
+lintrunner f --all-files
+
+# Check without fixing (dry run)
+lintrunner
+
+# Lint specific files
+lintrunner -a path/to/file.py path/to/other_file.cc
+```
+
+## Style rules
+
+### C++
+- Google C++ Style with modifications
+- Max line length: 120 characters
+- Configured in `.clang-format` and `.clang-tidy`
+
+### Python
+- Google Python Style Guide (extension of PEP 8)
+- Max line length: 120 characters
+- Formatter/linter: ruff (configured in `pyproject.toml`)
+
+## Workflow
+
+1. If lintrunner is not yet set up, install dependencies and run `lintrunner init`.
+2. Determine scope: changed files only (`lintrunner -a`) or all files (`lintrunner -a --all-files`).
+3. Run the appropriate lint command.
+4. If there are unfixable issues, report them to the user with file locations and descriptions.
diff --git a/.agents/skills/ort-test/SKILL.md b/.agents/skills/ort-test/SKILL.md
@@ -0,0 +1,78 @@
+---
+name: ort-test
+description: Run ONNX Runtime tests. Use this skill when asked to run tests, debug test failures, or find and execute specific test cases in ONNX Runtime.
+---
+
+# Running ONNX Runtime Tests
+
+ONNX Runtime uses **Google Test** for C++ and **unittest** (preferred) / **pytest** for Python.
+
+## C++ tests
+
+### Run all tests via CTest
+
+```bash
+# Linux/macOS
+cd build/Linux/Release && ctest
+
+# Windows
+cd build\Windows\Release && ctest
+```
+
+The platform subdirectory under `build/` matches the OS (e.g., `Linux`, `Windows`, `Darwin`).
+
+### Run a specific C++ test binary
+
+The main test binary is `onnxruntime_test_all`. Use `--gtest_filter` to select tests:
+
+```bash
+# Run a specific test by name pattern
+./build/Linux/Release/onnxruntime_test_all --gtest_filter="*TestName*"
+
+# Windows
+.\build\Windows\Release\onnxruntime_test_all.exe --gtest_filter="*TestName*"
+```
+
+### Run via build script
+
+```bash
+# Run all tests through the build script (skips update and build)
+./build.sh --config Release --test
+.\build.bat --config Release --test
+```
+
+## Python tests
+
+Use `pytest` as the test runner:
+
+```bash
+# Run an entire test file
+pytest onnxruntime/test/python/test_specific.py
+
+# Run a specific test class or method
+pytest onnxruntime/test/python/test_specific.py::TestClass::test_method
+
+# Run with verbose output
+pytest -v onnxruntime/test/python/test_specific.py
+
+# Run tests matching a keyword
+pytest -k "test_keyword" onnxruntime/test/python/
+```
+
+## Test naming convention
+
+Python tests follow this pattern:
+
+```
+test_<method>_<expected_behavior>_[when_<condition>]
+```
+
+Example: `test_method_x_raises_error_when_dims_is_not_a_sequence`
+
+## Workflow
+
+1. Determine what the user wants to test (all tests, specific C++ test, specific Python test, etc.).
+2. Detect the platform to construct the correct paths.
+3. For C++ tests, check that the build directory exists and a prior build has been completed.
+4. Construct and run the appropriate test command.
+5. If tests fail, analyze the output and help debug the failures.
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,119 @@
+# Agent Instructions for ONNX Runtime
+
+## Build, Test, and Lint
+
+See the `/ort-build`, `/ort-test`, and `/ort-lint` skills (in `.agents/skills/`) for detailed build, test, and lint instructions.
+
+## Architecture Overview
+
+ONNX Runtime is a cross-platform inference and training engine for ONNX models. The core pipeline is: **Load model → Build graph → Optimize graph → Partition across Execution Providers → Execute**.
+
+### Key layers (`onnxruntime/core/`)
+
+- **`graph/`** — ONNX model/graph IR. `Model` wraps a `Graph` of `Node`s connected by edges. `GraphViewer` provides read-only traversal.
+- **`optimizer/`** — Graph transformation passes (fusion, elimination, constant folding, layout transformation). Transformers implement `GraphTransformer::ApplyImpl()` and are organized by optimization level (Level1–Level4).
+- **`framework/`** — Execution machinery: `OpKernel` (operator implementations), `Tensor`, `KernelRegistry`, allocators, executors.
+- **`session/`** — `InferenceSession` is the main runtime class. Flow: `Load()` → `Initialize()` (optimize + assign kernels) → `Run()`.
+- **`providers/`** — Execution Provider (EP) implementations. Each EP implements `IExecutionProvider` to declare which ops it can run and how to allocate device memory. CPU EP is the default fallback. 20+ EPs exist (CUDA, TensorRT, DirectML, CoreML, OpenVINO, WebGPU, QNN, etc.).
+- **`common/`** — Utilities, status/error types, logging, threading.
+- **`platform/`** — OS abstraction (file I/O, threading).
+
+### Contrib ops (`onnxruntime/contrib_ops/`)
+
+Custom operators not in the ONNX standard, organized by EP (`cpu/`, `cuda/`, `js/`, `webgpu/`). Registration is in `cpu_contrib_kernels.cc` / `cuda_contrib_kernels.cc`.
+
+### Training (`orttraining/`)
+
+Training-specific code (gradient ops, loss functions, optimizers, `TrainingSession`) layered on top of the inference framework. The training code is not being actively developed.
+
+### Language bindings
+
+`csharp/`, `java/`, `js/`, `objectivec/`, `rust/` — each wraps the C API (`include/onnxruntime/core/session/onnxruntime_c_api.h`).
+
+## C++ Conventions
+
+**Style**: Google C++ Style with modifications. Max line length 120. Configured in `.clang-format` and `.clang-tidy`.
+
+### Error handling
+
+Functions that can fail return `onnxruntime::common::Status`. Use these macros from `core/common/common.h`:
+
+- `ORT_RETURN_IF_ERROR(expr)` — early-return if `expr` returns non-OK Status
+- `ORT_THROW_IF_ERROR(expr)` — throw if `expr` returns non-OK Status
+- `ORT_RETURN_IF(condition, ...)` / `ORT_RETURN_IF_NOT(condition, ...)` — conditional early-return with message
+- `ORT_ENFORCE(condition, ...)` — assert-like; throws `OnnxRuntimeException` on failure
+- `ORT_MAKE_STATUS(category, code, ...)` — construct a Status object
+
+Exceptions may be disabled in a build, in which case, the throwing macros will call `abort()` instead.
+
+In the C API boundary, use `API_IMPL_BEGIN` / `API_IMPL_END` to catch exceptions—C++ exceptions must never cross the C API boundary.
+
+### Container types (minimize allocations)
+
+Required over `std::vector` / `std::unordered_map`:
+
+- `InlinedVector<T>` — small-buffer-optimized vector (64 bytes inline). From `core/common/inlined_containers_fwd.h`.
+- `InlinedHashSet<T>`, `InlinedHashMap<K,V>` — flat hash containers. From `core/common/inlined_containers.h`.
+- `NodeHashSet<T>`, `NodeHashMap<K,V>` — when pointer stability is needed.
+- `TensorShapeVector` — for shape dimensions. From `core/framework/tensor_shape.h`.
+
+Use `reserve()` not `resize()`. Do not use `absl::` directly—use the ORT typedefs.
+
+### Other key conventions
+
+- Use `#pragma once` for header guards.
+- Use `ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE` for new classes until copy/move is proven necessary.
+- Prefer `gsl::span<const T>` over `const std::vector<T>&` for input parameters.
+- Prefer `std::string_view` by value over `const std::string&`.
+- Use `SafeInt<size_t>` (from `core/common/safeint.h`) for memory size arithmetic to prevent overflow.
+- Don't use `else` after `return`.
+- Avoid the `long` type (ambiguous width). Use `int64_t` for dimensions, `size_t` for counts.
+- `using namespace` is allowed in limited scope but never at global scope in headers.
+- Use `std::make_unique()` for heap allocations; prefer `std::optional` over `unique_ptr` for optional/delayed construction.
+
+## Python Environment
+
+The build and test processes may install Python dependencies. To avoid modifying the system or user Python environment, create and activate an isolated virtual environment before building or testing:
+
+```bash
+# Create a virtual environment (one-time setup)
+python -m venv .venv
+
+# Activate it
+# Linux/macOS
+source .venv/bin/activate
+# Windows (PowerShell)
+.\.venv\Scripts\Activate.ps1
+```
+
+If a virtual environment already exists (e.g., `.venv/`), activate it rather than creating a new one.
+
+## Python Conventions
+
+- Follow [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html) (extension of PEP 8).
+- Max line length: 120 characters.
+- Formatter: ruff (configured in `pyproject.toml`).
+- Static type checking: pyright/pylance.
+- Test framework: `unittest` (preferred) with `pytest` as runner.
+
+## C API Conventions
+
+The main public C API header is in `include/onnxruntime/core/session/onnxruntime_c_api.h`:
+
+- Functions that may fail return `OrtStatus*` (`nullptr` on success); release/cleanup functions (e.g., `OrtReleaseXxx`) return `void`.
+- Object lifecycle: `OrtCreateXxx` / `OrtReleaseXxx`.
+- All strings are UTF-8 encoded.
+- Use `int64_t` for dimensions, `size_t` for counts and memory sizes.
+- APIs requiring allocation take an `OrtAllocator*` parameter.
+- Failed calls must not modify out-parameters.
+
+Other public C/C++ API headers are named like `onnxruntime_*.h` and located in one of these directories:
+- `include/onnxruntime/core/session/`
+- `orttraining/orttraining/training_api/include/`
+
+## PR Guidelines
+
+- Keep PRs small (aim for ≤10 files; separate cosmetic changes from functional ones).
+- All changes must have unit tests, unless they are documentation-only or already adequately covered by existing unit tests.
+- Build and test locally on at least one platform before submitting.
+- PR author is responsible for merging after approval.