|
| 1 | +<!-- AGENTS.md: Guidance for AI agents to navigate, build, test, and contribute to this repository --> |
| 2 | +# AGENTS |
| 3 | + |
| 4 | +This file provides instructions for AI agents to understand the layout of the `mistral.rs` repository, run builds/tests, and follow project conventions. |
| 5 | + |
| 6 | +## Repository Structure |
| 7 | + |
| 8 | +- `/mistralrs/` : Main Rust crate (text & multimodal inference API) |
| 9 | +- `/mistralrs-core/` : Core inference logic and tensor operations (text models) |
| 10 | +- `/mistralrs-vision/` : Vision inference support (image-based inputs & vision-enabled models) |
| 11 | +- `/mistralrs-quant/` : Quantization support (ISQ, GGUF, GPTQ, AWQ, FP8, HQQ, etc.) |
| 12 | +- `/mistralrs-paged-attn/`: PagedAttention implementation |
| 13 | +- `/mistralrs-pyo3/` : Python bindings (PyO3) |
| 14 | +- `/mistralrs-server/` : CLI & OpenAI-compatible HTTP server (subcommands: run/vision-plain, diffusion, speech) |
| 15 | +- `/mistralrs-server-core/`: Shared server core logic |
| 16 | +- `/mistralrs-web-chat/` : Web chat application (static assets & backend integration) |
| 17 | +- `/mistralrs-bench/` : Benchmarking tools |
| 18 | +- `/docs/` : Markdown documentation for models, features, and guides |
| 19 | +- `/examples/` : Usage examples (Rust, Python, server samples, notebooks) |
| 20 | +- `/chat_templates/` : Chat formatting templates (JSON/Jinja) |
| 21 | +- `/scripts/` : Utility scripts (e.g., AWQ conversion) |
| 22 | + |
| 23 | +## Feature Organization |
| 24 | + |
| 25 | +Mistral.rs supports multiple model types and advanced features via dedicated crates and CLI subcommands: |
| 26 | + |
| 27 | +- **Text Inference** |
| 28 | + - Crate: `mistralrs-core` (low-level ops), `mistralrs` (API wrapper) |
| 29 | + - CLI: `run` / `plain` subcommand in `mistralrs-server` |
| 30 | + - Docs: `docs/SAMPLING.md`, `docs/TOOL_CALLING.md` |
| 31 | +- **Vision Models** |
| 32 | + - Crate: `mistralrs-vision` |
| 33 | + - CLI: `vision-plain` subcommand |
| 34 | + - Docs: `docs/VISION_MODELS.md`, `docs/IMAGEGEN_MODELS.md`, `docs/IMATRIX.md` |
| 35 | +- **Diffusion Models** |
| 36 | + - CLI: `diffusion` subcommand |
| 37 | + - Docs: `docs/FLUX.md` |
| 38 | +- **Speech Models** |
| 39 | + - CLI: `speech` subcommand |
| 40 | + - Docs: `docs/DIA.md` |
| 41 | +- **Quantization & ISQ** |
| 42 | + - Crate: `mistralrs-quant` |
| 43 | + - Docs: `docs/QUANTS.md`, `docs/ISQ.md` |
| 44 | + - Conversion Script: `scripts/convert_awq_marlin.py` |
| 45 | +- **Paged Attention** |
| 46 | + - Crate: `mistralrs-paged-attn` |
| 47 | + - Docs: `docs/PAGED_ATTENTION.md` |
| 48 | +- **Adapters & LoRA/X-LoRA** |
| 49 | + - Docs: `docs/ADAPTER_MODELS.md`, `docs/LORA_XLORA.md` |
| 50 | +- **Mixture of Experts (AnyMoE)** |
| 51 | + - Docs: `docs/ANYMOE.md` |
| 52 | + |
| 53 | +## Building |
| 54 | + |
| 55 | +1. Install Rust via rustup (Rust 2021 edition). |
| 56 | +2. Choose optional features (e.g., `cuda`, `flash-attn`, `cudnn`, `metal`, `mkl`, `accelerate`). |
| 57 | +3. Build the entire workspace: |
| 58 | + ```bash |
| 59 | + cargo build --workspace --release --features "<features>" |
| 60 | + ``` |
| 61 | +4. Or build/install only the server binary: |
| 62 | + ```bash |
| 63 | + cargo build --release --package mistralrs-server --features "<features>" |
| 64 | + cargo install --path mistralrs-server --features "<features>" |
| 65 | + ``` |
| 66 | + |
| 67 | +## Models |
| 68 | + |
| 69 | +When integrating a new model, make sure it respects all of the varbuilder `.pp` calls. In Candle, a VarBuilder maintains an internal path vector that acts like a “current working directory” for model weights; every call to pp("sub") (alias for push_prefix) clones the builder and appends sub, so successive calls accumulate a dotted prefix such as transformer.h.0 while leaving the original builder untouched . When you eventually call get(...), Candle joins that prefix with the tensor name (prefix + "." + name) and looks it up in the checkpoint backend, producing keys that exactly match the dot-separated names emitted by PyTorch’s state_dict/named_parameters, which means PyTorch-trained weights can be loaded without any renaming . This lets you recreate the PyTorch module tree in Rust by “walking” it: e.g. vb.pp("word_embeddings") grabs word_embeddings.*, while a chain like vb.pp("encoder").pp("layers").pp(i.to_string()) targets keys such as encoder.layers.0.*, exactly as shown in community tutorials porting Transformers models to Candle . As one maintainer put it, the prefix system lets you “cd” around the parameter hierarchy, giving a lightweight namespace mechanism that keeps Candle fully compatible with PyTorch naming conventions while remaining ergonomic to use. |
| 70 | + |
| 71 | +You should also look for a model.safetensors.index.json file for the model at hand to verify correct structure. |
| 72 | + |
| 73 | +## Testing |
| 74 | + |
| 75 | +- Core test suite (requires HF token for some tests): |
| 76 | + ```bash |
| 77 | + export HF_TOKEN=<your_token> # or TESTS_HF_TOKEN for CI parity |
| 78 | + cargo test -p mistralrs-core -p mistralrs-quant -p mistralrs-vision |
| 79 | + ``` |
| 80 | +- Run all tests across workspace (may skip some crates without tests): |
| 81 | + ```bash |
| 82 | + cargo test --workspace |
| 83 | + ``` |
| 84 | + |
| 85 | +You should *always* run `cargo check`/`cargo c` before returning to make sure code compiles. If code does not compile, only make edits. |
| 86 | + |
| 87 | +Avoid returning TODOs. |
| 88 | + |
| 89 | +## Formatting & Linting |
| 90 | + |
| 91 | +- Format all Rust code: |
| 92 | + ```bash |
| 93 | + cargo fmt --all |
| 94 | + make fmt # also formats Python/CUDA/C++ files via ruff, clang-format |
| 95 | + ``` |
| 96 | +- Lint with Clippy: |
| 97 | + ```bash |
| 98 | + cargo clippy --workspace --tests --examples -- -D warnings |
| 99 | + ``` |
| 100 | + |
| 101 | +## Documentation |
| 102 | + |
| 103 | +- Generate Rust docs for all crates: |
| 104 | + ```bash |
| 105 | + cargo doc --workspace |
| 106 | + ``` |
| 107 | +- Preview at `target/doc/` or publish to GitHub Pages as configured. |
| 108 | +- Refer to `/docs/` for in-depth markdown guides (e.g., DEVICE_MAPPING.md, TOOL_CALLING.md). |
| 109 | + |
| 110 | +## Examples |
| 111 | + |
| 112 | +- Rust examples: `mistralrs/examples/` |
| 113 | +- Python examples: `examples/python/` |
| 114 | +- Server samples: `examples/server/` |
| 115 | +- Run Python scripts: |
| 116 | + ```bash |
| 117 | + python3 examples/python/<script>.py |
| 118 | + ``` |
| 119 | +- Run server/CLI: |
| 120 | + ```bash |
| 121 | + ./target/release/mistralrs-server -i <mode> -m <model> [options] |
| 122 | + ``` |
| 123 | + |
| 124 | +## CI Parity |
| 125 | + |
| 126 | +The CI pipeline is defined in `.github/workflows/ci.yml` and includes: |
| 127 | + - `cargo check` for all targets |
| 128 | + - `cargo test` on core crates |
| 129 | + - `cargo fmt -- --check` |
| 130 | + - `cargo clippy -D warnings` |
| 131 | + - `cargo doc` |
| 132 | + - Typos check (`crate-ci/typos`) |
| 133 | + |
| 134 | +## Contribution Conventions |
| 135 | + |
| 136 | +- Follow Rust 2021 idioms, keep code minimal and focused. |
| 137 | +- Update `/docs/` and examples when adding features or breaking changes. |
| 138 | +- Add tests and examples for new functionality. |
| 139 | +- Commit messages should be clear and follow conventional style where possible. |
| 140 | + ``` |
| 141 | + feat(crate): describe new feature |
| 142 | + fix(crate): describe bug fix |
| 143 | + docs: update docs for ... |
| 144 | + ``` |
| 145 | + |
| 146 | +--- |
| 147 | +*This AGENTS.md file is intended solely to improve AI-driven assistance and does not affect runtime behavior.* |
0 commit comments