Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@
"*megalinter_file_names_cspell.txt",
"**/.terraform/**",
"**/.terraform.lock.hcl",
"**/shared/ci/tests/Fixtures/**"
"**/shared/ci/tests/Fixtures/**",
"**/TERRAFORM.md"
],
"dictionaryDefinitions": [
{
Expand Down
1 change: 1 addition & 0 deletions .cspell/general-technical.txt
Original file line number Diff line number Diff line change
Expand Up @@ -676,6 +676,7 @@ lakehouses
lalogs
lan
lanczos
lastexitcode
lavfi
ldap
leaderboard
Expand Down
3 changes: 2 additions & 1 deletion .markdownlint-cli2.jsonc
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@
"**/.venv/**",
"external/**",
"shared/ci/tests/Fixtures/**",
"logs/**"
"logs/**",
"**/TERRAFORM.md"
],
"config": {
"default": true,
Expand Down
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,14 +82,14 @@ The setup script installs Python 3.11 via [uv](https://docs.astral.sh/uv/), crea

Full documentation is available in the [docs/](docs/README.md) directory.

| Guide | Description |
|---------------------------------------------------|-----------------------------------------------------------------|
| [Getting Started](docs/getting-started/README.md) | Prerequisites, quickstart, and first training job |
| [Deployment](docs/infrastructure/README.md) | Infrastructure provisioning and setup |
| [Training](docs/training/README.md) | RL and IL training workflows, MLflow, and checkpointing |
| [Security](docs/security/README.md) | Threat model, security guide, deployment responsibilities |
| [Recipes](docs/recipes/README.md) | Guides that take you from a standing start to a working result |
| [Contributing](docs/contributing/README.md) | Architecture, style guides, contribution workflow |
| Guide | Description |
|---------------------------------------------------|----------------------------------------------------------------|
| [Getting Started](docs/getting-started/README.md) | Prerequisites, quickstart, and first training job |
| [Deployment](docs/infrastructure/README.md) | Infrastructure provisioning and setup |
| [Training](docs/training/README.md) | RL and IL training workflows, MLflow, and checkpointing |
| [Security](docs/security/README.md) | Threat model, security guide, deployment responsibilities |
| [Recipes](docs/recipes/README.md) | Guides that take you from a standing start to a working result |
| [Contributing](docs/contributing/README.md) | Architecture, style guides, contribution workflow |

## Architecture

Expand Down
44 changes: 43 additions & 1 deletion docs/contributing/infrastructure-style.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ sidebar_position: 7
title: Infrastructure as Code Style Guide
description: Terraform conventions, shell script standards, and copyright headers for contributions
author: Microsoft Robotics-AI Team
ms.date: 2026-03-18
ms.date: 2026-03-26
ms.topic: reference
---

Expand Down Expand Up @@ -343,6 +343,48 @@ kind: ConfigMap
* Place at the top of the file for other file types
* Include blank line between copyright header and code

## Documentation Generation

Terraform module documentation generates from source using [terraform-docs](https://terraform-docs.io/) v0.21.0. Each module and deployment directory contains a `TERRAFORM.md` file that terraform-docs produces automatically.

### Configuration

The repository-wide configuration lives in `.terraform-docs.yml` at the workspace root. This file controls output format, section ordering, and content templates.

### Generated Files

Generated `TERRAFORM.md` files exist in every Terraform module and deployment directory. These files are excluded from cspell and markdownlint because their content derives from HCL source code.

| Directory | File |
|----------------------------------------------|----------------|
| `infrastructure/terraform/` | `TERRAFORM.md` |
| `infrastructure/terraform/vpn/` | `TERRAFORM.md` |
| `infrastructure/terraform/modules/platform/` | `TERRAFORM.md` |
| `infrastructure/terraform/modules/sil/` | `TERRAFORM.md` |
| `infrastructure/terraform/modules/vpn/` | `TERRAFORM.md` |

### Regenerating Documentation

Run terraform-docs against a specific directory:

```bash
terraform-docs markdown table --output-file TERRAFORM.md infrastructure/terraform/modules/platform/
```

Or regenerate all modules using the PowerShell helper:

```powershell
./scripts/Update-TerraformDocs.ps1
```

### Quality Standards

Variable descriptions serve as the primary documentation source. Write descriptions that:

* Use sentence case without trailing periods
* Explain purpose and expected values, not just the variable name restated
* Include examples for complex types (e.g., `object`, `map`)

## Related Documentation

* [Contributing Guide](README.md) - Prerequisites, workflow, commit messages
Expand Down
36 changes: 20 additions & 16 deletions docs/contributing/prerequisites.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,22 +22,23 @@ Tools, Azure access, and build validation requirements for contributing to the P

Install these tools before contributing:

| Tool | Minimum Version | Installation |
|-------------|-----------------|-----------------------------------------------------------------------|
| Terraform | 1.9.8 | <https://developer.hashicorp.com/terraform/install> |
| TFLint | 0.61.0 | <https://github.com/terraform-linters/tflint> |
| Azure CLI | 2.65.0 | <https://learn.microsoft.com/cli/azure/install-azure-cli> |
| kubectl | 1.31 | <https://kubernetes.io/docs/tasks/tools/> |
| Helm | 3.16 | <https://helm.sh/docs/intro/install/> |
| Node.js/npm | 20+ LTS | <https://nodejs.org/> |
| Python | 3.11+ | <https://www.python.org/downloads/> |
| shellcheck | 0.10+ | <https://www.shellcheck.net/> |
| uv | latest | <https://docs.astral.sh/uv/> |
| Go | 1.24+ | <https://go.dev/dl/> |
| golangci-lint | 2.11+ | <https://golangci-lint.run/welcome/install/> |
| Docker | latest | <https://docs.docker.com/get-docker/> (with NVIDIA Container Toolkit) |
| OSMO CLI | latest | <https://developer.nvidia.com/osmo> |
| hve-core | latest | <https://github.com/microsoft/hve-core> |
| Tool | Minimum Version | Installation |
|----------------|-----------------|-----------------------------------------------------------------------|
| Terraform | 1.9.8 | <https://developer.hashicorp.com/terraform/install> |
| TFLint | 0.61.0 | <https://github.com/terraform-linters/tflint> |
| Azure CLI | 2.65.0 | <https://learn.microsoft.com/cli/azure/install-azure-cli> |
| kubectl | 1.31 | <https://kubernetes.io/docs/tasks/tools/> |
| Helm | 3.16 | <https://helm.sh/docs/intro/install/> |
| Node.js/npm | 20+ LTS | <https://nodejs.org/> |
| Python | 3.11+ | <https://www.python.org/downloads/> |
| shellcheck | 0.10+ | <https://www.shellcheck.net/> |
| uv | latest | <https://docs.astral.sh/uv/> |
| Go | 1.24+ | <https://go.dev/dl/> |
| golangci-lint | 2.11+ | <https://golangci-lint.run/welcome/install/> |
| Docker | latest | <https://docs.docker.com/get-docker/> (with NVIDIA Container Toolkit) |
| OSMO CLI | latest | <https://developer.nvidia.com/osmo> |
| terraform-docs | 0.21.0 | <https://github.com/terraform-docs/terraform-docs/releases> |
| hve-core | latest | <https://github.com/microsoft/hve-core> |

## Azure Access Requirements

Expand Down Expand Up @@ -145,6 +146,9 @@ nvidia-ctk --version
# OSMO CLI
osmo --version

# terraform-docs
terraform-docs --version # >= 0.21.0

# hve-core (VS Code extension — verify via extensions list)
code --list-extensions | grep -i hve-core
```
Expand Down
18 changes: 9 additions & 9 deletions docs/operations/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
---
sidebar_position: 1
title: Operations Hub
Expand Down Expand Up @@ -31,16 +31,16 @@

The reference architecture deploys configurable monitoring components through Terraform feature flags.

| Component | Purpose | Feature Flag |
|----------------------------------|------------------------------------|-----------------------------------|
| Log Analytics workspace | Central log aggregation | Always deployed |
| Application Insights | Application performance monitoring | Always deployed |
| Azure Monitor workspace | Prometheus metrics backend | `should_deploy_monitor_workspace` |
| Managed Grafana | Visualization dashboards | `should_deploy_grafana` |
| Container Insights | AKS container telemetry | `should_deploy_dce` |
| Component | Purpose | Feature Flag |
|----------------------------------|------------------------------------|-----------------------------------------------------------|
| Log Analytics workspace | Central log aggregation | Always deployed |
| Application Insights | Application performance monitoring | Always deployed |
| Azure Monitor workspace | Prometheus metrics backend | `should_deploy_monitor_workspace` |
| Managed Grafana | Visualization dashboards | `should_deploy_grafana` |
| Container Insights | AKS container telemetry | `should_deploy_dce` |
| Prometheus data collection rules | Metric scraping configuration | `should_deploy_dce` and `should_deploy_monitor_workspace` |
| Azure Monitor Private Link Scope | Private network monitoring | `should_deploy_ampls` |
| Data collection endpoint | Private ingestion endpoint | `should_deploy_dce` |
| Azure Monitor Private Link Scope | Private network monitoring | `should_deploy_ampls` |
| Data collection endpoint | Private ingestion endpoint | `should_deploy_dce` |

> [!IMPORTANT]
> The default configuration deploys a **private AKS cluster**. Connect through the VPN Gateway before running any `kubectl` or Helm commands. See [VPN Gateway](../infrastructure/vpn.md) for setup instructions.
Expand Down
32 changes: 16 additions & 16 deletions docs/recipes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,30 +7,30 @@ Step-by-step guides that take you from a standing start to a working result. Eac

## 🚀 Pick a Recipe

| Goal | Recipe | Time |
|------|--------|------|
| Train an RL policy | [Your First RL Training Job](training/your-first-rl-training-job.md) | 30 min |
| Train a LeRobot policy | [Your First LeRobot Training Job](training/your-first-lerobot-training-job.md) | 30 min |
| Run the full train → eval → register pipeline | [End-to-End LeRobot Pipeline](training/end-to-end-lerobot-pipeline.md) | 60 min |
| Configure edge recording | [Configuring Edge Data Recording](data-collection/configuring-edge-data-recording.md) | 20 min |
| Prepare a dataset for training | [Preparing Datasets for Training](data-collection/preparing-datasets-for-training.md) | 30 min |
| Goal | Recipe | Time |
|-----------------------------------------------|---------------------------------------------------------------------------------------|--------|
| Train an RL policy | [Your First RL Training Job](training/your-first-rl-training-job.md) | 30 min |
| Train a LeRobot policy | [Your First LeRobot Training Job](training/your-first-lerobot-training-job.md) | 30 min |
| Run the full train → eval → register pipeline | [End-to-End LeRobot Pipeline](training/end-to-end-lerobot-pipeline.md) | 60 min |
| Configure edge recording | [Configuring Edge Data Recording](data-collection/configuring-edge-data-recording.md) | 20 min |
| Prepare a dataset for training | [Preparing Datasets for Training](data-collection/preparing-datasets-for-training.md) | 30 min |

## 📖 Recipe Catalog

### Training

| Recipe | Description | Prerequisites |
|--------|-------------|---------------|
| [Your First RL Training Job](training/your-first-rl-training-job.md) | Submit an Isaac Lab RL training job on OSMO with SKRL | Deployed infrastructure, OSMO running |
| [Your First LeRobot Training Job](training/your-first-lerobot-training-job.md) | Submit a LeRobot behavioral cloning job on OSMO | Deployed infrastructure, HuggingFace dataset |
| [End-to-End LeRobot Pipeline](training/end-to-end-lerobot-pipeline.md) | Orchestrate train → evaluate → register in one command | Completed basic LeRobot recipe |
| Recipe | Description | Prerequisites |
|--------------------------------------------------------------------------------|--------------------------------------------------------|----------------------------------------------|
| [Your First RL Training Job](training/your-first-rl-training-job.md) | Submit an Isaac Lab RL training job on OSMO with SKRL | Deployed infrastructure, OSMO running |
| [Your First LeRobot Training Job](training/your-first-lerobot-training-job.md) | Submit a LeRobot behavioral cloning job on OSMO | Deployed infrastructure, HuggingFace dataset |
| [End-to-End LeRobot Pipeline](training/end-to-end-lerobot-pipeline.md) | Orchestrate train → evaluate → register in one command | Completed basic LeRobot recipe |

### Data Collection

| Recipe | Description | Prerequisites |
|--------|-------------|---------------|
| [Configuring Edge Data Recording](data-collection/configuring-edge-data-recording.md) | Set up ROS 2 edge recording on Jetson with chunking and compression | Jetson device, ROS 2 |
| [Preparing Datasets for Training](data-collection/preparing-datasets-for-training.md) | Download, inspect, and validate datasets for LeRobot training | Python 3.11+, Azure CLI |
| Recipe | Description | Prerequisites |
|---------------------------------------------------------------------------------------|---------------------------------------------------------------------|-------------------------|
| [Configuring Edge Data Recording](data-collection/configuring-edge-data-recording.md) | Set up ROS 2 edge recording on Jetson with chunking and compression | Jetson device, ROS 2 |
| [Preparing Datasets for Training](data-collection/preparing-datasets-for-training.md) | Download, inspect, and validate datasets for LeRobot training | Python 3.11+, Azure CLI |

## 🔗 Related Documentation

Expand Down
6 changes: 3 additions & 3 deletions docs/recipes/data-collection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@ Guides for capturing, processing, and managing robotic training datasets.

## 📖 Recipes

| Recipe | Description | Time |
|--------|-------------|------|
| Recipe | Description | Time |
|-----------------------------------------------------------------------|---------------------------------------------------------------------|--------|
| [Configuring Edge Data Recording](configuring-edge-data-recording.md) | Set up ROS 2 edge recording on Jetson with chunking and compression | 20 min |
| [Preparing Datasets for Training](preparing-datasets-for-training.md) | Download, inspect, and validate datasets for LeRobot training | 30 min |
| [Preparing Datasets for Training](preparing-datasets-for-training.md) | Download, inspect, and validate datasets for LeRobot training | 30 min |

## 🔗 Related

Expand Down
Loading
Loading