Skip to content

feat(infrastructure): add private Linux Isaac Sim VM deployment option#348

Merged
katriendg merged 6 commits into
microsoft:mainfrom
fbeltrao:feature/63-private-isaac-sim-linux
Mar 25, 2026
Merged

feat(infrastructure): add private Linux Isaac Sim VM deployment option#348
katriendg merged 6 commits into
microsoft:mainfrom
fbeltrao:feature/63-private-isaac-sim-linux

Conversation

@fbeltrao
Copy link
Copy Markdown
Contributor

Description

Add an optional private Linux Isaac Sim VM deployment path for development environments.

This PR adds a Bicep-based VM deployment for the NVIDIA Isaac Sim Developer Workstation marketplace image, a deploy wrapper script that resolves values from the existing Terraform platform deployment, post-provisioning setup for workstation dependencies, and optional NAT egress and Microsoft Defender for Endpoint VM extensions. It also extends the Terraform platform outputs and variables so the deployment script can reuse a dedicated VM subnet and the shared network security group from the platform environment.

This is partial progress toward #63. The current implementation is intentionally limited to Linux VMs on private networking and reuses existing subnet and NSG resources instead of introducing a standalone Terraform VM deployment path.

What Is Included

  • Bicep entrypoint and modules for a private Linux Isaac Sim VM, optional subnet NAT egress, and shared default parameter types.
  • Deployment script support for Terraform-derived defaults, isolated VM resource groups, config preview, and optional Defender for Endpoint enablement.
  • Post-provisioning scripts for developer tooling and unattended ThinLinc installation.
  • Terraform updates that expose and optionally create the VM subnet and network security group needed by the deployment flow.
  • README documentation for prerequisites, deployment commands, parameters, and current limitations.

Not Included In This PR

  • Terraform-managed Isaac Sim VM resources. This PR adds Terraform support for the required network outputs only.
  • A standalone validation script such as scripts/validate_isaac_vm.sh.
  • Verification for the remaining issue acceptance criteria, including Checkov scanning, deployment time measurement, and automated Isaac Sim sample-scene launch validation.
  • Public IP or alternate connectivity modes. The current flow supports private networking only.
  • Additional authentication modes such as SSH key-based login.

Those items should be completed in follow-up PRs so this change can stay scoped to the initial private Linux deployment path.

Type of Change

  • 🐛 Bug fix (non-breaking change fixing an issue)
  • ✨ New feature (non-breaking change adding functionality)
  • 💥 Breaking change (fix or feature causing existing functionality to change)
  • 📚 Documentation update
  • 🏗️ Infrastructure change (Terraform/IaC)
  • ♻️ Refactoring (no functional changes)

Component(s) Affected

  • infrastructure/terraform/prerequisites/ - Azure subscription setup
  • infrastructure/terraform/ - Terraform infrastructure
  • infrastructure/setup/ - OSMO control plane / Helm
  • workflows/ - Training and evaluation workflows
  • training/ - Training pipelines and scripts
  • docs/ - Documentation

Testing Performed

  • Terraform plan reviewed (no unexpected changes)
  • Terraform apply tested in dev environment
  • Training scripts tested locally with Isaac Sim
  • OSMO workflow submitted successfully
  • Smoke tests passed (smoke_test_azure.py)

Notes:

  • terraform fmt was run for the Terraform changes.
  • ./deploy-isaac-sim-vm.sh --vm-name <vm-name> was run successfully for the optional deployment flow.

Documentation Impact

  • No documentation changes needed
  • Documentation updated in this PR
  • Documentation issue filed

Issue Link

Partially addresses #63.

Checklist

  • My code follows the project conventions
  • Commit messages follow conventional commit format
  • I have performed a self-review
  • Documentation impact assessed above
  • No new linting warnings introduced

@fbeltrao fbeltrao requested a review from a team as a code owner March 23, 2026 13:17
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 43.58%. Comparing base (9c34f86) to head (e896a08).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #348   +/-   ##
=======================================
  Coverage   43.58%   43.58%           
=======================================
  Files         242      242           
  Lines       14840    14840           
  Branches     1855     1855           
=======================================
  Hits         6468     6468           
  Misses       8082     8082           
  Partials      290      290           
Flag Coverage Δ *Carryforward flag
pester 79.87% <ø> (ø)
pytest 6.89% <ø> (ø) Carriedforward from 6804630
pytest-dataviewer 61.98% <ø> (ø)
vitest 50.72% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Collaborator

@katriendg katriendg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review — Isaac Sim VM Deployment

Nice work getting the initial private Linux deployment path scoped and working. This is a solid foundation for #63. Below is a summary of the inline review comments.

Findings Summary

# Severity File Summary
1 💡 Note bicep/main.bicep Bicep acceptable for optional deployment; Terraform variant could be a follow-up
2 ⚠️ High bicep/modules/subnet-nat-egress.bicep Subnet PUT overwrites existing NSG/delegation/route properties — carry forward existing config
3 ⚠️ Medium scripts/install-dev-deps.sh Hardcoded azureuser for Docker group + git config --global runs as root — pass admin username from Bicep
4 💡 Suggestion deploy-isaac-sim-vm.sh Custom tfvars parser is fragile — terraform console alternative
5 ⚠️ Medium isaac-sim-vm/README.md Prerequisites should spell out exact Terraform pipeline steps and commands
6 ⚠️ Medium bicep/modules/linux-isaac-vm.bicep Auto-shutdown timezone hardcoded to W. Europe — parameterize with UTC default
7 ⚠️ Medium isaac-sim-vm/README.md No cleanup/destroy documentation — users need explicit teardown guidance for Bicep-deployed resources

Sub-Issue Recommendation

Issue #63 has no sub-issues. This PR addresses roughly half the acceptance criteria. Recommend creating a sub-issue to track the remaining items:

  • Passes Checkov security scan
  • Deployment completes in under 15 minutes (measure and document)
  • Isaac Sim launches successfully and loads a sample scene after provisioning
  • Deployment validation script (scripts/validate_isaac_vm.sh)
  • Decision on Terraform variant requirement (or amend #63 criteria to Bicep-only)
  • SSH key authentication support (currently password-only)
  • Parameterize auto-shutdown timezone
  • Add cleanup/destroy documentation section to README
  • Make ThinLinc installation optional

Comment thread infrastructure/setup/optional/isaac-sim-vm/bicep/main.bicep
Comment thread infrastructure/setup/optional/isaac-sim-vm/bicep/modules/subnet-nat-egress.bicep Outdated
Comment thread infrastructure/setup/optional/deploy-isaac-sim-vm.sh Outdated
Comment thread infrastructure/setup/optional/isaac-sim-vm/README.md Outdated
Comment thread infrastructure/setup/optional/isaac-sim-vm/README.md
…sions

- add main Bicep file for Isaac Linux VM deployment
- create module for Linux Isaac VM with necessary resources
- implement subnet NAT egress module for outbound internet access
- define shared types and default configurations for deployment
- include scripts for installing development dependencies and ThinLinc silently

🔧 - Generated by Copilot
…schedule

- add shutdown schedule parameter to Bicep modules
- update deployment scripts to validate admin user existence
- improve README with detailed deployment steps and cleanup instructions

🛠️ - Generated by Copilot
@fbeltrao fbeltrao force-pushed the feature/63-private-isaac-sim-linux branch from 66aa3da to dab4518 Compare March 24, 2026 17:00
- specify that deleting the resource group will remove all VMs
- add cautionary note for users regarding the command

⚠️ - Generated by Copilot
Copy link
Copy Markdown
Collaborator

@katriendg katriendg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional changes, this looks good to me. The fact we have Bicep now only is OK, we can reflect on adding Terraform and enhancing any areas in subsequent PR.

@katriendg katriendg merged commit 3748c2d into microsoft:main Mar 25, 2026
27 checks passed
WilliamBerryiii pushed a commit that referenced this pull request Mar 26, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.5.0](v0.4.0...v0.5.0)
(2026-03-26)


### ✨ Features

* add dataviewer web application for dataset analysis and annotation
([#375](#375))
([c44d7bb](c44d7bb))
* add return type annotations to cli_args functions
([#476](#476))
([35523ee](35523ee))
* add YAML config schema with pydantic validation for ROS 2 recording
([#376](#376))
([1fa5243](1fa5243))
* **agents:** Copilot agents and skills for dataviewer and OSMO training
workflows.
([#444](#444))
([8b72daf](8b72daf))
* **build:** add automated ms.date freshness checking
([#448](#448))
([f92ddbc](f92ddbc))
* **build:** add CLA section, Dependabot security prefix, and OWASP ZAP
DAST scan
([#241](#241))
([083a8af](083a8af))
* **build:** add coverage.py configuration to pyproject.toml
([#428](#428))
([eac7426](eac7426))
* **build:** add Go CI pipeline with golangci-lint and go test
([#351](#351))
([b27e4fb](b27e4fb))
* **build:** add OpenSSF Scorecard workflow and badge
([#431](#431))
([98a62e7](98a62e7))
* **build:** add release artifact signing and SBOM attestation
([#480](#480))
([b226e96](b226e96))
* **build:** add TFLint reusable GitHub Actions workflow
([#229](#229))
([34d5575](34d5575))
* **build:** split Go CI into separate lint and test pipelines
([#354](#354))
([2dec155](2dec155))
* **dataviewer:** add authentication middleware and CSRF protection for
mutation endpoints
([#432](#432))
([77c8a01](77c8a01))
* **docs:** create training documentation hub with guides and migration
([#380](#380))
([0fdccc5](0fdccc5))
* **docs:** port Docusaurus documentation site with full build
validation
([#182](#182))
([29dd640](29dd640))
* fix and deploy dataviewer
([#498](#498))
([c922d49](c922d49))
* **inference:** add AzureML and local LeRobot inference workflows
([#438](#438))
([f7d786a](f7d786a))
* **inference:** add MLflow trajectory plots and multi-source support to
OSMO inference workflow
([#421](#421))
([8637458](8637458))
* **infra:** add blob storage lifecycle policies and folder structure
([#179](#179))
([101a6e8](101a6e8))
* **infrastructure:** add optional observability and compute feature
flags
([#437](#437))
([9eba0da](9eba0da))
* **infrastructure:** add private Linux Isaac Sim VM deployment option
([#348](#348))
([3748c2d](3748c2d))
* **infrastructure:** add terraform-docs auto-generation pipeline
([#358](#358))
([6565caa](6565caa))
* **infrastructure:** harden Isaac Sim VM deployment with encryption and
spot options
([#355](#355))
([6ebc1f2](6ebc1f2))
* **repo:** migrate to domain-driven architecture
([#270](#270))
([a339e70](a339e70))
* **scripts:** add --config-preview and deployment summary to submission
scripts
([#499](#499))
([4069806](4069806))
* **scripts:** add Copilot attribution footer validation to frontmatter
linting
([#378](#378))
([4d595f2](4d595f2))
* **src:** add dataviewer web application with storage adapter layer
([#404](#404))
([8a9fb70](8a9fb70))


### 🐛 Bug Fixes

* **build:** add GHSA to cspell custom dictionary
([#315](#315))
([67db81a](67db81a))
* **build:** correct codecov report_type input for terraform test
uploads
([#324](#324))
([d90d66d](d90d66d))
* **build:** expand CODEOWNERS coverage to critical paths
([#505](#505))
([bafade1](bafade1))
* **build:** pin Docker base image and pip dependencies with Dependabot
coverage
([#497](#497))
([d3d7ea4](d3d7ea4))
* **build:** pin pydantic version and use uv in config schema validation
workflow
([#493](#493))
([28d823f](28d823f))
* **build:** pin uv installer to versioned URL
([#495](#495))
([8d8541b](8d8541b))
* **build:** remediate GHSA vulnerabilities flagged by OSSF Scorecard
([#271](#271))
([49b6e58](49b6e58))
* **build:** remove README frontmatter, add FrontmatterExcludePaths,
enforce Pester 5
([#443](#443))
([641d0f3](641d0f3))
* **build:** resolve CI failures for release 0.5.0 PR
([#174](#174))
([62c9900](62c9900))
* **build:** resolve codecov PR comment suppression
([#523](#523))
([5603bd7](5603bd7))
* **build:** use npm ci for deterministic frontend dependency install
([#491](#491))
([ee8b5d3](ee8b5d3)),
closes
[#490](#490)
* **ci:** add `wait_for_ci` to Codecov configuration
([#183](#183))
([370cf44](370cf44))
* **CI:** Issue 116 clean up dataviewer tests
([#184](#184))
([f466c23](f466c23))
* **ci:** pin pydantic to ==2.12.5 across all references
([#230](#230))
([9d841d5](9d841d5))
* **dataviewer:** add HTTP Range support for blob video streaming
([#165](#165))
([8adde50](8adde50))
* **dataviewer:** remediate CodeQL alerts and align ruff config
([#419](#419))
([eb6fac9](eb6fac9))
* **dataviewer:** remediate path traversal and input validation
vulnerabilities
([#413](#413))
([0a1d2ca](0a1d2ca))
* **docs:** remove trailingSlash: false for GitHub Pages compatibility
([#228](#228))
([a78cb97](a78cb97))
* **gpu:** add GPU Operator validation dependencies to GRID driver
installer
([#441](#441))
([eec42da](eec42da))
* **infrastructure:** add zone-redundant config to VPN gateway public IP
([#352](#352))
([2d734f4](2d734f4))
* **infrastructure:** improve stdout handling for helm commands in GPU…
([#311](#311))
([153f467](153f467))
* **infrastructure:** resolve remaining TFLint violations in SIL module
and example configs
([#298](#298))
([c0ce3e5](c0ce3e5))
* **infrastructure:** resolve TFLint violations in root and automation
modules
([#287](#287))
([b6a4604](b6a4604)),
closes
[#203](#203)
* **infrastructure:** update deprecated bgp vng variable name
([#307](#307))
([f530734](f530734))
* **scripts:** pin uv version in OSMO workflow templates
([#500](#500))
([7edf13a](7edf13a))
* **scripts:** replace lambda with def in lerobot_handler to satisfy R…
([#176](#176))
([baf9e58](baf9e58))
* **scripts:** support OSMO control-plane deploys with in-cluster Redis
([#317](#317))
([d4b70de](d4b70de))
* **scripts:** update compute target name derivation logic
([#319](#319))
([bb20431](bb20431))
* **settings:** update devcontainer name to match project context
([#177](#177))
([745321e](745321e))
* **terraform:** create PostgreSQL Key Vault secret via ARM control
plane
([#304](#304))
([5d73b81](5d73b81))
* **terraform:** gate observability with feature flags
([#303](#303))
([ea5e056](ea5e056))
* **terraform:** switch VPN gateway defaults to AZ SKUs
([#309](#309))
([74989c5](74989c5))
* **training:** correct learning rate mapping and pin LeRobot version
([#439](#439))
([5cf9943](5cf9943))
* **workflows:** enable SARIF upload for dependency-pinning scans
([#502](#502))
([124cad6](124cad6)),
closes
[#501](#501)
* **workflows:** remove redundant top-level permissions from
codeql-analysis
([#489](#489))
([1490fda](1490fda))
* **workflows:** use bash shell for uv.lock regeneration and add SARIF
to dictionary
([#225](#225))
([e6fa6ea](e6fa6ea))


### 📚 Documentation

* add chunking and compression configuration guide for Jetson edge
recording
([#408](#408))
([787a322](787a322))
* add OpenSSF Best Practices badge to README
([#282](#282))
([01ea384](01ea384))
* add threat model cross-reference to SECURITY.md
([#235](#235))
([88a461e](88a461e))
* add vulnerability remediation timeline to SECURITY.md
([#233](#233))
([5ead3ee](5ead3ee))
* **contributing:** remove version-specific planning language from
ownership tip
([#407](#407))
([3191f9b](3191f9b))
* **deploy:** replace deploy/ READMEs with pointer files
([#379](#379))
([b3c3abb](b3c3abb))
* **docs:** add bug report response timeline for OSSF report_responses
criterion
([#485](#485))
([9b26212](9b26212))
* **docs:** add component update process for OpenSSF Silver badge
([#446](#446))
([6adc8a2](6adc8a2))
* **docs:** Add data collection and training recipes
([#343](#343))
([9c34f86](9c34f86))
* **docs:** add deprecation policy for external interfaces
([#445](#445))
([229d5db](229d5db))
* **docs:** add structure for recipes in repo
([#322](#322))
([098757b](098757b))
* **docs:** add YAML frontmatter to SUPPORT.md
([#478](#478))
([d94c15d](d94c15d)),
closes
[#347](#347)
* **docs:** clarify issue assignment requirement before starting work
([#299](#299))
([1534462](1534462))
* **docs:** create inference and training docs hubs
([#402](#402))
([7a20a2e](7a20a2e))
* **docs:** create reference hub and migrate script documentation
([#503](#503))
([03a31c6](03a31c6))
* **docs:** create training and inference documentation hubs
([#403](#403))
([7be003b](7be003b))
* **operations:** create operations hub and troubleshooting guide
([#525](#525))
([31c7aaa](31c7aaa))
* **reference:** add copilot artifacts documentation hub
([#170](#170))
([9a45ca4](9a45ca4))
* simplify root README and update prerequisites
([#440](#440))
([c0c7710](c0c7710))


### ♻️ Code Refactoring

* **build:** align Python dependency workflows with uv
([#447](#447))
([3102e03](3102e03))
* **docs:** rename Docusaurus site to Physical AI Toolchain
([#224](#224))
([cfdf47a](cfdf47a))
* **infrastructure:** rename boolean variables to `should_` prefix and
add missing core variables
([#292](#292))
([4496593](4496593))
* **python:** move runtime deps to workflow pyproject manifests
([#405](#405))
([6c5fbeb](6c5fbeb))


### 📦 Build System

* **build:** add Codecov upload to pytest workflow
([#434](#434))
([0110c17](0110c17))
* **deps-dev:** bump the npm_and_yarn group across 2 directories with 1
update
([#325](#325))
([59cf9e6](59cf9e6))
* **workflows:** enable coverage parameters and fix Pester test
infrastructure
([#435](#435))
([528bbde](528bbde))


### 🔧 Miscellaneous

* add gomod to cspell general-technical wordlist
([#362](#362))
([1f93f47](1f93f47))
* **build:** add codecov.yml for unified coverage reporting
([#430](#430))
([b0faf70](b0faf70))
* **build:** add Go toolchain devcontainer feature and Dependabot gomod
([#337](#337))
([8a36620](8a36620))
* **deps:** bump cryptography from 45.0.7 to 46.0.5 in /src/training
([#506](#506))
([a06434e](a06434e))
* **deps:** bump minimatch in /src/dataviewer/frontend
([#416](#416))
([38a7607](38a7607))
* **deps:** bump pyasn1 from 0.6.2 to 0.6.3 in /training/rl
([#296](#296))
([7b42cf5](7b42cf5))
* **deps:** bump rollup in /src/dataviewer/frontend
([#417](#417))
([6302ce4](6302ce4))
* **deps:** bump the common-dependencies group in /src/common with 3
updates
([#507](#507))
([db05074](db05074))
* **deps:** bump the github-actions group across 1 directory with 6
updates
([#284](#284))
([c40eff6](c40eff6))
* **deps:** bump the github-actions group across 1 directory with 6
updates
([#433](#433))
([2d9dd4f](2d9dd4f))
* **deps:** bump the github-actions group across 1 directory with 6
updates
([#510](#510))
([c334a64](c334a64))
* **deps:** bump the github-actions group with 2 updates
([#163](#163))
([f25713e](f25713e))
* **deps:** bump the inference-dependencies group in /evaluation with 3
updates
([#279](#279))
([1d2d3dc](1d2d3dc))
* **deps:** bump the inference-dependencies group in /src/inference with
5 updates
([#508](#508))
([2852ffb](2852ffb))
* **deps:** bump the lerobot-inference-dependencies group in
/workflows/azureml with 4 updates
([#511](#511))
([b7c5773](b7c5773))
* **deps:** bump the npm_and_yarn group across 2 directories with 1
update
([#223](#223))
([6a261ab](6a261ab))
* **deps:** bump the training-dependencies group
([#429](#429))
([66e43f4](66e43f4))
* **deps:** bump tornado from 6.5.4 to 6.5.5 in the uv group across 1
directory
([#172](#172))
([d6caf29](d6caf29))
* **docs:** correct ms.date tooling and refresh stale documentation
([#349](#349))
([ccaa1e8](ccaa1e8))
* **infrastructure:** add Go module and golangci-lint config for e2e
tests
([#347](#347))
([e0e6bbf](e0e6bbf))
* **infrastructure:** add root .terraform-docs.yml configuration
([#312](#312))
([bb73bbb](bb73bbb))
* migrate references from Azure-Samples to
microsoft/physical-ai-toolchain
([f58f0ef](f58f0ef))
* **workflows:** update Dependabot, CodeQL, CODEOWNERS, and cspell for
dataviewer coverage
([#231](#231))
([6d8c2e8](6d8c2e8))


### 🔒 Security

* **deps:** bump mlflow from 3.5.0 to 3.8.0rc0 in /training/rl
([#297](#297))
([e9929df](e9929df))
* **deps:** bump the github-actions group across 1 directory with 4
updates
([#344](#344))
([6826929](6826929))
* **deps:** bump the inference-dependencies group in /evaluation with 2
updates
([#339](#339))
([6804630](6804630))
* **deps:** bump the npm_and_yarn group across 3 directories with 1
update
([#361](#361))
([6760857](6760857))
* **deps:** bump the training-dependencies group across 1 directory with
54 updates
([#286](#286))
([d9ae04f](d9ae04f))
* **deps:** bump the uv group across 3 directories with 1 update
([#360](#360))
([dfbda06](dfbda06))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: physical-ai-toolchain-release[bot] <267194360+physical-ai-toolchain-release[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Bill Berry <wbery@microsoft.com>
@fbeltrao fbeltrao deleted the feature/63-private-isaac-sim-linux branch April 10, 2026 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants