Skip to content

feat(infrastructure): add optional ADLS Gen2 data lake storage account#398

Merged
WilliamBerryiii merged 7 commits into
microsoft:mainfrom
jjottar:feat/adls-gen2-storage
Apr 8, 2026
Merged

feat(infrastructure): add optional ADLS Gen2 data lake storage account#398
WilliamBerryiii merged 7 commits into
microsoft:mainfrom
jjottar:feat/adls-gen2-storage

Conversation

@jjottar
Copy link
Copy Markdown
Contributor

@jjottar jjottar commented Apr 7, 2026

Pull Request

Description

Add an optional dedicated ADLS Gen2 storage account with hierarchical namespace (HNS) for domain data (datasets, model checkpoints, evaluation reports), separate from the existing AzureML workspace storage. The data lake is gated behind should_create_data_lake_storage (default: false) and follows existing patterns for naming, networking, RBAC, and lifecycle policies.

Closes #385

Type of Change

  • 🐛 Bug fix (non-breaking change fixing an issue)
  • ✨ New feature (non-breaking change adding functionality)
  • 💥 Breaking change (fix or feature causing existing functionality to change)
  • 📚 Documentation update
  • 🏗️ Infrastructure change (Terraform/IaC)
  • ♻️ Refactoring (no functional changes)

Component(s) Affected

  • infrastructure/terraform/prerequisites/ - Azure subscription setup
  • infrastructure/terraform/ - Terraform infrastructure
  • infrastructure/setup/ - OSMO control plane / Helm
  • workflows/ - Training and evaluation workflows
  • training/ - Training pipelines and scripts
  • docs/ - Documentation

Testing Performed

  • Terraform plan reviewed (no unexpected changes)
  • Terraform apply tested in dev environment
  • Training scripts tested locally with Isaac Sim
  • OSMO workflow submitted successfully
  • Smoke tests passed (smoke_test_azure.py)

Terraform Plan (with should_create_data_lake_storage = true)

Plan: 12 to add, 2 to change, 1 to destroy.

Action Resource
create module.platform.azurerm_storage_account.data_lake[0]
create module.platform.azurerm_storage_container.datasets[0]
create module.platform.azurerm_storage_container.models[0]
create module.platform.azurerm_storage_container.evaluation[0]
create module.platform.azurerm_storage_management_policy.data_lake[0]
create module.platform.azurerm_private_dns_zone.core["storage_dfs"]
create module.platform.azurerm_private_dns_zone_virtual_network_link.core["storage_dfs"]
create module.platform.azurerm_private_endpoint.data_lake_blob[0]
create module.platform.azurerm_private_endpoint.data_lake_dfs[0]
create module.platform.azurerm_role_assignment.user_data_lake_blob[0]
create module.platform.azurerm_role_assignment.ml_data_lake_blob[0]
create module.platform.azurerm_role_assignment.osmo_data_lake_blob[0]
update module.platform.azurerm_key_vault.main (in-place, pre-existing drift)
update module.platform.azurerm_storage_account.main (in-place, pre-existing drift)
destroy module.platform.azurerm_storage_management_policy.main

The 2 in-place updates and the destroy are expected:

  • Updates: Pre-existing drift on Key Vault and ML storage account — not caused by this feature.
  • Destroy: ML storage lifecycle policy transitions to conditional (count = 0 when data lake is enabled). Existing deployments without the data lake retain their lifecycle rules.

Terraform Apply

All 12 data lake resources created successfully in rg-roboticsch-dev-001 (switzerlandnorth):

Resource Name / ID
Storage Account stdlroboticschdev001 (HNS enabled)
Container: datasets datasets (private)
Container: models models (private)
Container: evaluation evaluation (private)
Lifecycle Policy 3 rules (raw bags delete, datasets cool, evaluation reports cool→archive)
DNS Zone privatelink.dfs.core.windows.net
PE: blob pe-datalake-blob-roboticsch-dev-001
PE: dfs pe-datalake-dfs-roboticsch-dev-001
RBAC: current user Storage Blob Data Contributor on stdl*
RBAC: ML identity Storage Blob Data Contributor on stdl*
RBAC: OSMO identity Storage Blob Data Contributor on stdl*

Terraform Test

Total: 169 passed, 0 failed, 0 errors

Lint & Validation

Check Result
npm run lint:tf 0 issues
npm run lint:tf:validate All directories passed
npm run spell-check 0 issues
npm run lint:md 0 errors

What Changed

Platform Module (infrastructure/terraform/modules/platform/)

  • storage.tf — New azurerm_storage_account.data_lake with is_hns_enabled = true, datasets, models, and evaluation containers, data lake lifecycle policy, blob and DFS private endpoints. ML storage lifecycle policy gated with count = var.should_create_data_lake_storage ? 0 : 1 to avoid regression for existing deployments, and legacy fallback lifecycle prefixes corrected to target ml-workspace/... paths when the data lake is disabled.
  • main.tf — Added storage_dfs = "privatelink.dfs.core.windows.net" to base_dns_zones (7 base zones, up from 6).
  • variables.tf — New should_create_data_lake_storage variable (bool, default false).
  • role-assignments.tf — Added Storage Blob Data Contributor on data lake for current user, ML identity, and OSMO identity. All gated on the data lake flag.
  • outputs.tf — New data_lake_storage_account and data_lake_storage_account_access outputs (null when disabled).

Dataviewer Module (infrastructure/terraform/modules/dataviewer/)

  • variables.deps.tf — New optional data_lake_storage_account input (nullable) on the reusable dataviewer Terraform module.
  • role-assignments.tf — Conditional Storage Blob Data Contributor on data lake for dataviewer identity when a caller passes the optional data lake dependency.

Root Module (infrastructure/terraform/)

  • variables.tf — New should_create_data_lake_storage root variable.
  • main.tf — Pass should_create_data_lake_storage to platform module.
  • outputs.tf — New data_lake_storage_account root output.
  • terraform.tfvars.example — Added should_create_data_lake_storage with documentation.

Tests (infrastructure/terraform/modules/platform/tests/)

  • dns-zones.tftest.hcl — Updated zone counts (6→7 base zones).
  • security.tftest.hcl — Added data_lake_security and data_lake_disabled_by_default test runs.
  • conditionals.tftest.hcl — Added data_lake_enabled and data_lake_disabled test runs.

Documentation

  • docs/cloud/blob-storage-structure.md — Rewritten for two-account architecture: ML workspace storage vs data lake storage, new container/folder structure, updated lifecycle policy references.
  • .cspell/general-technical.txt — Added stdl (data lake naming prefix).

Documentation Impact

  • Documentation updated in this PR

Checklist

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.40%. Comparing base (f0735d8) to head (f577331).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #398   +/-   ##
=======================================
  Coverage   64.40%   64.40%           
=======================================
  Files         251      251           
  Lines       15433    15433           
  Branches     2060     2060           
=======================================
  Hits         9939     9939           
  Misses       5206     5206           
  Partials      288      288           
Flag Coverage Δ
pester 82.20% <ø> (ø)
pytest 92.40% <ø> (ø)
pytest-dataviewer 63.87% <ø> (ø)
pytest-fuzz 1.59% <ø> (ø)
vitest 50.80% <ø> (ø)
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jjottar jjottar marked this pull request as ready for review April 7, 2026 08:59
@jjottar jjottar requested a review from a team as a code owner April 7, 2026 08:59
Copy link
Copy Markdown
Collaborator

@katriendg katriendg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jjottar for this contribution!

I've left one comment in the review, and two small requests:

  1. We have added Terraform docs generation (still to be documented though, so not something you knew): could you run npm run docs:generate:tf locally to update TERRAFORM.md file(s) before we merge?
  2. We typically document variables in the file infrastructure/terraform/terraform.tfvars.example, could you add this new one there as well?

Comment thread infrastructure/terraform/modules/platform/storage.tf
Copy link
Copy Markdown
Collaborator

@katriendg katriendg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jjottar.
Docs generation all looking good.

There is just one thing with the existing policy that now applies to the ADLS HNS account, which has containers for the blobs, I believe this may need a final update before we can merge? Left an inline comment on this.

@jjottar
Copy link
Copy Markdown
Contributor Author

jjottar commented Apr 8, 2026

Thank you @jjottar. Docs generation all looking good.

There is just one thing with the existing policy that now applies to the ADLS HNS account, which has containers for the blobs, I believe this may need a final update before we can merge? Left an inline comment on this.

Hello @katriendg, thanks for following up, somehow the new inline comment doesn't show up for me. Could it be it's not yet posted?

Copy link
Copy Markdown
Collaborator

@katriendg katriendg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So sorry @jjottar! Not sure how that happened. It seems I had two windows open...
Found the right window, this was the comment (should be there now).

I also notice in the meantime we merged another PR which generates the docs, for merging the conflict you can simply rebase and re-generate the docs again. Your version will then contain the incoming as well as your updates.

Comment thread infrastructure/terraform/modules/platform/storage.tf
Juan Jottar added 5 commits April 8, 2026 15:10
- add data lake storage account with HNS behind should_create_data_lake_storage flag
- add datasets and models containers with lifecycle policies
- add storage_dfs private DNS zone and data lake private endpoints
- add Storage Blob Data Contributor role assignments for ML, OSMO, user, and dataviewer identities
- update blob storage architecture docs for two-account layout

🗄️ - Generated by Copilot
- add conditional azurerm_storage_management_policy.main (active when data lake off)
- add should_create_data_lake_storage to terraform.tfvars.example
…tainer name and regenerate docs

- update data lake lifecycle prefix_match to include container name (datasets/raw/, datasets/converted/, datasets/reports/)
- regenerate TERRAFORM.md after rebase on upstream main
- add evaluation container for reports and evaluation outputs
- update lifecycle prefix from datasets/reports/ to evaluation/reports/
- update docs and tests for new container structure
@jjottar jjottar force-pushed the feat/adls-gen2-storage branch from 44ad760 to ebf534a Compare April 8, 2026 16:22
Copy link
Copy Markdown
Collaborator

@katriendg katriendg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you for the work on this.

@katriendg katriendg requested a review from a team April 8, 2026 19:32
@WilliamBerryiii WilliamBerryiii merged commit 3bb9012 into microsoft:main Apr 8, 2026
31 checks passed
@jjottar jjottar deleted the feat/adls-gen2-storage branch April 9, 2026 06:57
WilliamBerryiii pushed a commit that referenced this pull request Apr 9, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.7.0](v0.6.1...v0.7.0)
(2026-04-09)


### ✨ Features

* **build:** add hve-core release pipeline with dependency SBOM and
signing artifacts
([#420](#420))
([2ff839a](2ff839a))
* **build:** enforce strict warnings across all linters
([#392](#392))
([b75e217](b75e217))
* **evaluation:** add fuzz testing infrastructure and property-based
tests
([#416](#416))
([d97d42c](d97d42c))
* **infrastructure:** add optional ADLS Gen2 data lake storage account
([#398](#398))
([3bb9012](3bb9012))
* **settings:** add HVE Core extension to workspace and devcontainer
recommendations
([#226](#226))
([f0735d8](f0735d8))


### 🐛 Bug Fixes

* **docs:** fix broken links, harden Docusaurus config, and integrate CI
workflow
([#430](#430))
([ea99997](ea99997))
* **scripts:** join shellcheck version output before -match to populate
$Matches
([#432](#432))
([8768e76](8768e76))
* **scripts:** map unmapped ShellCheck severity levels and harden
version parsing
([#434](#434))
([1e95a17](1e95a17))
* **scripts:** resolve ShellCheck SC2034 and enable source-path
resolution
([#443](#443))
([04438ea](04438ea))


### 🔧 Miscellaneous

* **deps-dev:** bump basic-ftp from 5.2.0 to 5.2.1
([#429](#429))
([438660a](438660a))
* **deps:** bump cryptography from 46.0.6 to 46.0.7
([#425](#425))
([2366647](2366647))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: physical-ai-toolchain-release[bot] <267194360+physical-ai-toolchain-release[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(infra): Add dedicated ADLS Gen2 storage account

4 participants