feat(infra): add blob storage lifecycle policies and folder structure#179
Conversation
- add lifecycle management variables to platform module for raw bags, converted datasets, and reports - add azurerm_storage_management_policy resource with deletion and tiering rules - create blob path validation module with naming convention enforcement - document blob storage structure with retention policies and examples 🗂️ - Generated by Copilot
…-reference-architecture into infra/238-storage-account-folder-structure-and-lifecycle-policies
…o infra/45-storage-account-folder-structure-and-lifecycle-policies
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #179 +/- ##
==========================================
+ Coverage 38.37% 38.50% +0.12%
==========================================
Files 73 74 +1
Lines 8579 8597 +18
Branches 497 501 +4
==========================================
+ Hits 3292 3310 +18
Misses 5277 5277
Partials 10 10
*This pull request uses carry forward flags. Click here to find out more. 🚀 New features to boost your workflow:
|
…d-lifecycle-policies
- update lerobot/common/datasets URL to src/lerobot/common/datasets - collapse multi-line function calls in blob_path_validator.py - collapse multi-line assertions in test_blob_path_validator.py 🔗 - Generated by Copilot
🔗 - Generated by Copilot
…d-lifecycle-policies
Remove blob storage folder reference added by this branch to keep root README unchanged in PR #179.
…d-lifecycle-policies
|
Hey @rezatnoMsirhC — great work on this PR! The blob storage lifecycle policies and folder structure look solid, and the validation testing you did was thorough. I went ahead and resolved the merge conflict by restoring the root Merging shortly — thanks for the contribution! 🚀 |
…o infra/45-storage-account-folder-structure-and-lifecycle-policies
…o infra/45-storage-account-folder-structure-and-lifecycle-policies
🧹 - Generated by Copilot
…o infra/45-storage-account-folder-structure-and-lifecycle-policies
katriendg
left a comment
There was a problem hiding this comment.
With recent changes it may be your current version has some small inconsistencies.
The one point may be worth adding is ensuring the new test is discoverable, pyproject.toml has testpaths = ["tests", "training/tests"], but is missing the new data-pipeline/capture/tests/ folder.
…o infra/45-storage-account-folder-structure-and-lifecycle-policies
- move blob_path_validator.py from training/rl to data-management/tools - move test file to data-management/tools/tests with sys.path injection - fix docs container name isaaclab-training-logs → ml-workspace - remove duplicate and env-specific entries from .cspell/azure-services.txt - add data-management/tools/tests to root pytest testpaths ♻️ - Generated by Copilot
…o infra/45-storage-account-folder-structure-and-lifecycle-policies
- delete misplaced test_blob_path_validator.py sibling copy - add data-management/tools to pythonpath in pyproject.toml ♻️ - Generated by Copilot
…ders 📝 - Generated by Copilot
…d-lifecycle-policies
…d-lifecycle-policies
…d-lifecycle-policies
katriendg
left a comment
There was a problem hiding this comment.
Looks good, I believe the re-org of the folders is also captured. Thanks!
🤖 I have created a release *beep* *boop* --- ## [0.5.0](v0.4.0...v0.5.0) (2026-03-26) ### ✨ Features * add dataviewer web application for dataset analysis and annotation ([#375](#375)) ([c44d7bb](c44d7bb)) * add return type annotations to cli_args functions ([#476](#476)) ([35523ee](35523ee)) * add YAML config schema with pydantic validation for ROS 2 recording ([#376](#376)) ([1fa5243](1fa5243)) * **agents:** Copilot agents and skills for dataviewer and OSMO training workflows. ([#444](#444)) ([8b72daf](8b72daf)) * **build:** add automated ms.date freshness checking ([#448](#448)) ([f92ddbc](f92ddbc)) * **build:** add CLA section, Dependabot security prefix, and OWASP ZAP DAST scan ([#241](#241)) ([083a8af](083a8af)) * **build:** add coverage.py configuration to pyproject.toml ([#428](#428)) ([eac7426](eac7426)) * **build:** add Go CI pipeline with golangci-lint and go test ([#351](#351)) ([b27e4fb](b27e4fb)) * **build:** add OpenSSF Scorecard workflow and badge ([#431](#431)) ([98a62e7](98a62e7)) * **build:** add release artifact signing and SBOM attestation ([#480](#480)) ([b226e96](b226e96)) * **build:** add TFLint reusable GitHub Actions workflow ([#229](#229)) ([34d5575](34d5575)) * **build:** split Go CI into separate lint and test pipelines ([#354](#354)) ([2dec155](2dec155)) * **dataviewer:** add authentication middleware and CSRF protection for mutation endpoints ([#432](#432)) ([77c8a01](77c8a01)) * **docs:** create training documentation hub with guides and migration ([#380](#380)) ([0fdccc5](0fdccc5)) * **docs:** port Docusaurus documentation site with full build validation ([#182](#182)) ([29dd640](29dd640)) * fix and deploy dataviewer ([#498](#498)) ([c922d49](c922d49)) * **inference:** add AzureML and local LeRobot inference workflows ([#438](#438)) ([f7d786a](f7d786a)) * **inference:** add MLflow trajectory plots and multi-source support to OSMO inference workflow ([#421](#421)) ([8637458](8637458)) * **infra:** add blob storage lifecycle policies and folder structure ([#179](#179)) ([101a6e8](101a6e8)) * **infrastructure:** add optional observability and compute feature flags ([#437](#437)) ([9eba0da](9eba0da)) * **infrastructure:** add private Linux Isaac Sim VM deployment option ([#348](#348)) ([3748c2d](3748c2d)) * **infrastructure:** add terraform-docs auto-generation pipeline ([#358](#358)) ([6565caa](6565caa)) * **infrastructure:** harden Isaac Sim VM deployment with encryption and spot options ([#355](#355)) ([6ebc1f2](6ebc1f2)) * **repo:** migrate to domain-driven architecture ([#270](#270)) ([a339e70](a339e70)) * **scripts:** add --config-preview and deployment summary to submission scripts ([#499](#499)) ([4069806](4069806)) * **scripts:** add Copilot attribution footer validation to frontmatter linting ([#378](#378)) ([4d595f2](4d595f2)) * **src:** add dataviewer web application with storage adapter layer ([#404](#404)) ([8a9fb70](8a9fb70)) ### 🐛 Bug Fixes * **build:** add GHSA to cspell custom dictionary ([#315](#315)) ([67db81a](67db81a)) * **build:** correct codecov report_type input for terraform test uploads ([#324](#324)) ([d90d66d](d90d66d)) * **build:** expand CODEOWNERS coverage to critical paths ([#505](#505)) ([bafade1](bafade1)) * **build:** pin Docker base image and pip dependencies with Dependabot coverage ([#497](#497)) ([d3d7ea4](d3d7ea4)) * **build:** pin pydantic version and use uv in config schema validation workflow ([#493](#493)) ([28d823f](28d823f)) * **build:** pin uv installer to versioned URL ([#495](#495)) ([8d8541b](8d8541b)) * **build:** remediate GHSA vulnerabilities flagged by OSSF Scorecard ([#271](#271)) ([49b6e58](49b6e58)) * **build:** remove README frontmatter, add FrontmatterExcludePaths, enforce Pester 5 ([#443](#443)) ([641d0f3](641d0f3)) * **build:** resolve CI failures for release 0.5.0 PR ([#174](#174)) ([62c9900](62c9900)) * **build:** resolve codecov PR comment suppression ([#523](#523)) ([5603bd7](5603bd7)) * **build:** use npm ci for deterministic frontend dependency install ([#491](#491)) ([ee8b5d3](ee8b5d3)), closes [#490](#490) * **ci:** add `wait_for_ci` to Codecov configuration ([#183](#183)) ([370cf44](370cf44)) * **CI:** Issue 116 clean up dataviewer tests ([#184](#184)) ([f466c23](f466c23)) * **ci:** pin pydantic to ==2.12.5 across all references ([#230](#230)) ([9d841d5](9d841d5)) * **dataviewer:** add HTTP Range support for blob video streaming ([#165](#165)) ([8adde50](8adde50)) * **dataviewer:** remediate CodeQL alerts and align ruff config ([#419](#419)) ([eb6fac9](eb6fac9)) * **dataviewer:** remediate path traversal and input validation vulnerabilities ([#413](#413)) ([0a1d2ca](0a1d2ca)) * **docs:** remove trailingSlash: false for GitHub Pages compatibility ([#228](#228)) ([a78cb97](a78cb97)) * **gpu:** add GPU Operator validation dependencies to GRID driver installer ([#441](#441)) ([eec42da](eec42da)) * **infrastructure:** add zone-redundant config to VPN gateway public IP ([#352](#352)) ([2d734f4](2d734f4)) * **infrastructure:** improve stdout handling for helm commands in GPU… ([#311](#311)) ([153f467](153f467)) * **infrastructure:** resolve remaining TFLint violations in SIL module and example configs ([#298](#298)) ([c0ce3e5](c0ce3e5)) * **infrastructure:** resolve TFLint violations in root and automation modules ([#287](#287)) ([b6a4604](b6a4604)), closes [#203](#203) * **infrastructure:** update deprecated bgp vng variable name ([#307](#307)) ([f530734](f530734)) * **scripts:** pin uv version in OSMO workflow templates ([#500](#500)) ([7edf13a](7edf13a)) * **scripts:** replace lambda with def in lerobot_handler to satisfy R… ([#176](#176)) ([baf9e58](baf9e58)) * **scripts:** support OSMO control-plane deploys with in-cluster Redis ([#317](#317)) ([d4b70de](d4b70de)) * **scripts:** update compute target name derivation logic ([#319](#319)) ([bb20431](bb20431)) * **settings:** update devcontainer name to match project context ([#177](#177)) ([745321e](745321e)) * **terraform:** create PostgreSQL Key Vault secret via ARM control plane ([#304](#304)) ([5d73b81](5d73b81)) * **terraform:** gate observability with feature flags ([#303](#303)) ([ea5e056](ea5e056)) * **terraform:** switch VPN gateway defaults to AZ SKUs ([#309](#309)) ([74989c5](74989c5)) * **training:** correct learning rate mapping and pin LeRobot version ([#439](#439)) ([5cf9943](5cf9943)) * **workflows:** enable SARIF upload for dependency-pinning scans ([#502](#502)) ([124cad6](124cad6)), closes [#501](#501) * **workflows:** remove redundant top-level permissions from codeql-analysis ([#489](#489)) ([1490fda](1490fda)) * **workflows:** use bash shell for uv.lock regeneration and add SARIF to dictionary ([#225](#225)) ([e6fa6ea](e6fa6ea)) ### 📚 Documentation * add chunking and compression configuration guide for Jetson edge recording ([#408](#408)) ([787a322](787a322)) * add OpenSSF Best Practices badge to README ([#282](#282)) ([01ea384](01ea384)) * add threat model cross-reference to SECURITY.md ([#235](#235)) ([88a461e](88a461e)) * add vulnerability remediation timeline to SECURITY.md ([#233](#233)) ([5ead3ee](5ead3ee)) * **contributing:** remove version-specific planning language from ownership tip ([#407](#407)) ([3191f9b](3191f9b)) * **deploy:** replace deploy/ READMEs with pointer files ([#379](#379)) ([b3c3abb](b3c3abb)) * **docs:** add bug report response timeline for OSSF report_responses criterion ([#485](#485)) ([9b26212](9b26212)) * **docs:** add component update process for OpenSSF Silver badge ([#446](#446)) ([6adc8a2](6adc8a2)) * **docs:** Add data collection and training recipes ([#343](#343)) ([9c34f86](9c34f86)) * **docs:** add deprecation policy for external interfaces ([#445](#445)) ([229d5db](229d5db)) * **docs:** add structure for recipes in repo ([#322](#322)) ([098757b](098757b)) * **docs:** add YAML frontmatter to SUPPORT.md ([#478](#478)) ([d94c15d](d94c15d)), closes [#347](#347) * **docs:** clarify issue assignment requirement before starting work ([#299](#299)) ([1534462](1534462)) * **docs:** create inference and training docs hubs ([#402](#402)) ([7a20a2e](7a20a2e)) * **docs:** create reference hub and migrate script documentation ([#503](#503)) ([03a31c6](03a31c6)) * **docs:** create training and inference documentation hubs ([#403](#403)) ([7be003b](7be003b)) * **operations:** create operations hub and troubleshooting guide ([#525](#525)) ([31c7aaa](31c7aaa)) * **reference:** add copilot artifacts documentation hub ([#170](#170)) ([9a45ca4](9a45ca4)) * simplify root README and update prerequisites ([#440](#440)) ([c0c7710](c0c7710)) ### ♻️ Code Refactoring * **build:** align Python dependency workflows with uv ([#447](#447)) ([3102e03](3102e03)) * **docs:** rename Docusaurus site to Physical AI Toolchain ([#224](#224)) ([cfdf47a](cfdf47a)) * **infrastructure:** rename boolean variables to `should_` prefix and add missing core variables ([#292](#292)) ([4496593](4496593)) * **python:** move runtime deps to workflow pyproject manifests ([#405](#405)) ([6c5fbeb](6c5fbeb)) ### 📦 Build System * **build:** add Codecov upload to pytest workflow ([#434](#434)) ([0110c17](0110c17)) * **deps-dev:** bump the npm_and_yarn group across 2 directories with 1 update ([#325](#325)) ([59cf9e6](59cf9e6)) * **workflows:** enable coverage parameters and fix Pester test infrastructure ([#435](#435)) ([528bbde](528bbde)) ### 🔧 Miscellaneous * add gomod to cspell general-technical wordlist ([#362](#362)) ([1f93f47](1f93f47)) * **build:** add codecov.yml for unified coverage reporting ([#430](#430)) ([b0faf70](b0faf70)) * **build:** add Go toolchain devcontainer feature and Dependabot gomod ([#337](#337)) ([8a36620](8a36620)) * **deps:** bump cryptography from 45.0.7 to 46.0.5 in /src/training ([#506](#506)) ([a06434e](a06434e)) * **deps:** bump minimatch in /src/dataviewer/frontend ([#416](#416)) ([38a7607](38a7607)) * **deps:** bump pyasn1 from 0.6.2 to 0.6.3 in /training/rl ([#296](#296)) ([7b42cf5](7b42cf5)) * **deps:** bump rollup in /src/dataviewer/frontend ([#417](#417)) ([6302ce4](6302ce4)) * **deps:** bump the common-dependencies group in /src/common with 3 updates ([#507](#507)) ([db05074](db05074)) * **deps:** bump the github-actions group across 1 directory with 6 updates ([#284](#284)) ([c40eff6](c40eff6)) * **deps:** bump the github-actions group across 1 directory with 6 updates ([#433](#433)) ([2d9dd4f](2d9dd4f)) * **deps:** bump the github-actions group across 1 directory with 6 updates ([#510](#510)) ([c334a64](c334a64)) * **deps:** bump the github-actions group with 2 updates ([#163](#163)) ([f25713e](f25713e)) * **deps:** bump the inference-dependencies group in /evaluation with 3 updates ([#279](#279)) ([1d2d3dc](1d2d3dc)) * **deps:** bump the inference-dependencies group in /src/inference with 5 updates ([#508](#508)) ([2852ffb](2852ffb)) * **deps:** bump the lerobot-inference-dependencies group in /workflows/azureml with 4 updates ([#511](#511)) ([b7c5773](b7c5773)) * **deps:** bump the npm_and_yarn group across 2 directories with 1 update ([#223](#223)) ([6a261ab](6a261ab)) * **deps:** bump the training-dependencies group ([#429](#429)) ([66e43f4](66e43f4)) * **deps:** bump tornado from 6.5.4 to 6.5.5 in the uv group across 1 directory ([#172](#172)) ([d6caf29](d6caf29)) * **docs:** correct ms.date tooling and refresh stale documentation ([#349](#349)) ([ccaa1e8](ccaa1e8)) * **infrastructure:** add Go module and golangci-lint config for e2e tests ([#347](#347)) ([e0e6bbf](e0e6bbf)) * **infrastructure:** add root .terraform-docs.yml configuration ([#312](#312)) ([bb73bbb](bb73bbb)) * migrate references from Azure-Samples to microsoft/physical-ai-toolchain ([f58f0ef](f58f0ef)) * **workflows:** update Dependabot, CodeQL, CODEOWNERS, and cspell for dataviewer coverage ([#231](#231)) ([6d8c2e8](6d8c2e8)) ### 🔒 Security * **deps:** bump mlflow from 3.5.0 to 3.8.0rc0 in /training/rl ([#297](#297)) ([e9929df](e9929df)) * **deps:** bump the github-actions group across 1 directory with 4 updates ([#344](#344)) ([6826929](6826929)) * **deps:** bump the inference-dependencies group in /evaluation with 2 updates ([#339](#339)) ([6804630](6804630)) * **deps:** bump the npm_and_yarn group across 3 directories with 1 update ([#361](#361)) ([6760857](6760857)) * **deps:** bump the training-dependencies group across 1 directory with 54 updates ([#286](#286)) ([d9ae04f](d9ae04f)) * **deps:** bump the uv group across 3 directories with 1 update ([#360](#360)) ([dfbda06](dfbda06)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: physical-ai-toolchain-release[bot] <267194360+physical-ai-toolchain-release[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Bill Berry <wbery@microsoft.com>
feat(infra): add blob storage lifecycle policies and folder structure
Description
Implements Azure Blob Storage folder structure with automated lifecycle management for raw ROS bags, converted LeRobot datasets, validation reports, and model checkpoints. The PR adds a Terraform
azurerm_storage_management_policyresource with three configurable rules, a Python path validator that enforces naming conventions across all four data types, and comprehensive documentation describing the folder structure, lifecycle policies, and migration guidance for existing paths.The lifecycle policy was applied to the IAI-DEV environment and all three rules were confirmed via Azure CLI. The Python validator was exercised against test blob paths uploaded to the deployed container, with all 8 naming-convention checks passing alongside the 34-test unit suite.
Closes #45
Type of Change
Component(s) Affected
deploy/000-prerequisites- Azure subscription setupdeploy/001-iac- Terraform infrastructuredeploy/002-setup- OSMO control plane / Helmdeploy/004-workflow- Training workflowssrc/training- Python training scriptsdocs/- DocumentationTesting Performed
planreviewed (no unexpected changes)applytested in dev environmentsmoke_test_azure.py)Additional testing steps:
Unit tests:
.venv/bin/python -m pytest tests/common/test_blob_path_validator.py -v # 34/34 passedFormatting and linting:
Terraform plan and apply (IAI-DEV environment):
Lifecycle policy verification:
Blob upload and naming convention verification:
Naming convention validation (8/8 checks passed):
Documentation Impact
docs/cloud/blob-storage-structure.mdadded (296 lines): folder structure specification, naming conventions, lifecycle policy table, access tier comparison, rehydration instructions, and migration guidance for existinginference_outputs/and LeRobot paths.README.mdupdated with a blob storage folder quick-reference.Bug Fix Checklist
Complete this section for bug fix PRs. Skip for other contribution types.
Checklist