fix(infrastructure): OSMO workflow execution, PostgreSQL public access, and quickstart corrections#477
Merged
Merged
Conversation
- Refactor tables in README for better readability - Update blob storage structure documentation for consistency - Revise quickstart guide to reflect new steps and configurations - Enhance troubleshooting documentation with detailed resolutions - Improve clarity in operational guides and service configurations Signed-off-by: Marcel Bindseil <marcelbindseil@gmail.com>
Contributor
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Snapshot WarningsEnsure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice. Scanned FilesNone |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #477 +/- ##
=======================================
Coverage 65.18% 65.18%
=======================================
Files 253 253
Lines 15541 15541
Branches 2118 2118
=======================================
Hits 10130 10130
Misses 5122 5122
Partials 289 289
🚀 New features to boost your workflow:
|
katriendg
approved these changes
Apr 15, 2026
Collaborator
katriendg
left a comment
There was a problem hiding this comment.
Thanks, looks like everything is there for an initial fix!
liupeirong
approved these changes
Apr 15, 2026
nguyena2
approved these changes
Apr 15, 2026
Signed-off-by: Marcel Bindseil <marcelbindseil@gmail.com>
18 tasks
28 tasks
WilliamBerryiii
pushed a commit
that referenced
this pull request
May 8, 2026
🤖 I have created a release *beep* *boop* --- ## [0.8.0](v0.7.4...v0.8.0) (2026-05-08) ### ⚠ BREAKING CHANGES * **dataviewer:** bump frontend stack to React 19, Vite 8, Tailwind v4, MSAL 5, ESLint 10 ([#524](#524)) ### ✨ Features * **agents:** add automated validation for high-risk Dependabot bumps ([#574](#574)) ([8c3686a](8c3686a)), closes [#573](#573) * **data:** add camera selector to annotation workspace and fix AV1 frame extraction ([#591](#591)) ([c809d2f](c809d2f)) * **data:** seed dataviewer frontend test foundation and per-section codecov flags ([#594](#594)) ([c06c4e3](c06c4e3)) * **dataviewer:** add OWASP security middleware stack ([#439](#439)) ([239edb9](239edb9)) * **infrastructure:** add conversion pipeline Terraform module ([#542](#542)) ([244531e](244531e)) * **infrastructure:** upgrade OSMO to chart 1.2.1 / image 6.2 with secure auth and skrl 2.0.0 compatibility ([#492](#492)) ([edfd7a5](edfd7a5)) * **pipeline:** add ACSA setup for ROS2 bag sync to Blob ([#451](#451)) ([c271a54](c271a54)) * **workflows:** add advisory Dependabot PR reviewer agentic workflow ([#498](#498)) ([d4bb140](d4bb140)) * **workflows:** trigger AW Dependabot PR reviewer after PR Validation ([#580](#580)) ([7ab3d16](7ab3d16)) ### 🐛 Bug Fixes * **ci:** correct stale version comment for actions/create-github-app-token ([#506](#506)) ([b2e9a54](b2e9a54)) * **ci:** restore data-pipeline and training broken tests by domain folder restructure ([#547](#547)) ([06d8472](06d8472)) * **docs:** update remaining stale 'Coming soon' labels in docs/README.md ([#507](#507)) ([02439d6](02439d6)) * **docs:** update stale coming soon label for Training section ([#472](#472)) ([46db49b](46db49b)) * **evaluation:** scope SIL AzureML validation code path and script reference ([#387](#387)) ([9f138a9](9f138a9)) * **infrastructure:** OSMO workflow execution, PostgreSQL public access, and quickstart corrections ([#477](#477)) ([9ed2da6](9ed2da6)) * **scripts:** exclude CHANGELOG.md from changed-files msdate check ([#644](#644)) ([8133bdc](8133bdc)) * **workflows:** allow dependabot[bot] to activate AW Dependabot PR Review ([#586](#586)) ([39dc022](39dc022)) * **workflows:** correct branches filter on AW Dependabot PR Review workflow_run trigger ([#584](#584)) ([fe06b52](fe06b52)) * **workflows:** normalize validate.yaml placeholder env/compute values ([#510](#510)) ([340ff44](340ff44)) * **workflows:** recompile aw-dependabot-pr-review lock file ([#576](#576)) ([d77c167](d77c167)) * **workflows:** switch AW Dependabot PR Review to pull_request_target ([#589](#589)) ([3f1edd1](3f1edd1)) ### 📚 Documentation * **docs:** Fix deployment guide links ([#614](#614)) ([0070b04](0070b04)) * document dependency-pinning-artifacts directory purpose ([#508](#508)) ([50e0010](50e0010)) ### 📦 Build System * **training:** standardize on Python 3.12 across manifests, containers, and runtime scripts ([#541](#541)) ([7ad014a](7ad014a)) ### 🔧 Operations * **build:** add Copilot cloud agent setup-steps workflow ([#593](#593)) ([c912668](c912668)) ### 🔧 Miscellaneous * **build:** exclude auto-generated CHANGELOG.md from cspell and seed dictionary ([#582](#582)) ([de1dd57](de1dd57)) * **build:** redesign codecov flags and split pytest CI per component ([#520](#520)) ([357e745](357e745)) * **dataviewer:** bump frontend stack to React 19, Vite 8, Tailwind v4, MSAL 5, ESLint 10 ([#524](#524)) ([50f8ad4](50f8ad4)) * **dataviewer:** repoint stale src/dataviewer references to data-management/viewer ([#504](#504)) ([88fa1b4](88fa1b4)), closes [#503](#503) * **deps-dev:** bump basic-ftp from 5.3.0 to 5.3.1 ([#618](#618)) ([ca10f2a](ca10f2a)) * **deps-dev:** bump globals from 15.15.0 to 17.5.0 in /data-management/viewer/frontend ([#527](#527)) ([0e0b2ae](0e0b2ae)) * **deps-dev:** bump ip-address from 10.1.0 to 10.2.0 ([#616](#616)) ([816c9cf](816c9cf)) * **deps-dev:** bump lint-staged from 16.4.0 to 17.0.2 in the root-npm-dependencies group across 1 directory ([#626](#626)) ([0e2f293](0e2f293)) * **deps-dev:** bump pydantic from 2.13.3 to 2.13.4 in the python-dependencies group across 1 directory ([#629](#629)) ([c24f1c1](c24f1c1)) * **deps-dev:** bump the python-dependencies group across 1 directory with 2 updates ([#514](#514)) ([8410f4b](8410f4b)) * **deps:** bump azure-core from 1.39.0 to 1.40.0 in /evaluation in the inference-dependencies group across 1 directory ([#597](#597)) ([6141db4](6141db4)) * **deps:** bump cryptography from 46.0.6 to 46.0.7 in /data-management/viewer ([#424](#424)) ([5fb6d58](5fb6d58)) * **deps:** bump cryptography from 46.0.6 to 46.0.7 in /data-management/viewer/backend ([#423](#423)) ([b516ad5](b516ad5)) * **deps:** bump lucide-react from 0.469.0 to 1.8.0 in /data-management/viewer/frontend ([#528](#528)) ([1bdfc1e](1bdfc1e)) * **deps:** bump nginx from `8aa63af` to `5616878` in /data-management/viewer/frontend ([#511](#511)) ([9e7e20e](9e7e20e)) * **deps:** bump nginx from 1.27-alpine to 1.29-alpine in /data-management/viewer/frontend ([#484](#484)) ([0e5c3dd](0e5c3dd)) * **deps:** bump node from `435f353` to `e49fd70` in /data-management/viewer/frontend ([#560](#560)) ([2884649](2884649)) * **deps:** bump react-is from 18.3.1 to 19.2.5 in /data-management/viewer/frontend ([#530](#530)) ([d51318c](d51318c)) * **deps:** bump tensordict from 0.11.0 to 0.12.1 in /evaluation in the inference-dependencies group across 1 directory ([#456](#456)) ([b24e733](b24e733)) * **deps:** bump the dataviewer-backend-dependencies group across 1 directory with 2 updates ([#531](#531)) ([171a1da](171a1da)) * **deps:** bump the dataviewer-backend-dependencies group across 1 directory with 5 updates ([#516](#516)) ([4f9a577](4f9a577)) * **deps:** bump the dataviewer-backend-dependencies group across 1 directory with 5 updates ([#602](#602)) ([6c27ab5](6c27ab5)) * **deps:** bump the dataviewer-dependencies group across 1 directory with 2 updates ([#529](#529)) ([8646971](8646971)) * **deps:** bump the dataviewer-dependencies group across 1 directory with 3 updates ([#601](#601)) ([d28fb50](d28fb50)) * **deps:** bump the dataviewer-dependencies group across 1 directory with 3 updates ([#632](#632)) ([4ca5f3e](4ca5f3e)) * **deps:** bump the dataviewer-dependencies group across 1 directory with 5 updates ([#515](#515)) ([109ee81](109ee81)) * **deps:** bump the dataviewer-frontend-patch-minor group across 1 directory with 6 updates ([#630](#630)) ([04d5dfd](04d5dfd)) * **deps:** bump the dataviewer-frontend-patch-minor group across 1 directory with 9 updates ([#563](#563)) ([c08f450](c08f450)) * **deps:** bump the docusaurus-dependencies group across 1 directory with 4 updates ([#627](#627)) ([f5825fc](f5825fc)) * **deps:** bump the docusaurus-dependencies group across 1 directory with 6 updates ([#599](#599)) ([b859344](b859344)) * **deps:** bump the github-actions group across 1 directory with 4 updates ([#459](#459)) ([2609c52](2609c52)) * **deps:** bump the github-actions group across 1 directory with 4 updates ([#517](#517)) ([f54bf5d](f54bf5d)) * **deps:** bump the inference-dependencies group across 1 directory with 11 updates ([#562](#562)) ([087f53a](087f53a)) * **deps:** bump the inference-dependencies group across 1 directory with 2 updates ([#628](#628)) ([4a3be47](4a3be47)) * **deps:** bump the pip group across 2 directories with 1 update ([#494](#494)) ([a14b6b0](a14b6b0)) * **docs:** update stale Python 3.11 references to 3.12 ([#575](#575)) ([6f85c95](6f85c95)) * **scripts:** remove redundant SC1091 disables in OSMO deploy scripts ([#509](#509)) ([ae1cb82](ae1cb82)) ### 🔒 Security * **build:** pin dependencies and hash-verify downloads ([#465](#465)) ([0289f49](0289f49)) * **build:** remediate dependency security advisories ([#479](#479)) ([7196d6d](7196d6d)) * **deps-dev:** bump basic-ftp from 5.2.1 to 5.2.2 ([#454](#454)) ([cb158f1](cb158f1)) * **deps-dev:** bump basic-ftp from 5.2.2 to 5.3.0 ([#495](#495)) ([e983b8b](e983b8b)) * **deps-dev:** bump hypothesis from 6.152.3 to 6.152.4 in the python-dependencies group ([#598](#598)) ([83384d2](83384d2)) * **deps-dev:** bump markdownlint-cli2 from 0.22.0 to 0.22.1 in the root-npm-dependencies group ([#559](#559)) ([32bde35](32bde35)) * **deps-dev:** bump picomatch from 2.3.1 to 2.3.2 in /docs/docusaurus ([#455](#455)) ([66f86ca](66f86ca)) * **deps-dev:** bump postcss from 8.5.10 to 8.5.12 in /data-management/viewer/frontend ([#569](#569)) ([a652dba](a652dba)) * **deps-dev:** bump the python-dependencies group with 2 updates ([#457](#457)) ([749d231](749d231)) * **deps-dev:** bump the python-dependencies group with 2 updates ([#485](#485)) ([71b44fd](71b44fd)) * **deps-dev:** bump the python-dependencies group with 3 updates ([#564](#564)) ([9fc52fd](9fc52fd)) * **deps-dev:** bump typescript from 6.0.2 to 6.0.3 in /docs/docusaurus in the docusaurus-dependencies group ([#513](#513)) ([5694dbc](5694dbc)) * **deps:** bump azureml/openmpi4.1.0-ubuntu22.04 from 20260303.v5 to 20260409.v4 in /evaluation/sil/docker ([#480](#480)) ([25d4df8](25d4df8)) * **deps:** bump cryptography from 46.0.6 to 46.0.7 in /evaluation in the uv group across 1 directory ([#538](#538)) ([92c5b2e](92c5b2e)) * **deps:** bump diffusers from 0.35.2 to 0.38.0 in /training/il/lerobot ([#638](#638)) ([6261d19](6261d19)) * **deps:** bump follow-redirects from 1.15.11 to 1.16.0 in /docs/docusaurus ([#469](#469)) ([0458908](0458908)) * **deps:** bump gitpython and mako for lerobot IL training ([#623](#623)) ([9f8022b](9f8022b)) * **deps:** bump node from 24.14.1-slim to 25.9.0-slim in /data-management/viewer/frontend ([#482](#482)) ([1532d09](1532d09)) * **deps:** bump packaging from 26.0 to 26.1 in /evaluation in the inference-dependencies group ([#483](#483)) ([f4afb6c](f4afb6c)) * **deps:** bump pillow from 12.1.1 to 12.2.0 ([#467](#467)) ([39fb663](39fb663)) * **deps:** bump python from 3.11-slim to 3.14-slim in /data-management/viewer/backend ([#481](#481)) ([7af9dfc](7af9dfc)) * **deps:** bump the dataviewer-backend-dependencies group across 1 directory with 15 updates ([#428](#428)) ([e4446a2](e4446a2)) * **deps:** bump the dataviewer-backend-dependencies group in /data-management/viewer/backend with 4 updates ([#487](#487)) ([0f57c5b](0f57c5b)) * **deps:** bump the dataviewer-backend-dependencies group in /data-management/viewer/backend with 8 updates ([#566](#566)) ([d6e7869](d6e7869)) * **deps:** bump the dataviewer-dependencies group across 1 directory with 5 updates ([#464](#464)) ([24c208d](24c208d)) * **deps:** bump the dataviewer-dependencies group in /data-management/viewer with 2 updates ([#486](#486)) ([90149f3](90149f3)) * **deps:** bump the dataviewer-dependencies group in /data-management/viewer with 6 updates ([#565](#565)) ([f0bb36b](f0bb36b)) * **deps:** bump the dataviewer-frontend-patch-minor group across 1 directory with 10 updates ([#613](#613)) ([e481f83](e481f83)) * **deps:** bump the github-actions group across 1 directory with 4 updates ([#534](#534)) ([5478ab6](5478ab6)) * **deps:** bump the github-actions group with 2 updates ([#488](#488)) ([4e6ce98](4e6ce98)) * **deps:** bump the github-actions group with 3 updates ([#567](#567)) ([48c38dc](48c38dc)) * **deps:** bump the github-actions group with 3 updates ([#634](#634)) ([00cfb49](00cfb49)) * **deps:** bump the github-actions group with 6 updates ([#603](#603)) ([73eb79a](73eb79a)) * **deps:** bump the training-dependencies group across 1 directory with 23 updates ([#463](#463)) ([d5a8656](d5a8656)) * **deps:** bump yaml from 2.8.2 to 2.8.3 in /data-management/viewer/frontend ([#453](#453)) ([10449df](10449df)) * pytest harness, dependabot advisories, and OSSF Scorecard remediations ([#501](#501)) ([e8756e8](e8756e8)) * **scripts:** pin and hash-verify all shell script downloads ([#468](#468)) ([0c2bb9c](0c2bb9c)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: physical-ai-toolchain-release[bot] <267194360+physical-ai-toolchain-release[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixed end-to-end OSMO workflow execution by resolving the empty
service_base_urlthat prevented osmo-ctrl sidecars from connecting to the logger WebSocket. Added PostgreSQL public network access support driven byshould_enable_public_network_access, and corrected the quickstart guide with accurate variables, paths, namespaces, and deployment steps validated against a live cluster.Closes #476
Closes #470
Type of Change
Component(s) Affected
infrastructure/terraform/prerequisites/- Azure subscription setupinfrastructure/terraform/- Terraform infrastructureinfrastructure/setup/- OSMO control plane / Helmworkflows/- Training and evaluation workflowstraining/- Training pipelines and scriptsdocs/- DocumentationTesting Performed
planreviewed (no unexpected changes)applytested in dev environmentsmoke_test_azure.py)Validated against live cluster
aks-osmo-dev-001with OSMO v6.0.0 inwestus2. Workflowhello-osmo-7completed successfully after theservice_base_urlfix. PostgreSQL connectivity verified from AKS pods with public access enabled. All Terraform, shellcheck, and markdownlint validations pass.Documentation Impact
Bug Fix Checklist
terraform validate. Quickstart corrections verified by cross-referencing against actual source files.Description
OSMO Service URL and Workflow Execution
detect_ingress_base_url()to scripts/lib/common.sh — returns the AzureML nginx ingress controller in-cluster FQDN (http://azureml-ingress-nginx-controller.azureml.svc.cluster.local) for osmo-ctrl sidecar routingSERVICE_BASE_URLfrom the ingress FQDN instead of the external LB IP, ensuring workflow pods route/api/logger,/api/auth, and/apipaths correctlyvalidate_service_url_reachable()to scripts/lib/common.sh — probes the service URL with a 5-second timeout and prints actionablekubectl port-forwardguidance when unreachablevalidate_service_url_reachableinto both 03-deploy-osmo-control-plane.sh and 04-deploy-osmo-backend.shcluster_service_urlfrom CLI URL in 04-deploy-osmo-backend.sh so--service-url http://localhost:8080works for port-forwarded CLI calls while Helm uses the in-clusteragent_service_urlfor podsPostgreSQL Public Network Access
public_network_access_enabledin infrastructure/terraform/modules/platform/postgresql.tf from hardcodedfalsetovar.should_enable_public_network_access, aligning with other resources (Storage, Redis, Key Vault, ACR)AllowAzureServicesfirewall rule (0.0.0.0/0.0.0.0) when public access is enabled, allowing AKS pods to reach PostgreSQL without private endpointsaks_egressfirewall rule scoped to the NAT Gateway public IP when!pe_enabled && nat_gw, for tighter production configurationsshould_enable_public_network_accessbehavior in both terraform.tfvars and terraform.tfvars.exampleQuickstart Guide Corrections
Rewrote docs/getting-started/quickstart.md with corrections validated against the actual codebase:
project_name,enable_*) with actual inputs (resource_prefix,should_*)-raw resource_group_nameto-json resource_group | jq -r '.name'scripts/submit-osmo-training.shtotraining/rl/scripts/submit-osmo-training.shosmo-control-planetoosmo-workflowsStandard_NV36ads_A10_v5(A10 Spot)--config-previewdry-run commandsresource_prefixnaming constraints warning and Helm cleanup beforeterraform destroyTroubleshooting and Operations Docs
service_base_url) and "UI shows server IP not found" (FQDN vs browser resolution)10.0.5.7to10.0.5.6in docs/infrastructure/cluster-setup-advanced.md and fixed the service name reference[!IMPORTANT]callout explainingservice_base_urlVPN vs non-VPN behaviorTable Formatting
Ran
npm run format:tablesacross 6 documentation files — column-width normalization only, no content changes.Related Issues
Closes #476
Closes #470
Subsumes PR #471 by @katriendg
Notes
AllowAzureServicesPostgreSQL firewall rule is intentionally broad for development scenarios. For production, disable it and rely on the NAT Gateway egress rule or private endpoints.6.0.0image tag and in-cluster agent URL. These are overridden by Helm--set-stringflags at deploy time.Checklist