test(training,common,inference): add Hypothesis property-based tests by WilliamBerryiii · Pull Request #268 · microsoft/physical-ai-toolchain

WilliamBerryiii · 2026-03-15T03:09:58Z

Pull Request

Description

Added Hypothesis property-based testing across all three Python source packages (common, inference, training), covering configuration models, CLI argument handling, plotting utilities, robot types, Azure ML context, metrics extraction, and ANSI stream stripping. A shared test helper reduced duplicated module-loading boilerplate, and two targeted fixes addressed a production exception gap and a Windows test compatibility issue.

Closes #240

Type of Change

🐛 Bug fix (non-breaking change fixing an issue)
✨ New feature (non-breaking change adding functionality)
💥 Breaking change (fix or feature causing existing functionality to change)
📚 Documentation update
🏗️ Infrastructure change (Terraform/IaC)
♻️ Refactoring (no functional changes)

Component(s) Affected

deploy/000-prerequisites - Azure subscription setup
deploy/001-iac - Terraform infrastructure
deploy/002-setup - OSMO control plane / Helm
deploy/004-workflow - Training workflows
src/training - Python training scripts
docs/ - Documentation

Testing Performed

Terraform plan reviewed (no unexpected changes)
Terraform apply tested in dev environment
Training scripts tested locally with Isaac Sim
OSMO workflow submitted successfully
Smoke tests passed (smoke_test_azure.py)

Documentation Impact

No documentation changes needed
Documentation updated in this PR
Documentation issue filed

Bug Fix Checklist

Linked to issue being fixed
Regression test included, OR
Justification for no regression test:

Checklist

My code follows the project conventions
Commit messages follow conventional commit format
I have performed a self-review
Documentation impact assessed above
No new linting warnings introduced

Changes

Property-Based Tests (7 new files)

Seven new Hypothesis test files were added following the test_<module>_hypothesis.py naming convention, keeping property-based tests separate from existing example-based suites.

Added config model property tests in tests/common/test_config_models_hypothesis.py covering roundtrip serialization, invalid name rejection, threshold ordering, duplicate topic detection, and boundary values for TopicConfig, GpioTriggerConfig, PositionTriggerConfig, DiskThresholds, VrTriggerConfig, GapDetectionConfig, and RecordingConfig
Added CLI argument example tests in tests/common/test_cli_args.py covering add_rsl_rl_args and update_rsl_rl_cfg — argument registration, defaults, string parsing, resume flag, logger choices, config overrides, seed randomization, and project name propagation
Added plotting property tests in tests/inference/test_plotting_hypothesis.py for plot_action_deltas, plot_cumulative_positions, plot_error_heatmap, plot_summary_panel, and plot_aggregate_summary using custom composite strategies and headless matplotlib rendering
Added robot types property tests in tests/inference/test_robot_types_hypothesis.py for RobotObservation and JointPositionCommand — valid shape acceptance, invalid shape rejection via flatmap, as_absolute additivity, and timestamp preservation
Added Azure ML context property tests in tests/training/test_context_hypothesis.py using mock-injection for azure.ai.ml, azure.identity, and mlflow — tested _optional_env, upload_file, upload_checkpoint, and upload_files_batch
Added metrics extraction property tests in tests/training/test_metrics_hypothesis.py for _extract_from_value using FakeTensor and NumpyArrayLike test doubles to avoid torch/numpy dependencies, with statistical invariant assertions (min ≤ mean ≤ max)
Added ANSI stream property tests in tests/training/test_stream_hypothesis.py for AnsiStrippingStream — plain text passthrough, ANSI code stripping, carriage return normalization, and encoding delegation

Production Fix

Added RuntimeError to the exception tuple in _extract_from_value in src/training/utils/metrics.py, handling torch tensors that raise RuntimeError during .item() calls on multi-element tensors

Test Infrastructure

Extracted a shared module-loading helper and fixed a cross-platform compatibility issue.

Added tests/training/conftest.py with load_training_module helper using importlib.util to load training source modules without importing the full dependency tree
- Adopted by test_env.py and test_metrics.py, replacing duplicated boilerplate
Replaced chmod(0o444) with monkeypatch.setattr("os.access", ...) in tests/common/test_config_models.py for Windows compatibility where POSIX permission bits do not reliably restrict write access

Dependencies

Added dev dependencies in pyproject.toml: hypothesis>=6.100.0, numpy>=1.26.0,<3.0.0, matplotlib>=3.10.8
Configured Hypothesis globally: max_examples = 50, deadline = 500 ms
Added "flatmap" to .cspell/general-technical.txt and .hypothesis/ to .gitignore
Updated uv.lock with transitive dependencies: contourpy, cycler, fonttools, pillow, pyparsing, sortedcontainers

Related Issues

Closes #240

Notes

All 147 tests pass (0 failures, 3 PydanticJsonSchemaWarnings). Ruff reports no lint warnings on new or modified files.

github-actions · 2026-03-15T03:10:13Z

Dependency Review

The following issues were found:

✅ 0 vulnerable package(s)
✅ 0 package(s) with incompatible licenses
✅ 0 package(s) with invalid SPDX license definitions
⚠️ 3 package(s) with unknown licenses.

See the Details below.

License Issues

uv.lock

Package	Version	License	Issue Type
hypothesis	6.151.9	Null	Unknown License
kiwisolver	1.5.0	Null	Unknown License
numpy	2.4.3	Null	Unknown License

OpenSSF Scorecard

Package

Version

Score

Details

pip/contourpy

1.3.3

Unknown

pip/cycler

0.12.1

Unknown

pip/fonttools

4.62.1

🟢 5.8

Details

Check	Score	Reason
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Maintained	🟢 10	30 commit(s) and 18 issue activity found in the last 90 days -- score normalized to 10
Code-Review	🟢 4	Found 7/16 approved changesets -- score normalized to 4
Security-Policy	🟢 10	security policy file detected
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
License	🟢 10	license file detected
Binary-Artifacts	🟢 10	no binaries found in the repo
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
Fuzzing	⚠️ 0	project is not fuzzed
Signed-Releases	⚠️ -1	no releases found
Branch-Protection	⚠️ -1	internal error: error during branchesHandler.setup: internal error: some github tokens can't read classic branch protection rules: https://github.com/ossf/scorecard-action/blob/main/docs/authentication/fine-grained-auth-token.md
Packaging	🟢 10	packaging workflow detected
SAST	⚠️ 0	SAST tool is not run on all commits -- score normalized to 0

pip/hypothesis

6.151.9

Unknown

pip/kiwisolver

1.5.0

Unknown

pip/matplotlib

3.10.8

Unknown

pip/numpy

2.4.3

Unknown

pip/pillow

12.1.1

Unknown

pip/pyparsing

3.3.2

Unknown

pip/sortedcontainers

2.4.0

Unknown

Scanned Files

uv.lock

nguyena2

Credit-style rating:

PR Credit Score: 832/850
Equivalent grade: Excellent (approve)
Why this score: very strong coverage improvement, minimal production risk, green CI, and clean scope; slight deduction only for increased test/dependency surface area that may add future maintenance/runtime overhead.

- add hypothesis>=6.100.0 dev dependency and configuration - add property tests for config models, metrics, and robot types - add .hypothesis/ to .gitignore 🧪 - Generated by Copilot

- add hypothesis and matplotlib dev dependencies - add property-based tests for stream, plotting, metrics, context, and config models - add example-based tests for cli_args argument parsing - fix RuntimeError handling in metrics exception handler - fix test_output_dir_must_be_writable for Windows compatibility 🧪 - Generated by Copilot

- add cspell disable for CSI terminal byte characters in stream test - change 'normalised' to 'normalized' for en-US dictionary - apply ruff format to test_cli_args, test_plotting_hypothesis, test-lerobot-inference 🔧 - Generated by Copilot

- correct conftest.py _SRC base path from src/ to repo root - update inference test imports to evaluation.metrics and evaluation.sil - fix ruff I001 import sorting violations in capture tests 🔧 - Generated by Copilot

codecov-commenter · 2026-03-17T04:37:13Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 9.79%. Comparing base (083a8af) to head (512d55d).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

@@          Coverage Diff          @@
##            main    #268   +/-   ##
=====================================
  Coverage   9.79%   9.79%           
=====================================
  Files         29      29           
  Lines       3881    3881           
  Branches     497     497           
=====================================
  Hits         380     380           
  Misses      3491    3491           
  Partials      10      10

Flag	Coverage Δ		*Carryforward flag
pester	`79.87% <ø> (ø)`
pytest	`6.89% <ø> (ø)`		Carriedforward from b309580

*This pull request uses carry forward flags. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

WilliamBerryiii requested a review from a team as a code owner March 15, 2026 03:09

nguyena2 approved these changes Mar 16, 2026

View reviewed changes

rezatnoMsirhC approved these changes Mar 16, 2026

View reviewed changes

WilliamBerryiii added 3 commits March 16, 2026 20:06

test: add Hypothesis property-based tests for Python components

31f3e37

- add hypothesis>=6.100.0 dev dependency and configuration - add property tests for config models, metrics, and robot types - add .hypothesis/ to .gitignore 🧪 - Generated by Copilot

WilliamBerryiii force-pushed the feature/hypothesis-property-tests branch from 9c27f1e to b309580 Compare March 17, 2026 03:09

fix: update import paths for domain-driven architecture migration

512d55d

- correct conftest.py _SRC base path from src/ to repo root - update inference test imports to evaluation.metrics and evaluation.sil - fix ruff I001 import sorting violations in capture tests 🔧 - Generated by Copilot

WilliamBerryiii merged commit ec47615 into main Mar 17, 2026
23 checks passed

WilliamBerryiii deleted the feature/hypothesis-property-tests branch March 17, 2026 03:39

github-actions Bot mentioned this pull request May 1, 2026

chore(deps): bump the dataviewer-dependencies group across 1 directory with 3 updates #601

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(training,common,inference): add Hypothesis property-based tests#268

test(training,common,inference): add Hypothesis property-based tests#268
WilliamBerryiii merged 4 commits into
mainfrom
feature/hypothesis-property-tests

WilliamBerryiii commented Mar 15, 2026

Uh oh!

github-actions Bot commented Mar 15, 2026 •

edited

Loading

Uh oh!

nguyena2 left a comment

Uh oh!

Uh oh!

codecov-commenter commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

WilliamBerryiii commented Mar 15, 2026

Pull Request

Description

Type of Change

Component(s) Affected

Testing Performed

Documentation Impact

Bug Fix Checklist

Checklist

Changes

Property-Based Tests (7 new files)

Production Fix

Test Infrastructure

Dependencies

Related Issues

Notes

Uh oh!

github-actions Bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

License Issues

uv.lock

OpenSSF Scorecard

Scanned Files

Uh oh!

nguyena2 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov-commenter commented Mar 17, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions Bot commented Mar 15, 2026 •

edited

Loading