Skip to content

test(training,common,inference): add Hypothesis property-based tests#268

Merged
WilliamBerryiii merged 4 commits into
mainfrom
feature/hypothesis-property-tests
Mar 17, 2026
Merged

test(training,common,inference): add Hypothesis property-based tests#268
WilliamBerryiii merged 4 commits into
mainfrom
feature/hypothesis-property-tests

Conversation

@WilliamBerryiii
Copy link
Copy Markdown
Member

Pull Request

Description

Added Hypothesis property-based testing across all three Python source packages (common, inference, training), covering configuration models, CLI argument handling, plotting utilities, robot types, Azure ML context, metrics extraction, and ANSI stream stripping. A shared test helper reduced duplicated module-loading boilerplate, and two targeted fixes addressed a production exception gap and a Windows test compatibility issue.

Closes #240

Type of Change

  • 🐛 Bug fix (non-breaking change fixing an issue)
  • ✨ New feature (non-breaking change adding functionality)
  • 💥 Breaking change (fix or feature causing existing functionality to change)
  • 📚 Documentation update
  • 🏗️ Infrastructure change (Terraform/IaC)
  • ♻️ Refactoring (no functional changes)

Component(s) Affected

  • deploy/000-prerequisites - Azure subscription setup
  • deploy/001-iac - Terraform infrastructure
  • deploy/002-setup - OSMO control plane / Helm
  • deploy/004-workflow - Training workflows
  • src/training - Python training scripts
  • docs/ - Documentation

Testing Performed

  • Terraform plan reviewed (no unexpected changes)
  • Terraform apply tested in dev environment
  • Training scripts tested locally with Isaac Sim
  • OSMO workflow submitted successfully
  • Smoke tests passed (smoke_test_azure.py)

Documentation Impact

  • No documentation changes needed
  • Documentation updated in this PR
  • Documentation issue filed

Bug Fix Checklist

  • Linked to issue being fixed
  • Regression test included, OR
  • Justification for no regression test:

Checklist


Changes

Property-Based Tests (7 new files)

Seven new Hypothesis test files were added following the test_<module>_hypothesis.py naming convention, keeping property-based tests separate from existing example-based suites.

  • Added config model property tests in tests/common/test_config_models_hypothesis.py covering roundtrip serialization, invalid name rejection, threshold ordering, duplicate topic detection, and boundary values for TopicConfig, GpioTriggerConfig, PositionTriggerConfig, DiskThresholds, VrTriggerConfig, GapDetectionConfig, and RecordingConfig
  • Added CLI argument example tests in tests/common/test_cli_args.py covering add_rsl_rl_args and update_rsl_rl_cfg — argument registration, defaults, string parsing, resume flag, logger choices, config overrides, seed randomization, and project name propagation
  • Added plotting property tests in tests/inference/test_plotting_hypothesis.py for plot_action_deltas, plot_cumulative_positions, plot_error_heatmap, plot_summary_panel, and plot_aggregate_summary using custom composite strategies and headless matplotlib rendering
  • Added robot types property tests in tests/inference/test_robot_types_hypothesis.py for RobotObservation and JointPositionCommand — valid shape acceptance, invalid shape rejection via flatmap, as_absolute additivity, and timestamp preservation
  • Added Azure ML context property tests in tests/training/test_context_hypothesis.py using mock-injection for azure.ai.ml, azure.identity, and mlflow — tested _optional_env, upload_file, upload_checkpoint, and upload_files_batch
  • Added metrics extraction property tests in tests/training/test_metrics_hypothesis.py for _extract_from_value using FakeTensor and NumpyArrayLike test doubles to avoid torch/numpy dependencies, with statistical invariant assertions (min ≤ mean ≤ max)
  • Added ANSI stream property tests in tests/training/test_stream_hypothesis.py for AnsiStrippingStream — plain text passthrough, ANSI code stripping, carriage return normalization, and encoding delegation

Production Fix

  • Added RuntimeError to the exception tuple in _extract_from_value in src/training/utils/metrics.py, handling torch tensors that raise RuntimeError during .item() calls on multi-element tensors

Test Infrastructure

Extracted a shared module-loading helper and fixed a cross-platform compatibility issue.

  • Added tests/training/conftest.py with load_training_module helper using importlib.util to load training source modules without importing the full dependency tree
    • Adopted by test_env.py and test_metrics.py, replacing duplicated boilerplate
  • Replaced chmod(0o444) with monkeypatch.setattr("os.access", ...) in tests/common/test_config_models.py for Windows compatibility where POSIX permission bits do not reliably restrict write access

Dependencies

  • Added dev dependencies in pyproject.toml: hypothesis>=6.100.0, numpy>=1.26.0,<3.0.0, matplotlib>=3.10.8
  • Configured Hypothesis globally: max_examples = 50, deadline = 500 ms
  • Added "flatmap" to .cspell/general-technical.txt and .hypothesis/ to .gitignore
  • Updated uv.lock with transitive dependencies: contourpy, cycler, fonttools, pillow, pyparsing, sortedcontainers

Related Issues

Closes #240

Notes

All 147 tests pass (0 failures, 3 PydanticJsonSchemaWarnings). Ruff reports no lint warnings on new or modified files.

@WilliamBerryiii WilliamBerryiii requested a review from a team as a code owner March 15, 2026 03:09
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 15, 2026

Dependency Review

The following issues were found:
  • ✅ 0 vulnerable package(s)
  • ✅ 0 package(s) with incompatible licenses
  • ✅ 0 package(s) with invalid SPDX license definitions
  • ⚠️ 3 package(s) with unknown licenses.
See the Details below.

License Issues

uv.lock

PackageVersionLicenseIssue Type
hypothesis6.151.9NullUnknown License
kiwisolver1.5.0NullUnknown License
numpy2.4.3NullUnknown License

OpenSSF Scorecard

PackageVersionScoreDetails
pip/contourpy 1.3.3 UnknownUnknown
pip/cycler 0.12.1 UnknownUnknown
pip/fonttools 4.62.1 🟢 5.8
Details
CheckScoreReason
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Maintained🟢 1030 commit(s) and 18 issue activity found in the last 90 days -- score normalized to 10
Code-Review🟢 4Found 7/16 approved changesets -- score normalized to 4
Security-Policy🟢 10security policy file detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Binary-Artifacts🟢 10no binaries found in the repo
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Fuzzing⚠️ 0project is not fuzzed
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: some github tokens can't read classic branch protection rules: https://github.com/ossf/scorecard-action/blob/main/docs/authentication/fine-grained-auth-token.md
Packaging🟢 10packaging workflow detected
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
pip/hypothesis 6.151.9 UnknownUnknown
pip/kiwisolver 1.5.0 UnknownUnknown
pip/matplotlib 3.10.8 UnknownUnknown
pip/numpy 2.4.3 UnknownUnknown
pip/pillow 12.1.1 UnknownUnknown
pip/pyparsing 3.3.2 UnknownUnknown
pip/sortedcontainers 2.4.0 UnknownUnknown

Scanned Files

  • uv.lock

Copy link
Copy Markdown
Contributor

@nguyena2 nguyena2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Credit-style rating:

PR Credit Score: 832/850
Equivalent grade: Excellent (approve)
Why this score: very strong coverage improvement, minimal production risk, green CI, and clean scope; slight deduction only for increased test/dependency surface area that may add future maintenance/runtime overhead.

- add hypothesis>=6.100.0 dev dependency and configuration
- add property tests for config models, metrics, and robot types
- add .hypothesis/ to .gitignore

🧪 - Generated by Copilot
- add hypothesis and matplotlib dev dependencies
- add property-based tests for stream, plotting, metrics, context, and config models
- add example-based tests for cli_args argument parsing
- fix RuntimeError handling in metrics exception handler
- fix test_output_dir_must_be_writable for Windows compatibility

🧪 - Generated by Copilot
- add cspell disable for CSI terminal byte characters in stream test
- change 'normalised' to 'normalized' for en-US dictionary
- apply ruff format to test_cli_args, test_plotting_hypothesis, test-lerobot-inference

🔧 - Generated by Copilot
@WilliamBerryiii WilliamBerryiii force-pushed the feature/hypothesis-property-tests branch from 9c27f1e to b309580 Compare March 17, 2026 03:09
- correct conftest.py _SRC base path from src/ to repo root
- update inference test imports to evaluation.metrics and evaluation.sil
- fix ruff I001 import sorting violations in capture tests

🔧 - Generated by Copilot
@WilliamBerryiii WilliamBerryiii merged commit ec47615 into main Mar 17, 2026
23 checks passed
@WilliamBerryiii WilliamBerryiii deleted the feature/hypothesis-property-tests branch March 17, 2026 03:39
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 9.79%. Comparing base (083a8af) to head (512d55d).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@          Coverage Diff          @@
##            main    #268   +/-   ##
=====================================
  Coverage   9.79%   9.79%           
=====================================
  Files         29      29           
  Lines       3881    3881           
  Branches     497     497           
=====================================
  Hits         380     380           
  Misses      3491    3491           
  Partials      10      10           
Flag Coverage Δ *Carryforward flag
pester 79.87% <ø> (ø)
pytest 6.89% <ø> (ø) Carriedforward from b309580

*This pull request uses carry forward flags. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(testing): add Hypothesis property-based tests for Python components

4 participants