test: enforce 80% pytest coverage gate across all 6 Python domains (#441) by WilliamBerryiii · Pull Request #590 · microsoft/physical-ai-toolchain

WilliamBerryiii · 2026-04-29T00:01:37Z

Summary

Brings the entire repository to ≥80% pytest coverage across all six Python domains and enforces that floor in CI. Closes #441.

Originally scoped to the dataviewer backend, the work expanded after audit to cover every Python domain in the repo. Each domain now has empirically verified coverage well above the 80% gate, and every main pytest workflow now fails the build when coverage drops below 80%.

Per-Domain Coverage (verified locally)

Domain	Coverage	Tests	CI workflow
`data-pipeline`	100.00%	71 passed	`.github/workflows/pytest-data-pipeline.yml`
`data-management/tools`	100.00%	34 passed	`.github/workflows/pytest-dm-tools.yml`
`evaluation` (`sil` + `metrics`)	99.74%	217 passed	`.github/workflows/evaluation-pytests.yml`
`fleet-deployment/inference`	99.65%	29 passed	`.github/workflows/pytest-inference.yml`
`training`	96.38%	449 passed	`.github/workflows/pytest-training.yml`
`data-management/viewer` (back)	94.62%	(existing)	`.github/workflows/dataviewer-backend-pytests.yml`

CI Gates Enforced

All six main pytest workflows now pass --cov-fail-under=80:

.github/workflows/pytest-training.yml
.github/workflows/pytest-inference.yml
.github/workflows/pytest-dm-tools.yml
.github/workflows/pytest-data-pipeline.yml
.github/workflows/dataviewer-backend-pytests.yml
.github/workflows/evaluation-pytests.yml (gated via evaluation/pyproject.toml addopts: --cov=sil --cov=metrics --cov-fail-under=80)

Notable Changes

New / expanded test suites across training/tests/ (SKRL, RSL-RL, Azure smoke), fleet-deployment/inference/tests/ (ACT inference node, plotting + robot-types Hypothesis suites), evaluation/ (sil + metrics), data-pipeline/, data-management/tools/, and the dataviewer backend.
Hypothesis stability: deadline=None applied where flakes were observed.
Inference imports repaired via importlib.util to support test discovery.
Python/uv versions bumped in workflows where required to satisfy domain interpreters.
data-pipeline/uv.lock added for reproducible builds (matches existing evaluation/uv.lock convention).

Commit Composition (59 commits)

Type	Count
`test`	48
`ci`	4
`fix`	3
`chore`	2
`feat`	1
`ops`	1

Files Changed

108 files changed, 15,464 insertions(+), 1,497 deletions(-).

Distribution by top-level directory:

Area	Files
`data-management`	52
`.github`	16
`training`	15
`docs`	7
`data-pipeline`	4
`evaluation`	4
`fleet-deployment`	2
Root / config	8

Validation

Each domain was run locally with --cov-fail-under=80; all passed. CI workflows on this branch will re-validate on push.

Deferred / Follow-Up

.github/workflows/fuzz-regression-tests.yml collects coverage but does not yet enforce --cov-fail-under=80 (separate concern; tracked for follow-up).
logs/ cleanup of stale local coverage artifacts.

- share make_asgi_request helper in tests/conftest.py (IV-003) - replace bare-expression assertion in test_csrf (IV-004) - drive provider selection via public require_auth (IV-002) - assert AnnotationService construction via behavior round-trip (IV-001) - replace datetime.utcnow with timezone-aware now (IV-005) 🧪 - Generated by Copilot

- replace datetime.utcnow() with datetime.now(UTC) in tests/api/test_annotations.py - skip HDF5 video test when neither ffmpeg nor cv2 available - tighten hypothesis strategy in test_valid_nested_ids_accepted 🧪 - Generated by Copilot

…dataset - add 43 tests covering scan, sync, video resolve, hdf5 helpers, close - raise blob_dataset.py coverage from baseline to 89% 🧪 - Generated by Copilot

- add 13 tests covering _extract_features, cluster, _simple_clustering, dataclasses - raise clustering.py coverage from 20.45% to 98.48% 🧪 - Generated by Copilot

- cover _load_info defaults, _find_episode_location, list_episodes_with_meta - cover load_episode qpos/qvel aliases and zero-fill paths - raise lerobot_loader.py coverage from 22.77% to 90.43% 🧪 - Generated by Copilot

- cover discover, list_episodes, load_episode, get_trajectory paths via FakeLoader injection - cover get_frame_image ffmpeg/cv2 fallback and get_video_path/get_cameras error paths - raise lerobot_handler.py coverage from 25.00% to 96.76% 🧪 - Generated by Copilot

- cover _get_model cache hit and ImportError branches - cover get_cached, clear_cache, detect_frame box parsing including unknown class - cover detect_episode skips, exception handling, class summary, cache write, and singleton accessor - raise detection_service.py coverage from 48.55% to 100.00% 🧪 - Generated by Copilot

- cover detect short-input early return and constant velocity branch - cover velocity spikes, unexpected stop severity tiers and start/end exclusion - cover oscillation, force spike, gripper failure and joint limit detectors - cover _group_consecutive empty and split paths and _zscore_to_severity tiers - raise anomaly_detection.py coverage from 54.55% to 99.57% 🧪 - Generated by Copilot

- Cover validation, blob sync helpers, blob discovery, video streaming, eviction, list_datasets pruning, and capability flags - Cover prefetch scheduling guards, episode/trajectory cache paths, dataset path traversal, and invalidate_episode_cache - Cover get_video_file_path branches (no handler, lerobot, hdf5 upload, missing cache) and singleton get_dataset_service - Raises src/api/services/dataset_service/service.py coverage from 37.03% to 83.08% 🧪 - Generated by Copilot

- Adds tests/test_joint_config_router.py covering per-dataset and global joint configuration GET/PUT endpoints plus module helpers - Lifts coverage of src/api/routers/joint_config.py from 42.42% to 95.96% - Remaining uncovered lines (122, 146) are defensive double-checks unreachable behind SAFE_DATASET_ID_PATTERN 🧪 - Generated by Copilot

- Adds tests/test_ai_analysis_router.py covering trajectory analysis, anomaly detection, episode clustering, and annotation suggestion endpoints - Lifts coverage of src/api/routes/ai_analysis.py from 45.62% to 95.00% 🧪 - Generated by Copilot

- Adds tests/test_export_router.py covering sync export, SSE streaming, and preview endpoints - Lifts coverage of src/api/routers/export.py from 47.74% to 91.46% 🧪 - Generated by Copilot

- Adds tests/test_datasets_router.py covering capabilities, episodes, trajectory, frames, cameras, video file/blob streaming with HEAD and Range support, cache stats, and warm-cache endpoints - Lifts coverage of src/api/routers/datasets.py from 48.72% to 100.00% 🧪 - Generated by Copilot

- add evaluation-pytests reusable workflow job alongside other pytest jobs - add evaluation-pytests to release-please needs for parity with pr-validation.yml 🔧 - Generated by Copilot

…anches - Fix function-local import monkeypatches via string-form setattr - Rebuild ffmpeg success-path FakeProc to write output file in wait() - Remove orphaned assertion tail in get_loader test - Add cv2 success-path test via sys.modules injection with FakeWriter - Coverage 81.23% -> 93.49% 🧪 - Generated by Copilot

- Add StorageError wrap test for _download_file exception path - Add string-tasks branch and parse-error wrap tests for get_dataset_info - Add parquet discovery success path with ValueError and non-parquet skip - Add StorageError re-raise tests for get_dataset_info, list_episodes, get_episode_data - Add unexpected-exception wrap test for get_episode_data - Coverage: huggingface.py 87.74%% -> 92.90%% 🧪 - Generated by Copilot

…aged identity - Add ImportError test for missing azure-storage-blob SDK - Add managed identity branch test for _get_client - Add invalid JSON, HttpResponseError, and unexpected exception wraps for get_annotation - Add HttpResponseError and unexpected exception wraps for save_annotation - Add malformed filename skip and error wraps for list_annotated_episodes - Add HttpResponseError and unexpected exception wraps for delete_annotation - Add close test releasing client and idempotent no-op when never created - Coverage: azure.py 69.11%% -> 93.50%% 🧪 - Generated by Copilot

- Add edge-case tests for crop, resize, brightness, contrast, saturation, hue, gamma, and color filter validation paths - Add tests for apply_color_adjustment, apply_transform dispatch, apply_transforms_batch, apply_camera_transforms, and get_output_dimensions - Lift image_transform.py coverage from 73.47%% to 97.62%% 🧪 - Generated by Copilot

- Add tests for HDF5Loader covering load_all_frames, load_single_frame, file discovery, caching, and error paths - Fix stale regex assertion in test_find_episode_file_missing_raises to match current 'No HDF5 file found' error message - Brings src/api/services/hdf5_loader.py coverage to 95.85%% 🧪 - Generated by Copilot

…zers - Add tests/test_validation_branches.py exercising dependency factory closures directly - Cover _sanitize_nested_value recursion across list, tuple, set, dict - Cover SanitizedModel CRLF stripping in nested fields - Cover validate_safe_string empty rejection and pattern compilation paths - Cover range_header_param branches: none, prefix mismatch, open-ended, bounded, invalid - Lift src/api/validation.py coverage from 85.53% to 100% 🧪 - Generated by Copilot

- Add tests/api/test_analysis.py with TestClient cases for /trajectory-quality and /anomaly-detection - Push src/api/routers/analysis.py from 75.00% to 100% 🧪 - Generated by Copilot

- Add tests/test_config_branches.py exercising error paths and env-driven defaults - Hold src/api/config.py at 100% combined coverage 🧪 - Generated by Copilot

- Add Pydantic model tests for DetectionRequest.validate_frames - Cover None, valid, and negative-frame paths - Push src/api/models/detection.py from 88.89% to 100% 🧪 - Generated by Copilot

- Add tests for makedirs OSError, atomic-write cleanup, malformed filename skip - Cover listdir/remove failure paths and missing-directory early return - Add cleanup-skip-unlink and empty-directory branch coverage - Push src/api/storage/local.py from 78.35% to 98.97% 🧪 - Generated by Copilot

- add SecurityHeadersMiddleware tests for /docs, /redoc, /openapi.json bypass - add ContentSizeLimitMiddleware tests for malformed Content-Length and streaming overflow - raise src/api/middleware.py from 86.84% to 98.68% 🧪 - Generated by Copilot

- add tests for run-detection 404, success, and 500 unexpected-error paths - add tests for GET cached detections and DELETE clear-cache endpoints - raise src/api/routers/detection.py to 97.83% (44 stmts, 1 missing) 🧪 - Generated by Copilot

- add tests for GET/PUT/DELETE annotations 404 and success paths - add tests for auto-analysis and summary endpoints - raise src/api/routers/annotations.py from 26.15% to 100% 🧪 - Generated by Copilot

- convert async tests to sync wrappers using asyncio.run for compatibility without pytest-asyncio - exercise CRUD, auto-analysis flag detection (jitter/hesitation/correction), and summary aggregation - raise src/api/services/annotation_service.py from 14.29% to 88.10% 🧪 - Generated by Copilot

- Convert async provider tests to sync wrappers using asyncio.run - Cover ApiKeyProvider, EasyAuthProvider, JwtProvider, and require_auth/require_role paths - Validate provider factory dispatch and reset_auth_provider singleton behavior 🧪 - Generated by Copilot

🎨 - Generated by Copilot

katriendg · 2026-04-29T10:02:00Z

@WilliamBerryiii noting here that we may want to rebase and update some of the tests for dataviewer, with the extent of changes being added by @akzaidi in PR #591 (merge that before this one?). Believe it makes more sense. Great addition here in any case!

@given

- extract four @given methods from TestValidateDatasetIdProperties class - convert to standalone module-level test functions - resolves CodeQL py/iteration-of-non-iterable false positive - Generated by Copilot

@given

- extract four @given methods from TestValidateDatasetIdProperties class - convert to standalone module-level test functions - resolves CodeQL py/iteration-of-non-iterable false positive ♻️ - Generated by Copilot

- remove unreachable except clause in storage/test_base.py - replace bare assertions and dead branches in api/test_labels.py - drop redundant assignments and unused locals in dataset/hdf5/middleware tests - tighten assertion in training lerobot checkpoint test 🎨 - Generated by Copilot

WilliamBerryiii · 2026-04-29T21:45:45Z

@WilliamBerryiii noting here that we may want to rebase and update some of the tests for dataviewer, with the extent of changes being added by @akzaidi in PR #591 (merge that before this one?). Believe it makes more sense. Great addition here in any case!

Yes.

🎨 - Generated by Copilot

rezatnoMsirhC

Thank you for the thorough coverage work here. The CI gate additions are clean and the new test suite is impressively comprehensive. Left a few comments below.

…ackend-coverage # Conflicts: # data-management/viewer/backend/tests/test_lerobot_handler.py # data-management/viewer/backend/tests/test_lerobot_loader.py

- Convert manual asyncio.run wrappers to native async tests - Rely on pytest-asyncio asyncio_mode=auto for execution - Improves readability and prepares for additional async coverage 🤖 - Generated by Copilot

- Canonical launch tests live alongside training package - Removes duplicate to avoid divergence 🤖 - Generated by Copilot

…e to 90 - Add rationale comments for deadline=None on slow hypothesis tests - Raise dataviewer backend --cov-fail-under from 80 to 90 🤖 - Generated by Copilot

- test_detection: tolerate broken ultralytics/torch installs (AttributeError) - test_lerobot_handler: add get_tasks() to FakeLoader (post-merge API) - test_hdf5_handler: probe avc1 codec at runtime, skip if unavailable on Windows 🤖 - Generated by Copilot

- Resolved conflicts by keeping refactored test files (async wrappers, get_tasks fix, codec probe) over remote's style-only edits to the pre-refactor versions - Retained deletion of duplicate training/tests/test_rl_launch.py - Auto-merged labels.py, test_labels.py, test_config_branches.py, test_lerobot_loader.py, test_property_based.py 🤖 - Generated by Copilot

- Convert 49 sync wrapper + asyncio.run(_run()) sites to native async def test_* - Aligns with pytest-asyncio auto mode used elsewhere in suite - Preserves import asyncio (used by test_creates_task_when_loop_running) 🤖 - Generated by Copilot

WilliamBerryiii · 2026-04-30T23:33:51Z

Follow-up: converted the orchestrator tests to the native async def pattern for consistency with the rest of the suite (commit b3e9f36).

49 sync wrapper + asyncio.run(_run()) sites converted to native async def test_* (pytest-asyncio auto mode).
import asyncio retained — still used by TestSchedulePrefetch.test_creates_task_when_loop_running.
Full backend pytest: 1048 passed, 111 skipped, coverage 94.53% (≥ 90% gate).

🤖 - Generated by Copilot

- Capture _prefetch() coroutine and explicitly close it in the RuntimeError branch of _schedule_prefetch to eliminate "coroutine was never awaited" warning when no event loop is running - Apply ruff format to backend test files to satisfy CI lint check 🤖 - Generated by Copilot

katriendg

Nice one, valuable addition to our code coverage across.

- Align local pytest config with CI workflow already enforcing 80% - Remove stale TODO; capture/ now ships config, models, scripts modules - Resolve PR #590 review nit (thread r3180127646) 🤖 - Generated by Copilot

WilliamBerryiii added 30 commits April 23, 2026 22:54

test(extension): expand dataviewer backend coverage for storage/blob_…

5e17148

…dataset - add 43 tests covering scan, sync, video resolve, hdf5 helpers, close - raise blob_dataset.py coverage from baseline to 89% 🧪 - Generated by Copilot

test(extension): add unit tests for EpisodeClusterer service

793695e

- add 13 tests covering _extract_features, cluster, _simple_clustering, dataclasses - raise clustering.py coverage from 20.45% to 98.48% 🧪 - Generated by Copilot

test(extension): add unit tests for LeRobot dataset loader

99dfa7a

- cover _load_info defaults, _find_episode_location, list_episodes_with_meta - cover load_episode qpos/qvel aliases and zero-fill paths - raise lerobot_loader.py coverage from 22.77% to 90.43% 🧪 - Generated by Copilot

test(extension): add unit tests for export router

b64ab7a

- Adds tests/test_export_router.py covering sync export, SSE streaming, and preview endpoints - Lifts coverage of src/api/routers/export.py from 47.74% to 91.46% 🧪 - Generated by Copilot

ops(workflows): wire evaluation-pytests into main.yml

ada997b

- add evaluation-pytests reusable workflow job alongside other pytest jobs - add evaluation-pytests to release-please needs for parity with pr-validation.yml 🔧 - Generated by Copilot

test(extension): cover analysis router stub endpoints

d613cbd

- Add tests/api/test_analysis.py with TestClient cases for /trajectory-quality and /anomaly-detection - Push src/api/routers/analysis.py from 75.00% to 100% 🧪 - Generated by Copilot

test(extension): cover api config validation branches

29e26f1

- Add tests/test_config_branches.py exercising error paths and env-driven defaults - Hold src/api/config.py at 100% combined coverage 🧪 - Generated by Copilot

test(extension): cover detection model validator branches

7840880

- Add Pydantic model tests for DetectionRequest.validate_frames - Cover None, valid, and negative-frame paths - Push src/api/models/detection.py from 88.89% to 100% 🧪 - Generated by Copilot

test(extension): cover annotation router endpoints

87c1bc8

- add tests for GET/PUT/DELETE annotations 404 and success paths - add tests for auto-analysis and summary endpoints - raise src/api/routers/annotations.py from 26.15% to 100% 🧪 - Generated by Copilot

WilliamBerryiii added 2 commits April 28, 2026 20:55

style(training): apply lint and formatting cleanup to training tests

03315e0

🎨 - Generated by Copilot

style(training): apply lint and formatting cleanup to training tests

dd1844e

🎨 - Generated by Copilot

WilliamBerryiii added 4 commits April 29, 2026 10:50

refactor(viewer): move hypothesis property tests to module level

87f54ec

- extract four @given methods from TestValidateDatasetIdProperties class - convert to standalone module-level test functions - resolves CodeQL py/iteration-of-non-iterable false positive - Generated by Copilot

WilliamBerryiii added 2 commits April 29, 2026 15:39

style(viewer-backend): apply ruff format to test_property_based.py

6fbb16a

🎨 - Generated by Copilot

style(viewer-backend): apply ruff format to test_property_based.py

f52f4b6

🎨 - Generated by Copilot

katriendg mentioned this pull request Apr 30, 2026

feat(data): add camera selector to annotation workspace and fix AV1 frame extraction #591

Merged

25 tasks

rezatnoMsirhC requested changes Apr 30, 2026

View reviewed changes

Comment thread data-management/viewer/backend/tests/test_dataset_service.py

Comment thread training/tests/test_rl_launch.py Outdated

Comment thread fleet-deployment/inference/tests/test_plotting_hypothesis.py

WilliamBerryiii added 6 commits April 30, 2026 11:27

Merge remote-tracking branch 'origin/main' into feat/441-dataviewer-b…

5ede5f4

…ackend-coverage # Conflicts: # data-management/viewer/backend/tests/test_lerobot_handler.py # data-management/viewer/backend/tests/test_lerobot_loader.py

test(data): refactor async wrappers in viewer backend tests

d51da4d

- Convert manual asyncio.run wrappers to native async tests - Rely on pytest-asyncio asyncio_mode=auto for execution - Improves readability and prepares for additional async coverage 🤖 - Generated by Copilot

test(training): remove duplicate test_rl_launch.py

0d56ef6

- Canonical launch tests live alongside training package - Removes duplicate to avoid divergence 🤖 - Generated by Copilot

test: justify hypothesis deadline overrides; raise dataviewer cov gat…

a21f945

…e to 90 - Add rationale comments for deadline=None on slow hypothesis tests - Raise dataviewer backend --cov-fail-under from 80 to 90 🤖 - Generated by Copilot

github-advanced-security AI found potential problems Apr 30, 2026

View reviewed changes

Comment thread data-management/viewer/backend/tests/test_detection.py Dismissed

WilliamBerryiii requested a review from rezatnoMsirhC May 1, 2026 21:32

Merge branch 'main' into feat/441-dataviewer-backend-coverage

8c7d4fb

katriendg approved these changes May 4, 2026

View reviewed changes

Comment thread data-pipeline/pyproject.toml Outdated

rezatnoMsirhC approved these changes May 4, 2026

View reviewed changes

rezatnoMsirhC and others added 2 commits May 4, 2026 08:21

Merge branch 'main' into feat/441-dataviewer-backend-coverage

b8611c9

fix(pipeline): enable --cov-fail-under=80 in pyproject addopts

377ddf0

- Align local pytest config with CI workflow already enforcing 80% - Remove stale TODO; capture/ now ships config, models, scripts modules - Resolve PR #590 review nit (thread r3180127646) 🤖 - Generated by Copilot

WilliamBerryiii merged commit 10ab980 into main May 5, 2026
48 checks passed

WilliamBerryiii deleted the feat/441-dataviewer-backend-coverage branch May 5, 2026 03:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: enforce 80% pytest coverage gate across all 6 Python domains (#441)#590

test: enforce 80% pytest coverage gate across all 6 Python domains (#441)#590
WilliamBerryiii merged 85 commits into
mainfrom
feat/441-dataviewer-backend-coverage

WilliamBerryiii commented Apr 29, 2026

Uh oh!

katriendg commented Apr 29, 2026

Uh oh!

WilliamBerryiii commented Apr 29, 2026

Uh oh!

rezatnoMsirhC left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WilliamBerryiii commented Apr 30, 2026

Uh oh!

katriendg left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

WilliamBerryiii commented Apr 29, 2026

Summary

Per-Domain Coverage (verified locally)

CI Gates Enforced

Notable Changes

Commit Composition (59 commits)

Files Changed

Validation

Deferred / Follow-Up

Related

Uh oh!

katriendg commented Apr 29, 2026

Uh oh!

WilliamBerryiii commented Apr 29, 2026

Uh oh!

rezatnoMsirhC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WilliamBerryiii commented Apr 30, 2026

Uh oh!

katriendg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants