Skip to content

fix(collector): _write_health OSError no longer escapes per-package isolation (#32)#35

Merged
cmeans-claude-dev[bot] merged 1 commit into
mainfrom
fix/health-write-oserror-isolation
Apr 27, 2026
Merged

fix(collector): _write_health OSError no longer escapes per-package isolation (#32)#35
cmeans-claude-dev[bot] merged 1 commit into
mainfrom
fix/health-write-oserror-isolation

Conversation

@cmeans-claude-dev

@cmeans-claude-dev cmeans-claude-dev Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes #32. collect() called _write_health(...) with no try/except, and __main__.main() had no handler around collector_fn(config). A failure inside the health-write step (disk full, output dir not writable, atomic-replace cross-device, etc.) propagated as an unhandled OSError traceback through the process, bypassing the structured per-package failure summary that operators rely on. This defeated the per-package isolation contract one level up: _collect_one already wraps (CollectorError, OSError) so one bad package can't kill the run, but the final health-write step had no equivalent guard.

Approach

File Change
src/pypi_winnow_downloads/collector.py Wrap _write_health(...) call in collect() with try: … except OSError as e: logger.error(…). Surface failure via new `CollectorResult.health_write_error: str
src/pypi_winnow_downloads/__main__.py Combine per-package failures + health-write failure into one structured exit message: winnow-collect: 2 package(s) failed: foo, bar; health file write failed: [Errno 28] No space left on device. Both modes produce non-zero exit; either alone produces a single-clause message.
tests/test_collector.py New: test_collect_health_write_oserror_recorded_not_raised (monkeypatches os.replace to raise ENOSPC ONLY on _health.json, badge writes still work). Extended: test_collect_result_reports_no_failures_on_full_success asserts health_write_error is None on happy path.
tests/test_main.py New: test_main_exits_nonzero_when_health_file_write_fails and test_main_combines_package_and_health_failure_messages.
CHANGELOG.md ### Fixed entry referencing #32.

Test plan

  • uv run pytest --cov --cov-report=xml — 59 passed (3 new + extended 1)
  • uv run ruff check src/ tests/ — clean
  • uv run ruff format --check src/ tests/ — clean
  • uv run mypy src/pypi_winnow_downloads/ — clean
  • CollectorResult field addition is backward-compatible: existing test fixtures construct without health_write_error, default None makes them work unchanged
  • Reviewer-relevant: the test that monkeypatches os.replace is targeted (only raises for the health-file destination, lets badge writes use the real call) — this verifies the OSError comes from the right code path, not from a side effect
  • CI green on all jobs (lint, typecheck, test 3.11/3.12/3.13, deploy-smoke, codecov)
  • Squash-merge title/body preserved

Why a new field rather than logging-only

Two failure modes to surface:

  1. Per-package failures (existing) — operator sees which packages didn't get fresh badges
  2. Health-write failure (new) — operator sees that the diagnostic surface itself is broken

If we just logged the OSError without surfacing it structurally, a run with no per-package failures + a broken health-write would exit 0 silently — operators would think the collector succeeded when in fact _health.json is stale or absent. The new field keeps the exit code honest.

Alternative (rejected): a top-level except OSError in __main__.main() around collector_fn(config). Loses per-package outcome detail (CollectorResult never returned), and requires operators to read raw OSError strings without context about which step failed. The current approach gives finer signal at the same complexity cost.

Source of finding

Independent code-review pass against main 4e4148f, 2026-04-26. Filed as #32, picked up immediately.

Closes #32

…solation (closes #32)

Surfaced by independent code-review pass against main `4e4148f`
(2026-04-26): `collect()` called `_write_health(...)` with no
try/except, and `__main__.main()` had no handler around
`collector_fn(config)`. A failure inside the health-write step
(disk full, output dir not writable, cross-device atomic-replace,
etc.) propagated as an unhandled OSError traceback all the way
out of the process, bypassing the structured per-package failure
summary that operators rely on. This defeated the per-package
isolation contract one level up: `_collect_one` already wraps
`(CollectorError, OSError)` so one bad package can't kill the run,
but the final health-write step had no equivalent guard.

The fix:

- Wrap the `_write_health(...)` call in `collect()` with
  `try: ... except OSError as e: logger.error(...)`. Surface the
  failure structurally via a new
  `CollectorResult.health_write_error: str | None` field
  (default `None`, additive change — existing call sites that
  construct `CollectorResult` keep working without modification).
- Update `__main__.main()` to combine per-package failures and
  health-write failures into one structured exit message:
  `winnow-collect: 2 package(s) failed: foo, bar; health file
  write failed: [Errno 28] No space left on device`. Both modes
  produce non-zero exit; either alone produces a single-clause
  message.

Tests added:

- `test_collect_health_write_oserror_recorded_not_raised` —
  monkeypatches `collector_module.os.replace` to raise ENOSPC
  ONLY on the `_health.json` target (badge writes still work
  via the real call). Asserts the per-package outcome lands
  intact AND `result.health_write_error` is set, AND no
  exception escapes `collect()`.
- `test_main_exits_nonzero_when_health_file_write_fails` —
  calls `__main__.main()` with a fake `collector_fn` that
  returns `health_write_error="[Errno 28] No space left on
  device"`; asserts SystemExit message contains the structured
  text.
- `test_main_combines_package_and_health_failure_messages` —
  same shape with both per-package failures AND
  health_write_error set; asserts both clauses appear in the
  exit message.
- Existing `test_collect_result_reports_no_failures_on_full_success`
  extended to assert `health_write_error is None` on the
  happy path.

Verified locally: 59/59 pytest pass, ruff/format/mypy clean.
@cmeans-claude-dev cmeans-claude-dev Bot added the Ready for QA Dev work complete — QA can begin review label Apr 27, 2026
@github-actions github-actions Bot added Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA Ready for QA Dev work complete — QA can begin review and removed Ready for QA Dev work complete — QA can begin review Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA labels Apr 27, 2026
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@cmeans cmeans left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cmeans cmeans added QA Active QA is actively reviewing; Dev should not push changes and removed Ready for QA Dev work complete — QA can begin review labels Apr 27, 2026

@cmeans cmeans left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA round 1 — clean, no follow-ups

Closes #32 cleanly via the issue's resolution A. Bug class is the actual bug — _write_health OSError escaping per-package isolation — and the regression test exercises the actual bug, not just the fix mechanic.

Issue #32 scope vs PR delivery:

Issue #32 ask Delivered
Wrap _write_health(...) in collect() with try/except OSError; log; continue collector.py now does exactly that, plus surfaces the failure structurally via new CollectorResult.health_write_error: str | None = None field (additive, default None, backward-compat)
Regression test that injects a write failure test_collect_health_write_oserror_recorded_not_raised — monkeypatches collector_module.os.replace to raise ENOSPC only when dst ends in _health.json, lets badge writes use the real call. Asserts: result.outcomes still populated; result.failures == (); result.health_write_error contains "No space left on device". This is the right level of targeting — tests the bug class, not a blanket os.replace failure that would also break badge writes.
Honest exit code __main__.main() rewritten to accumulate a problems list, joining package-failures and health-write-failures with ; separator. All four permutations covered: success silent, only-package, only-health, both-combined.

Per-package isolation contract verified intact:

_collect_one already wraps (CollectorError, OSError) around the body that calls badge.write_badge (which uses the same os.replace pattern as _write_health). Repo-wide grep for unwrapped os.replace / write_text / mkdir calls in collect()'s call graph: only _write_health was the gap; this PR closes it, no other surface remains.

Test sufficiency — covers bug, not just mechanics:

Test Bug-class coverage
test_collect_health_write_oserror_recorded_not_raised Forces OSError at the exact code path (os.replace on _health.json); verifies non-propagation + structured surfacing
test_collect_result_reports_no_failures_on_full_success Backward-compat: health_write_error is None on happy path
test_main_exits_nonzero_when_health_file_write_fails main() exits non-zero with structured message when only health fails
test_main_combines_package_and_health_failure_messages Multi-clause exit message permutation: both failure modes together produce the ;-joined message

Hygiene:

  • New except OSError as e is not silent — logger.error("collector: failed to write _health.json: %s", e) + structured surfacing. No feedback_silent_exceptions violation.
  • Backward compat: existing CollectorResult(...) construction sites keep working (default None on the new field).
  • Test plan: all 7 verifiable items now ticked (item 7 — squash-merge title preservation — is post-merge).

Local verification on head 3f2fc85:

Check Result
uv run pytest -q 59 passed, 0 deselected, 0.26s
uv run pytest tests/test_collector.py::test_collect_health_write_oserror_recorded_not_raised (just the bug-class test) passed
uv run pytest tests/test_main.py::test_main_combines_package_and_health_failure_messages passed
uv run ruff check, ruff format --check, mypy src all clean
Coverage of new lines in collector.py:209-227 and __main__.py:39-47 100% (the lone uncovered prod lines remain the same pre-existing 1% gap discussed earlier — __init__.py PackageNotFoundError fallback, __main__.py entry-point shim, two config.py error paths, one collector line — none introduced by this PR)
CI on PR head all SUCCESS (test 3.11/3.12/3.13, lint, typecheck, deploy-smoke, qa-approved, on-push)

No findings. Transitioning label to Ready for QA Signoff.

@cmeans

cmeans commented Apr 27, 2026

Copy link
Copy Markdown
Owner

Applying Ready for QA Signoff — see review above. Closes #32 cleanly: regression test targets the exact bug location (os.replace on _health.json) and verifies non-propagation + structured surfacing without breaking badge writes. Per-package isolation contract intact, no other unwrapped IO surfaces remain in collect()'s call graph. 59/59 tests, ruff/format/mypy clean, full coverage on new code paths.

@cmeans cmeans added Ready for QA Signoff QA passed — ready for maintainer final review and merge QA Approved Manual QA testing completed and passed and removed QA Active QA is actively reviewing; Dev should not push changes Ready for QA Signoff QA passed — ready for maintainer final review and merge labels Apr 27, 2026
@cmeans-claude-dev cmeans-claude-dev Bot merged commit 655039a into main Apr 27, 2026
40 checks passed
@cmeans-claude-dev cmeans-claude-dev Bot deleted the fix/health-write-oserror-isolation branch April 27, 2026 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

QA Approved Manual QA testing completed and passed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

collector: _write_health OSError escapes per-package isolation and bypasses structured exit

2 participants