ci: add deploy-smoke job for Dockerfile + Compose + Caddyfile examples (#7)#29
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
cmeans
left a comment
There was a problem hiding this comment.
QA round 1 — clean (no follow-ups)
PR closes #7 in full and matches the issue's proposal exactly: 4 smoke steps (Docker build + entrypoint --help + Compose config + Caddyfile validate), with systemd-analyze correctly excluded per the issue's own caveat about the unit's referenced binary not existing on a fresh runner.
CI run on PR head df2ce49 — deploy-smoke job all-success:
| Step | CI step output |
|---|---|
docker build -f deploy/docker/Dockerfile -t pypi-winnow-downloads:smoke . |
multi-stage build through python:3.14-slim (PR #24's bump now exercised) — DONE in 7s, image tagged |
docker run --rm --entrypoint /opt/venv/bin/winnow-collect pypi-winnow-downloads:smoke --help |
usage: winnow-collect [-h] --config CONFIG [-v] printed; exit 0 — confirms venv links resolve and import pypi_winnow_downloads works inside the runtime stage (the exact bug class from PR #6 round 1) |
BADGE_HOST=badges.example.com docker compose -f deploy/docker/compose.yml.example config |
merged service tree emitted (collector, caddy, port bindings, volumes) — ${BADGE_HOST:?…} substitution resolves correctly |
docker run … caddy:2 caddy validate --config /etc/caddy/Caddyfile.example --adapter caddyfile |
Valid configuration — Caddyfile parses against the production image |
Static + scope checks:
| Check | Result |
|---|---|
| Issue #7 scope | matches — all required steps + 2 of 3 optional extras; systemd-analyze skip is documented in the PR with reasoning that mirrors the issue's own optional-extras note |
| Diff scope | exactly .github/workflows/ci.yml (+41) + CHANGELOG.md (+1) — matches PR claim |
| CHANGELOG entry placement | top of existing ### Added in ## [Unreleased]; references Closes #7 — clean Keep-a-Changelog ordering, no new orphans |
| Entrypoint override rationale | sound — Dockerfile's ENTRYPOINT hardcodes --config /etc/pypi-winnow-downloads/config.yaml which the smoke image doesn't ship; overriding to /opt/venv/bin/winnow-collect lets --help short-circuit before config loading |
Local uv run pytest -q + ruff check + ruff format --check + mypy src |
56 passed, 0 deselected; clean across the rest |
| Other CI on PR head | all SUCCESS (test 3.11/3.12/3.13, lint, typecheck, on-push, qa-approved) |
Test-plan checkboxes 5 + 6 ticked (verifiable now). Items 1–4 already ticked by Dev with local pre-commit verification, which I cross-checked against the CI run output and reproduced the same outputs.
The follow-up note in the PR (adding deploy-smoke to the main-protection ruleset's required-status-checks list) is a maintainer ruleset choice, not a code change — correctly out of scope for this PR.
No findings. Transitioning label to Ready for QA Signoff.
|
Applying |
… examples (closes #7) Adds a `deploy-smoke` job to `.github/workflows/ci.yml` that runs on every push to main and every PR. Four steps: 1. `docker build -f deploy/docker/Dockerfile -t pypi-winnow-downloads:smoke .` 2. `docker run --rm --entrypoint /opt/venv/bin/winnow-collect pypi-winnow-downloads:smoke --help` — exit 0 confirms the venv links resolve and `import pypi_winnow_downloads` works inside the runtime stage. This is the bug class that PR #6 round 1 shipped (editable install pointing at a non-existent /src), which all the Python-only checks missed and only QA caught manually. Now CI catches it. 3. `BADGE_HOST=badges.example.com docker compose -f deploy/docker/compose.yml.example config` — validates the example compose syntax + the BADGE_HOST substitution. 4. `docker run --rm -v $WORKSPACE/deploy/caddy:/etc/caddy:ro caddy:2 caddy validate --config /etc/caddy/Caddyfile.example --adapter caddyfile` — validates the Caddyfile against the exact image users will run. Skips `systemd-analyze verify` on the timer per the issue's caveat — the referenced .service unit declares `ExecStart=/usr/local/bin/winnow-collect`, which doesn't exist on a fresh CI runner, so the verify would warn-or-fail on a condition that's intended (it's a deploy example, the binary gets installed by the operator following deploy/README.md). The .service syntax is trivial enough that the cost of CI noise exceeds the value of the check. Override of the Dockerfile's ENTRYPOINT in step 2 is necessary because the default entrypoint hardcodes `--config /etc/pypi-winnow-downloads/config.yaml`, which the smoke container doesn't ship — would error before --help can short- circuit. Overriding to the binary directly lets us pass --help cleanly. Verified locally before commit: build succeeds, --help exits 0 with the expected usage output, compose config emits the merged service tree, caddy validate emits "Valid configuration".
df2ce49 to
9805544
Compare
cmeans
left a comment
There was a problem hiding this comment.
QA round 2 — QA Failed
Reversing my round-1 sign-off after a closer look. Local + CI checks are clean on head 9805544; the failure is at the level of "does this PR actually deliver its stated protection."
Substantive finding (blocker): deploy-smoke ships non-required, which defeats the PR's stated purpose
The PR's motivation, stated in both the description and the inline workflow comment, is to catch the bug class from PR #6 round 1 — a Dockerfile breakage that lint / mypy / pytest all greenlit. The defense only works if a deploy-smoke failure blocks merge. Today it doesn't.
Verified directly against the live ruleset:
$ gh api repos/cmeans/pypi-winnow-downloads/rulesets/15546042 \
| jq -r '.rules[] | select(.type == "required_status_checks") | .parameters.required_status_checks[].context'
qa-approved
lint
typecheck
test (3.11)
test (3.12)
test (3.13)
deploy-smoke is not on that list. With this PR landed as written, a future PR that breaks deploy/docker/Dockerfile (the exact PR #6 scenario) would see deploy-smoke fail, all required checks stay green, and the PR remains mergeable. Same pre-PR posture, same bug class still ships.
The PR body acknowledges this and punts:
If the maintainer wants
deploy-smoketo be a required check, add it to themain-protectionruleset's required status checks list. Out of scope for this PR.
That's the homework-miss the new lens flags. The protective intent is the entire scope of the PR. "Out of scope" without a stated rationale (soak window, expected flakiness, etc.) and without a concrete promotion trigger is punting the half of the work that actually changes outcomes.
Two acceptable resolutions
Pick one:
A. Promote in this PR. Update the ruleset to add deploy-smoke to required status checks atomically with the workflow addition. One API call:
gh api -X PUT repos/cmeans/pypi-winnow-downloads/rulesets/15546042 \
-f 'rules[].parameters.required_status_checks[].context=deploy-smoke' …
(Or via the UI — Settings → Rules → main-protection → required status checks → add deploy-smoke.) Both valid; the second isn't a "code change" but the reason it's not doesn't make the work less in-scope, just less reviewable in the diff.
B. Soak-then-promote with an explicit trigger. If the concern is flake risk during initial rollout (Docker Hub rate limits, registry availability, etc.), state that explicitly in the PR body and file a tracking issue with a concrete promotion trigger — e.g., "promote deploy-smoke to required after 5 consecutive green runs on main" or "by 2026-05-10". This is the pattern used for vdsm.yml on cmeans/mcp-synology (intention 0737e130-02f7-4b3c-bc98-eed195bac6d7). With that ticket filed and referenced from the PR, the soak rationale stands and signoff is appropriate.
What's not acceptable as-is: a "Follow-up (post-merge, optional)" line with no rationale and no trigger — that's the work being punted, not deferred.
What is correct on this PR
| Aspect | State |
|---|---|
| Issue #7 step coverage | matches the issue's proposal exactly (4 of 4 listed steps, plus the documented systemd-analyze skip with reasoning that mirrors the issue's own caveat) |
Local + CI verification on 9805544 |
56/56 pytest, ruff/format/mypy clean; deploy-smoke job all-success in CI |
| Diff scope | .github/workflows/ci.yml (+41) + CHANGELOG.md (+1) — matches PR claim |
| CHANGELOG entry | top of ### Added in ## [Unreleased], references Closes #7 — correct Keep-a-Changelog placement |
| Workflow internals | sound — entrypoint override rationale matches the Dockerfile's ENTRYPOINT, BADGE_HOST substitution proven in compose config output, Caddyfile validate emits Valid configuration against the production caddy:2 image |
This is a one-finding QA Failed, not a teardown. The job itself is well-built; what's missing is making it load-bearing.
Transitioning label to QA Failed.
|
Applying |
|
QA round 2 finding addressed via Option B (soak-then-promote), per your second resolution path. No code change — body-only edit:
The job itself is unchanged (head still |
cmeans
left a comment
There was a problem hiding this comment.
QA round 3 — clean, signing off
Round 2 fail addressed via resolution B (soak-then-promote with tracking issue). Code unchanged from round 2 (9805544); only the PR body picked up the soak section and a new test-plan checkbox referencing #31.
Round-2 finding resolution check:
| Round-2 ask | Resolution on this round |
|---|---|
| State soak rationale explicitly | PR body's "Soak window before promoting…" section names Docker Hub's 100/6h anonymous-pull limit, identifies the two registry pulls per run (python:3.14-slim for build + runtime stages, caddy:2 for validate), and notes Dependabot Mondays compound the rate. Defensible — not vague handwaving. |
| Concrete promotion trigger | Issue #31: 5 consecutive green deploy-smoke runs on main OR 2026-05-10, whichever first. Time-bounded, with a "fix root cause before promoting, don't extend indefinitely" guard. |
| Tracking issue with acceptance criteria | Issue #31 exists, lists three acceptance checkboxes including a verification gate ("a subsequent test PR that intentionally breaks deploy/docker/Dockerfile is BLOCKED at merge time on the failed deploy-smoke check") — that's the load-bearing test that the promotion actually worked. |
| Promotion mechanism documented | Issue #31 names ruleset ID 15546042 (matches what I verified via gh api in round 2) and gives both gh api and UI paths, honest about the gh api PUT form needing full ruleset JSON. |
No code changes from round 2:
- Head SHA
980554476…identical to what I verified locally in round 2 (56/56 pytest, ruff/format/mypy clean) and whatdeploy-smokeitself ran green against in CI - CI on this head all SUCCESS (test 3.11/3.12/3.13, lint, typecheck, deploy-smoke, on-push, qa-approved)
The "punted, optional, post-merge" framing from round 2 is now "deferred with stated trigger" — which is the distinction the new lens calls for. Resolution B properly executed.
Note for tracking: issue #31's acceptance gate ("test PR that breaks Dockerfile is BLOCKED") needs to be exercised when promotion happens. Not a blocker for this PR, but worth keeping in view — without that verification, the promotion is itself unverified.
No findings. Transitioning label to Ready for QA Signoff.
|
Applying |
Three mechanical edits: - pyproject.toml: version "0.1.0" -> "0.1.1" - CHANGELOG.md: insert `## [0.1.1] - 2026-04-26` directly under the (still empty) `## [Unreleased]` header so all 12 PRs' worth of bullets that have been accumulating since v0.1.0 ship are now categorized under the 0.1.1 release. Updated the link refs at the bottom: [Unreleased] now compares from v0.1.1, and a new [0.1.1] entry compares v0.1.0...v0.1.1. - uv.lock: refreshed by `uv lock` so the locked pypi-winnow-downloads version (0.1.1) matches pyproject.toml. What ships in v0.1.1 (highlights — full changelog under ## [0.1.1]): Library fixes (operator-visible): - collector: _write_health OSError no longer escapes per-package isolation. Disk-full / perm errors now produce structured `winnow-collect: ...; health file write failed: [Errno 28] No space left on device` exit instead of a raw traceback. Closes #32. - collector: stale_threshold_days is now actually consulted — the "warn if previous run is older than N days" feature documented in config.example.yaml since v0.1.0 finally fires. Log-only per the documented v1 contract; degrades silently on first-run / unreadable / malformed / future- timestamped previous _health.json. Closes #33. Documentation: - README acknowledgments / license / BigQuery dataset link refresh (PR #15) - README shields.io URL canonicalization (PR #27, closes #16) - deploy/README.md Tailscale Funnel as alternative HTTPS exposure (PR #22) - deploy/README.md "Pick an approach" table updated to reflect the new Caddy logging shape (in PR #30) CI / project infrastructure (no PyPI consumer impact, but hardens future releases): - Community health files: CONTRIBUTING / CoC / SECURITY / issue templates (PR #20) - .github/dependabot.yml across pip + github-actions + docker ecosystems (PR #21) - Dependabot PR hygiene cascade from cmeans/mcp-synology: PULL_REQUEST_TEMPLATE.md + auto-CHANGELOG workflow (App- token authenticated so required CI re-fires on the bot's HEAD SHA) + dependabot.yml prefix fix (PR #25). Validated end-to-end via the first two real Dependabot bumps PR #23 (codecov-action 5->6) and PR #24 (python 3.13-slim -> 3.14-slim). - deploy-smoke CI job that builds the Dockerfile, smokes the entrypoint, validates compose+Caddyfile against caddy:2 (PR #29, closes #7). Promoted to required status check on the main-protection ruleset 2026-04-26 22:43 (issue #31 closed via operator action). - deploy/caddy/Caddyfile.example gains global error logger + per-site access logger with built-in lumberjack rotation, documents the validate-as-root gotcha (PR #30). Live CT 112 deployment fixed in the same change. - 100% coverage on src/ via real tests (no `# pragma: no cover`), with `fail_under = 100` gate in pyproject.toml so future regressions trip CI (PR #38, closes #37). Verified locally: 71/71 pytest pass, ruff/format/mypy clean, coverage gate green at 100.00%. After this merges: 1. Tag the squash-merge commit as v0.1.1 and push the tag — publish.yml fires and uploads to PyPI via the existing trusted-publisher OIDC flow. 2. Update the live CT 112 deployment to install pypi-winnow-downloads==0.1.1 from PyPI (currently runs a wheel built from main, but pinning to the released version keeps deploy reproducible). 3. Close any post-release follow-ups Chris wants tracked. Co-authored-by: cmeans-claude-dev[bot] <272174644+cmeans-claude-dev[bot]@users.noreply.github.com>
Summary
Adds a
deploy-smokejob to.github/workflows/ci.ymlthat exercises the fourdeploy/example artifacts CI can't reach today. The headline gain: catches the bug class that took down PR #6 round 1 (Dockerfile runtime stage shipped a venv installed in editable mode pointing at a non-existent/src, soimport pypi_winnow_downloadsfailed at startup; lint/mypy/pytest all green; only QA caught it manually).docker build -f deploy/docker/Dockerfile -t pypi-winnow-downloads:smoke .docker run --rm --entrypoint /opt/venv/bin/winnow-collect pypi-winnow-downloads:smoke --helpimport pypi_winnow_downloadsworks inside the runtime stage, console-script entry point is present and invokable. Exit 0 = greenBADGE_HOST=badges.example.com docker compose -f deploy/docker/compose.yml.example config${BADGE_HOST:?…}substitution worksdocker run --rm -v deploy/caddy:/etc/caddy:ro caddy:2 caddy validate --config /etc/caddy/Caddyfile.example --adapter caddyfileOut of scope
systemd-analyze verify deploy/systemd/pypi-winnow-downloads-collector.timer— listed as optional in the issue. The timer'sRequires=chains into the .service unit, which declaresExecStart=/usr/local/bin/winnow-collect, a binary that doesn't exist on a fresh CI runner. The verify would warn (or fail) on a condition that's intended (operators install the binary perdeploy/README.md). Cost of CI noise exceeds the value for trivial syntax.Why override the entrypoint in step 2
The Dockerfile's
ENTRYPOINThardcodes--config /etc/pypi-winnow-downloads/config.yaml. The smoke container doesn't ship a config file (and shouldn't — it's a smoke image), so without the overridewinnow-collectwould error on missing config before--helpcould short-circuit. Overriding to the binary directly lets us pass--helpcleanly and assert exit 0.Soak window before promoting to required status check
This PR ships
deploy-smokenon-required. The maintainer-side ruleset promotion to required status check is intentionally deferred per QA round 2 on this PR — the protective intent only fires oncedeploy-smokeblocks merge, but rolling it straight to required without observing live behavior risks Docker Hub rate-limit flakes (the job pullspython:3.14-slimandcaddy:2on every run; weekly Dependabot Mondays compound the pull rate).Tracking issue: #31 captures the soak rationale + a concrete promotion trigger (5 consecutive green runs on
mainOR 2026-05-10, whichever first) + the one-line ruleset-update mechanism. The promotion is an operator action on themain-protectionruleset, not a code change — so it lives in #31 rather than this PR's diff. Without #31 this would be a punt; with it, the soak is a deferral with a stated trigger.Test plan
Locally verified all four steps before commit:
docker build -f deploy/docker/Dockerfile -t pypi-winnow-downloads:smoke .— succeedsdocker run --rm --entrypoint /opt/venv/bin/winnow-collect pypi-winnow-downloads:smoke --help— exits 0 with expected usage outputBADGE_HOST=badges.example.com docker compose -f deploy/docker/compose.yml.example config— emits merged service tree (collector + caddy + volumes)docker run --rm -v $PWD/deploy/caddy:/etc/caddy:ro caddy:2 caddy validate --config /etc/caddy/Caddyfile.example --adapter caddyfile— emits"Valid configuration"deploy-smokejob appearing alongsidelint/typecheck/test (3.11/3.12/3.13)and passing## [Unreleased]→### Addedreferences this PR's closes Add Docker image smoke test to CI #7Closes #7