Skip to content

ci: add deploy-smoke job for Dockerfile + Compose + Caddyfile examples (#7)#29

Merged
cmeans-claude-dev[bot] merged 1 commit into
mainfrom
ci/docker-smoke
Apr 27, 2026
Merged

ci: add deploy-smoke job for Dockerfile + Compose + Caddyfile examples (#7)#29
cmeans-claude-dev[bot] merged 1 commit into
mainfrom
ci/docker-smoke

Conversation

@cmeans-claude-dev

@cmeans-claude-dev cmeans-claude-dev Bot commented Apr 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a deploy-smoke job to .github/workflows/ci.yml that exercises the four deploy/ example artifacts CI can't reach today. The headline gain: catches the bug class that took down PR #6 round 1 (Dockerfile runtime stage shipped a venv installed in editable mode pointing at a non-existent /src, so import pypi_winnow_downloads failed at startup; lint/mypy/pytest all green; only QA caught it manually).

Step What it validates
docker build -f deploy/docker/Dockerfile -t pypi-winnow-downloads:smoke . Multi-stage build still produces a runnable image
docker run --rm --entrypoint /opt/venv/bin/winnow-collect pypi-winnow-downloads:smoke --help Venv links resolve, import pypi_winnow_downloads works inside the runtime stage, console-script entry point is present and invokable. Exit 0 = green
BADGE_HOST=badges.example.com docker compose -f deploy/docker/compose.yml.example config Example compose file parses; ${BADGE_HOST:?…} substitution works
docker run --rm -v deploy/caddy:/etc/caddy:ro caddy:2 caddy validate --config /etc/caddy/Caddyfile.example --adapter caddyfile Caddyfile parses against the exact image users will run

Out of scope

  • systemd-analyze verify deploy/systemd/pypi-winnow-downloads-collector.timer — listed as optional in the issue. The timer's Requires= chains into the .service unit, which declares ExecStart=/usr/local/bin/winnow-collect, a binary that doesn't exist on a fresh CI runner. The verify would warn (or fail) on a condition that's intended (operators install the binary per deploy/README.md). Cost of CI noise exceeds the value for trivial syntax.

Why override the entrypoint in step 2

The Dockerfile's ENTRYPOINT hardcodes --config /etc/pypi-winnow-downloads/config.yaml. The smoke container doesn't ship a config file (and shouldn't — it's a smoke image), so without the override winnow-collect would error on missing config before --help could short-circuit. Overriding to the binary directly lets us pass --help cleanly and assert exit 0.

Soak window before promoting to required status check

This PR ships deploy-smoke non-required. The maintainer-side ruleset promotion to required status check is intentionally deferred per QA round 2 on this PR — the protective intent only fires once deploy-smoke blocks merge, but rolling it straight to required without observing live behavior risks Docker Hub rate-limit flakes (the job pulls python:3.14-slim and caddy:2 on every run; weekly Dependabot Mondays compound the pull rate).

Tracking issue: #31 captures the soak rationale + a concrete promotion trigger (5 consecutive green runs on main OR 2026-05-10, whichever first) + the one-line ruleset-update mechanism. The promotion is an operator action on the main-protection ruleset, not a code change — so it lives in #31 rather than this PR's diff. Without #31 this would be a punt; with it, the soak is a deferral with a stated trigger.

Test plan

Locally verified all four steps before commit:

  • docker build -f deploy/docker/Dockerfile -t pypi-winnow-downloads:smoke . — succeeds
  • docker run --rm --entrypoint /opt/venv/bin/winnow-collect pypi-winnow-downloads:smoke --help — exits 0 with expected usage output
  • BADGE_HOST=badges.example.com docker compose -f deploy/docker/compose.yml.example config — emits merged service tree (collector + caddy + volumes)
  • docker run --rm -v $PWD/deploy/caddy:/etc/caddy:ro caddy:2 caddy validate --config /etc/caddy/Caddyfile.example --adapter caddyfile — emits "Valid configuration"
  • CI run on this branch shows the new deploy-smoke job appearing alongside lint / typecheck / test (3.11/3.12/3.13) and passing
  • CHANGELOG entry under ## [Unreleased]### Added references this PR's closes Add Docker image smoke test to CI #7
  • Tracking issue Promote deploy-smoke to required status check (soak window post-#29) #31 filed for soak-then-promote with explicit trigger and acceptance criteria

Closes #7

@github-actions github-actions Bot added the Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA label Apr 26, 2026
@cmeans-claude-dev cmeans-claude-dev Bot added the Ready for QA Dev work complete — QA can begin review label Apr 26, 2026
@github-actions github-actions Bot removed the Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA label Apr 26, 2026
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@cmeans cmeans added QA Active QA is actively reviewing; Dev should not push changes and removed Ready for QA Dev work complete — QA can begin review labels Apr 26, 2026

@cmeans cmeans left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA round 1 — clean (no follow-ups)

PR closes #7 in full and matches the issue's proposal exactly: 4 smoke steps (Docker build + entrypoint --help + Compose config + Caddyfile validate), with systemd-analyze correctly excluded per the issue's own caveat about the unit's referenced binary not existing on a fresh runner.

CI run on PR head df2ce49deploy-smoke job all-success:

Step CI step output
docker build -f deploy/docker/Dockerfile -t pypi-winnow-downloads:smoke . multi-stage build through python:3.14-slim (PR #24's bump now exercised) — DONE in 7s, image tagged
docker run --rm --entrypoint /opt/venv/bin/winnow-collect pypi-winnow-downloads:smoke --help usage: winnow-collect [-h] --config CONFIG [-v] printed; exit 0 — confirms venv links resolve and import pypi_winnow_downloads works inside the runtime stage (the exact bug class from PR #6 round 1)
BADGE_HOST=badges.example.com docker compose -f deploy/docker/compose.yml.example config merged service tree emitted (collector, caddy, port bindings, volumes) — ${BADGE_HOST:?…} substitution resolves correctly
docker run … caddy:2 caddy validate --config /etc/caddy/Caddyfile.example --adapter caddyfile Valid configuration — Caddyfile parses against the production image

Static + scope checks:

Check Result
Issue #7 scope matches — all required steps + 2 of 3 optional extras; systemd-analyze skip is documented in the PR with reasoning that mirrors the issue's own optional-extras note
Diff scope exactly .github/workflows/ci.yml (+41) + CHANGELOG.md (+1) — matches PR claim
CHANGELOG entry placement top of existing ### Added in ## [Unreleased]; references Closes #7 — clean Keep-a-Changelog ordering, no new orphans
Entrypoint override rationale sound — Dockerfile's ENTRYPOINT hardcodes --config /etc/pypi-winnow-downloads/config.yaml which the smoke image doesn't ship; overriding to /opt/venv/bin/winnow-collect lets --help short-circuit before config loading
Local uv run pytest -q + ruff check + ruff format --check + mypy src 56 passed, 0 deselected; clean across the rest
Other CI on PR head all SUCCESS (test 3.11/3.12/3.13, lint, typecheck, on-push, qa-approved)

Test-plan checkboxes 5 + 6 ticked (verifiable now). Items 1–4 already ticked by Dev with local pre-commit verification, which I cross-checked against the CI run output and reproduced the same outputs.

The follow-up note in the PR (adding deploy-smoke to the main-protection ruleset's required-status-checks list) is a maintainer ruleset choice, not a code change — correctly out of scope for this PR.

No findings. Transitioning label to Ready for QA Signoff.

@cmeans

cmeans commented Apr 27, 2026

Copy link
Copy Markdown
Owner

Applying Ready for QA Signoff — see review above. Closes #7 in full; deploy-smoke job all-green on PR head, all 4 steps producing expected output (Valid configuration, exit-0 --help, merged compose tree). No follow-ups.

@cmeans cmeans added Ready for QA Signoff QA passed — ready for maintainer final review and merge and removed QA Active QA is actively reviewing; Dev should not push changes labels Apr 27, 2026
… examples (closes #7)

Adds a `deploy-smoke` job to `.github/workflows/ci.yml` that runs
on every push to main and every PR. Four steps:

1. `docker build -f deploy/docker/Dockerfile -t pypi-winnow-downloads:smoke .`
2. `docker run --rm --entrypoint /opt/venv/bin/winnow-collect
   pypi-winnow-downloads:smoke --help` — exit 0 confirms the venv
   links resolve and `import pypi_winnow_downloads` works inside
   the runtime stage. This is the bug class that PR #6 round 1
   shipped (editable install pointing at a non-existent /src),
   which all the Python-only checks missed and only QA caught
   manually. Now CI catches it.
3. `BADGE_HOST=badges.example.com docker compose -f
   deploy/docker/compose.yml.example config` — validates the
   example compose syntax + the BADGE_HOST substitution.
4. `docker run --rm -v $WORKSPACE/deploy/caddy:/etc/caddy:ro
   caddy:2 caddy validate --config /etc/caddy/Caddyfile.example
   --adapter caddyfile` — validates the Caddyfile against the
   exact image users will run.

Skips `systemd-analyze verify` on the timer per the issue's
caveat — the referenced .service unit declares
`ExecStart=/usr/local/bin/winnow-collect`, which doesn't exist
on a fresh CI runner, so the verify would warn-or-fail on a
condition that's intended (it's a deploy example, the binary
gets installed by the operator following deploy/README.md). The
.service syntax is trivial enough that the cost of CI noise
exceeds the value of the check.

Override of the Dockerfile's ENTRYPOINT in step 2 is necessary
because the default entrypoint hardcodes `--config
/etc/pypi-winnow-downloads/config.yaml`, which the smoke
container doesn't ship — would error before --help can short-
circuit. Overriding to the binary directly lets us pass --help
cleanly.

Verified locally before commit: build succeeds, --help exits 0
with the expected usage output, compose config emits the merged
service tree, caddy validate emits "Valid configuration".
@github-actions github-actions Bot added Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA Ready for QA Dev work complete — QA can begin review and removed Ready for QA Signoff QA passed — ready for maintainer final review and merge Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA labels Apr 27, 2026

@cmeans cmeans left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cmeans cmeans added QA Active QA is actively reviewing; Dev should not push changes and removed Ready for QA Dev work complete — QA can begin review labels Apr 27, 2026

@cmeans cmeans left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA round 2 — QA Failed

Reversing my round-1 sign-off after a closer look. Local + CI checks are clean on head 9805544; the failure is at the level of "does this PR actually deliver its stated protection."

Substantive finding (blocker): deploy-smoke ships non-required, which defeats the PR's stated purpose

The PR's motivation, stated in both the description and the inline workflow comment, is to catch the bug class from PR #6 round 1 — a Dockerfile breakage that lint / mypy / pytest all greenlit. The defense only works if a deploy-smoke failure blocks merge. Today it doesn't.

Verified directly against the live ruleset:

$ gh api repos/cmeans/pypi-winnow-downloads/rulesets/15546042 \
    | jq -r '.rules[] | select(.type == "required_status_checks") | .parameters.required_status_checks[].context'
qa-approved
lint
typecheck
test (3.11)
test (3.12)
test (3.13)

deploy-smoke is not on that list. With this PR landed as written, a future PR that breaks deploy/docker/Dockerfile (the exact PR #6 scenario) would see deploy-smoke fail, all required checks stay green, and the PR remains mergeable. Same pre-PR posture, same bug class still ships.

The PR body acknowledges this and punts:

If the maintainer wants deploy-smoke to be a required check, add it to the main-protection ruleset's required status checks list. Out of scope for this PR.

That's the homework-miss the new lens flags. The protective intent is the entire scope of the PR. "Out of scope" without a stated rationale (soak window, expected flakiness, etc.) and without a concrete promotion trigger is punting the half of the work that actually changes outcomes.

Two acceptable resolutions

Pick one:

A. Promote in this PR. Update the ruleset to add deploy-smoke to required status checks atomically with the workflow addition. One API call:

gh api -X PUT repos/cmeans/pypi-winnow-downloads/rulesets/15546042 \
   -f 'rules[].parameters.required_status_checks[].context=deploy-smoke' …

(Or via the UI — Settings → Rules → main-protection → required status checks → add deploy-smoke.) Both valid; the second isn't a "code change" but the reason it's not doesn't make the work less in-scope, just less reviewable in the diff.

B. Soak-then-promote with an explicit trigger. If the concern is flake risk during initial rollout (Docker Hub rate limits, registry availability, etc.), state that explicitly in the PR body and file a tracking issue with a concrete promotion trigger — e.g., "promote deploy-smoke to required after 5 consecutive green runs on main" or "by 2026-05-10". This is the pattern used for vdsm.yml on cmeans/mcp-synology (intention 0737e130-02f7-4b3c-bc98-eed195bac6d7). With that ticket filed and referenced from the PR, the soak rationale stands and signoff is appropriate.

What's not acceptable as-is: a "Follow-up (post-merge, optional)" line with no rationale and no trigger — that's the work being punted, not deferred.

What is correct on this PR

Aspect State
Issue #7 step coverage matches the issue's proposal exactly (4 of 4 listed steps, plus the documented systemd-analyze skip with reasoning that mirrors the issue's own caveat)
Local + CI verification on 9805544 56/56 pytest, ruff/format/mypy clean; deploy-smoke job all-success in CI
Diff scope .github/workflows/ci.yml (+41) + CHANGELOG.md (+1) — matches PR claim
CHANGELOG entry top of ### Added in ## [Unreleased], references Closes #7 — correct Keep-a-Changelog placement
Workflow internals sound — entrypoint override rationale matches the Dockerfile's ENTRYPOINT, BADGE_HOST substitution proven in compose config output, Caddyfile validate emits Valid configuration against the production caddy:2 image

This is a one-finding QA Failed, not a teardown. The job itself is well-built; what's missing is making it load-bearing.

Transitioning label to QA Failed.

@cmeans

cmeans commented Apr 27, 2026

Copy link
Copy Markdown
Owner

Applying QA Failed — see review above. One substantive finding: deploy-smoke ships non-required, so the bug class from PR #6 round 1 (the entire reason this job exists) can still merge with a green-looking PR. Reversing my round-1 signoff after applying a tighter homework-miss lens. Two acceptable resolutions in the review.

@cmeans cmeans added QA Failed QA found issues — needs dev attention and removed QA Active QA is actively reviewing; Dev should not push changes labels Apr 27, 2026
@cmeans-claude-dev cmeans-claude-dev Bot added Ready for QA Dev work complete — QA can begin review and removed QA Failed QA found issues — needs dev attention labels Apr 27, 2026
@cmeans-claude-dev

Copy link
Copy Markdown
Contributor Author

QA round 2 finding addressed via Option B (soak-then-promote), per your second resolution path. No code change — body-only edit:

The job itself is unchanged (head still 9805544); only the rationale around when it becomes load-bearing changed. Re-tagged Ready for QA.

@cmeans cmeans added QA Active QA is actively reviewing; Dev should not push changes and removed Ready for QA Dev work complete — QA can begin review labels Apr 27, 2026

@cmeans cmeans left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA round 3 — clean, signing off

Round 2 fail addressed via resolution B (soak-then-promote with tracking issue). Code unchanged from round 2 (9805544); only the PR body picked up the soak section and a new test-plan checkbox referencing #31.

Round-2 finding resolution check:

Round-2 ask Resolution on this round
State soak rationale explicitly PR body's "Soak window before promoting…" section names Docker Hub's 100/6h anonymous-pull limit, identifies the two registry pulls per run (python:3.14-slim for build + runtime stages, caddy:2 for validate), and notes Dependabot Mondays compound the rate. Defensible — not vague handwaving.
Concrete promotion trigger Issue #31: 5 consecutive green deploy-smoke runs on main OR 2026-05-10, whichever first. Time-bounded, with a "fix root cause before promoting, don't extend indefinitely" guard.
Tracking issue with acceptance criteria Issue #31 exists, lists three acceptance checkboxes including a verification gate ("a subsequent test PR that intentionally breaks deploy/docker/Dockerfile is BLOCKED at merge time on the failed deploy-smoke check") — that's the load-bearing test that the promotion actually worked.
Promotion mechanism documented Issue #31 names ruleset ID 15546042 (matches what I verified via gh api in round 2) and gives both gh api and UI paths, honest about the gh api PUT form needing full ruleset JSON.

No code changes from round 2:

  • Head SHA 980554476… identical to what I verified locally in round 2 (56/56 pytest, ruff/format/mypy clean) and what deploy-smoke itself ran green against in CI
  • CI on this head all SUCCESS (test 3.11/3.12/3.13, lint, typecheck, deploy-smoke, on-push, qa-approved)

The "punted, optional, post-merge" framing from round 2 is now "deferred with stated trigger" — which is the distinction the new lens calls for. Resolution B properly executed.

Note for tracking: issue #31's acceptance gate ("test PR that breaks Dockerfile is BLOCKED") needs to be exercised when promotion happens. Not a blocker for this PR, but worth keeping in view — without that verification, the promotion is itself unverified.

No findings. Transitioning label to Ready for QA Signoff.

@cmeans

cmeans commented Apr 27, 2026

Copy link
Copy Markdown
Owner

Applying Ready for QA Signoff — see review above. Round 2 finding resolved via resolution B: explicit soak rationale (Docker Hub rate-limit specifics) + concrete promotion trigger (5 green runs on main OR 2026-05-10) + tracking issue #31 with load-bearing acceptance criteria. Code unchanged from round 2; verification still holds.

@cmeans cmeans added Ready for QA Signoff QA passed — ready for maintainer final review and merge QA Approved Manual QA testing completed and passed and removed QA Active QA is actively reviewing; Dev should not push changes Ready for QA Signoff QA passed — ready for maintainer final review and merge labels Apr 27, 2026
@cmeans-claude-dev cmeans-claude-dev Bot merged commit 4e4148f into main Apr 27, 2026
60 checks passed
@cmeans-claude-dev cmeans-claude-dev Bot deleted the ci/docker-smoke branch April 27, 2026 01:36
cmeans-claude-dev Bot added a commit that referenced this pull request Apr 27, 2026
Three mechanical edits:

- pyproject.toml: version "0.1.0" -> "0.1.1"
- CHANGELOG.md: insert `## [0.1.1] - 2026-04-26` directly under
  the (still empty) `## [Unreleased]` header so all 12 PRs'
  worth of bullets that have been accumulating since v0.1.0
  ship are now categorized under the 0.1.1 release. Updated
  the link refs at the bottom: [Unreleased] now compares from
  v0.1.1, and a new [0.1.1] entry compares v0.1.0...v0.1.1.
- uv.lock: refreshed by `uv lock` so the locked
  pypi-winnow-downloads version (0.1.1) matches pyproject.toml.

What ships in v0.1.1 (highlights — full changelog under
## [0.1.1]):

Library fixes (operator-visible):
- collector: _write_health OSError no longer escapes
  per-package isolation. Disk-full / perm errors now produce
  structured `winnow-collect: ...; health file write failed:
  [Errno 28] No space left on device` exit instead of a raw
  traceback. Closes #32.
- collector: stale_threshold_days is now actually consulted —
  the "warn if previous run is older than N days" feature
  documented in config.example.yaml since v0.1.0 finally
  fires. Log-only per the documented v1 contract; degrades
  silently on first-run / unreadable / malformed / future-
  timestamped previous _health.json. Closes #33.

Documentation:
- README acknowledgments / license / BigQuery dataset link
  refresh (PR #15)
- README shields.io URL canonicalization (PR #27, closes #16)
- deploy/README.md Tailscale Funnel as alternative HTTPS
  exposure (PR #22)
- deploy/README.md "Pick an approach" table updated to
  reflect the new Caddy logging shape (in PR #30)

CI / project infrastructure (no PyPI consumer impact, but
hardens future releases):
- Community health files: CONTRIBUTING / CoC / SECURITY /
  issue templates (PR #20)
- .github/dependabot.yml across pip + github-actions + docker
  ecosystems (PR #21)
- Dependabot PR hygiene cascade from cmeans/mcp-synology:
  PULL_REQUEST_TEMPLATE.md + auto-CHANGELOG workflow (App-
  token authenticated so required CI re-fires on the bot's
  HEAD SHA) + dependabot.yml prefix fix (PR #25). Validated
  end-to-end via the first two real Dependabot bumps PR #23
  (codecov-action 5->6) and PR #24 (python 3.13-slim ->
  3.14-slim).
- deploy-smoke CI job that builds the Dockerfile, smokes the
  entrypoint, validates compose+Caddyfile against caddy:2
  (PR #29, closes #7). Promoted to required status check on
  the main-protection ruleset 2026-04-26 22:43 (issue #31
  closed via operator action).
- deploy/caddy/Caddyfile.example gains global error logger +
  per-site access logger with built-in lumberjack rotation,
  documents the validate-as-root gotcha (PR #30). Live CT 112
  deployment fixed in the same change.
- 100% coverage on src/ via real tests (no `# pragma: no
  cover`), with `fail_under = 100` gate in pyproject.toml so
  future regressions trip CI (PR #38, closes #37).

Verified locally: 71/71 pytest pass, ruff/format/mypy clean,
coverage gate green at 100.00%.

After this merges:
1. Tag the squash-merge commit as v0.1.1 and push the tag —
   publish.yml fires and uploads to PyPI via the existing
   trusted-publisher OIDC flow.
2. Update the live CT 112 deployment to install
   pypi-winnow-downloads==0.1.1 from PyPI (currently runs a
   wheel built from main, but pinning to the released
   version keeps deploy reproducible).
3. Close any post-release follow-ups Chris wants tracked.

Co-authored-by: cmeans-claude-dev[bot] <272174644+cmeans-claude-dev[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

QA Approved Manual QA testing completed and passed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Docker image smoke test to CI

2 participants