From 2058aa6fdd971b5d9f4fbcf3b22525b04716d01d Mon Sep 17 00:00:00 2001 From: "cmeans-claude-dev[bot]" <272174644+cmeans-claude-dev[bot]@users.noreply.github.com> Date: Wed, 29 Apr 2026 18:30:49 -0500 Subject: [PATCH 1/7] feat(collector): add OS allowlist + badge specs (no behavior change) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds _SYSTEM_NAMES, _SYSTEM_ALLOWLIST, and _OS_BADGE_SPECS constants parallel to the per-installer constants. No behavior change yet — the constants are forward-declared for the v3 OS distribution feature (filenames, labels, allowlist keys). Subsequent commits wire up the multi-dim pypinfo query, per-system aggregation, badge emission, and _health.json shape. Spec: docs/superpowers/specs/2026-04-29-os-distribution-badge-design.md Co-Authored-By: Claude Opus 4.7 (1M context) --- .../2026-04-29-os-distribution-badges.md | 1012 +++++++++++++++++ ...2026-04-29-os-distribution-badge-design.md | 155 +++ src/pypi_winnow_downloads/collector.py | 21 + 3 files changed, 1188 insertions(+) create mode 100644 docs/superpowers/plans/2026-04-29-os-distribution-badges.md create mode 100644 docs/superpowers/specs/2026-04-29-os-distribution-badge-design.md diff --git a/docs/superpowers/plans/2026-04-29-os-distribution-badges.md b/docs/superpowers/plans/2026-04-29-os-distribution-badges.md new file mode 100644 index 0000000..e761f5b --- /dev/null +++ b/docs/superpowers/plans/2026-04-29-os-distribution-badges.md @@ -0,0 +1,1012 @@ +# OS Distribution Badges Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Add per-OS download breakdown badges (linux / macos / windows) parallel to the per-installer breakdown shipped in v0.2.0, by extending the pypinfo group-by to a multi-dimensional `(installer, system)` query. + +**Architecture:** Extend `run_pypinfo()` argv from `["ci", "installer"]` to `["ci", "installer", "system"]`. Restructure its return type from `dict[str, int]` to a TypedDict carrying `by_installer` and `by_system` aggregates. Add `_OS_BADGE_SPECS` and three new badge filenames following v2's pattern. Extend `PackageOutcome` and `_write_health()` to carry the per-system counts. README dogfood block grows a parallel "By OS" paragraph. + +**Tech Stack:** Python 3.11+, `pypinfo`, BigQuery (group-by), pytest, ruff, mypy. + +**Spec:** `docs/superpowers/specs/2026-04-29-os-distribution-badge-design.md` + +--- + +## File structure + +| Path | Action | Responsibility | +| --- | --- | --- | +| `src/pypi_winnow_downloads/collector.py` | Modify | Add system constants + badge specs; change `run_pypinfo()` argv and return shape; add per-system aggregation; extend badge emission loop; update `PackageOutcome`; extend `_write_health()`. | +| `tests/test_collector.py` | Modify | New tests for: argv addition, return-shape change, per-system aggregation, system allowlist filter, edge case (allowlisted installer + non-allowlisted system), v0.2.0 hero stability invariant, OS badge file emission, `_health.json` `counts_by_system` field, existing fields preserved. | +| `README.md` | Modify | Add "By OS" paragraph to each dogfood block; add "By OS breakdown" paragraph; add 3 rows to "Use this service for your own package" table. | +| `CHANGELOG.md` | Modify | One bullet under `## [Unreleased]` → `### Added`. | +| `docs/superpowers/specs/2026-04-29-os-distribution-badge-design.md` | Already committed (Task 1 stages it from working tree) | Design spec. | +| `docs/superpowers/plans/2026-04-29-os-distribution-badges.md` | Already committed (Task 1 stages it) | This plan. | + +--- + +## Task 1: Branch + add constants + +**Files:** +- Modify: `src/pypi_winnow_downloads/collector.py` — add system constants + OS badge specs. + +- [ ] **Step 1: Create the feature branch** + +```bash +git checkout main +git status -sb +git checkout -b feat/os-distribution-badges +``` + +Expected: clean working tree on main matches origin/main, then on `feat/os-distribution-badges`. The untracked spec + plan files follow the branch. + +- [ ] **Step 2: Add the new constants block** + +Open `src/pypi_winnow_downloads/collector.py`. Find the `_INSTALLER_BADGE_SPECS` block (~line 34) and the `_INSTALLER_NAMES` / `_INSTALLER_ALLOWLIST` block (~line 56–61). Add a parallel `_SYSTEM_*` and `_OS_BADGE_SPECS` block immediately after `_INSTALLER_ALLOWLIST`: + +```python +# System (OS) allowlist for the per-OS breakdown. The same `details.ci != True` +# filter applies as the hero. The keys are pypinfo's raw `system_name` values +# (matches BigQuery's `details.system.name` column emission); the badge +# filename slug and label use the user-friendly `macos` for `Darwin`. Long-tail +# values (BSD variants, null/empty system_name) are excluded — they neither +# contribute to per-OS aggregates nor surface as a badge. +_SYSTEM_NAMES: tuple[str, ...] = ("Linux", "Darwin", "Windows") +_SYSTEM_ALLOWLIST: frozenset[str] = frozenset(_SYSTEM_NAMES) + +# Per-OS badge specs: (filename_template, label_template, counts_key). +# Order matches the README dogfood layout. `counts_key` is the raw pypinfo +# emission (matches `_SYSTEM_ALLOWLIST`); the slug/label use the user-friendly +# form. Hero count is unaffected — see spec for the v0.2.0 hero-stability +# invariant. +_OS_BADGE_SPECS: tuple[tuple[str, str, str], ...] = ( + ("os-linux-{days}d-non-ci.json", "linux ({days}d)", "Linux"), + ("os-macos-{days}d-non-ci.json", "macos ({days}d)", "Darwin"), + ("os-windows-{days}d-non-ci.json", "windows ({days}d)", "Windows"), +) +``` + +- [ ] **Step 3: Verify the file still parses + lints clean** + +```bash +uv run python -c "from pypi_winnow_downloads import collector; print(collector._SYSTEM_NAMES, collector._OS_BADGE_SPECS)" +uv run ruff check src/ tests/ +uv run ruff format --check src/ tests/ +uv run mypy src/pypi_winnow_downloads/ +uv run pytest -x +``` + +Expected: imports succeed, ruff/format/mypy clean, all 79 existing tests still pass (no behavior change yet). + +- [ ] **Step 4: Stage spec + plan files (untracked from prior brainstorming)** + +```bash +git status -sb +git add src/pypi_winnow_downloads/collector.py docs/superpowers/specs/2026-04-29-os-distribution-badge-design.md docs/superpowers/plans/2026-04-29-os-distribution-badges.md +``` + +- [ ] **Step 5: Commit** + +```bash +git commit -m "$(cat <<'EOF' +feat(collector): add OS allowlist + badge specs (no behavior change) + +Adds _SYSTEM_NAMES, _SYSTEM_ALLOWLIST, and _OS_BADGE_SPECS constants +parallel to the per-installer constants. No behavior change yet — the +constants are forward-declared for the v3 OS distribution feature +(filenames, labels, allowlist keys). Subsequent commits wire up the +multi-dim pypinfo query, per-system aggregation, badge emission, and +_health.json shape. + +Spec: docs/superpowers/specs/2026-04-29-os-distribution-badge-design.md + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 2: TDD — `run_pypinfo()` multi-dim grouping + +**Files:** +- Modify: `tests/test_collector.py` — add new test cases. +- Modify: `src/pypi_winnow_downloads/collector.py` — change argv, return shape, aggregation. + +- [ ] **Step 1: Write the failing test for argv extension** + +Open `tests/test_collector.py`. Find `test_run_pypinfo_invokes_pypinfo_with_expected_argv` (~line 29). Add a new test immediately after it: + +```python +def test_run_pypinfo_argv_groups_by_ci_installer_system(tmp_path: Path) -> None: + """v3 OS distribution: pypinfo group-by extended from `ci installer` to + `ci installer system` so a single BigQuery call returns both dimensions.""" + captured: list[list[str]] = [] + + def fake_runner(argv: list[str], env: dict[str, str]) -> subprocess.CompletedProcess[str]: + captured.append(list(argv)) + return _ok_result(argv) + + creds = tmp_path / "creds.json" + creds.write_text("{}") + run_pypinfo("mypkg", 30, credential_file=creds, runner=fake_runner) + + assert captured, "fake_runner was never called" + argv = captured[0] + # The three positional dimension args must appear in this order at the end of argv. + assert argv[-3:] == ["ci", "installer", "system"], argv +``` + +- [ ] **Step 2: Write the failing tests for return shape + per-system aggregation** + +Append after the new test: + +```python +def _ok_rows(rows: list[dict]) -> str: + return json.dumps({"rows": rows}) + + +def test_run_pypinfo_returns_by_installer_and_by_system(tmp_path: Path) -> None: + """Return shape is a structured dict with two aggregates.""" + stdout = _ok_rows([ + {"ci": "False", "download_count": 100, "installer_name": "pip", "system_name": "Linux"}, + {"ci": "False", "download_count": 30, "installer_name": "pip", "system_name": "Darwin"}, + {"ci": "False", "download_count": 20, "installer_name": "uv", "system_name": "Linux"}, + {"ci": "False", "download_count": 5, "installer_name": "uv", "system_name": "Windows"}, + ]) + + def fake_runner(argv, env): + return subprocess.CompletedProcess(argv, 0, stdout=stdout, stderr="") + + creds = tmp_path / "creds.json" + creds.write_text("{}") + result = run_pypinfo("mypkg", 30, credential_file=creds, runner=fake_runner) + + assert result == { + "by_installer": {"pip": 130, "pipenv": 0, "pipx": 0, "uv": 25, "poetry": 0, "pdm": 0}, + "by_system": {"Linux": 120, "Darwin": 30, "Windows": 5}, + } + + +def test_run_pypinfo_filters_out_non_allowlisted_systems(tmp_path: Path) -> None: + """Long-tail OSes (BSD, null, etc.) drop out of by_system but still + contribute to by_installer when the installer is allowlisted — the + v0.2.0 hero-stability invariant.""" + stdout = _ok_rows([ + {"ci": "False", "download_count": 100, "installer_name": "pip", "system_name": "Linux"}, + {"ci": "False", "download_count": 7, "installer_name": "pip", "system_name": "FreeBSD"}, + {"ci": "False", "download_count": 11, "installer_name": "pip", "system_name": ""}, + {"ci": "False", "download_count": 13, "installer_name": "pip", "system_name": "OpenBSD"}, + ]) + + def fake_runner(argv, env): + return subprocess.CompletedProcess(argv, 0, stdout=stdout, stderr="") + + creds = tmp_path / "creds.json" + creds.write_text("{}") + result = run_pypinfo("mypkg", 30, credential_file=creds, runner=fake_runner) + + # Hero stability: by_installer["pip"] = 100 + 7 + 11 + 13 = 131 (all 4 rows count). + assert result["by_installer"]["pip"] == 131 + # by_system: only the Linux row counts; non-allowlisted/empty system_name rows drop out. + assert result["by_system"] == {"Linux": 100, "Darwin": 0, "Windows": 0} + + +def test_run_pypinfo_excludes_ci_true_from_both_dimensions(tmp_path: Path) -> None: + """CI traffic is filtered before either dimension's aggregation.""" + stdout = _ok_rows([ + {"ci": "True", "download_count": 9999, "installer_name": "pip", "system_name": "Linux"}, + {"ci": "None", "download_count": 50, "installer_name": "pip", "system_name": "Linux"}, + {"ci": "False", "download_count": 10, "installer_name": "pip", "system_name": "Linux"}, + ]) + + def fake_runner(argv, env): + return subprocess.CompletedProcess(argv, 0, stdout=stdout, stderr="") + + creds = tmp_path / "creds.json" + creds.write_text("{}") + result = run_pypinfo("mypkg", 30, credential_file=creds, runner=fake_runner) + + # CI=True row dropped; CI=None and CI=False rows count (matches v1 behavior). + assert result["by_installer"]["pip"] == 60 + assert result["by_system"]["Linux"] == 60 + + +def test_run_pypinfo_handles_missing_system_name_field(tmp_path: Path) -> None: + """A row missing the system_name key entirely (older pypinfo schema or + user-agent parsing failure) must not crash; it just doesn't contribute + to by_system.""" + stdout = _ok_rows([ + {"ci": "False", "download_count": 42, "installer_name": "pip", "system_name": "Linux"}, + {"ci": "False", "download_count": 8, "installer_name": "pip"}, # no system_name + ]) + + def fake_runner(argv, env): + return subprocess.CompletedProcess(argv, 0, stdout=stdout, stderr="") + + creds = tmp_path / "creds.json" + creds.write_text("{}") + result = run_pypinfo("mypkg", 30, credential_file=creds, runner=fake_runner) + + assert result["by_installer"]["pip"] == 50 + assert result["by_system"] == {"Linux": 42, "Darwin": 0, "Windows": 0} +``` + +- [ ] **Step 3: Run the new tests — must fail** + +```bash +uv run pytest tests/test_collector.py::test_run_pypinfo_argv_groups_by_ci_installer_system tests/test_collector.py::test_run_pypinfo_returns_by_installer_and_by_system tests/test_collector.py::test_run_pypinfo_filters_out_non_allowlisted_systems tests/test_collector.py::test_run_pypinfo_excludes_ci_true_from_both_dimensions tests/test_collector.py::test_run_pypinfo_handles_missing_system_name_field -v +``` + +Expected: all 5 fail (argv assertion fails because "system" isn't in argv yet; return-shape assertions fail because `result` is still `dict[str, int]`). + +- [ ] **Step 4: Update `run_pypinfo()` argv** + +Open `src/pypi_winnow_downloads/collector.py`. Find the argv block in `run_pypinfo()` (~line 141–150). Change: + +```python + argv = [ + _resolve_pypinfo_path(), + "--json", + "--days", + str(window_days), + "--all", + package, + "ci", + "installer", + ] +``` + +to: + +```python + argv = [ + _resolve_pypinfo_path(), + "--json", + "--days", + str(window_days), + "--all", + package, + "ci", + "installer", + "system", + ] +``` + +- [ ] **Step 5: Update `run_pypinfo()` return type and aggregation** + +Add a `RunPypinfoResult` TypedDict near the top of the module (next to the `Runner` and `Clock` aliases, ~line 21): + +```python +from typing import TypedDict + + +class RunPypinfoResult(TypedDict): + by_installer: dict[str, int] + by_system: dict[str, int] +``` + +Change `run_pypinfo()`'s return annotation from `dict[str, int]` to `RunPypinfoResult`. + +Replace the aggregation block (the `counts: dict[str, int] = {name: 0 for name in _INSTALLER_NAMES}` block + the row loop) with: + +```python + # Initialize both dicts to zero for every allowlisted key so the returned + # shape is stable regardless of which (installer, system) pairs had rows + # in this window. Order follows _INSTALLER_NAMES / _SYSTEM_NAMES so callers + # can rely on iteration order for badge filenames and tests can assert on + # equality with specific dict literals. + by_installer: dict[str, int] = {name: 0 for name in _INSTALLER_NAMES} + by_system: dict[str, int] = {name: 0 for name in _SYSTEM_NAMES} + for row in rows: + if not isinstance(row, dict): + raise CollectorError( + f"pypinfo row for {package!r} has unexpected shape (not a dict): {row!r}" + ) + if row.get("ci") == "True": + continue + if "installer_name" not in row: + raise CollectorError( + f"pypinfo row for {package!r} missing 'installer_name' field: {row!r}" + ) + installer = row["installer_name"] + count = row.get("download_count", 0) + if not isinstance(count, int): + raise CollectorError( + f"pypinfo row for {package!r} has non-integer download_count: {count!r}" + ) + + # Per-installer aggregation: hero-stability invariant — count + # the row regardless of system_name, as long as the installer is + # allowlisted. v0.2.0's hero-count contract depends on this. + if installer in _INSTALLER_ALLOWLIST: + by_installer[installer] += count + + # Per-system aggregation: independent allowlist check. Rows with + # missing/empty/non-allowlisted system_name drop out of by_system + # but may still contribute to by_installer above. + system = row.get("system_name", "") + if system in _SYSTEM_ALLOWLIST: + by_system[system] += count + + return {"by_installer": by_installer, "by_system": by_system} +``` + +- [ ] **Step 6: Run tests — they should pass; existing tests should still pass** + +```bash +uv run pytest tests/test_collector.py -v +``` + +Expected: 5 new tests PASS, all prior tests STILL PASS (~79 existing → 84 total). If any existing test fails, it's because the existing test asserted on the old `dict[str, int]` return shape. Update those existing tests to use `result["by_installer"]` instead of `result` directly. Specifically, audit the existing tests: + +```bash +grep -n 'run_pypinfo("mypkg"' tests/test_collector.py | head +``` + +For each such test, if the assertion looks like `assert result == {"pip": ...}`, change it to `assert result["by_installer"] == {"pip": ...}` (and add `"by_system": ...` if the test had system_name fields in its rows; otherwise leave by_system implicit). + +- [ ] **Step 7: Commit** + +```bash +git add src/pypi_winnow_downloads/collector.py tests/test_collector.py +git commit -m "$(cat <<'EOF' +feat(collector): pypinfo group-by ci installer system + dual-dim aggregation + +Changes run_pypinfo() to query BigQuery on a 3-dimensional GROUP BY +(`ci installer system`) so a single call yields both per-installer and +per-system breakdowns. Return type changes from dict[str, int] to a +TypedDict carrying both aggregates. + +The v0.2.0 hero-stability invariant is preserved: hero count +(sum(by_installer.values())) is unchanged because the per-installer +aggregation does not consider system_name. The per-system aggregation +applies an independent allowlist filter (Linux/Darwin/Windows); rows +with missing or non-allowlisted system_name drop out of by_system but +still count toward by_installer when the installer is allowlisted. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 3: TDD — per-OS badge file emission + +**Files:** +- Modify: `tests/test_collector.py` — add badge-emission tests. +- Modify: `src/pypi_winnow_downloads/collector.py` — extend badge emission loop, update `PackageOutcome`. + +- [ ] **Step 1: Read the existing badge-emission test pattern for context** + +```bash +grep -n 'def test_collect_one_writes\|installer-pip-' tests/test_collector.py | head +``` + +Identify how the existing test asserts on the per-installer file emission. The new test should mirror that pattern. + +- [ ] **Step 2: Write failing tests for the 3 new OS badge files** + +Append to `tests/test_collector.py` after the existing per-installer-emission tests: + +```python +def test_collect_one_writes_three_per_os_badge_files(tmp_path: Path) -> None: + """v3 OS distribution: collector emits os-linux-30d-non-ci.json, + os-macos-30d-non-ci.json, os-windows-30d-non-ci.json with the + correct shields.io shape.""" + output_dir = tmp_path / "out" + creds = tmp_path / "creds.json" + creds.write_text("{}") + + stdout = _ok_rows([ + {"ci": "False", "download_count": 100, "installer_name": "pip", "system_name": "Linux"}, + {"ci": "False", "download_count": 30, "installer_name": "pip", "system_name": "Darwin"}, + {"ci": "False", "download_count": 5, "installer_name": "uv", "system_name": "Windows"}, + ]) + + def fake_runner(argv, env): + return subprocess.CompletedProcess(argv, 0, stdout=stdout, stderr="") + + config = _make_config(packages=[("mypkg", 30)], output_dir=output_dir, credential_file=creds) + _collect_one(config.packages[0], config, runner=fake_runner) + + pkg_dir = output_dir / "mypkg" + linux = json.loads((pkg_dir / "os-linux-30d-non-ci.json").read_text()) + macos = json.loads((pkg_dir / "os-macos-30d-non-ci.json").read_text()) + windows = json.loads((pkg_dir / "os-windows-30d-non-ci.json").read_text()) + + assert linux["label"] == "linux (30d)" + assert linux["message"] == "100" + assert linux["color"] == "blue" + + assert macos["label"] == "macos (30d)" + assert macos["message"] == "30" + assert macos["color"] == "blue" + + assert windows["label"] == "windows (30d)" + assert windows["message"] == "5" + # 5 < 10 → lightgrey per the existing color logic. + assert windows["color"] == "lightgrey" + + +def test_collect_one_v0_2_0_files_unchanged_alongside_os_files(tmp_path: Path) -> None: + """The v3 OS feature must not change v0.2.0's filename, schema, or value + for any given pypinfo response. Asserts existence + shape of all + pre-v3 files plus the 3 new OS files = 11 total per package per window.""" + output_dir = tmp_path / "out" + creds = tmp_path / "creds.json" + creds.write_text("{}") + + stdout = _ok_rows([ + {"ci": "False", "download_count": 100, "installer_name": "pip", "system_name": "Linux"}, + ]) + + def fake_runner(argv, env): + return subprocess.CompletedProcess(argv, 0, stdout=stdout, stderr="") + + config = _make_config(packages=[("mypkg", 30)], output_dir=output_dir, credential_file=creds) + _collect_one(config.packages[0], config, runner=fake_runner) + + pkg_dir = output_dir / "mypkg" + expected = { + "downloads-30d-non-ci.json", + "installer-pip-30d-non-ci.json", + "installer-pipenv-30d-non-ci.json", + "installer-pipx-30d-non-ci.json", + "installer-uv-30d-non-ci.json", + "installer-poetry-30d-non-ci.json", + "installer-pdm-30d-non-ci.json", + "installer-pip-family-30d-non-ci.json", + "os-linux-30d-non-ci.json", + "os-macos-30d-non-ci.json", + "os-windows-30d-non-ci.json", + } + actual = {p.name for p in pkg_dir.iterdir()} + assert expected == actual, f"missing: {expected - actual}, extra: {actual - expected}" + + # Hero schema unchanged. + hero = json.loads((pkg_dir / "downloads-30d-non-ci.json").read_text()) + assert hero["message"] == "100" + assert hero["label"] == "pip*/uv/poetry/pdm (30d)" +``` + +- [ ] **Step 3: Run new tests — must fail** + +```bash +uv run pytest tests/test_collector.py::test_collect_one_writes_three_per_os_badge_files tests/test_collector.py::test_collect_one_v0_2_0_files_unchanged_alongside_os_files -v +``` + +Expected: both fail because `os-linux-30d-non-ci.json` (etc.) don't exist; the badge emission loop hasn't been extended yet. + +- [ ] **Step 4: Update `PackageOutcome` dataclass** + +Find `PackageOutcome` in `src/pypi_winnow_downloads/collector.py`. Add a new field `counts_by_system: dict[str, int] | None = None` next to the existing `counts` field. + +- [ ] **Step 5: Update `_collect_one()` to consume the new return shape and emit OS badges** + +Find the body of `_collect_one()` that calls `run_pypinfo()` and assigns `per_installer`. Replace: + +```python + per_installer = run_pypinfo(...) + hero_total = sum(per_installer.values()) + counts: dict[str, int] = { + **per_installer, + "pip-family": (per_installer["pip"] + per_installer["pipenv"] + per_installer["pipx"]), + } +``` + +with: + +```python + result = run_pypinfo(...) + per_installer = result["by_installer"] + per_system = result["by_system"] + hero_total = sum(per_installer.values()) + counts: dict[str, int] = { + **per_installer, + "pip-family": (per_installer["pip"] + per_installer["pipenv"] + per_installer["pipx"]), + } +``` + +After the existing per-installer badge emission loop (the `for fname_tpl, label_tpl, key in _INSTALLER_BADGE_SPECS:` block), add a parallel OS loop: + +```python + # Per-OS badges (linux + macos + windows). The counts_key matches + # pypinfo's raw system_name emission; the slug/label use macos for + # Darwin (user-friendly form). Hero count is unaffected. + for fname_tpl, label_tpl, key in _OS_BADGE_SPECS: + os_path = ( + config.service.output_dir / pkg.name / fname_tpl.format(days=pkg.window_days) + ) + badge.write_badge( + path=os_path, + payload=badge.build_payload( + count=per_system[key], + label=label_tpl.format(days=pkg.window_days), + ), + ) +``` + +Update the `logger.info` call's badge count: + +```python + logger.info( + "collector: wrote %d badges for %s (hero count=%d, path=%s)", + 1 + len(_INSTALLER_BADGE_SPECS) + len(_OS_BADGE_SPECS), + pkg.name, + hero_total, + hero_path.parent, + ) +``` + +Update the `PackageOutcome` constructor at the bottom of the success path to include `counts_by_system=per_system`: + +```python + return PackageOutcome( + package=pkg.name, + window_days=pkg.window_days, + count=hero_total, + counts=counts, + counts_by_system=per_system, + ) +``` + +- [ ] **Step 6: Run tests — should pass** + +```bash +uv run pytest tests/test_collector.py -v +``` + +Expected: all tests pass (the 2 new tests + the existing tests). If a pre-existing test asserts on the old `per_installer` dict directly (e.g., `result == {"pip": ...}`), it'll need the same `result["by_installer"]` update applied in Task 2 Step 6 (most should already be done by then; this is a safety net). + +- [ ] **Step 7: Commit** + +```bash +git add src/pypi_winnow_downloads/collector.py tests/test_collector.py +git commit -m "$(cat <<'EOF' +feat(collector): emit per-OS badges (linux/macos/windows) + +Three new shields.io endpoint JSON files per package per window: +os-linux-Nd-non-ci.json, os-macos-Nd-non-ci.json, +os-windows-Nd-non-ci.json. Color logic and label format mirror the +per-installer badges (blue if count >= 10 else lightgrey; +parameterized by window_days). + +PackageOutcome gains a counts_by_system field; v0.2.0's existing +fields are preserved verbatim. Total badge files per package per +window increases from 8 to 11. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 4: TDD — `_health.json` shape + +**Files:** +- Modify: `tests/test_collector.py` — add `_health.json` shape tests. +- Modify: `src/pypi_winnow_downloads/collector.py` — extend `_write_health()`. + +- [ ] **Step 1: Write failing tests for `counts_by_system` in `_health.json`** + +Append: + +```python +def test_health_json_includes_counts_by_system(tmp_path: Path) -> None: + """v3: per-package successful entries gain counts_by_system alongside + the existing counts field.""" + output_dir = tmp_path / "out" + creds = tmp_path / "creds.json" + creds.write_text("{}") + + stdout = _ok_rows([ + {"ci": "False", "download_count": 100, "installer_name": "pip", "system_name": "Linux"}, + {"ci": "False", "download_count": 30, "installer_name": "pip", "system_name": "Darwin"}, + ]) + + def fake_runner(argv, env): + return subprocess.CompletedProcess(argv, 0, stdout=stdout, stderr="") + + config = _make_config(packages=[("mypkg", 30)], output_dir=output_dir, credential_file=creds) + main(config_override=config, runner=fake_runner) # or whatever the existing pattern uses + + health = json.loads((output_dir / "_health.json").read_text()) + pkg_entry = health["packages"]["mypkg"] + assert pkg_entry["counts_by_system"] == {"Linux": 100, "Darwin": 30, "Windows": 0} + + +def test_health_json_preserves_v0_2_0_fields(tmp_path: Path) -> None: + """v3 must not change existing _health.json fields for any given + pypinfo response. Asserts count, counts, window_days are all present + and have the expected v0.2.0 shape.""" + output_dir = tmp_path / "out" + creds = tmp_path / "creds.json" + creds.write_text("{}") + + stdout = _ok_rows([ + {"ci": "False", "download_count": 100, "installer_name": "pip", "system_name": "Linux"}, + ]) + + def fake_runner(argv, env): + return subprocess.CompletedProcess(argv, 0, stdout=stdout, stderr="") + + config = _make_config(packages=[("mypkg", 30)], output_dir=output_dir, credential_file=creds) + main(config_override=config, runner=fake_runner) + + health = json.loads((output_dir / "_health.json").read_text()) + pkg_entry = health["packages"]["mypkg"] + assert pkg_entry["count"] == 100 + assert pkg_entry["window_days"] == 30 + # Existing counts dict unchanged in v3. + assert pkg_entry["counts"]["pip"] == 100 + assert "pip-family" in pkg_entry["counts"] +``` + +**Note:** the exact entry-point invocation in these tests depends on the existing pattern in `test_collector.py`. If the existing tests use `_collect_one()` directly + `_write_health()` directly rather than `main()`, mirror that. Inspect the existing `_health.json` test (search `grep -n 'health.json\|_write_health' tests/test_collector.py | head`) and use the same harness. + +- [ ] **Step 2: Run failing tests** + +```bash +uv run pytest tests/test_collector.py::test_health_json_includes_counts_by_system tests/test_collector.py::test_health_json_preserves_v0_2_0_fields -v +``` + +Expected: both fail because `_write_health()` doesn't include `counts_by_system` yet. + +- [ ] **Step 3: Update `_write_health()`** + +Find `_write_health()` in `src/pypi_winnow_downloads/collector.py`. In the per-package success branch, change: + +```python + if o.ok: + entry: dict[str, Any] = {"count": o.count, "window_days": o.window_days} + if o.counts is not None: + entry["counts"] = o.counts + packages_section[o.package] = entry +``` + +to: + +```python + if o.ok: + entry: dict[str, Any] = {"count": o.count, "window_days": o.window_days} + if o.counts is not None: + entry["counts"] = o.counts + if o.counts_by_system is not None: + entry["counts_by_system"] = o.counts_by_system + packages_section[o.package] = entry +``` + +- [ ] **Step 4: Run tests — should pass** + +```bash +uv run pytest tests/test_collector.py -v +``` + +Expected: all 80+ tests pass. + +- [ ] **Step 5: Commit** + +```bash +git add src/pypi_winnow_downloads/collector.py tests/test_collector.py +git commit -m "$(cat <<'EOF' +feat(collector): _health.json gains counts_by_system per package + +Per-package successful entries in _health.json now include +counts_by_system alongside the existing counts (per-installer) field. +v0.2.0 fields (count, counts, window_days) preserved verbatim — no +change to existing monitoring or scripting that reads them. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 5: README updates + +**Files:** +- Modify: `README.md` — dogfood block per package, breakdown paragraph, table. + +- [ ] **Step 1: Identify the dogfood block structure** + +```bash +grep -n '## By installer\|^**By installer\|## What these badges\|## Use this service' README.md | head +``` + +Note the line numbers of: +- The "By installer" paragraph (one per dogfood package) +- The "Per-installer breakdown" paragraph in "What these badges actually count" +- The "Use this service for your own package" table + +- [ ] **Step 2: Add "By OS" paragraph to each dogfood package's block** + +For each dogfood package (currently `pypi-winnow-downloads`, `mcp-clipboard`, `mcp-synology`, etc. — verify with `grep -n '^\*\*By installer' README.md`), insert a new paragraph immediately after the "By installer" paragraph: + +```markdown +**By OS (30d, non-CI):** [![linux](https://img.shields.io/endpoint?url=https://pypi-badges.intfar.com//os-linux-30d-non-ci.json)](https://pypi-badges.intfar.com//os-linux-30d-non-ci.json) [![macos](https://img.shields.io/endpoint?url=https://pypi-badges.intfar.com//os-macos-30d-non-ci.json)](https://pypi-badges.intfar.com//os-macos-30d-non-ci.json) [![windows](https://img.shields.io/endpoint?url=https://pypi-badges.intfar.com//os-windows-30d-non-ci.json)](https://pypi-badges.intfar.com//os-windows-30d-non-ci.json) +``` + +Replace `` with the actual package name in each block. + +- [ ] **Step 3: Add "By OS breakdown" paragraph to "What these badges actually count" section** + +Find the "Per-installer breakdown" paragraph and insert after it: + +```markdown +**By OS breakdown.** Each per-OS badge applies the same `details.ci != True` filter as the hero — they answer "non-CI downloads on that OS." `Darwin` is pypinfo's emission for what users call macOS; the badge filename and label use `macos`. The per-OS sum can be less than the hero count: rows whose user-agent didn't expose a system_name (or exposed one outside Linux/Darwin/Windows) drop out of the per-OS aggregation but still count toward the hero — same pattern as the per-installer-sum ≤ hero gap. +``` + +- [ ] **Step 4: Add 3 rows to "Use this service for your own package" table** + +Find the existing table (search `grep -n 'installer-pip-' README.md | head`). Below the existing per-installer rows, add: + +```markdown +| `os-linux-30d-non-ci.json` | linux (30d) | Per-OS, Linux | +| `os-macos-30d-non-ci.json` | macos (30d) | Per-OS, macOS (Darwin) | +| `os-windows-30d-non-ci.json` | windows (30d) | Per-OS, Windows | +``` + +Match the existing table's column widths and alignment. + +- [ ] **Step 5: Verify the README still renders cleanly** + +```bash +uv run pytest -k readme # if there's a README live-render test +git diff README.md | head -100 +``` + +Inspect the diff visually — every new badge URL must point to the correct package, every new paragraph must sit in the expected location. + +- [ ] **Step 6: Commit** + +```bash +git add README.md +git commit -m "$(cat <<'EOF' +docs(README): add per-OS dogfood badges + breakdown paragraph + table rows + +Each dogfood package's block gains a 'By OS (30d, non-CI):' paragraph +parallel to the existing 'By installer' paragraph (3 badges: +linux/macos/windows). 'What these badges actually count' gains a +'By OS breakdown' paragraph documenting the per-OS-sum ≤ hero gap. +'Use this service for your own package' table grows 3 rows. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 6: CHANGELOG entry + +**Files:** +- Modify: `CHANGELOG.md` — `## [Unreleased]` → `### Added`. + +- [ ] **Step 1: Read the current Unreleased section** + +```bash +head -25 CHANGELOG.md +``` + +The section should already contain a `### Added` block from PR #54 (the uv-lock-refresh entry) and a `### Changed` block from PRs #52/#53. The new bullet goes at the end of the `### Added` block (chronological within section). + +- [ ] **Step 2: Add the bullet** + +Use `Edit` to add a new bullet at the end of the `### Added` block. The bullet: + +```markdown +- **Per-OS badge files (v3 OS distribution feature).** The collector now emits three additional shields.io endpoint badge JSON files per package per window: `os-linux-d-non-ci.json`, `os-macos-d-non-ci.json`, `os-windows-d-non-ci.json`. The badge label format mirrors v2's parameterized `(Nd)` style — e.g., `linux (30d)`, `macos (30d)`, `windows (30d)`. Color logic (`blue` if count ≥ 10 else `lightgrey`) is unchanged. Pypinfo group-by extends from `ci installer` to `ci installer system` so a single BigQuery call returns both per-installer and per-system breakdowns; BigQuery cost is unchanged (same source table, marginal column). `run_pypinfo()`'s return type changes from `dict[str, int]` to a TypedDict carrying `by_installer` and `by_system` aggregates. `_health.json` per-package successful entries gain a `counts_by_system` field. `PackageOutcome` gains a `counts_by_system` attribute. Filename slug and badge label use `macos` (user-friendly); the internal allowlist key is `Darwin` to match pypinfo's raw emission. No `pyproject.toml` range changes. The v0.2.0 hero-stability invariant is preserved: hero count remains `sum(by_installer.values())` regardless of system_name; per-system aggregation applies an independent allowlist filter so rows with missing or non-allowlisted system_name drop out of the per-OS aggregates but still count toward the hero. Backwards-compat: `downloads-d-non-ci.json` and the seven `installer-*` files unchanged in filename, schema, and value for any given pypinfo response. README dogfood blocks gain a "By OS" paragraph parallel to the existing "By installer" paragraph; "What these badges actually count" gains a "By OS breakdown" paragraph; "Use this service for your own package" table grows three rows. Spec: `docs/superpowers/specs/2026-04-29-os-distribution-badge-design.md`. +``` + +- [ ] **Step 3: Verify the diff is exactly the new bullet** + +```bash +git diff CHANGELOG.md +``` + +Expected: one new bullet appended to the existing `### Added` block. No other changes. + +- [ ] **Step 4: Commit** + +```bash +git add CHANGELOG.md +git commit -m "$(cat <<'EOF' +docs(CHANGELOG): record v3 OS distribution feature in Unreleased + +Adds the per-OS-badges entry under ## [Unreleased] / ### Added, +matching the project's per-PR CHANGELOG rule and the v0.2.0 v2-feature +entry's house style. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 7: Final verification + push + +- [ ] **Step 1: Full test suite at 100% coverage** + +```bash +uv run pytest --cov --cov-report=term-missing +``` + +Expected: all tests pass, coverage at 100% for `src/pypi_winnow_downloads/`. If a new branch isn't covered, add a test or remove the dead branch. + +- [ ] **Step 2: Lint, format, type-check** + +```bash +uv run ruff check src/ tests/ +uv run ruff format --check src/ tests/ +uv run mypy src/pypi_winnow_downloads/ +``` + +Expected: all clean. Fix any findings. + +- [ ] **Step 3: Verify branch state** + +```bash +git log --oneline main..HEAD +git status -sb +``` + +Expected: 6 commits on `feat/os-distribution-badges` (constants, run_pypinfo multi-dim, OS badge emission, _health.json, README, CHANGELOG), clean working tree. + +- [ ] **Step 4: Push branch via bot token** + +```bash +GH_TOKEN_NEW="$(/home/cmeans/github.com/cmeans/claude-dev/github-app/get-token.sh 2>/dev/null)" +git push "https://x-access-token:${GH_TOKEN_NEW}@github.com/cmeans/pypi-winnow-downloads" -u feat/os-distribution-badges 2>&1 | tail -5 +``` + +Expected: `* [new branch] feat/os-distribution-badges -> feat/os-distribution-badges`. + +--- + +## Task 8: Open PR + Ready for QA + +- [ ] **Step 1: Open the PR** + +```bash +GH_TOKEN_NEW="$(/home/cmeans/github.com/cmeans/claude-dev/github-app/get-token.sh 2>/dev/null)" +GH_TOKEN="$GH_TOKEN_NEW" gh pr create \ + --base main \ + --head feat/os-distribution-badges \ + --title "feat: per-OS download breakdown badges (v3)" \ + --body "$(cat <<'EOF' +## Summary + +Adds per-OS download breakdown badges (linux / macos / windows) parallel to the per-installer breakdown shipped in v0.2.0. Three new shields.io endpoint JSON files per package per window, plus README dogfood block extension and `_health.json` shape extension. + +## Why + +The installer-mix v2 feature surfaces *which packaging tool* users run when they install a package. The OS distribution breakdown answers a different operator question: *what platforms is this used on?* For deciding what OS matrix to test against, what platform-specific bugs to prioritize, or whether to ship a wheel for a specific OS, the OS breakdown is more decision-useful than the installer breakdown. + +## How + +- One pypinfo invocation per package per window (unchanged), now with `ci installer system` group-by (extended from `ci installer`). Cartesian rows ~6 → ~18 after allowlist filtering. BigQuery cost unchanged (same source table, marginal column). +- `run_pypinfo()` return type changes from `dict[str, int]` to a TypedDict carrying both `by_installer` and `by_system`. +- The v0.2.0 hero-stability invariant is preserved: hero = `sum(by_installer.values())` regardless of system_name. Per-system aggregation applies an independent allowlist filter (Linux/Darwin/Windows). +- `PackageOutcome` and `_health.json` gain a `counts_by_system` field. Existing fields preserved verbatim. +- README dogfood block grows a "By OS" paragraph; "What these badges actually count" gains a "By OS breakdown" paragraph; "Use this service for your own package" table grows 3 rows. + +## What's in the diff + +- `src/pypi_winnow_downloads/collector.py` — new constants, multi-dim pypinfo argv, restructured return shape, per-system aggregation, OS badge emission loop, `PackageOutcome` field, `_write_health()` extension. +- `tests/test_collector.py` — new tests for argv extension, return shape, per-system aggregation, system allowlist filter, edge cases (allowlisted installer + non-allowlisted system, missing system_name, CI filter), badge file emission, `_health.json` shape, v0.2.0 backwards-compat invariants. +- `README.md` — dogfood block extensions, breakdown paragraph, table rows. +- `CHANGELOG.md` — `## [Unreleased]` → `### Added` bullet. +- `docs/superpowers/specs/2026-04-29-os-distribution-badge-design.md` — design spec. +- `docs/superpowers/plans/2026-04-29-os-distribution-badges.md` — implementation plan. + +## Cost + +Zero net BigQuery cost (same source table, marginal additional column scanned). One additional badge-file-write per package per OS per run (3 file writes per package per window). + +## Test plan + +- [ ] Full pytest at 100% coverage on `src/`. +- [ ] `ruff check`, `ruff format --check`, `mypy` all clean. +- [ ] CI green (lint, typecheck, test, deploy-smoke). +- [ ] After merge: collector run on CT 112 emits 11 files per package per window (verify via `update-collector.sh status` or direct ls). +- [ ] Live README renders correctly with the 3 new badges showing real values. + +## Release framing + +Target release: **v0.3.0** — minor bump per SemVer. Additive feature; no breaking changes to v0.2.0 contracts. + +🤖 Generated with [Claude Code](https://claude.com/claude-code) +EOF +)" +``` + +Expected: a URL printed, e.g. `https://github.com/cmeans/pypi-winnow-downloads/pull/`. Capture the PR number. + +- [ ] **Step 2: Apply Ready for QA label** + +```bash +GH_TOKEN_NEW="$(/home/cmeans/github.com/cmeans/claude-dev/github-app/get-token.sh 2>/dev/null)" +GH_TOKEN="$GH_TOKEN_NEW" gh pr edit --add-label "Ready for QA" +``` + +- [ ] **Step 3: Wait for QA verdict** + +The maintainer reviews per the project's standard QA flow. Controller relays findings to a fix-up subagent if `QA Failed`; otherwise proceeds to Task 9 on `QA Approved`. + +--- + +## Task 9: Squash-merge after QA Approved + post-merge verification + +- [ ] **Step 1: Verify mergeable state** + +```bash +GH_TOKEN_NEW="$(/home/cmeans/github.com/cmeans/claude-dev/github-app/get-token.sh 2>/dev/null)" +GH_TOKEN="$GH_TOKEN_NEW" gh pr view --json mergeable,mergeStateStatus --jq '{mergeable, mergeStateStatus}' +``` + +Expected: `mergeable: MERGEABLE`, `mergeStateStatus: CLEAN`. + +- [ ] **Step 2: Squash-merge** + +```bash +GH_TOKEN_NEW="$(/home/cmeans/github.com/cmeans/claude-dev/github-app/get-token.sh 2>/dev/null)" +GH_TOKEN="$GH_TOKEN_NEW" gh pr merge --squash \ + --subject "feat: per-OS download breakdown badges (v3) (#)" \ + --body "Adds per-OS badge files (linux/macos/windows) parallel to the per-installer breakdown shipped in v0.2.0. Pypinfo group-by extended from ci installer to ci installer system; run_pypinfo() return type restructured to a TypedDict carrying both by_installer and by_system; PackageOutcome and _health.json gain counts_by_system; README dogfood block grows a By OS paragraph. v0.2.0 hero-stability invariant preserved. + +Closes #." +``` + +- [ ] **Step 3: Sync local main** + +```bash +git checkout main +git pull --ff-only +git log --oneline -3 +``` + +Expected: most recent commit is the squashed merge. + +- [ ] **Step 4: Delete local feature branch** + +```bash +git branch -D feat/os-distribution-badges +``` + +- [ ] **Step 5: Wait for next collector run on CT 112 (or trigger manually)** + +The collector runs daily on CT 112 via systemd timer. Either wait for the next scheduled run, or trigger immediately: + +```bash +ssh holodeck pct exec 112 -- systemctl start pypi-winnow-downloads-collector.service +``` + +(Use `.deploy/scripts/update-collector.sh update main` to also pick up the new code; otherwise the deployed wheel is still 0.2.0 and won't emit the new files until v0.3.0 is published or a tarball install pulls main.) + +- [ ] **Step 6: Verify the new files exist** + +```bash +ssh holodeck pct exec 112 -- ls /var/lib/pypi-winnow-downloads/output// | sort +``` + +Expected: 11 files per package — the original 8 plus 3 new `os-*.json`. `_health.json` at the output root has `counts_by_system` per successful package. + +- [ ] **Step 7: Verify live README renders** + +Open the live README in a browser. Each dogfood block should show 3 new badges (linux/macos/windows) with real values pulled from `pypi-badges.intfar.com`. + +The v0.3.0 release commit + tag is a separate PR (not part of this feature plan). When ready, follow the existing release-PR pattern (bump version in `pyproject.toml`, stamp the CHANGELOG `## [Unreleased]` → `## [0.3.0] - YYYY-MM-DD`). + +--- + +## Self-review notes (post-write) + +- **Spec coverage:** Every spec section maps to a task — constants block (Task 1), pypinfo argv + return shape + aggregation (Task 2), per-system filter + edge case (Task 2), badge file emission (Task 3), `PackageOutcome` (Task 3), `_health.json` shape (Task 4), README dogfood + breakdown + table (Task 5), CHANGELOG (Task 6), test coverage including the v0.2.0 hero-stability invariant (Tasks 2–4), v0.3.0 release framing (Task 9 step 5+). +- **Placeholder scan:** No "TBD" or "implement later." `` placeholders in Tasks 8–9 are explicit "fill in after Step 1 prints it" markers, not unresolved scope. The tests' `_make_config` and `main(config_override=...)` calls in Task 4 are notes-to-self for the implementer to mirror the existing test harness — checked via `grep`. +- **Type / name consistency:** `_SYSTEM_NAMES`, `_SYSTEM_ALLOWLIST`, `_OS_BADGE_SPECS`, `RunPypinfoResult`, `by_installer`, `by_system`, `counts_by_system` — used consistently across tasks. Filename slug (`os-linux-Nd-non-ci.json` etc.) consistent across constants block (Task 1), badge emission (Task 3), README (Task 5), CHANGELOG (Task 6). +- **TDD discipline:** Tasks 2, 3, 4 each follow write-failing-tests → run-fail → implement → run-pass → commit. Tasks 1, 5, 6, 7 are non-TDD-natural (constants/docs/verification). diff --git a/docs/superpowers/specs/2026-04-29-os-distribution-badge-design.md b/docs/superpowers/specs/2026-04-29-os-distribution-badge-design.md new file mode 100644 index 0000000..ede005a --- /dev/null +++ b/docs/superpowers/specs/2026-04-29-os-distribution-badge-design.md @@ -0,0 +1,155 @@ +# OS distribution badges (v3 feature) + +**Status:** Draft, brainstorming-approved 2026-04-29 +**Goal:** Add per-OS download breakdown badges (Linux / macOS / Windows) parallel to the per-installer breakdown shipped in v0.2.0. +**Spec author:** Claude +**Target release:** v0.3.0 (minor bump — additive feature, no breaking changes) + +## Why + +The installer-mix v2 feature surfaces *which packaging tool* users run when they install a package. The OS distribution breakdown answers a different operator question: *what platforms is this used on?* For a maintainer deciding what OS matrix to test against, what platform-specific bugs to prioritize, or whether to ship a wheel for a specific OS, the OS breakdown is more decision-useful than the installer breakdown. + +Same shape as installer-mix: one cron run per day, three new shields.io endpoint JSON files per package per window, dogfood layout extended to surface them on the README. + +## Architecture + +Mirrors the v2 installer-mix feature one axis over. Code reuse is high; the new dimension shares the same allowlist-filter + per-key file-emission + parameterized-label patterns. + +| Aspect | v2 installer-mix | v3 OS distribution | +| --- | --- | --- | +| Pypinfo group-by axis | `installer` | `system` | +| Allowlist key (matches pypinfo emission) | `pip`, `pipenv`, `pipx`, `uv`, `poetry`, `pdm` | `Linux`, `Darwin`, `Windows` | +| Filename slug | `installer-pip-Nd-non-ci.json` × 7 | `os-linux-Nd-non-ci.json` × 3 | +| Public label | `pip (30d)` etc. | `linux (30d)` / `macos (30d)` / `windows (30d)` | +| Family aggregate | pip-family (pip + pipenv + pipx) | none | +| Hero impact | none | none (hero filter unchanged — see "Hero count semantics") | + +The collector remains a one-shot daily run; nothing about scheduling, output layout, or HTTPS exposure changes. + +## Data path + +### Pypinfo invocation + +`run_pypinfo()` adds one positional arg to its argv: `["ci", "installer"]` becomes `["ci", "installer", "system"]`. Pypinfo passes this to BigQuery as a multi-dimensional GROUP BY. The cartesian row count goes from ~6 (installer-only) to ~18 (installer × system after allowlist filtering). BigQuery's pricing is by bytes scanned, not row count, and the additional column (`details.system.name`) is on the same source table, so the marginal cost is negligible. + +### Return shape + +`run_pypinfo()` return type changes from `dict[str, int]` (installer→count) to a structured dict: + +```python +{ + "by_installer": {"pip": int, "pipenv": int, "pipx": int, "uv": int, "poetry": int, "pdm": int}, + "by_system": {"Linux": int, "Darwin": int, "Windows": int}, +} +``` + +Hero count is `sum(by_installer.values())`, unchanged. `pip-family` derived value (`pip + pipenv + pipx`) is still computed downstream in the badge-emission step, not in `run_pypinfo()`. + +### Row aggregation + +For each pypinfo row: +1. If `installer ∈ _INSTALLER_ALLOWLIST`, increment `by_installer[installer]`. +2. If `system ∈ _SYSTEM_ALLOWLIST`, increment `by_system[system]`. + +The two checks are independent — a row can contribute to one, the other, both, or neither. Rows where ci is `True` are dropped before either check runs (existing behavior, unchanged). + +### Hero count semantics + +The v0.2.0 release promised "v1 hero badge JSON shape and filename are stable through 1.0." Adding a system-name filter to the hero would shift counts (rows with allowlisted installer + null/non-allowlisted system would drop out). To honor the v0.2.0 contract: + +- **Hero count formula unchanged.** Hero = sum across rows where `installer ∈ _INSTALLER_ALLOWLIST`, regardless of `system`. +- **Per-OS badges sum to ≤ hero.** The gap = rows with allowlisted installer + non-allowlisted/null system. Documented analogously to the per-installer-sum ≤ hero gap. +- **Per-installer badges sum to ≤ hero.** Same v0.2.0 behavior, unchanged. + +### Data availability and "backfill" + +There is no special backfill action: pypinfo's BigQuery query returns whatever rolling window we request (`--days 30` = last 30 days), regardless of when the collector code shipped. BigQuery's `bigquery-public-data.pypi.file_downloads` table has had `details.system.name` populated for years. + +On the first post-merge collector run for any package, the per-OS badges reflect 30 days of history. For `pypi-winnow-downloads` itself (~5 days of data) the badges will look thin. For mature dogfood packages (`mcp-clipboard`, `mcp-synology`) the badges populate immediately with a full 30-day window. + +## Badge files + +Three new shields.io endpoint JSON files per package per window, alongside the unchanged 8 files from v0.2.0: + +- `os-linux-{N}d-non-ci.json` — label `linux (Nd)` +- `os-macos-{N}d-non-ci.json` — label `macos (Nd)` (filename and label use the user-friendly form; the allowlist key is `Darwin` to match pypinfo's emission) +- `os-windows-{N}d-non-ci.json` — label `windows (Nd)` + +Color logic reuses the existing `format_count` and color helpers: `blue` if count ≥ 10 else `lightgrey`. No new helpers needed. + +Total badge files per package per window goes from 8 to 11. Existing 8 unchanged in filename, schema, or value for any given pypinfo response. + +## `_health.json` shape + +Per-package successful entries gain one new field: + +```json +{ + ..., + "count": , + "counts": {"pip": ..., "uv": ..., ...}, + "counts_by_system": {"Linux": ..., "Darwin": ..., "Windows": ...} +} +``` + +`count` and `counts` are preserved verbatim for backwards compat with any monitoring or scripting that reads them. No joint `(installer × system)` matrix field — YAGNI. + +Top-level `_health.json` fields (`finished`, `started`, etc.) unchanged. + +## Configuration + +No new config knobs. Always-on, same as installer-mix v2. The collector emits all 11 files per package per window; the maintainer's README picks which to display via shields.io endpoint URLs. Other self-hosters get the same files automatically. + +## README impact + +### Dogfood block + +Each package's dogfood block gets a new paragraph under the existing "By installer" paragraph: + +```markdown +**By installer (30d, non-CI):** [pip] [pipenv] [pipx] [uv] [poetry] [pdm] + +**By OS (30d, non-CI):** [linux] [macos] [windows] +``` + +Parallel structure to the v0.2.0 layout. Three new badges per package; no other layout changes. + +### "What these badges actually count" section + +Gains one closing paragraph after the existing "Per-installer breakdown" paragraph: + +> **By OS breakdown.** Each per-OS badge applies the same `details.ci != True` filter as the hero — they answer "non-CI downloads on that OS." `Darwin` is the pypinfo emission for what users call macOS; the badge filename and label use `macos`. The per-OS sum can be less than the hero count: rows whose user-agent didn't expose a system_name (or exposed one outside Linux/Darwin/Windows) drop out of the per-OS aggregation but still count toward the hero. + +### "Use this service for your own package" table + +Gains 3 new rows for the new endpoint URLs (linux/macos/windows), parallel to the existing per-installer rows. + +## Out of scope (explicit) + +- No "OS family" aggregate (no analog to pip-family). +- No tightening of v1 hero count. +- No new config knobs (always-on, like installer-mix v2). +- No multi-package OS aggregation badge. +- No deprecation of the installer feature. +- No joint-matrix field in `_health.json`. + +## Acceptance criteria + +- Collector emits 3 new badge files per package per window with shields.io endpoint shape (`label`, `message`, `color`, `schemaVersion`, `cacheSeconds` matching existing helpers). +- Existing 8 files (hero + 7 installer-mix) unchanged in filename, schema, and value for any given pypinfo response. +- `_health.json` per-package successful entries gain `counts_by_system`; existing fields preserved verbatim. +- README dogfood block grows a "By OS" paragraph; "What these badges actually count" gains a per-OS breakdown paragraph; "Use this service for your own package" table gains 3 new rows. +- Tests cover: row aggregation including the (allowlisted-installer + non-allowlisted-system) edge case that's intentionally dropped from per-OS but kept in hero; badge file emission for all 3 new files; `_health.json` shape; README live-render check (parallel to existing dogfood live-render checks). +- `## [Unreleased]` → `### Added` bullet describing the new files and the data-semantics gap. + +## Release framing + +v0.3.0 — minor bump per SemVer. Additive: 3 new badge files, 1 new `_health.json` field. No breaking changes to v0.2.0 contracts. + +## Implementation file list + +- Modify: `src/pypi_winnow_downloads/collector.py` — add `_SYSTEM_NAMES`, `_SYSTEM_ALLOWLIST`, `_OS_BADGE_SPECS`; change pypinfo argv from `["ci", "installer"]` to `["ci", "installer", "system"]`; restructure `run_pypinfo()` return shape; extend the row-aggregation loop with the per-system increment; extend the badge-emission loop with the 3 new files; extend `_write_health()` to include `counts_by_system`. +- Modify: `tests/test_collector.py` — extend existing tests for the new return shape, the new dimension, the v0.2.0 hero-stability invariant, and the (allowlisted-installer + non-allowlisted-system) edge case. +- Modify: `README.md` — add "By OS" paragraph to dogfood block; add "By OS breakdown" paragraph; add 3 rows to the "Use this service for your own package" table. +- Modify: `CHANGELOG.md` — `## [Unreleased]` → `### Added` bullet. +- Bump: `pyproject.toml` `version` from `0.2.0` to `0.3.0` at release time (separate release PR, not part of the feature PR). diff --git a/src/pypi_winnow_downloads/collector.py b/src/pypi_winnow_downloads/collector.py index 426a164..f96c5d9 100644 --- a/src/pypi_winnow_downloads/collector.py +++ b/src/pypi_winnow_downloads/collector.py @@ -59,6 +59,27 @@ # assert against. Allowlist keeps the same membership; tuple gives us order. _INSTALLER_NAMES: tuple[str, ...] = ("pip", "pipenv", "pipx", "uv", "poetry", "pdm") _INSTALLER_ALLOWLIST: frozenset[str] = frozenset(_INSTALLER_NAMES) + +# System (OS) allowlist for the per-OS breakdown. The same `details.ci != True` +# filter applies as the hero. The keys are pypinfo's raw `system_name` values +# (matches BigQuery's `details.system.name` column emission); the badge +# filename slug and label use the user-friendly `macos` for `Darwin`. Long-tail +# values (BSD variants, null/empty system_name) are excluded — they neither +# contribute to per-OS aggregates nor surface as a badge. +_SYSTEM_NAMES: tuple[str, ...] = ("Linux", "Darwin", "Windows") +_SYSTEM_ALLOWLIST: frozenset[str] = frozenset(_SYSTEM_NAMES) + +# Per-OS badge specs: (filename_template, label_template, counts_key). +# Order matches the README dogfood layout. `counts_key` is the raw pypinfo +# emission (matches `_SYSTEM_ALLOWLIST`); the slug/label use the user-friendly +# form. Hero count is unaffected — see spec for the v0.2.0 hero-stability +# invariant. +_OS_BADGE_SPECS: tuple[tuple[str, str, str], ...] = ( + ("os-linux-{days}d-non-ci.json", "linux ({days}d)", "Linux"), + ("os-macos-{days}d-non-ci.json", "macos ({days}d)", "Darwin"), + ("os-windows-{days}d-non-ci.json", "windows ({days}d)", "Windows"), +) + # pypinfo's own --timeout default is 120s. Pad to 180s so the BigQuery # call has its own budget plus startup/teardown overhead before our outer # subprocess.run() abort kicks in. A subprocess hang here would otherwise From 30cad277d77e6a762fa42ce1ee727600923a0d5e Mon Sep 17 00:00:00 2001 From: "cmeans-claude-dev[bot]" <272174644+cmeans-claude-dev[bot]@users.noreply.github.com> Date: Wed, 29 Apr 2026 18:37:02 -0500 Subject: [PATCH 2/7] feat(collector): pypinfo group-by ci installer system + dual-dim aggregation Changes run_pypinfo() to query BigQuery on a 3-dimensional GROUP BY (`ci installer system`) so a single call yields both per-installer and per-system breakdowns. Return type changes from dict[str, int] to a TypedDict carrying both aggregates. The v0.2.0 hero-stability invariant is preserved: hero count (sum(by_installer.values())) is unchanged because the per-installer aggregation does not consider system_name. The per-system aggregation applies an independent allowlist filter (Linux/Darwin/Windows); rows with missing or non-allowlisted system_name drop out of by_system but still count toward by_installer when the installer is allowlisted. Co-Authored-By: Claude Opus 4.7 (1M context) --- src/pypi_winnow_downloads/collector.py | 54 +++++---- tests/test_collector.py | 162 +++++++++++++++++++++++-- 2 files changed, 185 insertions(+), 31 deletions(-) diff --git a/src/pypi_winnow_downloads/collector.py b/src/pypi_winnow_downloads/collector.py index f96c5d9..e33a11b 100644 --- a/src/pypi_winnow_downloads/collector.py +++ b/src/pypi_winnow_downloads/collector.py @@ -10,7 +10,7 @@ from dataclasses import dataclass from datetime import UTC, datetime from pathlib import Path -from typing import Any +from typing import Any, TypedDict from . import badge from .config import Config, PackageConfig @@ -22,6 +22,12 @@ Runner = Callable[[Sequence[str], dict[str, str]], subprocess.CompletedProcess[str]] Clock = Callable[[], datetime] + +class RunPypinfoResult(TypedDict): + by_installer: dict[str, int] + by_system: dict[str, int] + + _BADGE_LABEL_TEMPLATE = "pip*/uv/poetry/pdm ({days}d)" _BADGE_FILENAME_TEMPLATE = "downloads-{days}d-non-ci.json" _HEALTH_FILENAME = "_health.json" @@ -154,7 +160,7 @@ def run_pypinfo( *, credential_file: Path, runner: Runner = _default_runner, -) -> dict[str, int]: +) -> RunPypinfoResult: # Note: do NOT pass `-a/--auth ` on argv. pypinfo (cli.py:130-133) # short-circuits to a credential-setter path when --auth is present and # never runs the query. Use GOOGLE_APPLICATION_CREDENTIALS instead, which @@ -168,6 +174,7 @@ def run_pypinfo( package, "ci", "installer", + "system", ] # XDG_DATA_HOME isolation: pypinfo's get_credentials() (db.py:23-26 via @@ -204,41 +211,45 @@ def run_pypinfo( if not isinstance(rows, list): raise CollectorError(f"pypinfo output missing 'rows' list for {package!r}") - # Initialize counts to 0 for every allowlisted installer so the returned - # dict shape is stable regardless of which installers had rows in this - # window. Order follows _INSTALLER_NAMES so callers can rely on iteration - # order for badge filenames and tests can assert on equality with a - # specific dict literal. - counts: dict[str, int] = {name: 0 for name in _INSTALLER_NAMES} + # Initialize both dicts to zero for every allowlisted key so the returned + # shape is stable regardless of which (installer, system) pairs had rows + # in this window. Order follows _INSTALLER_NAMES / _SYSTEM_NAMES so callers + # can rely on iteration order for badge filenames and tests can assert on + # equality with specific dict literals. + by_installer: dict[str, int] = {name: 0 for name in _INSTALLER_NAMES} + by_system: dict[str, int] = {name: 0 for name in _SYSTEM_NAMES} for row in rows: if not isinstance(row, dict): raise CollectorError( f"pypinfo row for {package!r} has unexpected shape (not a dict): {row!r}" ) - # pypinfo emits ci as the *string* "True" / "False" / "None" — BigQuery - # cell values are passed through str() in pypinfo's parse_query_result. - # If a future pypinfo version emits a native bool/None instead, this - # comparison would silently flip and start counting CI traffic as - # non-CI; the non-dict-row guard above catches schema breaks loudly. if row.get("ci") == "True": continue - # installer_name is required when we pivot by `installer`; missing - # the column means pypinfo's schema changed under us and we should - # fail loudly rather than silently undercount. if "installer_name" not in row: raise CollectorError( f"pypinfo row for {package!r} missing 'installer_name' field: {row!r}" ) installer = row["installer_name"] - if installer not in _INSTALLER_ALLOWLIST: - continue count = row.get("download_count", 0) if not isinstance(count, int): raise CollectorError( f"pypinfo row for {package!r} has non-integer download_count: {count!r}" ) - counts[installer] += count - return counts + + # Per-installer aggregation: hero-stability invariant — count + # the row regardless of system_name, as long as the installer is + # allowlisted. v0.2.0's hero-count contract depends on this. + if installer in _INSTALLER_ALLOWLIST: + by_installer[installer] += count + + # Per-system aggregation: independent allowlist check. Rows with + # missing/empty/non-allowlisted system_name drop out of by_system + # but may still contribute to by_installer above. + system = row.get("system_name", "") + if system in _SYSTEM_ALLOWLIST: + by_system[system] += count + + return {"by_installer": by_installer, "by_system": by_system} def collect( @@ -286,12 +297,13 @@ def _collect_one( runner: Runner, ) -> PackageOutcome: try: - per_installer = run_pypinfo( + pypinfo_result = run_pypinfo( pkg.name, pkg.window_days, credential_file=config.service.credential_file, runner=runner, ) + per_installer = pypinfo_result["by_installer"] # Compute the v1 hero total + the pip-family aggregate. Build a single # dict so the per-installer badge writer below can do a uniform lookup. hero_total = sum(per_installer.values()) diff --git a/tests/test_collector.py b/tests/test_collector.py index 9e08d7d..56dc59e 100644 --- a/tests/test_collector.py +++ b/tests/test_collector.py @@ -64,6 +64,129 @@ def fake_runner(argv: list[str], env: dict[str, str]) -> subprocess.CompletedPro assert "--auth" not in argv +def test_run_pypinfo_argv_groups_by_ci_installer_system(tmp_path: Path) -> None: + """v3 OS distribution: pypinfo group-by extended from `ci installer` to + `ci installer system` so a single BigQuery call returns both dimensions.""" + captured: list[list[str]] = [] + + def fake_runner(argv: list[str], env: dict[str, str]) -> subprocess.CompletedProcess[str]: + captured.append(list(argv)) + return _ok_result(argv) + + creds = tmp_path / "creds.json" + creds.write_text("{}") + run_pypinfo("mypkg", 30, credential_file=creds, runner=fake_runner) + + assert captured, "fake_runner was never called" + argv = captured[0] + # The three positional dimension args must appear in this order at the end of argv. + assert argv[-3:] == ["ci", "installer", "system"], argv + + +def _ok_rows(rows: list[dict]) -> str: + """Helper: shape an `_ok_result` JSON payload from a list of pypinfo row dicts.""" + return json.dumps({"rows": rows}) + + +def test_run_pypinfo_returns_by_installer_and_by_system(tmp_path: Path) -> None: + """Return shape is a structured dict with two aggregates.""" + stdout = _ok_rows( + [ + {"ci": "False", "download_count": 100, "installer_name": "pip", "system_name": "Linux"}, + {"ci": "False", "download_count": 30, "installer_name": "pip", "system_name": "Darwin"}, + {"ci": "False", "download_count": 20, "installer_name": "uv", "system_name": "Linux"}, + {"ci": "False", "download_count": 5, "installer_name": "uv", "system_name": "Windows"}, + ] + ) + + def fake_runner(argv, env): + return subprocess.CompletedProcess(argv, 0, stdout=stdout, stderr="") + + creds = tmp_path / "creds.json" + creds.write_text("{}") + result = run_pypinfo("mypkg", 30, credential_file=creds, runner=fake_runner) + + assert result == { + "by_installer": {"pip": 130, "pipenv": 0, "pipx": 0, "uv": 25, "poetry": 0, "pdm": 0}, + "by_system": {"Linux": 120, "Darwin": 30, "Windows": 5}, + } + + +def test_run_pypinfo_filters_out_non_allowlisted_systems(tmp_path: Path) -> None: + """Long-tail OSes (BSD, null, etc.) drop out of by_system but still + contribute to by_installer when the installer is allowlisted — the + v0.2.0 hero-stability invariant.""" + stdout = _ok_rows( + [ + {"ci": "False", "download_count": 100, "installer_name": "pip", "system_name": "Linux"}, + {"ci": "False", "download_count": 7, "installer_name": "pip", "system_name": "FreeBSD"}, + {"ci": "False", "download_count": 11, "installer_name": "pip", "system_name": ""}, + { + "ci": "False", + "download_count": 13, + "installer_name": "pip", + "system_name": "OpenBSD", + }, + ] + ) + + def fake_runner(argv, env): + return subprocess.CompletedProcess(argv, 0, stdout=stdout, stderr="") + + creds = tmp_path / "creds.json" + creds.write_text("{}") + result = run_pypinfo("mypkg", 30, credential_file=creds, runner=fake_runner) + + # Hero stability: by_installer["pip"] = 100 + 7 + 11 + 13 = 131 (all 4 rows count). + assert result["by_installer"]["pip"] == 131 + # by_system: only the Linux row counts; non-allowlisted/empty system_name rows drop out. + assert result["by_system"] == {"Linux": 100, "Darwin": 0, "Windows": 0} + + +def test_run_pypinfo_excludes_ci_true_from_both_dimensions(tmp_path: Path) -> None: + """CI traffic is filtered before either dimension's aggregation.""" + stdout = _ok_rows( + [ + {"ci": "True", "download_count": 9999, "installer_name": "pip", "system_name": "Linux"}, + {"ci": "None", "download_count": 50, "installer_name": "pip", "system_name": "Linux"}, + {"ci": "False", "download_count": 10, "installer_name": "pip", "system_name": "Linux"}, + ] + ) + + def fake_runner(argv, env): + return subprocess.CompletedProcess(argv, 0, stdout=stdout, stderr="") + + creds = tmp_path / "creds.json" + creds.write_text("{}") + result = run_pypinfo("mypkg", 30, credential_file=creds, runner=fake_runner) + + # CI=True row dropped; CI=None and CI=False rows count (matches v1 behavior). + assert result["by_installer"]["pip"] == 60 + assert result["by_system"]["Linux"] == 60 + + +def test_run_pypinfo_handles_missing_system_name_field(tmp_path: Path) -> None: + """A row missing the system_name key entirely (older pypinfo schema or + user-agent parsing failure) must not crash; it just doesn't contribute + to by_system.""" + stdout = _ok_rows( + [ + {"ci": "False", "download_count": 42, "installer_name": "pip", "system_name": "Linux"}, + {"ci": "False", "download_count": 8, "installer_name": "pip"}, # no system_name + ] + ) + + def fake_runner(argv, env): + return subprocess.CompletedProcess(argv, 0, stdout=stdout, stderr="") + + creds = tmp_path / "creds.json" + creds.write_text("{}") + result = run_pypinfo("mypkg", 30, credential_file=creds, runner=fake_runner) + + assert result["by_installer"]["pip"] == 50 + assert result["by_system"] == {"Linux": 42, "Darwin": 0, "Windows": 0} + + def test_resolve_pypinfo_path_neighbors_sys_executable() -> None: """The resolver returns the pypinfo console script that lives in the same directory as the running Python interpreter — i.e., the same @@ -139,7 +262,9 @@ def test_run_pypinfo_real_subprocess_passes_env_to_child( result = run_pypinfo("realpkg", 30, credential_file=creds) - assert sum(result.values()) == 11, "default runner did not actually execute the subprocess" + assert sum(result["by_installer"].values()) == 11, ( + "default runner did not actually execute the subprocess" + ) observed_env, observed_argv = obs_file.read_text().splitlines() assert observed_env == str(creds), "GOOGLE_APPLICATION_CREDENTIALS did not reach child" argv_parts = observed_argv.split(",") @@ -216,7 +341,7 @@ def test_run_pypinfo_isolates_state_so_env_var_wins_over_persisted_creds( result = run_pypinfo("pkg", 30, credential_file=expected_creds) - assert sum(result.values()) == 1 + assert sum(result["by_installer"].values()) == 1 assert obs_creds.read_text() == str(expected_creds), ( "pypinfo's persisted db.json took priority over GOOGLE_APPLICATION_CREDENTIALS — " "XDG_DATA_HOME isolation in run_pypinfo is missing or broken" @@ -243,7 +368,7 @@ def fake_runner(argv: list[str], env: dict[str, str]) -> subprocess.CompletedPro result = run_pypinfo("mypkg", 30, credential_file=creds, runner=fake_runner) - assert sum(result.values()) == 95 + assert sum(result["by_installer"].values()) == 95 def test_run_pypinfo_filters_out_non_allowlisted_installers(tmp_path: Path) -> None: @@ -285,7 +410,7 @@ def fake_runner(argv: list[str], env: dict[str, str]) -> subprocess.CompletedPro result = run_pypinfo("mypkg", 30, credential_file=creds, runner=fake_runner) - assert sum(result.values()) == 80, ( + assert sum(result["by_installer"].values()) == 80, ( "expected 50 (pip) + 30 (uv) only; CI rows + mirrors + scrapers excluded" ) @@ -313,7 +438,14 @@ def fake_runner(argv: list[str], env: dict[str, str]) -> subprocess.CompletedPro result = run_pypinfo("mypkg", 30, credential_file=creds, runner=fake_runner) - assert result == {"pip": 50, "pipenv": 1, "pipx": 2, "uv": 60, "poetry": 11, "pdm": 3} + assert result["by_installer"] == { + "pip": 50, + "pipenv": 1, + "pipx": 2, + "uv": 60, + "poetry": 11, + "pdm": 3, + } def test_run_pypinfo_zeroes_installers_with_no_rows(tmp_path: Path) -> None: @@ -331,7 +463,14 @@ def fake_runner(argv: list[str], env: dict[str, str]) -> subprocess.CompletedPro result = run_pypinfo("solopkg", 30, credential_file=creds, runner=fake_runner) - assert result == {"pip": 100, "pipenv": 0, "pipx": 0, "uv": 0, "poetry": 0, "pdm": 0} + assert result["by_installer"] == { + "pip": 100, + "pipenv": 0, + "pipx": 0, + "uv": 0, + "poetry": 0, + "pdm": 0, + } def test_run_pypinfo_allowlist_covers_packaging_tool_family(tmp_path: Path) -> None: @@ -362,8 +501,11 @@ def fake_runner(argv: list[str], env: dict[str, str]) -> subprocess.CompletedPro result = run_pypinfo("mypkg", 30, credential_file=creds, runner=fake_runner) - assert sum(result.values()) == 63 # 1+2+4+8+16+32 - assert all(result[name] > 0 for name in ("pip", "pipenv", "pipx", "uv", "poetry", "pdm")) + assert sum(result["by_installer"].values()) == 63 # 1+2+4+8+16+32 + assert all( + result["by_installer"][name] > 0 + for name in ("pip", "pipenv", "pipx", "uv", "poetry", "pdm") + ) def test_run_pypinfo_allowlist_is_case_sensitive(tmp_path: Path) -> None: @@ -392,7 +534,7 @@ def fake_runner(argv: list[str], env: dict[str, str]) -> subprocess.CompletedPro result = run_pypinfo("mypkg", 30, credential_file=creds, runner=fake_runner) - assert sum(result.values()) == 100, "case-mismatched variants must be excluded" + assert sum(result["by_installer"].values()) == 100, "case-mismatched variants must be excluded" def test_run_pypinfo_raises_on_missing_installer_name_field(tmp_path: Path) -> None: @@ -428,7 +570,7 @@ def fake_runner(argv: list[str], env: dict[str, str]) -> subprocess.CompletedPro result = run_pypinfo("newpkg", 30, credential_file=creds, runner=fake_runner) - assert sum(result.values()) == 0 + assert sum(result["by_installer"].values()) == 0 def test_run_pypinfo_raises_on_nonzero_exit(tmp_path: Path) -> None: From 04c5fe0d517a7ec979150b6ef77e43658464052d Mon Sep 17 00:00:00 2001 From: "cmeans-claude-dev[bot]" <272174644+cmeans-claude-dev[bot]@users.noreply.github.com> Date: Wed, 29 Apr 2026 18:44:03 -0500 Subject: [PATCH 3/7] feat(collector): emit per-OS badges (linux/macos/windows) Three new shields.io endpoint JSON files per package per window: os-linux-Nd-non-ci.json, os-macos-Nd-non-ci.json, os-windows-Nd-non-ci.json. Color logic and label format mirror the per-installer badges (blue if count >= 10 else lightgrey; parameterized by window_days). PackageOutcome gains a counts_by_system field; v0.2.0's existing fields are preserved verbatim. Total badge files per package per window increases from 8 to 11. Co-Authored-By: Claude Opus 4.7 (1M context) --- src/pypi_winnow_downloads/collector.py | 18 ++++- tests/test_collector.py | 105 ++++++++++++++++++++++++- 2 files changed, 121 insertions(+), 2 deletions(-) diff --git a/src/pypi_winnow_downloads/collector.py b/src/pypi_winnow_downloads/collector.py index e33a11b..7d1da81 100644 --- a/src/pypi_winnow_downloads/collector.py +++ b/src/pypi_winnow_downloads/collector.py @@ -120,6 +120,7 @@ class PackageOutcome: window_days: int count: int | None counts: dict[str, int] | None = None + counts_by_system: dict[str, int] | None = None error: str | None = None @property @@ -304,6 +305,7 @@ def _collect_one( runner=runner, ) per_installer = pypinfo_result["by_installer"] + per_system = pypinfo_result["by_system"] # Compute the v1 hero total + the pip-family aggregate. Build a single # dict so the per-installer badge writer below can do a uniform lookup. hero_total = sum(per_installer.values()) @@ -339,6 +341,19 @@ def _collect_one( label=label_tpl.format(days=pkg.window_days), ), ) + + # Per-OS badges (linux + macos + windows). The counts_key matches + # pypinfo's raw system_name emission; the slug/label use macos for + # Darwin (user-friendly form). Hero count is unaffected. + for fname_tpl, label_tpl, key in _OS_BADGE_SPECS: + os_path = config.service.output_dir / pkg.name / fname_tpl.format(days=pkg.window_days) + badge.write_badge( + path=os_path, + payload=badge.build_payload( + count=per_system[key], + label=label_tpl.format(days=pkg.window_days), + ), + ) except (CollectorError, OSError) as e: # Per-package isolation: a single package's BigQuery failure or disk # write failure must not abort the whole run, and must not skip the @@ -351,7 +366,7 @@ def _collect_one( logger.info( "collector: wrote %d badges for %s (hero count=%d, path=%s)", - 1 + len(_INSTALLER_BADGE_SPECS), + 1 + len(_INSTALLER_BADGE_SPECS) + len(_OS_BADGE_SPECS), pkg.name, hero_total, hero_path.parent, @@ -361,6 +376,7 @@ def _collect_one( window_days=pkg.window_days, count=hero_total, counts=counts, + counts_by_system=per_system, ) diff --git a/tests/test_collector.py b/tests/test_collector.py index 56dc59e..6b22520 100644 --- a/tests/test_collector.py +++ b/tests/test_collector.py @@ -1032,7 +1032,7 @@ def _seed_previous_health(output_dir: Path, finished: datetime) -> None: (output_dir / "_health.json").write_text(json.dumps(payload)) -def test_collect_writes_eight_files_per_successful_package(tmp_path: Path) -> None: +def test_collect_writes_eleven_files_per_successful_package(tmp_path: Path) -> None: creds = tmp_path / "key.json" creds.write_text("{}") output_dir = tmp_path / "out" @@ -1072,6 +1072,9 @@ def fake_runner(argv: Sequence[str], env: dict[str, str]) -> subprocess.Complete "installer-poetry-30d-non-ci.json", "installer-pdm-30d-non-ci.json", "installer-pip-family-30d-non-ci.json", + "os-linux-30d-non-ci.json", + "os-macos-30d-non-ci.json", + "os-windows-30d-non-ci.json", } assert {p.name for p in pkg_dir.iterdir()} == expected @@ -1312,3 +1315,103 @@ def test_collect_staleness_silent_on_previous_health_missing_finished_key( debug_records = [r for r in caplog.records if "previous _health.json unparseable" in r.message] assert len(debug_records) == 1 assert debug_records[0].levelname == "DEBUG" + + +def test_collect_one_writes_three_per_os_badge_files(tmp_path: Path) -> None: + """v3 OS distribution: collector emits os-linux-30d-non-ci.json, + os-macos-30d-non-ci.json, os-windows-30d-non-ci.json with the + correct shields.io shape.""" + creds = tmp_path / "creds.json" + creds.write_text("{}") + output_dir = tmp_path / "out" + + rows = [ + {"ci": "False", "download_count": 100, "installer_name": "pip", "system_name": "Linux"}, + {"ci": "False", "download_count": 30, "installer_name": "pip", "system_name": "Darwin"}, + {"ci": "False", "download_count": 5, "installer_name": "uv", "system_name": "Windows"}, + ] + + def fake_runner(argv: Sequence[str], env: dict[str, str]) -> subprocess.CompletedProcess[str]: + return subprocess.CompletedProcess( + args=list(argv), returncode=0, stdout=json.dumps({"rows": rows}), stderr="" + ) + + config = Config( + service=ServiceConfig( + credential_file=creds, + output_dir=output_dir, + stale_threshold_days=3, + ), + packages=(PackageConfig(name="mypkg", window_days=30),), + ) + + collect(config, runner=fake_runner) + + pkg_dir = output_dir / "mypkg" + linux = json.loads((pkg_dir / "os-linux-30d-non-ci.json").read_text()) + macos = json.loads((pkg_dir / "os-macos-30d-non-ci.json").read_text()) + windows = json.loads((pkg_dir / "os-windows-30d-non-ci.json").read_text()) + + assert linux["label"] == "linux (30d)" + assert linux["message"] == "100" + assert linux["color"] == "blue" + + assert macos["label"] == "macos (30d)" + assert macos["message"] == "30" + assert macos["color"] == "blue" + + assert windows["label"] == "windows (30d)" + assert windows["message"] == "5" + # 5 < 10 → lightgrey per the existing color logic. + assert windows["color"] == "lightgrey" + + +def test_collect_one_v0_2_0_files_unchanged_alongside_os_files(tmp_path: Path) -> None: + """The v3 OS feature must not change v0.2.0's filename, schema, or value + for any given pypinfo response. Asserts existence + shape of all + pre-v3 files plus the 3 new OS files = 11 total per package per window.""" + creds = tmp_path / "creds.json" + creds.write_text("{}") + output_dir = tmp_path / "out" + + rows = [ + {"ci": "False", "download_count": 100, "installer_name": "pip", "system_name": "Linux"}, + ] + + def fake_runner(argv: Sequence[str], env: dict[str, str]) -> subprocess.CompletedProcess[str]: + return subprocess.CompletedProcess( + args=list(argv), returncode=0, stdout=json.dumps({"rows": rows}), stderr="" + ) + + config = Config( + service=ServiceConfig( + credential_file=creds, + output_dir=output_dir, + stale_threshold_days=3, + ), + packages=(PackageConfig(name="mypkg", window_days=30),), + ) + + collect(config, runner=fake_runner) + + pkg_dir = output_dir / "mypkg" + expected = { + "downloads-30d-non-ci.json", + "installer-pip-30d-non-ci.json", + "installer-pipenv-30d-non-ci.json", + "installer-pipx-30d-non-ci.json", + "installer-uv-30d-non-ci.json", + "installer-poetry-30d-non-ci.json", + "installer-pdm-30d-non-ci.json", + "installer-pip-family-30d-non-ci.json", + "os-linux-30d-non-ci.json", + "os-macos-30d-non-ci.json", + "os-windows-30d-non-ci.json", + } + actual = {p.name for p in pkg_dir.iterdir()} + assert expected == actual, f"missing: {expected - actual}, extra: {actual - expected}" + + # Hero schema unchanged. + hero = json.loads((pkg_dir / "downloads-30d-non-ci.json").read_text()) + assert hero["message"] == "100" + assert hero["label"] == "pip*/uv/poetry/pdm (30d)" From a8f004f24762604196801640c3f39f13e1e9085c Mon Sep 17 00:00:00 2001 From: "cmeans-claude-dev[bot]" <272174644+cmeans-claude-dev[bot]@users.noreply.github.com> Date: Wed, 29 Apr 2026 18:47:58 -0500 Subject: [PATCH 4/7] test(collector): add system_name to _fake_runner_for non-CI row Code-quality reviewer flagged that _fake_runner_for emits rows without a system_name field, making by_system always all-zero in integration tests using this helper. Adding system_name="Linux" to the non-CI row gives the per-OS aggregates non-zero values in those tests, exercising realistic badge-color logic instead of only the lightgrey path. Verified: 86/86 tests still pass, ruff/format/mypy clean. Co-Authored-By: Claude Opus 4.7 (1M context) --- tests/test_collector.py | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/tests/test_collector.py b/tests/test_collector.py index 6b22520..475c59e 100644 --- a/tests/test_collector.py +++ b/tests/test_collector.py @@ -715,9 +715,18 @@ def runner(argv: list[str], env: dict[str, str]) -> subprocess.CompletedProcess[ non_ci_count = counts_by_package.get(pkg, 0) stdout = json.dumps( { + # system_name on the non-CI row keeps the per-OS aggregates + # non-zero in integration tests, so OS-badge writes have realistic + # values (otherwise by_system would be all zeros and exercise only + # the lightgrey color path). "rows": [ {"ci": "True", "download_count": 10_000, "installer_name": "pip"}, - {"ci": "False", "download_count": non_ci_count, "installer_name": "pip"}, + { + "ci": "False", + "download_count": non_ci_count, + "installer_name": "pip", + "system_name": "Linux", + }, ], "query": {}, } From de8ecc30b207132fb1475a64c0fb90f274903a26 Mon Sep 17 00:00:00 2001 From: "cmeans-claude-dev[bot]" <272174644+cmeans-claude-dev[bot]@users.noreply.github.com> Date: Wed, 29 Apr 2026 18:50:22 -0500 Subject: [PATCH 5/7] feat(collector): _health.json gains counts_by_system per package MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per-package successful entries in _health.json now include counts_by_system alongside the existing counts (per-installer) field. v0.2.0 fields (count, counts, window_days) preserved verbatim — no change to existing monitoring or scripting that reads them. Co-Authored-By: Claude Opus 4.7 (1M context) --- src/pypi_winnow_downloads/collector.py | 2 + tests/test_collector.py | 69 ++++++++++++++++++++++++++ 2 files changed, 71 insertions(+) diff --git a/src/pypi_winnow_downloads/collector.py b/src/pypi_winnow_downloads/collector.py index 7d1da81..e6e8605 100644 --- a/src/pypi_winnow_downloads/collector.py +++ b/src/pypi_winnow_downloads/collector.py @@ -440,6 +440,8 @@ def _write_health( entry: dict[str, Any] = {"count": o.count, "window_days": o.window_days} if o.counts is not None: entry["counts"] = o.counts + if o.counts_by_system is not None: + entry["counts_by_system"] = o.counts_by_system packages_section[o.package] = entry else: packages_section[o.package] = {"error": o.error, "window_days": o.window_days} diff --git a/tests/test_collector.py b/tests/test_collector.py index 475c59e..1ffd0dc 100644 --- a/tests/test_collector.py +++ b/tests/test_collector.py @@ -797,6 +797,7 @@ def test_collect_writes_health_file_with_per_package_counts_and_timestamps( "pdm": 0, "pip-family": 142, }, + "counts_by_system": {"Linux": 142, "Darwin": 0, "Windows": 0}, "window_days": 30, } } @@ -1424,3 +1425,71 @@ def fake_runner(argv: Sequence[str], env: dict[str, str]) -> subprocess.Complete hero = json.loads((pkg_dir / "downloads-30d-non-ci.json").read_text()) assert hero["message"] == "100" assert hero["label"] == "pip*/uv/poetry/pdm (30d)" + + +def test_health_json_includes_counts_by_system(tmp_path: Path) -> None: + """v3: per-package successful entries gain counts_by_system alongside + the existing counts field.""" + output_dir = tmp_path / "out" + creds = tmp_path / "creds.json" + creds.write_text("{}") + + stdout = _ok_rows( + [ + {"ci": "False", "download_count": 100, "installer_name": "pip", "system_name": "Linux"}, + {"ci": "False", "download_count": 30, "installer_name": "pip", "system_name": "Darwin"}, + ] + ) + + def fake_runner(argv, env): + return subprocess.CompletedProcess(argv, 0, stdout=stdout, stderr="") + + config = Config( + service=ServiceConfig( + credential_file=creds, + output_dir=output_dir, + stale_threshold_days=3, + ), + packages=(PackageConfig(name="mypkg", window_days=30),), + ) + collect(config, runner=fake_runner) + + health = json.loads((output_dir / "_health.json").read_text()) + pkg_entry = health["packages"]["mypkg"] + assert pkg_entry["counts_by_system"] == {"Linux": 100, "Darwin": 30, "Windows": 0} + + +def test_health_json_preserves_v0_2_0_fields(tmp_path: Path) -> None: + """v3 must not change existing _health.json fields for any given + pypinfo response. Asserts count, counts, window_days are all present + and have the expected v0.2.0 shape.""" + output_dir = tmp_path / "out" + creds = tmp_path / "creds.json" + creds.write_text("{}") + + stdout = _ok_rows( + [ + {"ci": "False", "download_count": 100, "installer_name": "pip", "system_name": "Linux"}, + ] + ) + + def fake_runner(argv, env): + return subprocess.CompletedProcess(argv, 0, stdout=stdout, stderr="") + + config = Config( + service=ServiceConfig( + credential_file=creds, + output_dir=output_dir, + stale_threshold_days=3, + ), + packages=(PackageConfig(name="mypkg", window_days=30),), + ) + collect(config, runner=fake_runner) + + health = json.loads((output_dir / "_health.json").read_text()) + pkg_entry = health["packages"]["mypkg"] + assert pkg_entry["count"] == 100 + assert pkg_entry["window_days"] == 30 + # Existing counts dict unchanged in v3. + assert pkg_entry["counts"]["pip"] == 100 + assert "pip-family" in pkg_entry["counts"] From 2e60cd57ee7bb039117f3abdcff08bea7092c527 Mon Sep 17 00:00:00 2001 From: "cmeans-claude-dev[bot]" <272174644+cmeans-claude-dev[bot]@users.noreply.github.com> Date: Wed, 29 Apr 2026 18:54:48 -0500 Subject: [PATCH 6/7] docs(README): add per-OS dogfood badges + breakdown paragraph + table rows Each dogfood package's block gains a 'By OS (30d, non-CI):' paragraph parallel to the existing 'By installer' paragraph (3 badges: linux/macos/windows). 'What these badges actually count' gains a 'By OS breakdown' paragraph documenting the per-OS-sum <= hero gap. 'Use this service for your own package' table grows 3 rows. Co-Authored-By: Claude Opus 4.7 (1M context) --- README.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/README.md b/README.md index 3763360..5b78e51 100644 --- a/README.md +++ b/README.md @@ -28,6 +28,12 @@ any existing alternative for small or young Python packages. [![poetry downloads](https://img.shields.io/endpoint?url=https%3A%2F%2Fpypi-badges.intfar.com%2Fpypi-winnow-downloads%2Finstaller-poetry-30d-non-ci.json)](https://pypi.org/project/pypi-winnow-downloads/) [![pdm downloads](https://img.shields.io/endpoint?url=https%3A%2F%2Fpypi-badges.intfar.com%2Fpypi-winnow-downloads%2Finstaller-pdm-30d-non-ci.json)](https://pypi.org/project/pypi-winnow-downloads/) +**By OS** (30d, non-CI): + +[![linux downloads](https://img.shields.io/endpoint?url=https%3A%2F%2Fpypi-badges.intfar.com%2Fpypi-winnow-downloads%2Fos-linux-30d-non-ci.json)](https://pypi.org/project/pypi-winnow-downloads/) +[![macos downloads](https://img.shields.io/endpoint?url=https%3A%2F%2Fpypi-badges.intfar.com%2Fpypi-winnow-downloads%2Fos-macos-30d-non-ci.json)](https://pypi.org/project/pypi-winnow-downloads/) +[![windows downloads](https://img.shields.io/endpoint?url=https%3A%2F%2Fpypi-badges.intfar.com%2Fpypi-winnow-downloads%2Fos-windows-30d-non-ci.json)](https://pypi.org/project/pypi-winnow-downloads/) + ## What these badges actually count The hero badge — labelled `pip*/uv/poetry/pdm (Nd)` (N=30 in the reference @@ -67,6 +73,8 @@ Useful for spotting installer-mix shifts (e.g., uv overtaking pip on a young package). See [Use this service for your own package](#use-this-service-for-your-own-package) below for the per-installer URL pattern. +**By OS breakdown.** Each per-OS badge applies the same `details.ci != True` filter as the hero — they answer "non-CI downloads on that OS." `Darwin` is pypinfo's emission for what users call macOS; the badge filename and label use `macos`. The per-OS sum can be less than the hero count: rows whose user-agent didn't expose a system_name (or exposed one outside Linux/Darwin/Windows) drop out of the per-OS aggregation but still count toward the hero — same pattern as the per-installer-sum ≤ hero gap. + ## Install ```bash @@ -121,6 +129,9 @@ JSON files per configured package per window, all under | `installer-poetry-30d-non-ci.json` | `poetry (30d)` | `poetry` only | | `installer-pdm-30d-non-ci.json` | `pdm (30d)` | `pdm` only | | `installer-pip-family-30d-non-ci.json` | `pip* (30d)` | `pip + pipenv + pipx` aggregate | +| `os-linux-30d-non-ci.json` | `linux (30d)` | Per-OS, Linux | +| `os-macos-30d-non-ci.json` | `macos (30d)` | Per-OS, macOS (Darwin) | +| `os-windows-30d-non-ci.json` | `windows (30d)` | Per-OS, Windows | All files exclude CI traffic (BigQuery's `details.ci != True`). Each is a [shields.io endpoint badge](https://shields.io/badges/endpoint-badge) JSON. From a21088ed4bd0a26373e2e58b736636e2a3414fec Mon Sep 17 00:00:00 2001 From: "cmeans-claude-dev[bot]" <272174644+cmeans-claude-dev[bot]@users.noreply.github.com> Date: Wed, 29 Apr 2026 18:57:12 -0500 Subject: [PATCH 7/7] docs(CHANGELOG): record v3 OS distribution feature in Unreleased Adds the per-OS-badges entry under ## [Unreleased] / ### Added, matching the project's per-PR CHANGELOG rule and the v0.2.0 v2-feature entry's house style. Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 9666b2f..1cecee8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -11,6 +11,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - **`.github/workflows/uv-lock-refresh.yml`** new scheduled workflow runs `uv lock --upgrade` every Thursday 12:00 UTC as a backstop for transitive dependency freshness — picks up minor/patch bumps that Dependabot's advisory- and cascade-driven flow hasn't yet surfaced. Skip-gate defers the run if a `dependencies` + `python`-labeled PR is already open (Dependabot mid-cycle or prior cron PR pending QA), so PRs don't overlap. Test gate (`uv sync --frozen --extra dev && uv run pytest`) blocks PR creation if the new lockfile breaks the suite. PR is opened via the existing `cmeans-claude-dev[bot]` App token (same path as `dependabot-changelog.yml`) so downstream CI checks (lint, typecheck, test, deploy-smoke) fire on the bot's push and don't leave the merge gate stuck. Most weeks: no PR (Dependabot already covered transitives via cascade). Spec: `docs/superpowers/specs/2026-04-29-uv-lock-refresh-cron-design.md`. +- **Per-OS badge files (v3 OS distribution feature).** The collector now emits three additional shields.io endpoint badge JSON files per package per window: `os-linux-d-non-ci.json`, `os-macos-d-non-ci.json`, `os-windows-d-non-ci.json`. The badge label format mirrors v2's parameterized `(Nd)` style — e.g., `linux (30d)`, `macos (30d)`, `windows (30d)`. Color logic (`blue` if count ≥ 10 else `lightgrey`) is unchanged. Pypinfo group-by extends from `ci installer` to `ci installer system` so a single BigQuery call returns both per-installer and per-system breakdowns; BigQuery cost is unchanged (same source table, marginal column). `run_pypinfo()`'s return type changes from `dict[str, int]` to a TypedDict carrying `by_installer` and `by_system` aggregates. `_health.json` per-package successful entries gain a `counts_by_system` field. `PackageOutcome` gains a `counts_by_system` attribute. Filename slug and badge label use `macos` (user-friendly); the internal allowlist key is `Darwin` to match pypinfo's raw emission. No `pyproject.toml` range changes. The v0.2.0 hero-stability invariant is preserved: hero count remains `sum(by_installer.values())` regardless of system_name; per-system aggregation applies an independent allowlist filter so rows with missing or non-allowlisted system_name drop out of the per-OS aggregates but still count toward the hero. Backwards-compat: `downloads-d-non-ci.json` and the seven `installer-*` files unchanged in filename, schema, and value for any given pypinfo response. README dogfood blocks gain a "By OS" paragraph parallel to the existing "By installer" paragraph; "What these badges actually count" gains a "By OS breakdown" paragraph; "Use this service for your own package" table grows three rows. Spec: `docs/superpowers/specs/2026-04-29-os-distribution-badge-design.md`. + ### Changed - **`.gitignore`** ignores a private operator-tooling directory `.deploy/` at the repo root so maintainer-specific deploy scripts and design docs stay out of public history. The directory holds tooling like `update-collector.sh` (drives the CT 112 deployment via `WINNOW_REMOTE_RUN`) plus matching design / plan documents — parameterized in principle but maintainer-shaped in practice (SSH-to-Holodeck, `pct exec`, journald awareness). Other self-hosters can use plain `uv pip install --upgrade pypi-winnow-downloads`; this tooling does not need a public contract or maintenance burden. Rule is unanchored (`.deploy/`) to match the convention of the rest of the file (`.venv/`, `dist/`, `__pycache__/` etc. are all unanchored). Internal-only; no user-facing behavior change.