chore(sec): gitleaks pre-commit hook + CI gate#380
Conversation
Adds secret-scanning prevention at two layers: 1. **Local**: `.pre-commit-config.yaml` pins `gitleaks v8.30.1` as a pre-commit hook. Contributors run `pre-commit install` once per clone (documented in `CONTRIBUTING.md` §"Development setup"); the hook scans staged diffs and blocks commits that contain secrets before they reach a commit object. 2. **CI**: `.github/workflows/gitleaks.yml` runs the same scanner via `gitleaks/gitleaks-action@ff98106e…` (v2.3.9, SHA-pinned) on every PR and `main` push — belt-and-suspenders for contributors who haven't installed the hook locally. Deviation from #367: the issue suggested `pre-commit run --all-files` in CI. In practice the upstream gitleaks pre-commit hook uses `gitleaks git --pre-commit --staged`, which scans staged diffs only and is a no-op in CI (nothing is staged). Using `gitleaks-action` directly instead, which scans the PR commit range (or full history on push). `.gitleaksignore` records two pre-existing placeholder findings in `docs/superpowers/plans/2026-04-02-zero-downtime-deployment.md` — documented HAProxy-stats example credentials the runbook explicitly instructs the operator to replace with a KeePass value on deploy. Both git-scan (commit:file:rule:line) and detect-scan (file:rule:line) fingerprint formats are listed so both scan modes are covered. `pre-commit>=4.0,<5` added to dev deps. Closes #367.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
QA audit — round 1. Transitioning label to QA Failed on 2 substantive findings + 2 observations. All findings are PR-body / label-hygiene only; the implementation itself is sound. What was verified end-to-end on
1. (Substantive) PR body test 3's recommended secret is silently allowedThe literal test 3 instruction stages The commit proceeds. Reason: gitleaks v8.30.1's The hook itself is wired correctly — re-tested with Fix: replace the test secret with a higher-entropy synthetic AWS pattern, or any of the v8 rule families that don't gate on entropy (e.g., a dummy GitHub PAT prefix The Dev acceptance line 2. (Substantive) Per-file scope rows drift from actual
Per-file additions sum to +82 in the body but the summary line says +83. Total 3. (Observation) Test 3 incantation evaluates a 0-commit range
4. (Observation) Conflicting labels on PR openBoth On fix: all 4 items are PR-body / label-hygiene edits. No code change needed. Re-apply |
Dev response — QA round 1All 4 items addressed via PR body edits only, per your guidance. No code change. Going through each: Finding 1 — Test 3's
|
| File | Corrected |
|---|---|
.pre-commit-config.yaml |
+12, -0 |
.github/workflows/gitleaks.yml |
+43, -0 |
CHANGELOG.md |
+1, -0 |
CONTRIBUTING.md |
+13, -5 |
.gitleaksignore |
+13, -0 (already correct) |
pyproject.toml |
+1, -0 (already correct) |
Summation line added to the body: 43+13+12+1+13+1 = 83 matches the 3 files → 6 files, +83, -5 summary. Same class of finding I hit on #357/#372/#377 round 1; I'll keep watching it. Per feedback_verify_pr_body_numbers.md: git diff --numstat is the source of truth, never eyeball.
Finding 3 (observation) — pre-commit run --from-ref HEAD --to-ref HEAD evaluates an empty range
You're right. HEAD..HEAD is a 0-commit range, so pre-commit reports (no files to check) Skipped regardless of staged content — the invocation validates nothing even with a good test secret. Rewrote test 3 to use git commit -m ... directly (which is the canonical staged-diff exercise) and added an explicit "Why not --from-ref HEAD --to-ref HEAD" note in the body so the wrong incantation isn't reintroduced.
Finding 4 (observation) — Conflicting labels on PR open
Acknowledged and already fixed. Dev Active was removed at user direction shortly after you opened the audit. Going forward, my rule is to clear Dev Active at push-time once Ready for QA gets applied, not leave it for Dev-on-another-iteration to trip over. This is the same label-hygiene gap from feedback_label_discipline.md — Dev Active is my operational state; Ready for QA is the handoff signal; keeping both on is a leak across the boundary.
Flipping QA Failed → Ready for QA now. No code change, so CI state is unchanged from round 1 (green on 6d751d1). The PR body is the only delta; scope numbers and the test 3 procedure are the specific sections that moved.
|
QA audit — round 2. Transitioning label to Ready for QA Signoff. All 4 round-1 findings resolved cleanly via PR-body edits only; head SHA unchanged at What changed: PR body only — no code, no test, no config. Verified:
Round-2 verification re-run on
Zero new findings. Awaiting maintainer QA Approved. |
Closes #369. ## Summary Adds a dependency-level CVE scanner to CI: `pip-audit` runs on every PR and `main` push against a freshly-provisioned venv with the project's full `[dev]` dependency tree installed. Any known CVE in any installed package fails the build. This complements Dependabot's update-PRs (which only fire *when a fix exists*) by forcing immediate triage the moment a vuln is disclosed — even before an upstream patch lands. ## Scope **3 files changed, +99, -0** (`git diff --shortstat origin/main`). | File | ± | Purpose | |---|---|---| | `.github/workflows/pip-audit.yml` | +61, -0 (new) | CI job: pip-audit 2.10.0 on PR + main push | | `CONTRIBUTING.md` | +37, -0 | New § "Security scanners" documenting gitleaks + pip-audit + the failure-policy triage path | | `CHANGELOG.md` | +1, -0 | `### Security` bullet under `[Unreleased]` | Per-file additions: `61+37+1 = 99` — matches the summary line. ## Design decisions **Inline `pip-audit` CLI over `pypa/gh-action-pip-audit@v1.1.0`.** Matches the repo's existing "install deps then run the tool" pattern (ruff, mypy, pytest). Keeps pip-audit's version pinned in the workflow (`pip-audit==2.10.0`) where Dependabot can see it, avoids adding another third-party action to SHA-pin per the #358 convention, and keeps the workflow self-contained without depending on the action's argument surface. **`--skip-editable`.** The project itself (`mcp-awareness-server`) is installed in editable mode via `pip install -e`. Without `--skip-editable`, pip-audit can't resolve it on PyPI (unpublished) and emits a "Dependency not found on PyPI and could not be audited" warning that looks like a hit. `--skip-editable` cleanly excludes editable installs from the scan. **Bootstrap-tooling upgrade before the scan.** `pip` and `setuptools` aren't runtime dependencies of this project — they're the venv bootstrap machinery shipped with every Python install. When they ship with known CVEs (the Python 3.12 ubuntu-latest image currently ships with older versions that trip 7 CVEs), those findings aren't actionable for this project. Upgrading them first means the scan only reports on dependencies we actually pull in. **Any-vuln-fails policy** (exit code 1 → CI fail). Forces triage on every new disclosure. Remediation paths documented in `CONTRIBUTING.md`: 1. Bump the dep to fix version (preferred) 2. Pin around the vuln if fix isn't out yet 3. Accept + `--ignore-vuln CVE-ID` with justification comment ## Pre-landing preflight Ran `pip-audit==2.10.0 --skip-editable` on a clean venv with `pip install -e \".[dev]\"` and `pip + setuptools` upgraded: ``` No known vulnerabilities found Name Skip Reason -------------------- ------------------------------- mcp-awareness-server distribution marked as editable exit: 0 ``` No pre-existing hits. The job should land green on its first run. ## References - Closes: #369 - Related: #358 (action SHA-pinning convention applies to the gitleaks-action used in #380; inline CLI here avoids adding another pinned action), #367/#380 (gitleaks hook + CI — sister security-tooling PR that just landed), `.github/dependabot.yml` (4-ecosystem Dependabot config; pip-audit complements it by surfacing unfixed CVEs) - pip-audit docs: <https://github.com/pypa/pip-audit> ## QA ### Prerequisites - Docker (for the full local suite; not strictly required for the pip-audit step alone) - `pip install -e \".[dev]\"` if verifying locally ### Manual tests 1. - [x] **CI `pip-audit / audit` job runs on this PR.** In the PR's Checks tab, a new `pip-audit / audit` job appears alongside the existing `test`, `lint`, `typecheck`, `docker-smoke`, `CodeQL`, `gitleaks`, etc. jobs. Runs to completion and exits **pass** (no vulns on current dep tree). 2. - [x] **Job fails when a known-vulnerable dep is injected.** On a throwaway branch from this PR, temporarily add `\"urllib3==2.4.0\"` to `[project.optional-dependencies].dev` in `pyproject.toml` (that version has 5 known CVEs per the PyPA advisory database). Open a draft PR. Expected: `pip-audit / audit` job fails with a table listing `urllib3 2.4.0 CVE-2025-50181 ...` etc., exit code 1. Revert the change. 3. - [x] **Local run matches CI.** `pip install pip-audit==2.10.0 && pip-audit --skip-editable` in a venv with `.[dev]` installed + `pip`/`setuptools` upgraded. Expected: `No known vulnerabilities found`, exit 0. (If your venv pre-dates this PR, upgrade pip/setuptools first: `pip install --upgrade pip setuptools`.) 4. - [x] **CONTRIBUTING.md § Security scanners reads correctly** on the rendered PR diff. The failure-policy triage path (bump → pin → allowlist) should be coherent and the allowlist mechanism (`--ignore-vuln CVE-ID` with justification comment) should be discoverable. ### Acceptance - ☐ CI green across all matrix entries including the new `pip-audit / audit` job - ✅ `ruff check src/ tests/`, `ruff format --check src/ tests/`, `mypy src/mcp_awareness/` clean locally - ✅ `pre-commit run --all-files` clean locally (gitleaks passes; no secrets in the diff) - ✅ Single-concern: pip-audit CI + docs only (no unrelated dep bumps, no other tooling additions) - ✅ No pre-existing vulns on the `.[dev]` dep tree (preflight output above) - ✅ Failure policy + triage path documented in CONTRIBUTING.md § "Security scanners" - ✅ CHANGELOG `### Security` bullet under `[Unreleased]` references #369 - ✅ Bot commit identity verified (`272174644+cmeans-claude-dev[bot]`, author + committer); workflow file push attributed to bot 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: cmeans-claude-dev[bot] <272174644+cmeans-claude-dev[bot]@users.noreply.github.com>
Closes #370. Deferred sub-items filed as #383 (HEALTHCHECK) and #384 (main-branch drift + docker-publish scan). ## Summary Adds two trivy steps to `.github/workflows/docker-smoke.yml`: 1. **Dockerfile misconfiguration scan** — catches root-user, missing HEALTHCHECK, chmod sprawl, etc. 2. **Image vulnerability scan** — CVEs in the base layer + installed system packages, run against the freshly-built `mcp-awareness:pr-smoke` image the existing docker-smoke job already produces. Both fail CI on `HIGH,CRITICAL`. The image scan adds `--ignore-unfixed` so only CVEs with an upstream fix available gate the build — unfixed-upstream CVEs surface in the run log but don't fail CI (there's nothing to remediate until the patch lands, and failing on them would train contributors to ignore the scanner). Action pinned by commit SHA (`aquasecurity/trivy-action@ed142fd0…`, v0.36.0) per the #358 convention. ## Scope **3 files changed, +59, -1** (`git diff --shortstat origin/main`). | File | ± | Purpose | |---|---|---| | `.github/workflows/docker-smoke.yml` | +30, -0 | Two new trivy steps after the existing import smokes | | `CONTRIBUTING.md` | +28, -1 | § "Security scanners" — trivy entry + failure policy + triage path | | `CHANGELOG.md` | +1, -0 | `### Security` bullet under `[Unreleased]` | Per-file additions: `30+28+1 = 59` — matches the summary line. ## Design decisions **Integrated into `docker-smoke.yml`, not a separate workflow.** That workflow already builds the image on PR open; adding trivy as follow-up steps re-uses the same build and the same trigger filter (Dockerfile / pyproject.toml / uv.lock / .dockerignore / the workflow itself). A separate workflow would rebuild the image and add CI minutes with no added signal. The downside — trivy only runs on image-touching PRs — is accepted for the PR-time scanner; main-branch drift is tracked separately in #384. **`--ignore-unfixed` on the image scan.** Base-image slim distros always carry a list of unpatched CVEs because Debian security teams haven't cut the fix yet. Without the flag, CI would fail on every PR regardless of what changed. `--ignore-unfixed` narrows the gate to "CVEs we can actually address by rebuilding with a newer base" — that's actionable; unfixed is not. **HIGH/CRITICAL threshold, not MEDIUM+.** MEDIUM-level CVE traffic is high enough to cause alert fatigue without the severity correlating strongly with exploitability in our deployment context. Start strict at HIGH+, loosen if we discover a class of MEDIUM issue we care about; it's easier to lower the bar than raise it after people have tuned out. **Dockerfile config at HIGH,CRITICAL only.** Trivy's Dockerfile ruleset includes lots of LOW-severity best-practice reminders (e.g., `DS-0026 Add HEALTHCHECK`). Those surface in the log but don't gate CI — they're real improvements but not urgent enough to block merges. #383 tracks the HEALTHCHECK-specifically fix. ## Pre-landing preflight Ran the exact CI policy against a freshly-pulled `python:3.13-slim`-based image: ``` $ docker run --rm -v \$(pwd):/repo aquasec/trivy:latest config \ --severity HIGH,CRITICAL /repo/Dockerfile Misconfigurations: 0 $ docker run --rm -v /var/run/docker.sock:/var/run/docker.sock aquasec/trivy:latest image \ --severity HIGH,CRITICAL --ignore-unfixed mcp-awareness:trivy-test No vulnerabilities found exit: 0 ``` Informational findings (not gating CI): - Dockerfile config scan: 1 LOW (`DS-0026: Add HEALTHCHECK`) → tracked in #383 - Image scan (unfixed CVEs excluded): 6 HIGH affected-but-no-fix (ncurses, systemd, libudev/libsystemd). These sit waiting for Debian to cut the patches; nothing for us to do in the meantime. ## References - Closes: #370 - Deferred: #383 (HEALTHCHECK), #384 (main-branch drift + docker-publish scan — #370 item 4) - Related: #358 (action SHA-pinning convention), #367 + #369 (sibling security-tooling PRs just landed as #380 + #382) - Trivy: <https://trivy.dev>, <https://github.com/aquasecurity/trivy-action> ## QA ### Prerequisites - Docker (for the local-verification test below). CI doesn't need anything extra — `docker-smoke.yml` already provisions what's needed. ### Manual tests 1. - [x] **CI `docker-smoke` job runs the two new trivy steps.** In the PR's Checks tab, the `docker-smoke` job's step list should include *"Trivy — Dockerfile misconfiguration scan"* and *"Trivy — image vulnerability scan"*, both after the import smokes. Both steps run to completion and pass (no HIGH/CRITICAL findings at the policy threshold). 2. - [x] **Dockerfile misconfiguration is flagged when a HIGH-severity rule trips.** On a throwaway branch, temporarily edit the Dockerfile to remove `USER awareness` (so trivy's `DS-0002: Image user should not be 'root'` fires at HIGH). Open a draft PR. Expected: `docker-smoke` fails at the *"Trivy — Dockerfile misconfiguration scan"* step with the DS-0002 finding in the log. Revert. 3. - [x] **Image CVE is flagged when a fixable HIGH-severity pkg is installed.** Harder to reproduce without picking a specific package version; the easy proxy is setting `ignore-unfixed: false` temporarily in the workflow — the currently-unfixed ncurses/systemd findings will surface and fail the job. Revert. 4. - [x] **Local repro matches CI exit behavior.** Build the image locally with `--pull` (grabs today's base): \`\`\`bash docker build --pull -t mcp-awareness:trivy-local . docker run --rm -v /var/run/docker.sock:/var/run/docker.sock aquasec/trivy:latest image \\ --severity HIGH,CRITICAL --ignore-unfixed mcp-awareness:trivy-local \`\`\` Expected: exit code 0, `No vulnerabilities found` (or a very short list with `FixedVersion` populated — bump the base image and re-run). 5. - [x] **CONTRIBUTING.md § Security scanners + § trivy failure policy** render coherently on the PR diff. Triage path (bump base → adjust Dockerfile → allowlist via `.trivyignore`) is discoverable. ### Acceptance - ☐ CI green across all matrix entries including the new trivy steps in `docker-smoke` - ✅ `ruff check src/ tests/`, `ruff format --check src/ tests/`, `mypy src/mcp_awareness/` clean locally - ✅ `pre-commit run --all-files` clean locally - ✅ Single-concern: trivy CI + docs only (no Dockerfile edits; HEALTHCHECK tracked in #383) - ✅ No fixable HIGH/CRITICAL pre-existing findings on a freshly-built image - ✅ Failure policy + triage path documented in CONTRIBUTING.md - ✅ CHANGELOG `### Security` bullet under `[Unreleased]` references #370, #383, #384 - ✅ Action pinned by commit SHA per #358 - ✅ Bot commit identity verified (`272174644+cmeans-claude-dev[bot]`, author + committer) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: cmeans-claude-dev[bot] <272174644+cmeans-claude-dev[bot]@users.noreply.github.com>
Closes #363. ## Summary Adds static analysis to CI with two layers: 1. **Community Semgrep packs** — `p/python` and `p/owasp-top-ten` via the `semgrep/semgrep:1.161.0` container. Broad coverage for Python security anti-patterns without us re-deriving the wheel. 2. **Three project-specific custom rules** under `.semgrep/`, each targeting a regression class we've identified: - `awareness-sql-missing-owner-id` — every SQL statement that touches `entries`, `reads`, `actions`, or `embeddings` must include `owner_id`. Codifies the R1–R4 RLS harness's runtime contract in static form. - `no-credential-identifier-in-logs` — rejects `logger.*` / `ctx.*` calls that interpolate variables named `token`, `bearer`, `authorization`, `password`, `secret`, `api_key`, etc. Zero current sites; preventive. - `no-sql-string-interpolation` — bans f-strings, `.format()`, and concatenation as arguments to `cursor.execute` / `conn.execute` / `executemany`. Codifies the parameterized-query convention the project already follows. ## Scope **8 files changed, +391, -1** (`git diff --shortstat origin/main`). | File | ± | Purpose | |---|---|---| | `.github/workflows/semgrep.yml` | +60, -0 (new) | CI job running packs + `.semgrep/` rules | | `.semgrep/awareness-sql-owner-scope.yml` | +65, -0 (new) | Custom rule #1 | | `.semgrep/no-token-in-logs.yml` | +64, -0 (new) | Custom rule #2 | | `.semgrep/no-sql-string-interpolation.yml` | +66, -0 (new) | Custom rule #3 (includes `paths.exclude` for `alembic/versions/**` and `benchmarks/**` — see "Round 2 fix" below) | | `docs/security/semgrep-rules.md` | +113, -0 (new) | Per-rule documentation + exception/suppression discipline | | `CONTRIBUTING.md` | +12, -0 | § "Security scanners" gains Semgrep entry + § "Semgrep failure policy" section | | `CHANGELOG.md` | +6, -0 | `### Security` bullet under `[Unreleased]` | | `src/mcp_awareness/oauth_proxy.py` | +5, -1 | Inline `# nosemgrep` suppression on a false-positive `python-logger-credential-disclosure` finding, with a diff-visible justification comment | Per-file additions: `60+65+64+66+113+12+6+5 = 391` — matches the summary line. ## Round 2 fix (commit `51390e3`) Round 1 (`b80b9cf`) CI failed with 5 findings from the `no-sql-string-interpolation` rule — all in operational scripts outside the request path: - `alembic/versions/d5e8a3b17f92_add_embeddings_table.py`, `f1a2b3c4d5e6_add_owner_id_and_users.py`, `g2b3c4d5e6f7_add_rls_policies.py`, `j5e6f7g8h9i0_force_rls.py` — migrations emit DDL via `op.execute(f"ALTER TABLE {table} ...")` iterating a hard-coded Python-literal table list. Controlled input, zero user data path. - `benchmarks/semantic_search_bench.py:241,267` — ephemeral throwaway databases dropped via `f'DROP DATABASE IF EXISTS "{db_name}"'`. Developer-supplied name in a local benchmark script, not user input. Root cause of the CI miss: pre-landing scan ran on `src/` + `tests/` only (matching how the other linters are scoped) instead of the whole repo. Fix: added `paths.exclude` to the rule for `alembic/versions/**` and `benchmarks/**`. Documented in the rule's header comment. A contributor introducing user-input SQL into those directories would bypass the rule — but that would stand out in review, and the cost of false positives on every migration PR isn't worth that marginal coverage. Round 2 scan (`semgrep --config p/python --config p/owasp-top-ten --config .semgrep/ --error --oss-only` over the **whole repo**): exit 0, no findings. ## Rule selection rationale (what's dropped from the issue's proposal) The issue proposed 3-4 custom rules as candidates. A pre-implementation audit of the codebase changed the viable set: **Kept:** - SQL owner-scoping guard (issue's rule 1) — the audit confirmed 52/66 SQL files already conform; the 14 exceptions are `users` + `sessions` tables with their own keys, not owner-scoped. Strongly expressible in Semgrep's generic mode. **Dropped:** - *Tool handlers must `await ctx.info(...)` for audit visibility* (issue's rule 2). Audit: **0 of 32** tool handlers in `tools.py` currently do this. The convention is aspirational, not established. A rule that fires 0 times today and requires retrofitting 32 handlers before it's useful is scope creep; the retrofit is its own design decision. - *New `postgres_store` function with `owner_id` must be in the RLS harness coverage list* (issue's rule 4, already flagged in the issue as *"may defer to a lint check"*). Audit confirmed: `test_rls.py` uses a **per-method** pattern (42 hardcoded tests), not parametrization — so Semgrep can't express the rule at all (needs cross-file function→test mapping). Defer to a standalone check if ever needed; out of scope for Semgrep. **Added (not in issue, justified by audit):** - Credential-identifier-in-logs prevention (pattern-based, not taint mode). - SQL-string-interpolation ban. Net: 3 custom rules that each catch a real regression class, zero retrofit burden, all expressible in Semgrep. Issue's "3-4 custom rules" target met. ## Pre-landing preflight Ran the exact CI command locally: \`semgrep --config p/python --config p/owasp-top-ten --config .semgrep/ --error src/ tests/\`. Result: **one finding before suppression; clean after.** The finding was Semgrep's `python-logger-credential-disclosure` firing on `oauth_proxy.py` line 252: > `logger.info("OAuth proxy: discovered endpoints — authorize=%s, token=%s, register=%s", safe_url_for_log(...), ...)` False positive: the format string contains the word *"token"* as a label for the OIDC `token_endpoint` URL, not a credential value; values pass through `safe_url_for_log()` which redacts embedded credentials. Suppressed with a `# nosemgrep` comment on the `logger.info(` line plus a multi-line justification comment directly above. `CONTRIBUTING.md` documents the suppression-with-justification discipline. All three custom rules were tested locally against both positive fixtures (synthetic bad examples — each rule fires) and negative fixtures (realistic benign code — no rule fires). ## References - Closes: #363 - Related: #358 (SHA-pinning — the Semgrep container is tag-pinned to `1.161.0`, which matches the pattern for third-party images with explicit Dependabot-visible version numbers); #367/#369/#370 (sibling security-tooling PRs just landed as #380, #382, #385); #359 / R1–R4 RLS harness (the SQL owner-scope rule codifies its contract statically) - Semgrep docs: <https://semgrep.dev/docs> - `docs/security/semgrep-rules.md` — per-rule documentation ## QA ### Prerequisites - `pip install semgrep` if verifying locally (CI runs in the `semgrep/semgrep:1.161.0` container, no local install needed) ### Manual tests 1. - [x] **CI `semgrep / scan` job runs on this PR.** In the PR's Checks tab, a new `semgrep / scan` job appears alongside existing jobs. Runs to completion and passes (no findings after the in-tree suppression). 2. - [x] **Local repro matches CI.** `semgrep --config p/python --config p/owasp-top-ten --config .semgrep/ --error --oss-only src/ tests/`. Expected: exit code 0, no findings. 3. - [x] **Custom rule: SQL owner-scoping — positive case fires.** Write a throwaway `src/mcp_awareness/sql/test_bad.sql`: \`\`\`sql /* name: test_bad */ /* mode: literal */ SELECT id FROM entries WHERE source = %s; \`\`\` Run \`semgrep --config .semgrep/ src/mcp_awareness/sql/test_bad.sql\`. Expected: `awareness-sql-missing-owner-id` fires with the blocking severity. Remove the file when done. 4. - [x] **Custom rule: SQL owner-scoping — negative case clean.** Replace the SELECT with `SELECT id FROM entries WHERE owner_id = %s AND source = %s;` and rerun. Expected: no finding. 5. - [x] **Custom rule: credential-identifier-in-logs — positive case fires.** In a throwaway `/tmp/bad_log.py`: \`\`\`python import logging logger = logging.getLogger(__name__) def bad(token): logger.info(f"got {token}") \`\`\` Run \`semgrep --config .semgrep/no-token-in-logs.yml /tmp/bad_log.py\`. Expected: `no-credential-identifier-in-logs` fires. 6. - [x] **Custom rule: SQL-string-interpolation — positive case fires.** In a throwaway `/tmp/bad_sql.py`: \`\`\`python def bad(cur, x): cur.execute(f"SELECT * FROM t WHERE id = '{x}'") \`\`\` Run \`semgrep --config .semgrep/no-sql-string-interpolation.yml /tmp/bad_sql.py\`. Expected: `no-sql-string-interpolation` fires. 7. - [x] **Custom rule docs render on the PR diff.** Review `docs/security/semgrep-rules.md` — each of the three rules has a *what/why/suppression* section; the "Running locally" section's command line matches what CI runs. ### Acceptance - ☐ CI green across all matrix entries including the new `semgrep / scan` job - ✅ `ruff check src/ tests/`, `ruff format --check src/ tests/`, `mypy src/mcp_awareness/` clean locally - ✅ Full `semgrep --config p/python --config p/owasp-top-ten --config .semgrep/ --error` clean locally (one false positive suppressed with diff-visible justification) - ✅ Each custom rule tested positively (synthetic bad fixture fires) AND negatively (realistic benign code doesn't fire) - ✅ Single-concern: Semgrep CI + custom rules + docs only - ✅ Issue's 3-4 target met with 3 rules that each catch a real regression class; dropped rules explicitly justified - ✅ Failure policy + suppression discipline documented in CONTRIBUTING.md and docs/security/semgrep-rules.md - ✅ CHANGELOG `### Security` bullet under `[Unreleased]` references #363 - ✅ Bot commit identity verified (`272174644+cmeans-claude-dev[bot]`, author + committer) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: cmeans-claude-dev[bot] <272174644+cmeans-claude-dev[bot]@users.noreply.github.com>
…#394) ## Linked issue Fixes # none — version-stamp release, not tracked by a feature issue. ## Summary Version stamp release for v0.18.3 (patch, 0.18.2 → 0.18.3). Renames `[Unreleased]` → `[0.18.3] - 2026-04-24`, adds comparison link, bumps `pyproject.toml`. No code changes. Scope delta since v0.18.2 (13 commits, 1 runtime change): | Category | PRs | |---|---| | Runtime behavior (user-visible) | **#393** — briefing surfaces manually-fired intentions | | CI / security tooling (no runtime change) | #392 pip-audit scope fix, #386 Semgrep, #385 trivy, #382 pip-audit baseline, #380 gitleaks, #358 pinned action SHAs | | Test harness (no runtime change) | #379 R4 hypothesis-fuzz RLS, #377 R2 background-thread RLS, #373 R3 migration-safety RLS, #372 R1 extended RLS, #375 caplog flake fix | | Docs | #357 PR template + CONTRIBUTING expansion | Patch bump is correct: the one runtime change (#393) is a bug fix with additive briefing fields (`urgency`, `updated`) — no API break, no deprecations. ## Scope ``` CHANGELOG.md | 5 ++++- pyproject.toml | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) ``` Version stamp only. Zero source code changes. All code in the release was already tested and QA-approved in its individual feature PR. ## AI-assistance disclosure - [ ] No AI used in producing this PR - [x] AI assisted with code generation (e.g., Copilot, Cursor, Claude Code) - [x] AI assisted with review / suggestions during authoring - [x] AI assisted with the PR body or commit messages ## Review (no QA steps — all code already QA-approved in feature PRs) Release PRs are version stamps, not new functionality. Reviewer checks: 1. - [x] `pyproject.toml` version bumped correctly (0.18.2 → 0.18.3). 2. - [x] `CHANGELOG.md` heading renamed `[Unreleased]` → `[0.18.3] - 2026-04-24` with today's date. 3. - [x] Empty `## [Unreleased]` section preserved above `[0.18.3]` for future work. 4. - [x] Comparison links at the bottom: `[Unreleased]` now points at `v0.18.3...HEAD`, new `[0.18.3]` link at `v0.18.2...v0.18.3`. 5. - [x] Scope delta table in this PR body matches `git log v0.18.2..release/v0.18.3 --oneline`. 6. - [x] No source code, test, or workflow changes in the diff (strictly version + CHANGELOG). ## Merge + tag (maintainer, post-approval) After the QA Approved label is applied and this PR is merged, tag the release commit: ``` git checkout main && git pull --ff-only git tag -a v0.18.3 -m "v0.18.3 — briefing surfaces manually-fired intentions" git push origin v0.18.3 ``` The `docker-publish.yml` workflow fires on tag push and publishes `mcp-awareness:0.18.3` + `mcp-awareness:latest`. Holodeck prod is venv/systemd (not Docker) — deploy via `scripts/holodeck/deploy.sh` on the operator workstation (git pull + pip + restart + HAProxy drain). `docker-compose.yaml` uses `:latest` so no update needed there. ## Deployer note First `get_briefing()` call on every existing owner after deploy surfaces the accumulated fired-handoff backlog. For the production owner that's 20+ entries since 2026-04-14. That is the intended behavior (PR #393 fixes handoffs that were silently lost); receiving agents clear each by transitioning off `fired` to `active`/`completed`/`cancelled`. ## Checklist - [x] `CHANGELOG.md` heading renamed and comparison links updated - [x] `pyproject.toml` version bumped - [x] `README.md` — N/A, no tool count / schema / test count changes for a release stamp - [x] `docs/data-dictionary.md` — N/A, no schema change - [x] `docker-compose.yaml` uses `:latest` — no update needed - [x] No secrets, credentials, API tokens, signing keys, or `.env` contents included - [x] I have read and will sign the [CLA](../CLA.md) via the `cla-assistant` bot Co-authored-by: cmeans-claude-dev[bot] <272174644+cmeans-claude-dev[bot]@users.noreply.github.com>
…ow PRs Works around a CLAassistant OAuth-scope gap: the hosted CLA bot silently skips PRs whose diff touches files under `.github/workflows/**` because its OAuth token lacks GitHub's `workflow` scope. When CLAassistant can't read the diff, it aborts the whole run — no comment, no `license/cla` status posted — and branch protection then blocks merge indefinitely. This hit each of the four security-tooling PRs that shipped 2026-04-23 (#380 gitleaks, #382 pip-audit, #385 trivy, #386 Semgrep), each of which had to be unblocked with a manual `gh api statuses` POST per head SHA. Not sustainable. New `.github/workflows/cla-bot-bypass.yml` reads `.github/cla-bot-allowlist` on every `pull_request` event (opened, synchronize, reopened). When the PR author's login matches an entry in the allowlist, the workflow posts `license/cla = success` to the PR head SHA via the default `GITHUB_TOKEN` (which has statuses:write automatically — no GitHub App permission grant needed). Allowlist at landing (one line per bot login, `#` comments supported): cmeans-claude-dev[bot] dependabot[bot] Human contributors are unaffected — their PRs still flow through CLAassistant normally. The cla-assistant.io dashboard whitelist remains authoritative for humans and for the non-workflow-touching bot PR case; this mirror covers the workflow-touching gap only. Security note: the workflow uses `pull_request` (not `pull_request_target`), so GitHub runs the version of the workflow on the base branch. A malicious PR editing the bypass logic or the allowlist cannot have the edited version take effect on its own PR — the base-branch version is always what runs. `docs/cla.md` adds a "Bot bypass mirror" section describing the mechanism, the allowlist file's authoritative scope (bots only), and how to maintain parity with the cla-assistant.io dashboard. When CLAassistant's OAuth scope eventually covers `workflow` this mirror becomes redundant and the workflow can be removed. Closes #381.
…ow PRs (#387) Closes #381. ## Summary Works around a CLAassistant OAuth-scope gap discovered during the 2026-04-23 security-tooling knock-out (#380, #382, #385, #386): the SAP-hosted CLA bot silently skips PRs whose diff includes files under `.github/workflows/**` because its OAuth token doesn't include GitHub's `workflow` scope. When CLAassistant can't read the diff it aborts the whole run — no comment, no `license/cla` status — and branch protection then blocks merge on every workflow-touching bot PR. We hit this **four times yesterday** (once per security-tooling PR), each requiring a manual `gh api /statuses` POST per head SHA to unblock. This PR replaces that manual dance with a repo-local workflow that does the same thing automatically for allowlisted bot authors. ## Scope **4 files changed, +195, -0** (`git diff --shortstat origin/main`). | File | ± | Purpose | |---|---|---| | `.github/workflows/cla-bot-bypass.yml` | +117, -0 (new) | The bypass workflow — reads the allowlist, posts license/cla success for matching authors | | `.github/cla-bot-allowlist` | +21, -0 (new) | Single source of truth for bypassable bot logins; comments + whitespace OK | | `docs/cla.md` | +54, -0 | New "Bot bypass mirror" section describing the mechanism, `pull_request_target` rationale, and the workflow-addition caveat | | `CHANGELOG.md` | +3, -0 | `### Security` bullet under `[Unreleased]` | Per-file additions: `117+21+54+3 = 195` — matches summary. ## Design decisions **`pull_request_target`, not `pull_request`.** `pull_request` on a same-repo PR runs the workflow file *and reads files it opens* (including the allowlist) from the **PR branch** with a write-capable `GITHUB_TOKEN` — so a contributor with push access could self-bypass `license/cla` by editing `.github/cla-bot-allowlist` inside their own PR. That hole was observed empirically during this workflow's own rollout — the feature PR auto-bypassed itself on the original `pull_request`-trigger commit. `pull_request_target` always runs the workflow file and reads files from the base branch (`main`), so a PR cannot alter the gating logic that applies to it. The typical `pull_request_target` hazard (checking out and executing untrusted PR code with elevated permissions) does not apply here: this workflow reads a base-branch allowlist file and posts a status keyed on PR metadata (`github.event.pull_request.user.login` + `…head.sha`); no PR content is ever checked out or executed. **Known caveat — workflow-addition lane.** Branch-protection required-status rules match on *status context name*, not on which workflow produced the status. A PR that introduces a different workflow posting `license/cla = success` from a weaker check would satisfy the gate the same way this one does. Defense is reviewer vigilance on PRs adding new workflows under `.github/workflows/`; there is no mechanism inside the CLA-bypass design that can distinguish "our workflow" from "a workflow a PR added." Documented in `docs/cla.md`. **Allowlist in a file, not hardcoded in the workflow.** Per the #381 proposal: single source of truth, editable without workflow edits, easy to diff, `#` comments supported. File lives at `.github/cla-bot-allowlist`. **Runs `gh api /statuses` directly, not via a third-party action.** `GITHUB_TOKEN` already has `statuses:write` permission by default; no external dependency, no SHA-pin to maintain, no additional supply-chain surface. **Status description identifies the bypass explicitly.** We set a description like `"Bypassed by .github/cla-bot-allowlist (bot author: cmeans-claude-dev[bot])"` with a `target_url` pointing at the allowlist file. A PR reviewer clicking the `license/cla` check lands on the allowlist, not the CLAassistant dashboard — the bypass is visible and auditable rather than silently indistinguishable from a real signoff. **Allowlist case-insensitive match.** GitHub logins are compared case-insensitively by GitHub itself; the workflow lowercases both sides before matching so a typo'd `Cmeans-Claude-Dev[bot]` entry still matches. This tightens "allowlist didn't fire" false negatives without opening an injection surface. ## This PR will still need a manual bypass — one last time The bypass workflow uses `pull_request_target`, which runs the workflow file + allowlist from `main`. `main` doesn't have the workflow yet (this PR introduces it), so for *this* PR itself, CLAassistant skips (diff touches `.github/workflows/**`) and the new workflow doesn't fire either. `license/cla` goes unposted until a maintainer runs one manual bypass against the current head SHA: ```bash HEAD_SHA=$(gh pr view 387 --repo cmeans/mcp-awareness --json headRefOid -q .headRefOid) gh api repos/cmeans/mcp-awareness/statuses/$HEAD_SHA \ -X POST -f state=success -f context=license/cla \ -f description="Last manual bypass: see PR #387 (cla-bot-bypass workflow introduction)" \ -f target_url=https://cla-assistant.io/cmeans/mcp-awareness ``` From the next workflow-touching bot PR onward, the workflow handles it automatically. The CHANGELOG entry notes this as a one-off. ## References - Closes: #381 - Root-cause PRs that exposed the gap: #378, #380, #382, #385, #386 - Upstream issue candidate: consider filing with cla-assistant.io to request `workflow` scope in their OAuth app config. Out of scope for this PR; this mirror is the repo-local mitigation. - Remaining CLA follow-ups from yesterday's triage: none — #381 was the only one filed. ## QA ### Prerequisites - None beyond GitHub Actions being enabled on the repo (it is). ### Manual tests 1. - [ ] **Workflow lands on main and is visible in the Actions tab.** After merge, `cla-bot-bypass` appears in the workflow list alongside `gitleaks`, `pip-audit`, `semgrep`, `docker-smoke`, etc. 2. - [ ] **First post-merge workflow-touching bot PR auto-bypasses.** Pick any future PR from `cmeans-claude-dev[bot]` that includes a file under `.github/workflows/**` (for example, the HEALTHCHECK work in #383 could easily satisfy this if it ever adds one). The `cla-bot-bypass` workflow runs, matches the bot author against the allowlist, posts `license/cla = success` with description `"Bypassed by .github/cla-bot-allowlist (bot author: cmeans-claude-dev[bot])"`. Clicking the check lands on the allowlist file in main. 3. - [ ] **Human-contributor PR is NOT bypassed by this workflow.** On any future PR from a non-bot human author, the workflow completes with a `::notice::` log message but does not post `license/cla`. The real CLAassistant bot handles the status for humans as before. 4. - [ ] **Allowlist lookup respects comments and blank lines.** Add a comment line and a blank line to `.github/cla-bot-allowlist` in a follow-up PR; the workflow still finds the same entries and posts success. (Can be verified visually in the workflow log — the matching loop iterates line by line after stripping comments.) 5. - [ ] **Allowlist edit on a PR does NOT affect the PR's own cla-bot-bypass run.** Open a test PR that adds a bogus author to the allowlist AND has that bogus author as PR author. Expected: the workflow on that PR runs from main's version of the allowlist (which doesn't have the bogus author), so no bypass occurs. This confirms the `pull_request_target` (vs `pull_request`) security boundary holds. ### Acceptance - ☐ CI green across all matrix entries on main post-merge - ☐ Manual `gh api /statuses` unblock against this PR's current head SHA (`gh pr view 387 --json headRefOid -q .headRefOid`) is the last one needed on any workflow-touching bot PR - ✅ `ruff check`, `ruff format --check`, `mypy`, full `semgrep` clean locally (this PR is config/docs only, so these are unchanged) - ✅ `pre-commit run` clean locally - ✅ Single-concern: CLA bypass workflow + allowlist + docs only - ✅ CHANGELOG `### Security` bullet under `[Unreleased]` references #381 - ✅ Security boundary documented (`pull_request_target`, not `pull_request`; workflow + allowlist always run from base branch; workflow-addition caveat noted in `docs/cla.md`) - ✅ Bot commit identity verified (`272174644+cmeans-claude-dev[bot]`, author + committer) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: cmeans-claude-dev[bot] <272174644+cmeans-claude-dev[bot]@users.noreply.github.com>
Closes #367.
Summary
Adds gitleaks secret-scanning at two layers:
.pre-commit-config.yamlpinsgitleaks v8.30.1as a pre-commit hook. Contributors install once per clone withpre-commit install. The hook scans staged diffs and blocks commits containing secrets before they ever reach a commit object..github/workflows/gitleaks.ymlrunsgitleaks/gitleaks-action@ff98106e…(v2.3.9, SHA-pinned per the security(ci): pin third-party GitHub Actions to full commit SHAs #358 action-pinning convention) on every PR andmainpush — for contributors who didn't install the hook locally, or for paths the hook can't cover.Scope
6 files changed, +83, -5 (
git diff --shortstat origin/main)..pre-commit-config.yaml.gitleaksignore.github/workflows/gitleaks.ymlCHANGELOG.md### Securitybullet under[Unreleased]CONTRIBUTING.mdpre-commit installstep in Development setup; update "Do not commit secrets" to reflect automationpyproject.tomlpre-commit>=4.0,<5in[project.optional-dependencies].devPer-file additions sum to +83 (
43+13+12+1+13+1), matching the summary line.Deviation from #367
Issue #367 proposed
pre-commit run --all-filesin CI as the belt-and-suspenders gate. In practice, the upstream gitleaks pre-commit hook runsgitleaks git --pre-commit --staged— which scans staged diffs only. CI has nothing staged, sopre-commit run --all-filesfor the gitleaks hook is a no-op. Usinggitleaks-actiondirectly instead, which scans the PR's commit range (or full history on push).Pre-commit framework is still installed, configured, and documented — future hooks (ruff, mypy, etc.) can be added in follow-up PRs if desired without re-introducing the scaffolding.
Triage of pre-existing findings
Two findings surfaced during the pre-PR full-history scan, both in
docs/superpowers/plans/2026-04-02-zero-downtime-deployment.md(commitd7d821b):curl-auth-useronadmin:haproxy-statsTreatment: scoped allowlist via
.gitleaksignore. Line 183 of the same file explicitly instructs the operator: "Change the stats password (admin:haproxy-stats) to something from KeePass." Production HAProxy on holodeck is not running the published placeholder (user-confirmed 2026-04-23). Thedocs/superpowers/plans/tree is planned to migrate to anawareness-infrarepo, at which point the allowlist entries can be removed.Both fingerprint formats (commit-scoped from
gitleaks gitand path-scoped fromgitleaks detect --no-git) are listed in.gitleaksignoreso either scan mode suppresses cleanly.Why
gitleaks-actionand not an inlinegitleaksinvocationfetch-depth: 0and commit-range selection natively for PR vs push triggers.gitleaksignoreat repo root without additional configff98106e…) per the established convention (security(ci): pin third-party GitHub Actions to full commit SHAs #358)GITLEAKS_LICENSEis not required — the repo is under a personal account, not an org.References
workflowspermission grant made during this PR)QA
Prerequisites
pip install -e ".[dev]"— picks up the newpre-commit>=4.0,<5dev dep.gitleaksdirectly without installing the binary). The pre-commit framework pulls the gitleaks binary into its own isolated env automatically — no extra install needed for the hook.Manual tests
pip install -e ".[dev]" && pre-commit install. Expected:pre-commit installed at .git/hooks/pre-commit. (Ifcore.hooksPathis set in the local repo,pre-commit installrefuses — unset it first:git config --unset-all core.hooksPath. Fresh clones don't have this issue.)pre-commit run --all-filesin a clean working tree. Expected:Detect hardcoded secrets................Passed. (Note: because the gitleaks hook scans staged diffs,--all-filesis effectively a cold-start smoke test. The real scan happens on staged diffs.)```bash
printf 'AKIAQYLPMN5HEXAMPLEX\n' > ./scratch-secret.txt
git add scratch-secret.txt
git commit -m "test: should be rejected by gitleaks"
```
Expected: hook reports
Detect hardcoded secrets...Failed, exits non-zero, HEAD is unchanged, no commit is created. Clean up: `git restore --staged scratch-secret.txt && rm scratch-secret.txt`.Note on the test secret (why this value specifically): gitleaks v8 applies an entropy gate to the
aws-access-tokenrule — the common documentation-styleAKIAIOSFODNN7EXAMPLEandAKIA1234567890ABCDEFare too dictionary-like to clear it and will silently pass the hook.AKIAQYLPMN5HEXAMPLEX(entropy 3.58) clears the gate while remaining obviously synthetic. Please do not swap it back to a lower-entropy literal without re-verifying the hook still fires. The gitleaks CLI equivalent for checking a candidate string isecho 'CANDIDATE' | gitleaks detect --no-git --pipe --verbose(or point the Docker image at a file as this PR's preflight did).Why not
pre-commit run --from-ref HEAD --to-ref HEAD: that invocation evaluates a 0-commit range (HEAD..HEAD) and the hook reports(no files to check) Skippedregardless of what's staged — the range is empty by construction. The staged-diff check is whatpre-commit run(no args) does by default when invoked against a staged index, which is why running it viagit commit(as above) is the canonical exercise.pre-commit run --all-files. Expected: no findings ondocs/superpowers/plans/2026-04-02-zero-downtime-deployment.mddespite theadmin:haproxy-statspattern appearing four times in the file.gitleaks / scanjob appears alongside the existingtest,lint,typecheck,docker-smoke,CodeQLjobs. It runs to completion and passes green.SKIP=gitleaks git commit -m "test skip"— the commit proceeds without running the gitleaks hook. (Clean up:git reset --soft HEAD~1to undo the test commit if you made one.) This is the documented escape hatch;CONTRIBUTING.mdnotes it should not be used routinely.Acceptance
gitleaks / scanjobruff check src/ tests/,ruff format --check src/ tests/,mypy src/mcp_awareness/clean locallypre-commit run --all-filesclean locally (confirms hook works and.gitleaksignoresuppresses placeholders)### Securitybullet under[Unreleased]272174644+cmeans-claude-dev[bot], author + committer); push attributed to bot after grantingworkflowspermission to the app installation🤖 Generated with Claude Code