Add CHANGELOG by cmeans · Pull Request #3 · cmeans/mcp-awareness

cmeans · 2026-03-21T03:10:02Z

Summary

Add CHANGELOG.md following Keep a Changelog format
Backfilled from v0.1.0 (initial implementation) through current unreleased changes
Retroactive v0.1.0 tag already pushed at 2aa5040

🤖 Generated with Claude Code

Follows Keep a Changelog format. Retroactive v0.1.0 tag created at 2aa5040 to anchor the initial implementation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ount - hint now passes source/tags/entry_type/since/until filters to semantic_search instead of searching all entries (finding #3) - Test count updated to 315 in CHANGELOG and README (finding #1) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…#3) Replace f-string LIMIT value interpolation with bind parameters (%s) in get_reads, get_actions, and get_activity queries. While the previous int() cast prevented injection, parameterized queries are the standard defense-in-depth approach and eliminate a fragile pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: parameterized LIMIT in activity queries (MEDIUM #3)

1. Remove pull_policy: always — contradicts build: . (SUBSTANTIVE #1) 2. Default Ollama volume to ~/awareness-ollama-oauth to avoid concurrent access with production (SUBSTANTIVE #2) 3. Parameterize AWARENESS_PUBLIC_URL with env var (OBSERVATION #3) 4. Replace real WorkOS domain with placeholder in template (NIT #4) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- #1: Specify two-step re-init (synthetic initialize + replay original) using stored registry metadata (capabilities, client_info, protocol_version) - #2: Add session_redirects table for old→new session_id mapping with 5-min grace period. Middleware transparently rewrites request headers. - #3: Clarify sliding-window TTL — touch() extends expires_at - #4: Document HAProxy stick table gap after rotation (self-healing) - #5: Note deliberate LOGGED deviation from issue with rationale - #6: Specify lookup() filters expired sessions (no re-init for zombies) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…GELOG Round-2 fixes against PR #343 QA. Two substantive findings + one observation: - (#3, applied) docker-compose group now restricted to minor+patch update types (matching the pip policy). Ollama/Postgres/Cloudflared minor and patch bumps still batch into one PR per week; major version bumps like pgvector pg17->pg18 (a data-migration event) land as their own individual PRs where they can get the review attention they need. - (#1, resolved as a side effect) CHANGELOG rewritten to describe the actual policy: pip and docker-compose are grouped minor+patch by design; github-actions groups all update types (action major bumps are mechanical). Round-1 CHANGELOG claimed "minor+patch per ecosystem" which was inaccurate for two of the four. Also reframes the CHANGELOG opening from "add" to "replaced the existing 2-ecosystem config with a 4-ecosystem weekly rotation" since .github/dependabot.yml already existed on main with pip + github-actions. PR title + body + step-8 expected-diff stats get the same treatment in parallel (not in this commit). Refs #343. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…334) (#351) Closes [#334](#334). Also closes CodeQL alerts #1, #2, #3 (three flags of `actions/missing-workflow-permissions` in `ci.yml`). ## Summary Two workflow-hardening fixes bundled because they're the same theme (least-privilege + contributor-controlled-input discipline) and both surfaced from the same security review: ### 1. `pr-labels.yml` — cascade the `#333` env-routing pattern (closes #334) Three job steps in `pr-labels.yml` (`on-push`, `on-unlabel`, `on-label`) previously inlined contributor-visible fields as `${{ ... }}` expressions inside `run:` bodies: ```yaml # Before run: | PR=${{ github.event.pull_request.number }} REPO=${{ github.repository }} HEAD_SHA=${{ github.event.pull_request.head.sha }} # ... shell script references PR / REPO / HEAD_SHA ``` Now they route through step-level `env:` and are referenced as shell variables: ```yaml # After env: PR: ${{ github.event.pull_request.number }} REPO: ${{ github.repository }} HEAD_SHA: ${{ github.event.pull_request.head.sha }} run: | # ... shell script references "$PR" / "$REPO" / "$HEAD_SHA" ``` **Not a currently-exploitable bug.** The `on: pull_request:` trigger means fork PRs get a read-only `GITHUB_TOKEN` — the `gh pr edit --add-label` / `--remove-label` calls would be rejected from a fork PR regardless of what `PR`/`REPO`/`HEAD_SHA` contained. And all three values are typed (numeric PR, repo name validated by GHA, hex SHA) — none come from user-authored text like titles or bodies. **Why change it anyway**, per #334's rationale: - **Trigger-drift risk.** If `pr-labels.yml` ever switches to `pull_request_target` (to allow label automation on fork PRs), the same injection class that #333 closed on `pr-labels-ci.yml` reappears — and now the hardening would already be in place. - **Parameterization-drift risk.** A future maintainer adding a contributor-authored string field (label name, PR title fragment, branch name) to a `run:` block won't be prompted to route via `env:` first because the file already establishes the inline `${{ ... }}` style as "fine here." - **Cascade consistency.** `pr-labels-ci.yml` uses env-routing since #333; having the sibling workflow use a different style is a readability cost for anyone auditing the repo. ### 2. `ci.yml` — add workflow-level `permissions: contents: read` (closes CodeQL #1/#2/#3) `ci.yml` had no `permissions:` block at workflow or job level, so all three jobs (`lint`, `typecheck`, `test`) inherited whatever repo-level default `GITHUB_TOKEN` scope is configured. CodeQL flagged this three times (one per job). Fix: declare `permissions: contents: read` at the workflow level. Every job inherits read-only content access, which is sufficient for lint / typecheck / pytest / codecov. No job actually needs write access to anything. ## Audit sweep results While touching workflow files, checked all six for missing `permissions:`: | Workflow | Had `permissions:`? | This PR's action | |----------|---------------------|------------------| | `ci.yml` | No (CodeQL flagged 3x) | Added `contents: read` at workflow level | | `docker-publish.yml` | Yes, line 23 | No change | | `docker-smoke.yml` | Yes, line 40 (from #350) | No change | | `pr-labels-ci.yml` | Yes, line 35 (from #333) | No change | | `pr-labels.yml` | Yes, line 26 | No change to permissions block; env-routing changes only | | `qa-gate.yml` | Yes, line 24 | No change | `ci.yml` was the last gap. Sweep is complete. ## Scope - `.github/workflows/ci.yml` — `+7 lines` (permissions block with inline rationale comment) - `.github/workflows/pr-labels.yml` — `+10, -11` (three `run:` bodies lose two shell-assignment lines each; three `env:` blocks gain two-three entries each; explanatory comment added in the `on-unlabel` case) - `CHANGELOG.md` — `+4 lines` (new `### Security` subsection under `[Unreleased]`) No source, no tests, no migrations. ## References - Closes [#334](#334) - Closes CodeQL alerts #1, #2, #3 (`actions/missing-workflow-permissions` on `ci.yml:27/41/53`) - Cascade source: PR [#333](#333) (same pattern for `pr-labels-ci.yml`, which closed #332) - Related CodeQL alerts not addressed by this PR: #5/#6/#7/#8 (OAuth clear-text logging in `oauth.py` and `oauth_proxy.py`) — separate audit PR, coming next. #4 (socket bind in tests) — dismiss via UI. ## QA ### Prerequisites None. Pure workflow-YAML changes. ### Automated checks - `lint`, `typecheck`, `test (3.10/3.11/3.12)` — none touch YAML, should remain green. - `CodeQL (actions)` — will re-scan `ci.yml` and `pr-labels.yml` on this PR. Expected outcome: alerts #1/#2/#3 flip to "fixed" on merge; no new alerts introduced. - `docker-smoke` — not triggered (no changes under `Dockerfile` / `pyproject.toml` / `uv.lock` / `.dockerignore`). ### Manual tests 1. - [x] **Both workflow files parse.** ``` python3 -c "import yaml; [yaml.safe_load(open(f)) for f in ['.github/workflows/ci.yml', '.github/workflows/pr-labels.yml']]; print('OK')" ``` Expected: `OK`. 2. - [x] **`ci.yml` now has `permissions: contents: read`.** ``` grep -A1 '^permissions:' .github/workflows/ci.yml ``` Expected: `permissions:` header followed by ` contents: read`. 3. - [x] **No contributor-controlled inputs in `pr-labels.yml` `run:` bodies.** ``` awk '/^[[:space:]]+run: \|/,/^[[:space:]]+- name:|^[[:space:]]{2,6}[a-z-]+:$/' .github/workflows/pr-labels.yml | grep -nE '\$\{\{ *github\.(event|repository|head_ref)' || echo "(none — good)" ``` Expected: `(none — good)`. All `github.event.*` / `github.repository` references are now in `env:` blocks (and in job-level `if:` conditionals, which is safe context). 4. - [x] **All six workflows now have `permissions:`.** ``` for f in .github/workflows/*.yml; do if ! grep -q '^permissions:\|^ permissions:\|^ permissions:' "$f"; then echo "$f: MISSING permissions" fi done echo "(if no 'MISSING' lines above, sweep is complete)" ``` Expected: no `MISSING` lines. 5. - [x] **Label automation still functions on this PR.** When I push, `pr-labels.yml`'s `on-push` should reset labels to `Awaiting CI` and strip any stale QA labels. When `Dev Active` is removed, `on-unlabel` should promote to `Ready for QA` after CI passes. Empirically validated if the label transitions on this PR itself behave identically to recent merged PRs (self-test). 6. - [x] **`permissions: contents: read` doesn't break anything.** Lint / typecheck / pytest / codecov upload only need read access to `GITHUB_TOKEN` — none of them push labels, create comments, or mutate repo state. If any of the existing CI checks start failing on this PR with "resource not accessible" errors, that's a signal the permissions block is too tight (unlikely, but the empirical test is: does this PR's CI go green?). 7. - [x] **Diff review.** ``` git diff --stat origin/main ``` Expected: `.github/workflows/ci.yml` (+7), `.github/workflows/pr-labels.yml` (+10, -11), `CHANGELOG.md` (+4). Nothing else. ### Acceptance - ✅ `#334` — symmetric env-routing cascade landed in `pr-labels.yml` - ✅ CodeQL `#1`, `#2`, `#3` — `ci.yml` now has explicit `permissions:` - ☐ CodeQL re-scan confirmation — post-merge, the three alerts flip from Open → Fixed automatically on the next `Analyze (actions)` run against `main` Post-merge, also worth a look at CodeQL's /security/code-scanning dashboard to confirm Open count drops from 8 → 5 (just the four OAuth-logging + the one test-file socket-bind remaining). Co-authored-by: cmeans-claude-dev[bot] <3223881+cmeans-claude-dev[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ng flags (#5/#6/#7/#8) (#352) Closes CodeQL alerts **#5, #6, #7, #8, #9** (`py/clear-text-logging-sensitive-data` in `oauth.py` + `oauth_proxy.py`). ## Round 3 — IP-header-chain redesign The round-1 `_format_ip_header_chain` helper (see description below in §"The fix") didn't break CodeQL's dataflow — the analyzer tracked `header_chain` through the helper call and CodeQL #9 duplicated #8's flag on the new form. Round-3 commit `e31561c` **removes the helper** and instead splits the log by config path: the default case logs a module-level string literal (no taint flow), the override case logs only the count + env-var pointer (names never reach the sink). Both #8 and #9 close through code change — no UI dismissal. This is the code-improvement path per the user preference for a "reasonable wording change that would make CodeQL happy without sacrificing log usefulness" rather than inline suppression. ## The problem CodeQL flagged four log statements in the OAuth paths: | Alert | File | Line | What it logs | |-------|------|------|--------------| | #5 | `oauth.py` | 81 | `discovery_url` + exception when OIDC discovery fails | | #6 | `oauth.py` | 85 | `self.issuer` when falling back to default JWKS path | | #7 | `oauth_proxy.py` | 225 | `discovery_url` + exception in the proxy's discovery path | | #8 | `oauth_proxy.py` | 312 | IP-header-name chain config | CodeQL's `py/clear-text-logging-sensitive-data` uses dataflow from "sensitive sources" (values tracked as credential-adjacent) to log sinks. In our case, all four sites log values that are **not actually secrets today** — `self.issuer` is a config URL, header names aren't credentials — but CodeQL taints conservatively because the values flow from constructor parameters in auth-adjacent contexts. ## The fix — defensive URL redaction, not suppression Two new helpers: ### `safe_url_for_log(url)` (in `helpers.py`) Strips credential-carrying URL components before logging: - **`userinfo`** — RFC 3986 lets URLs embed `user:password@host`. An OAuth issuer URL misconfigured that way would leak credentials through every log line that mentioned it. - **`query string`** — OAuth authorization-code flow passes `code`, `state`, and PKCE verifiers here. Anything on the query string is a candidate to be secret. - **`fragment`** — OAuth implicit flow (deprecated but still occasionally in the wild) puts access tokens in the fragment. Returns `scheme://host[:port]/path` — preserves the operator-meaningful portion so an operator reading a log can still identify the endpoint, while removing anything that could leak a credential. Defensive rather than currently-exploitable: **our issuer URLs today are plain `https://provider/...` strings with no userinfo/query/fragment**. The helper ensures a future misconfiguration, provider change, or redirect can't change that. Unparseable or empty inputs return `"<redacted url>"` — a log helper should never itself become a failure source. **Applied at every URL-logging site in the OAuth paths**, not just the four CodeQL flagged — consistency: - `oauth.py:_discover_oidc_config` — 4 log statements (discovered JWKS URI, discovered userinfo endpoint, discovery-request failure, discovery-fallback warning) - `oauth_proxy.py:discover_oidc_endpoints` — 2 log statements (discovery failure, discovered-endpoints summary) ### IP-header-chain log split by config path (in `oauth_proxy.py`) The round-1 approach wrapped the IP-header-chain log in a `_format_ip_header_chain` helper to try to break CodeQL's taint-flow through function indirection. That didn't work — CodeQL #9 duplicated #8's flag on the helper-call form. Round 3 takes a different approach: **split the log statement by whether defaults are in use, and remove the taint-flow path entirely**: - **Default branch** (`ip_headers is None`): log a module-level literal `_DEFAULT_IP_HEADER_DISPLAY = "CF-Connecting-IP → X-Real-IP → ASGI client"`. Source-literal string; no contributor-controlled value flows into the sink. - **Override branch** (env-configured `ip_headers`): log only the *count* of custom entries and a pointer to `AWARENESS_OAUTH_PROXY_IP_HEADERS` so operators can inspect the env var directly. The names themselves never reach a log sink. Trade-off: operators running with a non-default header chain don't see the names in logs — they see the count + env-var name. Small loss of log convenience for the minority-override case; full operator clarity for the default case (which most deployments run). Closes CodeQL #8 and #9 cleanly without UI dismissal. ## Testing 11 new unit tests in `tests/test_helpers.py::TestSafeUrlForLog`: - `test_plain_https_preserved` — unaltered URLs round-trip - `test_strips_userinfo` — `user:password@host` → `host` - `test_strips_query_string` — `?code=secret&state=xyz` → `` (empty) - `test_strips_fragment` — `#access_token=secret` → `` (empty) - `test_strips_all_credential_carrying_parts_at_once` — composite test - `test_preserves_non_default_port` — `:8443` survives - `test_empty_string_returns_placeholder` — `""` → `"<redacted url>"` - `test_unparseable_string_returns_placeholder` — `"not a url"` → `"<redacted url>"` - `test_no_netloc_returns_placeholder` — `"/path/only"` → `"<redacted url>"` - `test_path_preserved_exactly` — deep paths with dots/dashes unchanged **All existing 120 OAuth tests (`tests/test_oauth.py` + `tests/test_oauth_proxy.py`) continue to pass** against the refactored code — no behavior regression, only log-output change. ## Scope - `src/mcp_awareness/helpers.py` — `+42` (new `safe_url_for_log` helper + docstring) - `src/mcp_awareness/oauth.py` — `+10, -4` (import + 4 log-site changes) - `src/mcp_awareness/oauth_proxy.py` — `+34, -10` (import + `_DEFAULT_IP_HEADER_DISPLAY` module constant + 3 log-site changes, including the split-by-config-path IP-header-chain log added in round 3) - `tests/test_helpers.py` — `+69` (import + 11 new test cases in `TestSafeUrlForLog` class) - `CHANGELOG.md` — `+3` (new `### Security` entry under `[Unreleased]`) No migrations. No source API surface changed. No log-message semantics changed beyond the URL-component redaction. ## References - Closes CodeQL alerts #5, #6, #7 (direct — URL now redacted before flowing to log sink) - #8 and #9 also closed — the round-3 split-by-config-path log statement removes taint-flow entirely; no UI dismissal required - Session context: #334 hardening cascade and ci.yml permissions closed alerts #1/#2/#3 via PR #351. This PR addresses Group B of yesterday's CodeQL audit. ## QA ### Prerequisites None. Pure code changes, covered by new unit tests and existing integration tests. ### Automated checks - `lint` / `typecheck` — clean locally - `test (3.10/3.11/3.12)` — `tests/test_helpers.py::TestSafeUrlForLog` adds 11 cases; existing OAuth tests continue to pass - `CodeQL (python)` — expected outcome: #5, #6, #7, #8, #9 all flip Open → Fixed on merge. - `codecov/patch` — new helper is covered by the 11 new test cases; should be 100% on touched lines - `docker-smoke` — **not triggered** (no `Dockerfile` / `pyproject.toml` / `uv.lock` / `.dockerignore` change) ### Manual tests 1. - [x] **All tests pass locally.** ``` python -m pytest tests/test_helpers.py tests/test_oauth.py tests/test_oauth_proxy.py -q ``` Expected: all pass, including the 11 new `TestSafeUrlForLog` cases. 2. - [x] **`safe_url_for_log` correctly strips each sensitive component.** ``` python -c " from mcp_awareness.helpers import safe_url_for_log as f assert f('https://user:secret@host/path') == 'https://host/path', 'userinfo' assert f('https://host/path?code=secret') == 'https://host/path', 'query' assert f('https://host/path#token=secret') == 'https://host/path', 'fragment' assert f('https://u:p@host:8443/oauth?c=1#t=2') == 'https://host:8443/oauth', 'composite' assert f('') == '<redacted url>', 'empty' assert f('not a url') == '<redacted url>', 'unparseable' print('all manual assertions pass') " ``` Expected: `all manual assertions pass`. 3. - [x] **Log-output semantics unchanged for normal URLs.** For any sensible OIDC issuer URL (`https://api.workos.com/user_management/client_X`), the log now reads identically to before — same host, same path. Verify by eyeballing an OAuth auth flow locally and confirming `Discovered JWKS URI: <url>` / `OAuth proxy: discovered endpoints — authorize=<url>, …` log lines render as before when URLs don't contain credentials. 4. - [x] **No contributor-controlled inputs are now logged.** Grep for any remaining `%s` + URL arg that isn't routed through the helper in the OAuth paths: ``` grep -nE 'logger\.(info|warning|error).*%s.*(discovery_url|self\.issuer|userinfo_endpoint|authorization_endpoint|token_endpoint|registration_endpoint|header_chain|jwks)' src/mcp_awareness/oauth.py src/mcp_awareness/oauth_proxy.py | grep -v safe_url_for_log | grep -v _format_ip_header_chain || echo "(none — good)" ``` Expected: `(none — good)`. Every URL-logging site has been routed through a redactor. 5. - [x] **CHANGELOG entry documents the defensive rationale.** ``` grep -nE '(credential|redact)' CHANGELOG.md | head -3 ``` Expected: the new `### Security` entry mentions credential-carrying components and redaction. 6. - [x] **Diff review.** ``` git diff --stat origin/main ``` Expected: 5 files changed. `CHANGELOG.md` (+3), `src/mcp_awareness/helpers.py` (+42), `src/mcp_awareness/oauth.py` (+10, -4), `src/mcp_awareness/oauth_proxy.py` (+34, -10), `tests/test_helpers.py` (+69). Nothing else. 7. - [ ] **(Post-merge) Confirm CodeQL alerts flip.** After merge, `Analyze (python)` re-runs on `main`. At https://github.com/cmeans/mcp-awareness/security/code-scanning: - Alerts #5, #6, #7 flip Open → Fixed (each was `clear-text-logging-sensitive-data` in an `oauth.py`/`oauth_proxy.py` file — the URL arg is now routed through `safe_url_for_log`, which CodeQL should see as a sanitizer). - Alerts #8 and #9 both flip Open → Fixed — the round-3 split-by-config-path log statement removes the taint-flow path entirely. ### Acceptance - ✅ CodeQL #5 — `oauth.py:82` discovery-request-failed log: URL now `safe_url_for_log(discovery_url)`. - ✅ CodeQL #6 — `oauth.py:91` default-JWKS-path log: URL now `safe_url_for_log(self.issuer)`. - ✅ CodeQL #7 — `oauth_proxy.py:229` discovery-failed log: URL now `safe_url_for_log(discovery_url)`. - ✅ CodeQL #8 — `oauth_proxy.py:330-345` header-chain log: split into default (logs literal `_DEFAULT_IP_HEADER_DISPLAY` constant — no taint flow) and override (logs count + env-var pointer — header names never reach the log sink). - ✅ CodeQL #9 — same fix. The round-1 `_format_ip_header_chain` helper triggered a duplicate flag; removing the helper and splitting by config path resolves both #8 and #9. --------- Co-authored-by: cmeans-claude-dev[bot] <3223881+cmeans-claude-dev[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes #363. ## Summary Adds static analysis to CI with two layers: 1. **Community Semgrep packs** — `p/python` and `p/owasp-top-ten` via the `semgrep/semgrep:1.161.0` container. Broad coverage for Python security anti-patterns without us re-deriving the wheel. 2. **Three project-specific custom rules** under `.semgrep/`, each targeting a regression class we've identified: - `awareness-sql-missing-owner-id` — every SQL statement that touches `entries`, `reads`, `actions`, or `embeddings` must include `owner_id`. Codifies the R1–R4 RLS harness's runtime contract in static form. - `no-credential-identifier-in-logs` — rejects `logger.*` / `ctx.*` calls that interpolate variables named `token`, `bearer`, `authorization`, `password`, `secret`, `api_key`, etc. Zero current sites; preventive. - `no-sql-string-interpolation` — bans f-strings, `.format()`, and concatenation as arguments to `cursor.execute` / `conn.execute` / `executemany`. Codifies the parameterized-query convention the project already follows. ## Scope **8 files changed, +391, -1** (`git diff --shortstat origin/main`). | File | ± | Purpose | |---|---|---| | `.github/workflows/semgrep.yml` | +60, -0 (new) | CI job running packs + `.semgrep/` rules | | `.semgrep/awareness-sql-owner-scope.yml` | +65, -0 (new) | Custom rule #1 | | `.semgrep/no-token-in-logs.yml` | +64, -0 (new) | Custom rule #2 | | `.semgrep/no-sql-string-interpolation.yml` | +66, -0 (new) | Custom rule #3 (includes `paths.exclude` for `alembic/versions/**` and `benchmarks/**` — see "Round 2 fix" below) | | `docs/security/semgrep-rules.md` | +113, -0 (new) | Per-rule documentation + exception/suppression discipline | | `CONTRIBUTING.md` | +12, -0 | § "Security scanners" gains Semgrep entry + § "Semgrep failure policy" section | | `CHANGELOG.md` | +6, -0 | `### Security` bullet under `[Unreleased]` | | `src/mcp_awareness/oauth_proxy.py` | +5, -1 | Inline `# nosemgrep` suppression on a false-positive `python-logger-credential-disclosure` finding, with a diff-visible justification comment | Per-file additions: `60+65+64+66+113+12+6+5 = 391` — matches the summary line. ## Round 2 fix (commit `51390e3`) Round 1 (`b80b9cf`) CI failed with 5 findings from the `no-sql-string-interpolation` rule — all in operational scripts outside the request path: - `alembic/versions/d5e8a3b17f92_add_embeddings_table.py`, `f1a2b3c4d5e6_add_owner_id_and_users.py`, `g2b3c4d5e6f7_add_rls_policies.py`, `j5e6f7g8h9i0_force_rls.py` — migrations emit DDL via `op.execute(f"ALTER TABLE {table} ...")` iterating a hard-coded Python-literal table list. Controlled input, zero user data path. - `benchmarks/semantic_search_bench.py:241,267` — ephemeral throwaway databases dropped via `f'DROP DATABASE IF EXISTS "{db_name}"'`. Developer-supplied name in a local benchmark script, not user input. Root cause of the CI miss: pre-landing scan ran on `src/` + `tests/` only (matching how the other linters are scoped) instead of the whole repo. Fix: added `paths.exclude` to the rule for `alembic/versions/**` and `benchmarks/**`. Documented in the rule's header comment. A contributor introducing user-input SQL into those directories would bypass the rule — but that would stand out in review, and the cost of false positives on every migration PR isn't worth that marginal coverage. Round 2 scan (`semgrep --config p/python --config p/owasp-top-ten --config .semgrep/ --error --oss-only` over the **whole repo**): exit 0, no findings. ## Rule selection rationale (what's dropped from the issue's proposal) The issue proposed 3-4 custom rules as candidates. A pre-implementation audit of the codebase changed the viable set: **Kept:** - SQL owner-scoping guard (issue's rule 1) — the audit confirmed 52/66 SQL files already conform; the 14 exceptions are `users` + `sessions` tables with their own keys, not owner-scoped. Strongly expressible in Semgrep's generic mode. **Dropped:** - *Tool handlers must `await ctx.info(...)` for audit visibility* (issue's rule 2). Audit: **0 of 32** tool handlers in `tools.py` currently do this. The convention is aspirational, not established. A rule that fires 0 times today and requires retrofitting 32 handlers before it's useful is scope creep; the retrofit is its own design decision. - *New `postgres_store` function with `owner_id` must be in the RLS harness coverage list* (issue's rule 4, already flagged in the issue as *"may defer to a lint check"*). Audit confirmed: `test_rls.py` uses a **per-method** pattern (42 hardcoded tests), not parametrization — so Semgrep can't express the rule at all (needs cross-file function→test mapping). Defer to a standalone check if ever needed; out of scope for Semgrep. **Added (not in issue, justified by audit):** - Credential-identifier-in-logs prevention (pattern-based, not taint mode). - SQL-string-interpolation ban. Net: 3 custom rules that each catch a real regression class, zero retrofit burden, all expressible in Semgrep. Issue's "3-4 custom rules" target met. ## Pre-landing preflight Ran the exact CI command locally: \`semgrep --config p/python --config p/owasp-top-ten --config .semgrep/ --error src/ tests/\`. Result: **one finding before suppression; clean after.** The finding was Semgrep's `python-logger-credential-disclosure` firing on `oauth_proxy.py` line 252: > `logger.info("OAuth proxy: discovered endpoints — authorize=%s, token=%s, register=%s", safe_url_for_log(...), ...)` False positive: the format string contains the word *"token"* as a label for the OIDC `token_endpoint` URL, not a credential value; values pass through `safe_url_for_log()` which redacts embedded credentials. Suppressed with a `# nosemgrep` comment on the `logger.info(` line plus a multi-line justification comment directly above. `CONTRIBUTING.md` documents the suppression-with-justification discipline. All three custom rules were tested locally against both positive fixtures (synthetic bad examples — each rule fires) and negative fixtures (realistic benign code — no rule fires). ## References - Closes: #363 - Related: #358 (SHA-pinning — the Semgrep container is tag-pinned to `1.161.0`, which matches the pattern for third-party images with explicit Dependabot-visible version numbers); #367/#369/#370 (sibling security-tooling PRs just landed as #380, #382, #385); #359 / R1–R4 RLS harness (the SQL owner-scope rule codifies its contract statically) - Semgrep docs: <https://semgrep.dev/docs> - `docs/security/semgrep-rules.md` — per-rule documentation ## QA ### Prerequisites - `pip install semgrep` if verifying locally (CI runs in the `semgrep/semgrep:1.161.0` container, no local install needed) ### Manual tests 1. - [x] **CI `semgrep / scan` job runs on this PR.** In the PR's Checks tab, a new `semgrep / scan` job appears alongside existing jobs. Runs to completion and passes (no findings after the in-tree suppression). 2. - [x] **Local repro matches CI.** `semgrep --config p/python --config p/owasp-top-ten --config .semgrep/ --error --oss-only src/ tests/`. Expected: exit code 0, no findings. 3. - [x] **Custom rule: SQL owner-scoping — positive case fires.** Write a throwaway `src/mcp_awareness/sql/test_bad.sql`: \`\`\`sql /* name: test_bad */ /* mode: literal */ SELECT id FROM entries WHERE source = %s; \`\`\` Run \`semgrep --config .semgrep/ src/mcp_awareness/sql/test_bad.sql\`. Expected: `awareness-sql-missing-owner-id` fires with the blocking severity. Remove the file when done. 4. - [x] **Custom rule: SQL owner-scoping — negative case clean.** Replace the SELECT with `SELECT id FROM entries WHERE owner_id = %s AND source = %s;` and rerun. Expected: no finding. 5. - [x] **Custom rule: credential-identifier-in-logs — positive case fires.** In a throwaway `/tmp/bad_log.py`: \`\`\`python import logging logger = logging.getLogger(__name__) def bad(token): logger.info(f"got {token}") \`\`\` Run \`semgrep --config .semgrep/no-token-in-logs.yml /tmp/bad_log.py\`. Expected: `no-credential-identifier-in-logs` fires. 6. - [x] **Custom rule: SQL-string-interpolation — positive case fires.** In a throwaway `/tmp/bad_sql.py`: \`\`\`python def bad(cur, x): cur.execute(f"SELECT * FROM t WHERE id = '{x}'") \`\`\` Run \`semgrep --config .semgrep/no-sql-string-interpolation.yml /tmp/bad_sql.py\`. Expected: `no-sql-string-interpolation` fires. 7. - [x] **Custom rule docs render on the PR diff.** Review `docs/security/semgrep-rules.md` — each of the three rules has a *what/why/suppression* section; the "Running locally" section's command line matches what CI runs. ### Acceptance - ☐ CI green across all matrix entries including the new `semgrep / scan` job - ✅ `ruff check src/ tests/`, `ruff format --check src/ tests/`, `mypy src/mcp_awareness/` clean locally - ✅ Full `semgrep --config p/python --config p/owasp-top-ten --config .semgrep/ --error` clean locally (one false positive suppressed with diff-visible justification) - ✅ Each custom rule tested positively (synthetic bad fixture fires) AND negatively (realistic benign code doesn't fire) - ✅ Single-concern: Semgrep CI + custom rules + docs only - ✅ Issue's 3-4 target met with 3 rules that each catch a real regression class; dropped rules explicitly justified - ✅ Failure policy + suppression discipline documented in CONTRIBUTING.md and docs/security/semgrep-rules.md - ✅ CHANGELOG `### Security` bullet under `[Unreleased]` references #363 - ✅ Bot commit identity verified (`272174644+cmeans-claude-dev[bot]`, author + committer) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: cmeans-claude-dev[bot] <272174644+cmeans-claude-dev[bot]@users.noreply.github.com>

…r it Addresses QA findings #2 and #3 on PR #392: - Finding #2 (Keep-a-Changelog structure): the cherry-picked content renamed `### Security` to `### Changed` under `## [Unreleased]`, but `### Changed` already existed below `### Added` for the CONTRIBUTING.md section expansions. That left two `### Changed` headings under a single version — Keep-a-Changelog expects each category once per release. - Finding #3 (entries demoted): the rename pushed nine prior entries (Semgrep, trivy, pip-audit, gitleaks, four RLS harness items, third- party action pinning) out of `### Security` even though each of them closed a security-tracking issue. Those entries belong under Security. Fix: revert the heading to `### Security`. The new pip-audit-scope entry stays under that same heading — scoping a CVE scanner correctly IS a security-adjacent fix, so it belongs with the cluster. `## [Unreleased]` now has Fixed / Security / Added / Changed, each exactly once.

Add CHANGELOG with backfilled history from v0.1.0 through unreleased

2244d8c

Follows Keep a Changelog format. Retroactive v0.1.0 tag created at 2aa5040 to anchor the initial implementation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cmeans merged commit d67e931 into main Mar 21, 2026
5 checks passed

cmeans deleted the add-changelog branch March 21, 2026 22:33

This was referenced Mar 23, 2026

Add read + action tracking #33

Merged

Add INTENTION entry type with time-based triggers #35

Merged

Add semantic search via pgvector + Ollama #40

Merged

Fix JSON content, connection resilience, RAG Phase 2 #43

Merged

cmeans mentioned this pull request Mar 31, 2026

fix: parameterized LIMIT in activity queries (MEDIUM #3) #119

Merged

cmeans-claude-dev Bot added a commit that referenced this pull request Mar 31, 2026

Merge pull request #119 from cmeans/fix/parameterized-limit

8e0b020

fix: parameterized LIMIT in activity queries (MEDIUM #3)

This was referenced Apr 1, 2026

fix: reduce pattern matching false positives #143

Open

infra: add holodeck provisioning and deploy scripts #159

Merged

cmeans mentioned this pull request Apr 7, 2026

docs: add MCP session persistence design spec (#161) #168

Merged

This was referenced Apr 8, 2026

feat: Postgres-backed MCP session persistence (#161) #169

Merged

infra: add QA Docker Compose stack with Postgres + Ollama #229

Merged

cmeans mentioned this pull request Apr 10, 2026

docs: hybrid retrieval + multilingual design plan #241

Merged

9 tasks

cmeans-claude-dev Bot mentioned this pull request Apr 20, 2026

chore(workflows): symmetric env-routing hardening for pr-labels.yml (defense-in-depth) #334

Closed

cmeans mentioned this pull request Apr 21, 2026

docs: add docs/backup.md self-hoster backup + restore guide (#310) #337

Merged

cmeans-claude-dev Bot mentioned this pull request Apr 21, 2026

chore(docker-compose): parameterize port mapping as ${AWARENESS_PORT:-8420}:8420 in docker-compose.yaml #344

Closed

This was referenced Apr 21, 2026

chore: expand .github/dependabot.yml to 4 ecosystems with grouped weekly updates #343

Merged

ci: cascade env-routing to pr-labels.yml + add permissions to ci.yml (#334) #351

Merged

cmeans-claude-dev Bot mentioned this pull request Apr 21, 2026

fix(oauth): redact URLs in log output — close CodeQL clear-text-logging flags (#5/#6/#7/#8) #352

Merged

This was referenced Apr 22, 2026

test(rls): R1 — extended cross-tenant leak coverage (30 methods) #372

Merged

test(rls): R2 — background-thread + connection-pool edge-case coverage #377

Merged

test(rls): R4 — random owner/tag inputs for cross-tenant leak tests (hypothesis) #379

Merged

cmeans-claude-dev Bot mentioned this pull request Apr 23, 2026

chore(sec): Semgrep CI with project-specific custom rules #386

Merged

cmeans mentioned this pull request Apr 24, 2026

chore(sec): scope pip-audit to project-effective deps, not the whole venv #392

Merged

14 tasks

This was referenced Apr 27, 2026

chore(deps,ci): port dependabot-pr-hygiene playbook to mcp-awareness #400

Merged

chore(deps): bump the github-actions group across 1 directory with 2 updates #397

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add CHANGELOG#3

Add CHANGELOG#3
cmeans merged 1 commit into
mainfrom
add-changelog

cmeans commented Mar 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

cmeans commented Mar 21, 2026

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant