Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 122 additions & 0 deletions .github/workflows/lean-proof.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
name: lean-proof

# Type-checks the Lean 4 / Mathlib formalization of DBSP at
# tools/lean4/Lean4/DbspChainRule.lean. Per the math-proofs
# honest assessment (docs/research/2026-05-03-math-proofs-
# honest-assessment.md) the Lean proof is A-grade today
# hand-run (chain_rule_proposition_3_2 + Dop_LTI_commute) but
# was MISSING CI gating — peer reviewers expect "runs in CI"
# as the line for an A-grade artifact. This workflow closes
# that gap.
#
# Path-filtered to only run on changes that affect the Lean
# proof (the Lean source, the lake-manifest, the toolchain
# version pin, or this workflow itself). Runs out-of-band
# from gate.yml because Mathlib's lake cache is multi-GB and
# Lean toolchain setup is heavier than the existing
# build-and-test surface.
#
# Composes with:
# - tools/lean4/Lean4/DbspChainRule.lean (the artifact)
# - docs/research/verification-registry.md (the registry
# rows for chain_rule_proposition_3_2 + Dop_LTI_commute)
# - docs/research/2026-05-03-math-proofs-honest-assessment.md
# (this workflow closes the A-with-CI upgrade for A1+A2)
# - .claude/skills/verification-drift-auditor/SKILL.md
# (every CI fire is implicit drift-check input)
#
# Safe-pattern compliance (per FACTORY-HYGIENE row #43):
# - SHA-pinned actions/checkout + actions/cache
# - Explicit minimum permissions: contents: read
# - No user-authored context; purely repo-file reads
# - Concurrency group + pinned runs-on
# - Path-filter narrows the trigger surface

on:
pull_request:
paths:
- "tools/lean4/**"
- ".github/workflows/lean-proof.yml"
push:
branches: [main]
paths:
- "tools/lean4/**"
- ".github/workflows/lean-proof.yml"
workflow_dispatch: {}

permissions:
contents: read

Comment thread
AceHack marked this conversation as resolved.
# Workflow-level env: exposes GITHUB_TOKEN to every step so mise's
# aqua: backend (used by tools/setup/install.sh during toolchain
# install) can authenticate its GitHub API calls. Without a token,
# mise hits the unauthenticated rate limit (60 requests per hour
# per IP, shared across all GitHub Actions runners) and fails to
# fetch release tags with a 403. With the token, the limit is
# 5000/hr per token. Same pattern as gate.yml (see workflow-level
# env: block there); inherits the workflow's permissions:
# contents: read — no write escalation.
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

concurrency:
group: lean-proof-${{ github.ref }}
cancel-in-progress: false

jobs:
type-check:
name: type-check Lean DbspChainRule
runs-on: ubuntu-24.04
# Mathlib's lake cache get + the actual type-check is fast
# (~2-5 min) on warm cache; cold cache (cache miss on first
# run after lean-toolchain bump) can take 15-25 min.
timeout-minutes: 30

steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

# Mathlib's lake build cache. Keyed on lean-toolchain (which
# binds the Mathlib version via lakefile.toml + lake-manifest)
# so the cache busts cleanly when either changes. Without this
# cache, every PR fetches + builds Mathlib from scratch which
# is hours of CI time.
- name: Cache lake (Mathlib oleans)
uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
with:
path: tools/lean4/.lake
key: lake-${{ runner.os }}-${{ hashFiles('tools/lean4/lean-toolchain', 'tools/lean4/lakefile.toml', 'tools/lean4/lake-manifest.json') }}
# Restore-keys allow falling back to a partial cache on
# cache-miss; lake exe cache get fills the gap. Faster than
# cold-build of Mathlib.
restore-keys: |
lake-${{ runner.os }}-

# elan toolchain. Same key shape as gate.yml's elan cache so
# the two workflows share cache space efficiently.
- name: Cache elan (Lean 4 toolchain)
uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
with:
path: ~/.elan
key: elan-${{ runner.os }}-${{ runner.arch }}-${{ hashFiles('tools/setup/common/elan.sh') }}

- name: Install toolchain via three-way-parity script (GOVERNANCE §24)
# Single source of truth for toolchain state. Installs elan
# which provides lake + lean. Same pattern as gate.yml.
run: ./tools/setup/install.sh
Comment thread
AceHack marked this conversation as resolved.

- name: Fetch Mathlib pre-built oleans
# Mathlib's CLI downloads pre-compiled .olean files from S3;
# this skips the multi-hour Mathlib build. After cache get,
# lake env lean type-checks the proof in ~30s.
working-directory: tools/lean4
run: lake exe cache get

- name: Type-check DbspChainRule.lean
# Silence on success = pass. Any error / warning fails the
# build per Lean's exit code. Per the verification-registry
# rows for chain_rule_proposition_3_2 + Dop_LTI_commute:
# this is the gate that converts both rows from A-grade-
# hand-run to A-grade-with-CI.
working-directory: tools/lean4
run: lake env lean Lean4/DbspChainRule.lean
51 changes: 29 additions & 22 deletions docs/research/2026-05-03-math-proofs-honest-assessment.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,14 +154,21 @@ have A-grade and C-grade artifacts side-by-side.
Pages or a release artifact; track kill-rate as a metric.
- **Effort:** ~1 day.

### B4 — Semgrep pattern rules (12 rules)
### B4 — Semgrep pattern rules (14+ rules) → **A-grade (verified post-assessment)**

- **Artifact:** `.semgrep.yml`
- **Status:** runs externally per `TECH-RADAR.md`; **not in
CI**. Catches pattern-level anti-patterns that would
otherwise need code review to spot.
- **Path to A:** add CI job.
- **Effort:** ~0.5 day.
- **Status correction (2026-05-03 verify-then-claim sweep):**
Semgrep is ALREADY in CI via `.github/workflows/gate.yml`
job `lint (semgrep)` (line ~422 in gate.yml). The original
B4 grading was incorrect — verified by grep against
`.github/workflows/gate.yml`. The job runs every rule via
`semgrep --config .semgrep.yml --error --metrics=off`
with hard-fail on any match.
- **Effective grade: A.** No work owed; the assessment's
outstanding-work matrix originally listed "Semgrep CI"
as a P1 item but that item is already complete. This
illustrates the verify-then-claim discipline (check actual
CI workflow before grading).

---

Expand Down Expand Up @@ -276,22 +283,22 @@ to publishable form:

---

## Outstanding-work matrix (post-this-assessment)

| Work item | Grade upgrade | Effort | Priority |
|---|---|---|---|
| Lean lake-build CI job | A1, A2 → A-with-CI | 1 day | P0 |
| Stryker CI + kill-rate publish | B3 → A | 1 day | P0 |
| Semgrep CI | B4 → A | 0.5 day | P1 |
| 4 deferred TLA+ specs into CI | B1 → A | 2 days | P1 |
| Alloy CI hook | B2 → A | 0.5 day | P1 |
| `.cfg` for 4 C1 specs | C1 → B | 2 days | P2 |
| TLA+ spec for `Recursive.fs` LFP | C2 → B | 2-3 days | P2 |
| TLA+ spec for WDC protocol | C2 → B | 3-5 days | P1 |
| 15 FsCheck properties (C3) | C3 → B | 3 days | P2 |
| `chain_rule_poly` (3-group) | C4 → A | research | P3 |
| Registry rows for A4 specs | external-fidelity claim | 1 day | P0 |
| Peer-review email draft | publishability | 2 hours | P0 |
## Outstanding-work matrix (post-this-assessment, with status updates)

| Work item | Grade upgrade | Effort | Priority | Status |
|---|---|---|---|---|
| Lean lake-build CI job | A1, A2 → A-with-CI | 1 day | P0 | **In flight (this PR ships `.github/workflows/lean-proof.yml`)** |
| Stryker CI + kill-rate publish | B3 → A | 1 day | P0 | open |
| Semgrep CI | B4 → A | 0.5 day | P1 | **Already done (verify-then-claim correction; see B4 section)** |
| 4 deferred TLA+ specs into CI | B1 → A | 2 days | P1 | open |
| Alloy CI hook | B2 → A | 0.5 day | P1 | open |
| `.cfg` for 4 C1 specs | C1 → B | 2 days | P2 | open |
| TLA+ spec for `Recursive.fs` LFP | C2 → B | 2-3 days | P2 | open |
| TLA+ spec for WDC protocol | C2 → B | 3-5 days | P1 | open |
| 15 FsCheck properties (C3) | C3 → B | 3 days | P2 | open |
| `chain_rule_poly` (3-group) | C4 → A | research | P3 | open |
| Registry rows for A4 specs | external-fidelity claim | 1 day | P0 | **Done (PR #1393, 2026-05-03)** |
| Peer-review email draft | publishability | 2 hours | P0 | **Done (PR #1387, 2026-05-03)** |

---

Expand Down
Loading