Integrate code-review into pr-review pipeline (firewall architecture)#35105
Integrate code-review into pr-review pipeline (firewall architecture)#35105PureWeen wants to merge 4 commits into
Conversation
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 35105Or
iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 35105" |
Wire the code-review skill into the pr-review orchestrator with a clear
separation between domain knowledge and PR-specific conclusions:
Pre-Flight Part B: Code-review runs as independent sub-agent
- Independence-first: receives only PR number, no Part A context
- Outputs to pre-flight/code-review.md + content.md summary
- SKIPPED fallback if sub-agent fails/times out
- Findings go to Report ONLY (firewalled from Try-Fix)
Try-Fix: Domain knowledge, not conclusions
- Models load review-rules.md for MAUI patterns (domain knowledge)
- Models do NOT receive code-review verdict/findings (no anchoring)
- Each model independently applies domain rules to their fix approach
Report: Advisory convergence analysis
- Priority-ordered decision table (first match wins)
- Code-review is advisory signal, NOT a hard gate
- Convergence analysis: did try-fix models independently flag
same issues as code-review? Corroborated = high confidence
- SKIPPED verdict handled gracefully
Architecture rationale: https://gist.github.com/PureWeen/fbd842390b471c2230018f47a57eb97a
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
f69ed7d to
1a99858
Compare
🔍 Skill Validation Results✅ Static Checks PassedSkills checked: 15 | Agents checked: 3 Full validator output❌ LLM Evaluation Failed0/1 skill(s) passed validation
❌ try-fix: Eval scenario 'Regression: agent must not claim success without running the test command' prompt mentions target name 'try-fix' (skill or agent) — remove the target name from the prompt to avoid biasing baseline runs. Eval scenario 'Regression: agent uses prescribed restore script, not raw git commands' prompt mentions target name 'try-fix' (skill or agent) — remove the target name from the prompt to avoid biasing baseline runs. Eval scenario 'Edge case: exhausted iterations produces documented Fail, not silence or Pass' prompt mentions target name 'try-fix' (skill or agent) — remove the target name from the prompt to avoid biasing baseline runs. |
try-fix/SKILL.md had no reference to the MAUI domain knowledge in review-rules.md. Now Step 3 instructs models to load and apply relevant rules when analyzing code and designing fixes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔍 Multi-Model Code Review — PR #35105PR: Integrate code-review into pr-review pipeline (firewall architecture) CI Status
Re-Review (after commit
|
| # | Finding | Status | Notes |
|---|---|---|---|
| 1 | Advisory/hard-gate contradiction | ✅ FIXED | Row 5 adds APPROVE path for minor findings. Advisory language and table are now consistent. |
| 2 | Parallel execution race | SKILL.md has correct sequencing, but pr-preflight.md template still says mode="sync" (see New Finding A) |
|
| 3 | Firewall open at try-fix callee | ✅ FIXED | Constraint added directly to try-fix hints input documentation |
| 4 | Convergence analysis under-specified | ✅ FIXED | Corroboration defined, cross-row population required |
| 5 | Reference inconsistency | ✅ FIXED | Both files now mention review-rules.md and .instructions.md |
| 6 | Template overlap | ✅ FIXED | Split into Convergence Analysis + Code Review Impact sections |
| 7 | No review-rules.md fallback | ➖ Not addressed (minor) | Add: "If file unavailable, proceed with general MAUI knowledge" |
| 8 | Row 2 gap | ✅ FIXED | "Even when Rows 1-2 fire" instruction added |
Score: 6/8 fully fixed, 1 partially fixed, 1 deferred minor
New Findings
🟡 MODERATE
A. mode="sync" template contradicts mode="background" instruction (1/3 reviewers — clear contradiction once identified)
📁 pr-preflight.md:55 vs pr-review/SKILL.md:81
SKILL.md says: "launch Part B as mode: "background""
But the code template in pr-preflight.md shows: mode="sync"
An agent copying the template will block on code-review before starting try-fix — negating the parallelism fix entirely.
Fix: Change mode="sync" → mode="background" in the pr-preflight.md code template.
🟢 MINOR
B. "APPROVE with notes" not in output template (1/3 reviewers)
Row 5 specifies ✅ APPROVE with notes but the output template header only lists {APPROVE/REQUEST CHANGES}. Add APPROVE with notes as a valid variant so agents produce consistent output.
C. Rows 4/5 "significant vs minor" doesn't map to code-review severity labels (1/3 reviewers)
Code-review uses ❌ Error /
Recommendation
- Must fix: Change
mode="sync"→mode="background"inpr-preflight.mdStep 7 template (Finding A) - Nice to have: Add "APPROVE with notes" to output template (Finding B)
- Nice to have: Map significant/minor to code-review severity labels (Finding C)
- Nice to have: Add review-rules.md fallback sentence (Finding 7)
📜 Review History
v1: 4 moderate + 4 minor findings.
v2 (after a627409): 6/8 fixed, 1 partial, 1 new moderate (mode sync/background contradiction).
1. Advisory/hard-gate: Add Row 5 allowing APPROVE with notes for minor uncorroborated code-review findings (style, naming). Significant findings (safety, correctness, leaks) still block. Add corroboration definition and cross-row population requirement. 2. Parallel sequencing: Replace vague 'can run in parallel' with explicit constraint: Part A → (Part B background ∥ Phase 2 sync) → await Part B → Report. Individual try-fix attempts remain sequential. 3. Firewall guard at callee: Add constraint to try-fix hints input documentation — hints must not contain code-review verdict/findings when invoked from pr-review. 4. Reference consistency: preflight firewall note now mentions both review-rules.md and .instructions.md (matching SKILL.md). CI: skill-validation 'LLM evaluation failed' is pre-existing (same pattern on main). All individual jobs passed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- pr-preflight.md: mode='sync' → mode='background' to match SKILL.md parallelism instruction (Part B runs in background while Phase 2 runs) - pr-report.md: Add 'APPROVE with notes' to output template header - pr-report.md: Ground Rows 4/5 in code-review severity labels (❌ Error = significant,⚠️ style/naming + 💡 = minor) - try-fix/SKILL.md: Add fallback for missing review-rules.md Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Note
Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!
Description
Integrates the
code-reviewskill into thepr-revieworchestrator with a firewall architecture that separates domain knowledge from PR-specific conclusions.Architecture: Full analysis gist
Key principle: Try-fix models get domain knowledge (review rules) but NOT code-review's PR-specific conclusions (verdict, error findings). This preserves multi-model independence.
Changes
pr-preflight.mdpr-report.mdpr-review/SKILL.mdcode-review/SKILL.mdRelation to other PRs