Integrate code-review into pr-review pipeline (firewall architecture) by PureWeen · Pull Request #35105 · dotnet/maui

PureWeen · 2026-04-23T14:17:46Z

Note

Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!

Description

Integrates the code-review skill into the pr-review orchestrator with a firewall architecture that separates domain knowledge from PR-specific conclusions.

Architecture: Full analysis gist

Key principle: Try-fix models get domain knowledge (review rules) but NOT code-review's PR-specific conclusions (verdict, error findings). This preserves multi-model independence.

Changes

File	What
`pr-preflight.md`	Add Part B: code-review sub-agent with independence-first, SKIPPED fallback, firewall
`pr-report.md`	Priority-ordered decision table (advisory, not hard gate), convergence analysis
`pr-review/SKILL.md`	Wire code-review into flow, add review-rules.md loading for try-fix, firewall enforcement
`code-review/SKILL.md`	Update description

Relation to other PRs

PR Integrate code-review skill into pr-review Pre-Flight and Try-Fix #34994: Similar goal but passes conclusions as hints (anchoring) + hard gate. This PR takes the firewall approach.
PR History-trained agentic files + expert reviewer (proposal) #35062: Upgrades the review engine. This PR wires the pipeline plumbing that History-trained agentic files + expert reviewer (proposal) #35062 will plug into.

github-actions · 2026-04-23T14:17:57Z

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 35105

Or

Run remotely in PowerShell:

iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 35105"

Wire the code-review skill into the pr-review orchestrator with a clear separation between domain knowledge and PR-specific conclusions: Pre-Flight Part B: Code-review runs as independent sub-agent - Independence-first: receives only PR number, no Part A context - Outputs to pre-flight/code-review.md + content.md summary - SKIPPED fallback if sub-agent fails/times out - Findings go to Report ONLY (firewalled from Try-Fix) Try-Fix: Domain knowledge, not conclusions - Models load review-rules.md for MAUI patterns (domain knowledge) - Models do NOT receive code-review verdict/findings (no anchoring) - Each model independently applies domain rules to their fix approach Report: Advisory convergence analysis - Priority-ordered decision table (first match wins) - Code-review is advisory signal, NOT a hard gate - Convergence analysis: did try-fix models independently flag same issues as code-review? Corroborated = high confidence - SKIPPED verdict handled gracefully Architecture rationale: https://gist.github.com/PureWeen/fbd842390b471c2230018f47a57eb97a Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-04-23T14:21:24Z

🔍 Skill Validation Results

✅ Static Checks Passed

Skills checked: 15 | Agents checked: 3

Full validator output

Found 2 skill(s)
[pr-review] 📊 pr-review: 3,186 BPE tokens [chars/4: 3,090] (standard ~), 22 sections, 7 code blocks
[pr-review]    ⚠  Skill is 3,186 BPE tokens (chars/4 estimate: 3,090) — approaching "comprehensive" range where gains diminish.
[try-fix] 📊 try-fix: 4,003 BPE tokens [chars/4: 4,195] (standard ~), 37 sections, 12 code blocks
[try-fix]    ⚠  Skill is 4,003 BPE tokens (chars/4 estimate: 4,195) — approaching "comprehensive" range where gains diminish.
✅ All checks passed (2 skill(s))
Found 3 agent(s)
Validated 3 agent(s)

✅ All checks passed (3 agent(s))

❌ LLM Evaluation Failed

0/1 skill(s) passed validation

Skill	Scenario	Baseline	Skilled	Verdict

❌ try-fix: Eval scenario 'Regression: agent must not claim success without running the test command' prompt mentions target name 'try-fix' (skill or agent) — remove the target name from the prompt to avoid biasing baseline runs. Eval scenario 'Regression: agent uses prescribed restore script, not raw git commands' prompt mentions target name 'try-fix' (skill or agent) — remove the target name from the prompt to avoid biasing baseline runs. Eval scenario 'Edge case: exhausted iterations produces documented Fail, not silence or Pass' prompt mentions target name 'try-fix' (skill or agent) — remove the target name from the prompt to avoid biasing baseline runs.

🔍 Full results and investigation steps

try-fix/SKILL.md had no reference to the MAUI domain knowledge in review-rules.md. Now Step 3 instructs models to load and apply relevant rules when analyzing code and designing fixes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-04-23T15:26:40Z

🔍 Multi-Model Code Review — PR #35105

PR: Integrate code-review into pr-review pipeline (firewall architecture)
Author: @PureWeen | Base: main ← feature/code-review-integration
Files changed: 5 (markdown instruction/skill files)
Review method: 3 independent reviewers + consensus analysis

CI Status

Check	Status
`license/cla`	✅ Pass
`Static validation`	✅ Pass
`evaluate (try-fix)`	✅ Pass
`skill-validation` (LLM evaluation)	⚠️ Pre-existing (same `skipped` pattern on main)
`maui-pr`	⏭️ Skipping

Re-Review (after commit `a627409`)

Previous Findings Status

#	Finding	Status	Notes
1	Advisory/hard-gate contradiction	✅ FIXED	Row 5 adds APPROVE path for minor findings. Advisory language and table are now consistent.
2	Parallel execution race	⚠️ PARTIALLY FIXED	SKILL.md has correct sequencing, but `pr-preflight.md` template still says `mode="sync"` (see New Finding A)
3	Firewall open at try-fix callee	✅ FIXED	Constraint added directly to try-fix hints input documentation
4	Convergence analysis under-specified	✅ FIXED	Corroboration defined, cross-row population required
5	Reference inconsistency	✅ FIXED	Both files now mention `review-rules.md` and `.instructions.md`
6	Template overlap	✅ FIXED	Split into Convergence Analysis + Code Review Impact sections
7	No review-rules.md fallback	➖ Not addressed (minor)	Add: "If file unavailable, proceed with general MAUI knowledge"
8	Row 2 gap	✅ FIXED	"Even when Rows 1-2 fire" instruction added

Score: 6/8 fully fixed, 1 partially fixed, 1 deferred minor

New Findings

🟡 MODERATE

A. mode="sync" template contradicts mode="background" instruction (1/3 reviewers — clear contradiction once identified)

📁 pr-preflight.md:55 vs pr-review/SKILL.md:81

SKILL.md says: "launch Part B as mode: "background""
But the code template in pr-preflight.md shows: mode="sync"

An agent copying the template will block on code-review before starting try-fix — negating the parallelism fix entirely.

Fix: Change mode="sync" → mode="background" in the pr-preflight.md code template.

🟢 MINOR

B. "APPROVE with notes" not in output template (1/3 reviewers)

Row 5 specifies ✅ APPROVE with notes but the output template header only lists {APPROVE/REQUEST CHANGES}. Add APPROVE with notes as a valid variant so agents produce consistent output.

C. Rows 4/5 "significant vs minor" doesn't map to code-review severity labels (1/3 reviewers)

Code-review uses ❌ Error / ⚠️ Warning / 💡 Suggestion. Rows 4/5 use "significant/minor" with examples. Consider grounding: "significant = any ❌ Error; minor = ⚠️ Warnings unrelated to safety/correctness + 💡 Suggestions".

Recommendation

⚠️ Request Changes — one fix needed:

Must fix: Change mode="sync" → mode="background" in pr-preflight.md Step 7 template (Finding A)
Nice to have: Add "APPROVE with notes" to output template (Finding B)
Nice to have: Map significant/minor to code-review severity labels (Finding C)
Nice to have: Add review-rules.md fallback sentence (Finding 7)

📜 Review History

v1: 4 moderate + 4 minor findings. ⚠️ REQUEST CHANGES.
v2 (after a627409): 6/8 fixed, 1 partial, 1 new moderate (mode sync/background contradiction). ⚠️ REQUEST CHANGES — one fix remaining.

1. Advisory/hard-gate: Add Row 5 allowing APPROVE with notes for minor uncorroborated code-review findings (style, naming). Significant findings (safety, correctness, leaks) still block. Add corroboration definition and cross-row population requirement. 2. Parallel sequencing: Replace vague 'can run in parallel' with explicit constraint: Part A → (Part B background ∥ Phase 2 sync) → await Part B → Report. Individual try-fix attempts remain sequential. 3. Firewall guard at callee: Add constraint to try-fix hints input documentation — hints must not contain code-review verdict/findings when invoked from pr-review. 4. Reference consistency: preflight firewall note now mentions both review-rules.md and .instructions.md (matching SKILL.md). CI: skill-validation 'LLM evaluation failed' is pre-existing (same pattern on main). All individual jobs passed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- pr-preflight.md: mode='sync' → mode='background' to match SKILL.md parallelism instruction (Part B runs in background while Phase 2 runs) - pr-report.md: Add 'APPROVE with notes' to output template header - pr-report.md: Ground Rows 4/5 in code-review severity labels (❌ Error = significant, ⚠️ style/naming + 💡 = minor) - try-fix/SKILL.md: Add fallback for missing review-rules.md Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen force-pushed the feature/code-review-integration branch from f69ed7d to 1a99858 Compare April 23, 2026 14:20

github-actions Bot and others added 2 commits April 23, 2026 10:31

This was referenced Apr 24, 2026

[repo-status] Daily Repo Status - April 24, 2026 🌟 #35120

Closed

[repo-status] 🌟 Daily Repo Status - April 25, 2026 #35135

Closed

PureWeen closed this Apr 28, 2026

PureWeen mentioned this pull request Apr 28, 2026

History-trained agentic files + expert reviewer #35198

Merged

github-actions Bot locked and limited conversation to collaborators May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate code-review into pr-review pipeline (firewall architecture)#35105

Integrate code-review into pr-review pipeline (firewall architecture)#35105
PureWeen wants to merge 4 commits into
mainfrom
feature/code-review-integration

PureWeen commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

PureWeen commented Apr 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PureWeen commented Apr 23, 2026

Description

Changes

Relation to other PRs

Uh oh!

github-actions Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Skill Validation Results

✅ Static Checks Passed

❌ LLM Evaluation Failed

Uh oh!

PureWeen commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Multi-Model Code Review — PR #35105

CI Status

Re-Review (after commit a627409)

Previous Findings Status

New Findings

🟡 MODERATE

🟢 MINOR

Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Apr 23, 2026 •

edited

Loading

github-actions Bot commented Apr 23, 2026 •

edited

Loading

PureWeen commented Apr 23, 2026 •

edited

Loading

Re-Review (after commit `a627409`)