feat: add expert code review workflow with 3-model adversarial consensus#35111
Conversation
Adds /review slash command that dispatches 3 parallel sub-agents (Opus, Sonnet, Codex) for independent code review, then synthesizes findings through adversarial consensus before posting. - Inline review comments on diff lines + COMMENT review summary - COMMENT-only reviews (never REQUEST_CHANGES) to avoid stale blocks - Gated to admin/maintainer/write roles - Token-optimized: orchestrator delegates file reading to sub-agents, caps follow-ups at 2 models and 3 disputed findings Ported from dotnet/maui-labs PR #118, verified working on PureWeen/PolyPilot and dotnet/maui-labs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 35111Or
iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 35111" |
🔍 Skill Validation Results✅ Static Checks PassedSkills checked: 15 | Agents checked: 4 Full validator output⏭️ LLM Evaluation: SkippedNo changed skills with eval tests found. |
There was a problem hiding this comment.
Pull request overview
Adds a new gh-aw “Expert Code Review” workflow that can be triggered on-demand via a /review slash command, intended to run a multi-model review orchestration and post PR review comments/summaries.
Changes:
- Introduces
/reviewslash-command workflow with shared orchestration instructions and safe-output configuration. - Adds an
expert-revieweragent instruction file used by the orchestrated reviewers. - Commits the compiled workflow lock and updates
.github/aw/actions-lock.json.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/shared/review-shared.md | Shared frontmatter (tools/permissions/safe-outputs) plus orchestration steps for multi-model review + consensus + posting. |
| .github/workflows/review.agent.md | Defines the /review slash command trigger, engine, imports shared orchestration. |
| .github/workflows/review.agent.lock.yml | Generated compiled workflow for the new agentic workflow. |
| .github/aw/actions-lock.json | Updates pinned action entries used by gh-aw compilation/security pinning. |
| .github/agents/expert-reviewer.agent.md | Defines the “expert-reviewer” review rubric/instructions for sub-agents. |
| task(agent_type: "general-purpose", model: "claude-opus-4.6", mode: "background", | ||
| description: "Reviewer 1: deep reasoning review", | ||
| prompt: "<full diff + PR description + instruction to follow .github/agents/expert-reviewer.agent.md>") | ||
|
|
||
| task(agent_type: "general-purpose", model: "claude-sonnet-4.6", mode: "background", | ||
| description: "Reviewer 2: pattern matching review", | ||
| prompt: "<same diff + same PR description + same instruction>") | ||
|
|
||
| task(agent_type: "general-purpose", model: "gpt-5.3-codex", mode: "background", | ||
| description: "Reviewer 3: alternative perspective review", | ||
| prompt: "<same diff + same PR description + same instruction>") |
There was a problem hiding this comment.
The example task(...) invocations use a YAML-like key: value argument syntax (and omit fields like name=) that doesn’t match the task(...) calling pattern used elsewhere in this repo (e.g., .github/pr-review/pr-preflight.md uses task(name=..., agent_type="general-purpose", mode="sync", prompt=...)). As written, this is likely to cause the orchestrator to fail to launch/track sub-agents reliably. Update the examples to the repo’s established task(...) call format and ensure the fields you rely on (name/description/agent_type/mode/model/prompt) are provided in the supported syntax.
| task(agent_type: "general-purpose", model: "claude-opus-4.6", mode: "background", | |
| description: "Reviewer 1: deep reasoning review", | |
| prompt: "<full diff + PR description + instruction to follow .github/agents/expert-reviewer.agent.md>") | |
| task(agent_type: "general-purpose", model: "claude-sonnet-4.6", mode: "background", | |
| description: "Reviewer 2: pattern matching review", | |
| prompt: "<same diff + same PR description + same instruction>") | |
| task(agent_type: "general-purpose", model: "gpt-5.3-codex", mode: "background", | |
| description: "Reviewer 3: alternative perspective review", | |
| prompt: "<same diff + same PR description + same instruction>") | |
| task( | |
| name="reviewer-1", | |
| description="Reviewer 1: deep reasoning review", | |
| agent_type="general-purpose", | |
| mode="background", | |
| model="claude-opus-4.6", | |
| prompt="<full diff + PR description + instruction to follow .github/agents/expert-reviewer.agent.md>" | |
| ) | |
| task( | |
| name="reviewer-2", | |
| description="Reviewer 2: pattern matching review", | |
| agent_type="general-purpose", | |
| mode="background", | |
| model="claude-sonnet-4.6", | |
| prompt="<same diff + same PR description + same instruction>" | |
| ) | |
| task( | |
| name="reviewer-3", | |
| description="Reviewer 3: alternative perspective review", | |
| agent_type="general-purpose", | |
| mode="background", | |
| model="gpt-5.3-codex", | |
| prompt="<same diff + same PR description + same instruction>" | |
| ) |
| add-comment: | ||
| max: 5 | ||
| hide-older-comments: true | ||
| target: "*" |
There was a problem hiding this comment.
safe-outputs does not configure noop, so the compiled workflow falls back to the default behavior (currently report-as-issue: true, as seen in the generated lock file). For an on-demand PR review workflow this can create noisy tracking issues on tool/missing-output paths. Consider explicitly adding noop: { report-as-issue: false } here (similar to copilot-evaluate-tests.md) to avoid unintended issue creation.
| target: "*" | |
| target: "*" | |
| noop: | |
| report-as-issue: false |
| "actions/github-script@v9": { | ||
| "repo": "actions/github-script", | ||
| "version": "v9", | ||
| "sha": "3a2844b7e9c422d3c10d287c895573f7108da1b3" | ||
| }, | ||
| "github/gh-aw-actions/setup@v0.62.2": { | ||
| "github/gh-aw-actions/setup@v0.69.3": { | ||
| "repo": "github/gh-aw-actions/setup", | ||
| "version": "v0.62.2", | ||
| "sha": "20045bbd5ad2632b9809856c389708eab1bd16ef" | ||
| "version": "v0.69.3", | ||
| "sha": "006ffd856b868b71df342dbe0ba082a963249b31" | ||
| }, |
There was a problem hiding this comment.
This update removes the github/gh-aw-actions/setup@v0.62.x entries, but other committed gh-aw lock workflows in this repo still reference github/gh-aw-actions/setup@... # v0.62.2 (e.g., copilot-evaluate-tests.lock.yml). If any validation/build step expects actions in lock files to exist in actions-lock.json, this inconsistency will break CI. Either keep the older setup entries in actions-lock.json or recompile/update the existing lock workflows to the new setup version so everything is consistent.
Multimodal Code ReviewPR #35111 — Add expert code review workflow with 3-model adversarial consensus SummaryThis PR adds a No screenshots needed — this is a workflow/infrastructure PR, not a UI change. Code Review FindingsPositives:
Issues worth addressing (agree with Copilot reviewer):
Observations (non-blocking):
VerdictWell-designed agentic workflow with strong security guards and a novel adversarial consensus approach. The |
- Fix task() syntax to use repo's established keyword-arg format (name=, agent_type=, etc.) - Add noop: report-as-issue: false to avoid noisy tracking issues - Restore v0.62.1/v0.62.2 entries in actions-lock.json for existing lock file compatibility - Remove expert-reviewer.agent.md (handled by other PRs) - Update references to use existing code-review skill instead - Recompile lock file Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds workflow_dispatch with pr_number input so the review workflow can be triggered from any branch against an arbitrary PR. This enables: - Iterating on the prompt in a PR branch without merging to main first - Testing against arbitrary PRs via Actions UI Uses the same Checkout-GhAwPr.ps1 pattern as copilot-evaluate-tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ranches When workflow_dispatch is triggered with use_pr_skills=true, the step runs Checkout-GhAwPr.ps1 as normal (security checks + PR checkout + .github/ restore from main), then overlays the PR branch's skill and instruction files back. This lets maintainers iterate on review criteria in a PR and test via workflow_dispatch without merging to main first. The slash_command path is unaffected — it always uses main's skills. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
workflow_dispatch is already gated to write-access collaborators, so there's no need for an extra opt-in flag. Just always overlay the PR branch's skill/instruction files after Checkout-GhAwPr.ps1 restores from main. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…flow Review workflow only reads PR data via MCP tools — no builds or NuGet access needed. Removing 'dotnet' from network.allowed reduces the attack surface to just defaults. Also recompiled with restored evaluate-tests lock to avoid unrelated changes to that workflow's lock file. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add bots: copilot-swe-agent[bot] so /review works on Copilot-authored PRs - Matches evaluate-tests workflow pattern Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Upgrade from v0.69.3 to v0.71.0 (latest release). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Restrict /review to PR contexts only (pull_request + pull_request_comment) to avoid wasted runs when typed on issues - Trim Step 1 'Gather Context' to remove MCP tool name hand-holding that gh-aw already provides via toolset configuration Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
If the git checkout of skill/instruction files from the PR branch fails, exit 1 instead of silently falling back to main's versions. This prevents confusing results where you think your PR changes are being used but they're not. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. Remove 'pull_request' from slash_command events — it compiled to a spurious trigger firing on every PR open/edit/reopen (2/3 consensus) 2. Add XPIA guard on orchestrator prompt — sub-agents had it but the orchestrator that processes untrusted PR content did not (2/3 consensus) 3. Add concurrency group with inputs.pr_number — workflow_dispatch runs fell through to github.run_id causing duplicate reviews (1/3, verified) 4. Fix stale 'expert-reviewer agent' reference in Step 2 description 5. Add sub-agent failure handling — graceful degradation when <2 complete Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. cancel-in-progress: false — prevents killing 60-min reviews on accidental double-trigger (2/3 consensus) 2. Large diff guard — PRs with 50+ files split into batches per reviewer to avoid context window overflow (3/3 consensus) 3. Time budget check before consensus follow-ups — skip if >60 min elapsed to avoid timeout with no posted review (2/3 consensus) 4. Prominent COMMENT constraint — top-level warning makes it harder for XPIA to trick agent into REQUEST_CHANGES (1/3, verified) 5. Zero-findings handling — explicit add-comment fallback when all reviewers find no issues (1/3, verified) 6. CI status reworded — no checks toolset available, so clarify the agent should assess test coverage from the diff (1/3, verified) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Evaluated against PolyPilot gh-aw guide + 3-model consensus. 3/3 consensus: 1. Add status-comment: true — users get progress feedback for 90-min workflow 2/3 consensus: 2. Remove duplicate permissions block from shared file (single source of truth) 1/3 verified improvements: 3. Add parentheses to if: expression for maintenance clarity 4. Use git rev-parse HEAD instead of gh pr view API call (simpler, no network) 5. Define consensus matching criteria: same root cause + same file 6. Cap at 3 most severe disputed findings (not arbitrary selection) 7. Batch-split findings: downgrade severity + annotate low confidence 8. Step 2: reference batch mode for large diffs (resolves contradiction) 9. Pre-flight check: verify SKILL.md exists before dispatching sub-agents Discarded false positives: - permissions: write needed (GPT) — safe-outputs handles writes - Move steps to agent.md (Sonnet) — shared imports is standard gh-aw - MCP fallback for forks (Opus) — forks rejected by Checkout-GhAwPr.ps1 - reaction: emoji (GPT) — compiler auto-adds eyes reaction Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Bump submit-pull-request-review max from 1 to 2 for retry headroom - Add start-time step so agent can check elapsed time budget - Broaden pre-flight error message for both trigger paths Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add explicit 2-reviewer fallback consensus rules (3/3 agreement) - Clarify MINOR stays MINOR in batch-split severity downgrade Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Code Review — PR #35111Follow-up review verifying previous findings and checking latest commits Independent AssessmentWhat this changes: Adds a Files: 4 changed — Previous Review Status
New Findings
|
Keeps v0.62.1 and v0.62.2 alongside v0.71.0 so copilot-evaluate-tests.lock.yml remains consistent with the shared action cache. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Code Review — PR #35111 (Re-review)Previous Finding Status
New Issues FoundNone. The Verdict: LGTMConfidence: high |
… rule Round 6 adversarial review (Opus/Sonnet/Codex): 1 fix from 7 findings. 'Median severity' was underspecified for non-adjacent levels (e.g., CRITICAL+MINOR). Now uses 'lower of the two' with explicit examples. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| task( | ||
| name="reviewer-1", | ||
| description="Reviewer 1: deep reasoning review", | ||
| agent_type="general-purpose", |
There was a problem hiding this comment.
Why not code-review or even a custom tailored agent?
That way you get more deterministic instructions (instead of relying on the orchestrator telling them what to do).
IMO safer if there are must-have instructions ( like do not build?)
There was a problem hiding this comment.
My assumption here is that the other PRs we have working in this space will take over here.
I don't want to get too lost on this one perfecting the agent here, this is more to get the workflow in with the good enough agent
Once we get this in we can iterate via your PR and then also try your approach here to see if we get better results.
I've had good results with this approach in other repositories so far, so, don't want this PR to get too stuck yet on the agent part.
There was a problem hiding this comment.
Ok, makes sense.
IMO the methodoly here is well prepared to just invoke more agents (e.g. built-in code-review and separately a custom reviewer), will just need some numerical adjustments for the voting process 👍
| create-pull-request-review-comment: | ||
| max: 30 | ||
| submit-pull-request-review: | ||
| max: 2 |
There was a problem hiding this comment.
Per the instructions to the orchestrator below, this should only be one. Where is the second one coming from?
There was a problem hiding this comment.
Good catch — changed to max: 1. The orchestrator only describes one submit_pull_request_review call, and safe-outputs retries don't consume the agent's max count, so max: 2 provided no retry benefit. Fixed in ced9ae3.
Per T-Gro's review: orchestrator instructions only describe one submit_pull_request_review call. Safe-outputs retries don't consume the agent's max count, so max:2 provided no retry benefit — it only permitted buggy double-submissions. Confirmed by 4/4 model consensus. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…issions (#35161) <!-- Please let the below note in for people that find this PR --> > [!NOTE] > Are you waiting for the changes in this PR to be merged? > It would be very helpful if you could [test the resulting artifacts](https://github.com/dotnet/maui/wiki/Testing-PR-Builds) from this PR and let us know in a comment if this change resolves your issue. Thank you! ## Description Recompiles `review.agent.lock.yml` with gh-aw v0.68.3 to fix 403 errors on `/review` slash command activation. ## Problem The lock file compiled with v0.71.0 (merged in #35111) was missing `pull-requests: write` on the activation job. When the workflow tried to add a 👀 reaction to a `/review` comment on a PR, it failed with: ``` POST /repos/dotnet/maui/issues/comments/{id}/reactions - 403 Resource not accessible by integration ``` GitHub requires `pull-requests: write` to add reactions to issue comments associated with PRs, even though the endpoint path is `/issues/comments/`. ## Root Cause Upstream compiler bug in gh-aw v0.69.3+ — the activation job permissions were scoped too tightly, stripping `pull-requests: write` for `slash_command` events on PR comments. Filed as [github/gh-aw#28767](github/gh-aw#28767). ## Fix Recompiled with gh-aw v0.68.3 (current default/recommended version), which correctly grants: ```yaml permissions: actions: read contents: read discussions: write issues: write pull-requests: write # ← this was missing with v0.71.0 ``` ## Testing - ✅ Tested on PureWeen/PolyPilot: v0.68.3 `/review` trigger succeeds, activation passes, agent runs - ❌ Confirmed v0.71.0 and v0.71.1 both fail with the same 403 error Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Note
Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!
Summary
Adds a
/reviewslash command that triggers a 3-model adversarial code review on any PR.How It Works
/reviewon a PRFiles
.github/workflows/review.agent.md/reviewslash command trigger + workflow_dispatch for testing.github/workflows/shared/review-shared.md.github/workflows/review.agent.lock.yml.github/aw/actions-lock.jsonDesign Decisions
/reviewonly — no auto-review-on-open to avoid cost on every PR in a large repoallowed-events: [COMMENT]prevents stale blocking reviews that cannot be dismissed (gh-aw#27655)create_pull_request_review_commentfor diff-line annotations,submit_pull_request_reviewfor summary,add_commentas fallbackroles: [admin, maintainer, write].github/skills/code-review/SKILL.md— existing MAUI code review skill with 345 lines of maintainer-sourced review rulesTrial Run
Validated end-to-end via
gh aw trial:Provenance
Ported from dotnet/maui-labs PR #118, iteratively tested and refined across: