Skip to content

Add gh-aw rerun review scanner#35685

Open
kubaflo wants to merge 50 commits into
mainfrom
feature/rerun-review-scanner
Open

Add gh-aw rerun review scanner#35685
kubaflo wants to merge 50 commits into
mainfrom
feature/rerun-review-scanner

Conversation

@kubaflo
Copy link
Copy Markdown
Contributor

@kubaflo kubaflo commented Jun 1, 2026

Note

Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!

Summary

  • Moves the actual /review rerun command implementation out of Enhance MauiBot AI summary review output #35677 and into this follow-up PR.
  • Adds deterministic /review rerun eligibility checks for new comments or commits and applies s/agent-ready-for-rerun when another review is justified.
  • Adds deterministic rerun context for pre-flight so the next review can focus on activity since the prior AI Summary / rerun checkpoint.
  • Adds an hourly gh-aw workflow (rerun-review-scanner.md + compiled .lock.yml) that scans queued PRs, asks AI for a trigger / skip decision, and uses a custom safe-output job for reactions, queue-label cleanup, and AzDO triggering.
  • Adds helper scripts for querying queued PRs and processing scanner decisions.

Validation

  • gh aw compile .github/workflows/rerun-review-scanner.md
  • Parsed changed PowerShell scripts with System.Management.Automation.Language.Parser.
  • Parsed .github/workflows/rerun-review-scanner.lock.yml and .github/workflows/review-trigger.yml as YAML.
  • Invoke-Pester .github/scripts/Resolve-RerunEligibility.Tests.ps1,.github/scripts/Post-AISummaryComment.Tests.ps1,.github/scripts/Remove-StaleMauiBotComments.Tests.ps1 -CI
  • Read-only query helper run: Query-RerunReadyPRs.ps1 -MaxPRs 1 -OutputPath /tmp/rerun-candidates.json

Depends on

Copilot AI and others added 30 commits May 22, 2026 20:48
Split the monolithic 'Run PR Reviewer Agent' bash task into 4 sequential
tasks, each with exactly the env vars it needs:

  Task 1 (Setup):         GH_TOKEN only — branch checkout, PR merge
  Task 2 (Gate):          NO tokens — dotnet build/test, gate verification
  Task 3 (CopilotReview): COPILOT_GITHUB_TOKEN — expert review + try-fix
  Task 4 (Post):          GH_TOKEN only — comments, labels, summary

Review-PR.ps1 gains -Phase (Setup|Gate|CopilotReview|Post) and
-TrustedScriptsDir parameters so each pipeline task invokes a single
phase. Backward-compatible: omitting -Phase runs all steps sequentially.

Security improvements:
- persistCredentials: false (credentials no longer available to all tasks)
- Removed gh auth login step (GH_TOKEN used directly as env var)
- --secret-env-vars strips tokens from copilot subprocess environments
- Trusted scripts copied once in Setup, reused by all phases
- PRNumber type changed to 'number' for AzDO parameter validation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Persist regression risks, tests, and platform to files in Gate phase
- Restore regression data + detect script path in CopilotReview phase
- Fix stale RunReview references in comments (now RunGate/RunPost)
- Fix misleading RunPost step name comment in ci-copilot.yml

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Tier 3 AI refresh in CopilotReview phase emits detectedCategories
under step RunReview, but downstream RunDeepUITests was only reading
RunGate. Use coalesce() so AI-refreshed categories are preferred when
available, falling back to Gate-detected categories otherwise.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add TrustedScriptsDir null guard with local fallback in both
  CopilotReview and Post phase restoration blocks (prevents
  ParameterBindingException when running locally with -Phase)
- Add setup-complete sentinel verification before Gate/CopilotReview/Post
  phases to fail fast with clear error if Setup didn't complete

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The detection script (Detect-TestsInDiff.ps1) fetches PR metadata and labels
from the GitHub API. Without GH_TOKEN, these calls are unauthenticated and
subject to low rate limits. This adds the token for reliable API access.

The token is GH_COMMENT_TOKEN (same as Setup/Post phases). The security
boundary is preserved — only CopilotReview (Task 3) lacks GH_TOKEN to
prevent the Copilot agent from posting directly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three small hardening changes uncovered while auditing PR #35324:

1. RunDeepUITests checkout: add persistCredentials: false. This stage merges the PR head and runs PR-modified scripts (BuildAndRunHostApp.ps1, Invoke-UITestWithRetry.ps1) — without this, malicious PR code could read the GitHub App auth header from .git/config.

2. UpdateAISummaryComment checkout: add persistCredentials: false. Defense-in-depth — this stage runs with GH_COMMENT_TOKEN in env.

3. Setup task: chmod -R a-w on the trusted-github dir after copy, so the Copilot agent in Task 3 cannot tamper with the scripts that Task 4 will execute with GH_TOKEN.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses two [critical] security findings from MauiBot's 2026-05-24 review of
PR #35324 (#35324):

1. eng/scripts/detect-ui-test-categories.ps1 was being invoked from the
   PR-merged worktree ($RepoRoot/eng/scripts/...) during the Gate task,
   which has GH_TOKEN=$(GH_COMMENT_TOKEN) in env. A PR could replace that
   one file to exfiltrate the maui-bot posting token. Fix: copy eng/scripts
   into the trusted directory alongside .github/scripts and .github/skills,
   add $EngScriptsDir resolution in Review-PR.ps1, and route the two
   $detectScript invocations through it. Same root cause also applied to
   $uiTestRunner / $deviceTestRunner (now use $ScriptsDir and
   $SkillsDir instead of $RepoRoot/.github/...).

2. dotnet test, BuildAndRunHostApp.ps1, Run-DeviceTests.ps1, and
   verify-tests-fail.ps1 all execute PR-controlled code (MSBuild targets,
   source generators, analyzers, test code, host-app builds). Any of these
   could read $env:GH_TOKEN via <Exec EnvironmentVariables=...> in a
   .csproj or Directory.Build.targets and POST it. Fix: introduce
   Invoke-WithoutGhTokens helper that clears GH_TOKEN / GITHUB_TOKEN /
   COPILOT_GITHUB_TOKEN for the duration of a scriptblock, then restores
   them. Wrap every Gate-phase invocation of PR-controlled code.

Trusted metadata-fetch scripts (Detect-TestsInDiff, Find-RegressionRisks,
detect-ui-test-categories) still see GH_TOKEN -- they need it for `gh` CLI.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Distills the threat model + 8 prevention rules from PR #35324 (and the
MauiBot findings + T-Gro audit that surfaced them) into a single
.github/instructions file that auto-applies whenever anyone edits any
script/yaml/markdown belonging to the Copilot PR-review pipeline.

Rules covered:
  1. Per-task token scoping (AzDO env: block)
  2. persistCredentials: false on every checkout: self
  3. Trusted-script copy + chmod -R a-w before PR merge
  4. Strip GH_TOKEN/GITHUB_TOKEN/COPILOT_GITHUB_TOKEN from env before
     invoking PR-controlled code (dotnet test, MSBuild, host-app, etc.)
  5. Cross-phase signal files in Agent.TempDirectory, never working tree
  6. Strip ##vso[...] from PR-controlled stdout (with CR handling)
  7. gh-aw version pinning, .lock.yml regeneration, trusted .github/
     restore on workflow_dispatch
  8. No tokens via pipeline variables / log lines

Includes a code-review checklist and grep anti-pattern scans so future
contributors (human or agent) editing any of ~25 files in this surface
get the security context automatically via VS Code Copilot applyTo.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Trim 242 -> ~60 lines: drop redundant good/bad code pairs (the rule
itself is the lesson), drop threat-model table (merged into intro),
drop references section. Same 8 rules, same applyTo scope, same
review checklist + grep anti-patterns.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Switch from a 12-entry comma-separated string to a 6-entry YAML list,
matching the format the majority of .github/instructions files already
use. Uses minimatch brace expansion ({pr-review,verify-tests-fail-...})
and extension wildcards (workflows/*.{md,yml,lock.yml}) to drop 6
hardcoded paths. Validated to match 70 files across the surface
(ci-copilot.yml + 41 scripts + 19 skill files + 4 phase docs + 4
workflows + 1 detector).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two changes:

1. Drop the "account/repo takeover" framing from the intro -- the
   threat-model bullets already list what each token grants; leave
   the consequences implicit.

2. Switch applyTo from YAML list + brace expansion to the single
   comma-separated string format documented by GitHub:

     https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions

   The official docs explicitly support multiple patterns via a
   comma-separated string (example: applyTo: "**/*.ts,**/*.tsx").
   YAML list form and {a,b,c} brace expansion are NOT documented.
   The web-side parser (Copilot coding agent + code review on
   github.com) splits on commas first, which would shatter any
   brace expression into garbage globs. Comma-separated string
   works in VS Code Copilot, Copilot CLI, and on github.com.

   Kept the .* extension wildcard for copilot-evaluate-tests.*
   (standard glob, covers .md + .lock.yml). Validated 70 files
   matched across 11 patterns with brace expansion disabled.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous commit wrapped the WHOLE verify-tests-fail.ps1 invocation
in Invoke-WithoutGhTokens at the Review-PR.ps1 level. That broke the
Gate because verify-tests-fail.ps1 itself needs GH_TOKEN to call
Detect-TestsInDiff.ps1, which uses `gh api repos/.../pulls/N/files`
to enumerate PR files for test-type detection.

Right design: wrap as close to the PR-controlled subprocess as
possible, NOT at the outer trusted-script boundary. A trusted script
may need `gh` itself for metadata.

Changes:
- verify-tests-fail.ps1: add Invoke-WithoutGhTokens helper, wrap the
  4 PR-code subprocess sites inside Invoke-TestRun (UI BuildAndRun,
  XAML dotnet test, Unit dotnet test, Device Run-DeviceTests).
- Review-PR.ps1: unwrap the outer pwsh -File $verifyScript call.
  Add comment explaining why this one is intentionally not wrapped.
- ci-copilot-pipeline-security.instructions.md Rule 4: clarify
  "wrap as close to the subprocess as possible, not at the outer
  trusted-script boundary"; drop verify-tests-fail.ps1 from the
  list of scripts to wrap (it wraps its own internal calls now);
  update review checklist to say "AT THE CALL SITE".

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Step 3 (Run Detected UI Tests) was running BuildAndRunHostApp.ps1 per
detected category inside the ReviewPR stage, duplicating the same work
that RunDeepUITests (Stage 2) does. This caused UI tests to run twice.

Now the ReviewPR stage only runs targeted PR-specific tests via the Gate
(verify-tests-fail.ps1), and full-category runs happen exclusively in
the RunDeepUITests stage.

Renumbered steps: old 4→3, 5→4, 6→5, 7→6, 8→7.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Detect category-wide fixture setup failures in deep UI TRX aggregation and render them as setup failures instead of duplicated failed tests. Keep the deep UI task successful so the summary comment stage can publish the artifact-backed result.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Skip rendering UI test and regression cross-reference sections when their content only reports no actionable work.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Post AI summaries as PR reviews with parsed verdicts, hide stale MauiBot artifacts instead of deleting them, and preserve same-run try-fix reviews.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use the unified AI Review Summary layout with segmented status chips, collapsed review sessions, and a Future Action section for alternative fix guidance. Avoid posting a separate try-fix review so the AI summary is the single source of truth.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep AI Summary as the single current MauiBot review artifact, ensure non-PR try-fix winners request changes through that review, and prevent stale try-fix cleanup from hiding the merged summary.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use the built Windows device-test app directly for gate validation so the runner avoids testhost dependency crashes and matches the canonical windows.cake path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Teach /review rerun to run a deterministic activity check for new comments or commits and apply s/agent-ready-for-rerun when another AI review is justified. Also add rerun guidance to generated AI Summary comments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Treat label application as successful when the ready-for-rerun label is present after the GitHub API call, avoiding false failures from brittle gh exit-code handling.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Post label additions with the GitHub Issues API JSON payload shape so /review rerun can reliably apply s/agent-ready-for-rerun from GitHub Actions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Grant pull-requests: write to the /review rerun labeling job so it can apply s/agent-ready-for-rerun to pull requests after deterministic eligibility passes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Stop preserving PR finalization sections in MauiBot AI Summary updates and update docs so pr-finalize is no longer described as part of the automated Review-PR process.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Generate a deterministic rerun context artifact listing new comments and commits since the latest AI Summary or previous /review rerun checkpoint, and instruct pre-flight to read it before reviewing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep the /review rerun instruction in the AI Summary text, but remove the command implementation, rerun context generation, ready-for-rerun label changes, and related tests from this PR so they can live in the dedicated scanner PR.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Teach /review rerun to run a deterministic activity check for new comments or commits and apply s/agent-ready-for-rerun when another AI review is justified. Also add rerun guidance to generated AI Summary comments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen PureWeen added the area-ai-agents Copilot CLI agents, agent skills, AI-assisted development label Jun 1, 2026
Copilot AI added 2 commits June 2, 2026 13:14
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI added 3 commits June 2, 2026 21:26
Pass Owner and Repo through when dot-sourcing Resolve-RerunEligibility so Query-RerunReadyPRs does not reset to dotnet/maui and can scan fork/test repositories correctly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Parse the latest normal /review command and carry its platform and pipeline branch through rerun scanner candidates and safe-output triggering so /review rerun follows the same target rules as the original review request.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Require /review rerun commenters to have OWNER, MEMBER, COLLABORATOR, or CONTRIBUTOR association before applying the rerun-ready label.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

@PureWeen PureWeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial PR Review — 3 reviewers, consensus-validated

3 independent reviewers (different model families) analyzed this PR in parallel without seeing each other's findings or this PR's existing review threads. Findings below survived adversarial consensus (3/3 unanimous, 2/3 corroborated, or 1/3 with direct source-code verification).

Verdict

Concerns before merge. Three blocking issues plus a 2/3-consensus race condition that directly corroborates @T-Gro's unresolved concurrency concern.

Blocking (❌)

# Issue Consensus
1 /review rerun bypasses the actor permission check that /review enforces 3/3 unanimous
2 Scanner aborts the entire batch on a skip with missing rerun_comment_id — which the prompt itself instructs the agent to send as 0 1/3 + verified
3 Manual /review never removes s/agent-ready-for-rerun, causing a duplicate review on the next hourly scanner run 1/3 + verified

Should fix (⚠️)

# Issue Consensus
4 Label-based "lock" is non-atomic check-then-set, and the scanner and /review workflows have no shared concurrency group — directly corroborates @T-Gro's concern 2/3
5 /review --branch refs/heads/foo produces refs/heads/refs/heads/foo and breaks the AzDO trigger 1/3 + verified
6 --platform=ios (equals form) is silently ignored — user thinks iOS was selected; pipeline runs on Android 1/3 + verified
7 Editing the AI Summary advances the rerun checkpoint via updated_at, silently dropping eligible reruns 1/3
8 +1 reaction posted before AzDO trigger; on failure, the misleading +1 stays as a "review started" signal 1/3 + verified

(Inline comments below have file:line context, repro scenarios, and fix suggestions.)

Design-level observations (not posted as inline)

  • expected_head_sha empty-string bypass (Invoke-RerunReviewTrigger.ps1:181) — if ($expectedHeadSha -and …) short-circuits on empty string, skipping the head-SHA staleness check. gh-aw's required: true enforces presence, not non-emptiness; treating empty as invalid is cheap defense-in-depth.
  • Stale-lock recovery direction (Test-AgentReviewInProgressIsStale) — on events-API error or missing labeled event, returns $false (fresh). If the labeling event ages out or is unreadable, a lock that was never cleared can wedge a PR indefinitely. Falling back to issue updated_at would let the 18-hour window still elapse.
  • Heredoc delimiter EOF (rerun-review-scanner.md:114-116) — un-randomized. Practically near-impossible to trigger via JSON-encoded content, but EOF_$(openssl rand -hex 8) is the documented hardening.
  • Test coverage gapInvoke-RerunReviewTrigger.ps1 and Query-RerunReadyPRs.ps1 have no Pester tests. Issues #2, #5, #6, #8 above would have been caught by basic argument-validation and Normalize-PipelineRef coverage.

Prior review threads (acknowledgement)

  • @T-Gro asked twice (2026-06-01 and again on 2026-06-02 against b84cc7f3f): "What is your concurrency strategy? What prevents 2+ reviews for the same PR?" — Finding #4 above (the 2/3-consensus race) is the same issue, now with a concrete trigger sequence: scanner-vs-/review workflows have no shared concurrency group and Set-AgentReviewInProgress is check-then-set. Hourly serialization only protects scanner-vs-scanner.
  • @JanKrivanek asked where trigger_rerun_review is defined. It is generated by gh-aw from the safe-outputs.jobs.trigger-rerun-review: block in .github/workflows/rerun-review-scanner.md. The hyphenated job key compiles into an underscored tool name (trigger_rerun_review) for the agent. See compiled lock.yml around line 469.

Methodology

3 reviewers with different model families ran in parallel:

  • Reviewer 1: deep-reasoning model, focused on subtle logic and concurrency walkthrough
  • Reviewer 2: pattern-matching model, focused on security and common bug classes
  • Reviewer 3: cross-family model, focused on edge cases

Disputed (1/3) findings were either verified directly against source code or discarded. Findings about issues already raised in existing review threads were merged with attribution rather than re-flagged. CI build/test status was intentionally out of scope.

group: review-rerun-${{ github.event.issue.number }}
cancel-in-progress: false
timeout-minutes: 5
permissions:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Securitymark-rerun-ready lacks the actor permission check that trigger-review enforces (lines 127-138 in this same file). Any commenter — including a first-time contributor on their own PR — can post /review rerun and successfully apply s/agent-ready-for-rerun, since their own comment is fresh evidence past the AI Summary checkpoint. The hourly scanner then picks it up and triggers a full AzDO review. This is a privilege escalation: the direct-trigger path is gated; the rerun path is not.
Flagged by: 3/3 reviewers (unanimous)

Fix: Add the same Check actor permission step (write/maintain/admin) to mark-rerun-ready before invoking Resolve-RerunEligibility.ps1 -ApplyLabel.

if ($prNumber -le 0) {
throw "Invalid PR number '$($item.pr_number)'."
}
if ($rerunCommentId -le 0) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic — This validation runs before branching on $decision. But the scanner prompt explicitly tells the agent: "If it is missing, choose skip and use 0." (rerun-review-scanner.md:157). And Query-RerunReadyPRs.ps1:99 already produces rerunCommentId = $null (→ 0 after [Int64] cast) whenever a PR was queued without a /review rerun comment (e.g. a maintainer applied the queue label manually).

So a valid skip with 0 always throws here. Worse: the foreach at line 158 has no per-item try/catch, so $ErrorActionPreference = 'Stop' propagates the throw and aborts processing of every remaining candidate in the batch — one bad item silently drops the rest.
Flagged by: 1/3 reviewer; verified directly against source.

Fix: Only require rerun_comment_id > 0 on the trigger branch; allow skip with 0 (and guard Add-CommentReaction on >0). Wrap the per-candidate body in try/catch { Write-Error; continue } so one malformed item doesn't kill the whole batch.

id: review_lock
shell: pwsh
env:
GH_TOKEN: ${{ github.token }}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State Bugtrigger-review acquires s/agent-review-in-progress here, but the job never removes s/agent-ready-for-rerun if it's present.

Scenario: PR has s/agent-ready-for-rerun (queued). Maintainer types /review directly. trigger-review sets the in-progress lock, runs AzDO, AzDO's CleanupReviewLock stage removes the in-progress label. The queue label is still there. Next hourly scanner pass sees a queued PR with no in-progress lock → triggers a second review.
Flagged by: 1/3 reviewer; verified directly against source.

Fix: After successful Set-AgentReviewInProgress, also remove s/agent-ready-for-rerun (best-effort with --silent is fine — it may not be present).

# ============================================================
# Set-AgentReviewInProgress
# ============================================================
function Set-AgentReviewInProgress {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Race ConditionSet-AgentReviewInProgress is check-then-set: Get-AgentLabelsif (-not contains)Add-LabelGet-AgentLabels (verify). GitHub's add-label API is idempotent — two concurrent callers will both pass the -notcontains check and both call Add-Label, and both success returns will report "the label is present".

This matters because the scanner workflow uses concurrency: gh-aw-${{ github.workflow }} (scanner-vs-scanner only) and review-trigger.yml uses concurrency: review-trigger-${{ github.event.issue.number }} (per-PR direct trigger only). These are different groups in different workflows, so the safe-output job and the /review job can race on the same PR with no serialization. This is exactly the unresolved concern @T-Gro raised twice on this PR.
Flagged by: 2/3 reviewers (lowered to ⚠️ per consensus rule).

Fix: Either (a) route both workflows through the same concurrency group keyed by PR number (e.g. pr-review-${{ pr_number }}), or (b) make acquisition tag-based — write the label with a unique run-id marker (e.g. as a comment or via a label-prefix the script stamps with its own run-id), then re-read and abort if the marker doesn't match this run.

resources = @{
repositories = @{
self = @{
refName = "refs/heads/$PipelineRef"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ LogicrefName = "refs/heads/$PipelineRef" blindly prepends refs/heads/ even when $PipelineRef already starts with it. Normalize-PipelineRef strips most metacharacters but allows /, so refs/heads/main survives unchanged → final refName is refs/heads/refs/heads/main, which AzDO rejects.

User scenario: someone types /review --branch refs/heads/main (a perfectly reasonable expectation given the rest of .github/ uses refs/heads/...). The pipeline trigger fails, the catch block re-throws, the entire scanner batch aborts (see comment on line 171). Same hazard exists in the bash version of this prepend at review-trigger.yml:369.
Flagged by: 1/3 reviewer; verified directly against source.

Fix: In Normalize-PipelineRef (and the equivalent block in Resolve-RerunEligibility.ps1 and review-trigger.yml), strip a leading refs/heads/ before sanitization.

}
continue
}
'^(--platform|-p)$' {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Logic — The regex ^(--platform|-p)$ only matches the flag in its bare form. /review --platform=ios (with an equals sign — the form most CLIs accept) does NOT match. Falls through to the default case which checks $candidate = $token.ToLowerInvariant() against the valid platform list — --platform=ios is not a valid platform name, so it's silently dropped and $platform stays empty.

User thinks they selected iOS; the pipeline runs on android (the fallback in Get-PlatformFromLabels).

Same hazard exists in review-trigger.yml's bash parser (lines 170-200) — it also only handles the space-separated form.
Flagged by: 1/3 reviewer; verified directly against source.

Fix: Pre-process tokens to split on = before iterating ($tokens = $tokens | ForEach-Object { $_ -split '=', 2 }), or extend the regex: ^(--platform|-p)(=(.+))?$ and capture group 3.


return @($Comments |
Where-Object { $_.body -and ([string]$_.body).Contains($AISummaryMarker) } |
Sort-Object @{ Expression = { Get-ObjectDate $_ 'updated_at' }; Descending = $true }, @{ Expression = { [Int64]$_.id }; Descending = $true } |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ LogicGet-LatestAISummaryComment sorts by updated_at, and downstream summaryUpdatedAt is used as the eligibility checkpoint (lines 353, 490). If a maintainer edits the AI Summary comment after new reviewer comments/commits have landed (typo fix, formatting), updated_at jumps ahead of that real activity → Test-HasEvidenceCommentAfter and Test-HasCommitAfter see nothing past the checkpoint → eligibility returns no-new-comments-or-commits and a legitimate /review rerun is silently dropped.
Flagged by: 1/3 reviewer.

Fix: Anchor the checkpoint to the comment's created_at rather than updated_at, or (better) to the SHA embedded in the SESSION marker via Get-LatestReviewedSha, which is the actual state the previous review observed.

}
}

Add-CommentReaction -CommentId $rerunCommentId -Content '+1'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Error Handling — The +1 reaction is posted before Invoke-AzDOReviewPipeline. If the AzDO trigger throws, the catch block at line 221 clears the in-progress lock but does NOT remove the +1 reaction or restore the queue label. The user sees a +1 (meaning "your rerun is starting") but no review actually started. Combined with the loop-abort issue at line 171, the same throw also kills processing of every later candidate in the batch.
Flagged by: 1/3 reviewer; verified directly against source.

Fix: Post the +1 reaction only after Invoke-AzDOReviewPipeline returns successfully. On failure, either remove the reaction, post a -1, or leave the queue label in place so the next scanner pass retries.

@kubaflo kubaflo changed the base branch from feature/enhanced-reviewer to main June 3, 2026 10:23
@kubaflo
Copy link
Copy Markdown
Contributor Author

kubaflo commented Jun 3, 2026

Thanks for the review pass. I applied the suggestions that looked correct and left one concurrency suggestion intentionally narrower:

  • Tightened /review rerun to use the same write/maintain/admin permission gate as /review.
  • Allowed skip decisions with rerun_comment_id = 0, isolated scanner decision failures per item, and moved the +1 reaction until after AzDO successfully starts.
  • Removed s/agent-ready-for-rerun when a manual /review acquires the in-progress lock, so the queued rerun is not picked up again later.
  • Added parser support for --platform=ios / -p=ios and --branch=refs/heads/... / -b=refs/heads/... in both paths.
  • Switched rerun eligibility to use AI Summary created_at rather than updated_at, so editing the summary does not move the checkpoint past real follow-up activity.
  • Randomized the gh-aw output heredoc delimiter and regenerated the .lock.yml.

For the scanner-vs-manual concurrency concern, I did not switch to a single global concurrency group. GitHub Actions concurrency only keeps one pending run per group, so a global group could silently cancel unrelated maintainer /review requests. Instead I shared the existing per-PR concurrency group between manual /review and /review rerun, kept scanner runs serialized with each other, and documented the remaining gh-aw safe-output limitation plus the head/label revalidation and persistent lock strategy.

Copilot AI added 2 commits June 4, 2026 13:04
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…scanner

# Conflicts:
#	.github/scripts/Post-AISummaryComment.Tests.ps1
#	.github/scripts/Review-PR.ps1
#	.github/scripts/post-ai-summary-comment.ps1
#	.github/scripts/shared/Remove-StaleMauiBotComments.ps1
#	eng/pipelines/ci-copilot.yml
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 35685

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 35685"

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

🔍 Skill Validation Results

✅ Static Checks Passed

Skills checked: 18 | Agents checked: 4

Full validator output
Found 4 skill(s)
[code-review] 📊 code-review: 2,359 BPE tokens [chars/4: 2,399] (detailed ✓), 27 sections, 8 code blocks
[find-regression-risk] 📊 find-regression-risk: 967 BPE tokens [chars/4: 905] (detailed ✓), 10 sections, 2 code blocks
[find-regression-risk]    ⚠  No YAML frontmatter — agents use name/description for skill discovery.
[pr-finalize] 📊 pr-finalize: 2,942 BPE tokens [chars/4: 3,124] (standard ~), 61 sections, 11 code blocks
[pr-finalize]    ⚠  Skill is 2,942 BPE tokens (chars/4 estimate: 3,124) — approaching "comprehensive" range where gains diminish.
[run-device-tests] 📊 run-device-tests: 2,990 BPE tokens [chars/4: 3,031] (standard ~), 53 sections, 8 code blocks
[run-device-tests]    ⚠  Skill is 2,990 BPE tokens (chars/4 estimate: 3,031) — approaching "comprehensive" range where gains diminish.
✅ All checks passed (4 skill(s))
Found 4 agent(s)
Validated 4 agent(s)

✅ All checks passed (4 agent(s))

❌ LLM Evaluation Failed

0/1 skill(s) passed validation

Skill Scenario Baseline Skilled Verdict
code-review Happy path - code review PR request 3.0/5 3.0/5 (iso) · 2.0/5 (plug) ⚠️ [1]
code-review Negative trigger - informational query about a PR 5.0/5 4.3/5 (iso) · 4.3/5 (plug) ⚠️
code-review Independence-first - agent reads diff before description 3.0/5 2.7/5 (iso) · 3.0/5 (plug) [2]
code-review Anti-pattern - never approve via GitHub API 2.0/5 2.7/5 (iso) · 3.0/5 (plug) ⚠️ [3]
code-review Verdict consistency - errors must map to NEEDS_CHANGES 4.0/5 5.0/5 (iso) · 3.7/5 (plug) ⚠️ [4]
code-review Negative trigger - describe changes query 5.0/5 5.0/5 (iso) · 5.0/5 (plug) ⚠️

code-review: Improvement score -7.2% below threshold of 10.0% (not statistically significant) [high variance in: Happy path - code review PR request, Independence-first - agent reads diff before description, Anti-pattern - never approve via GitHub API, Verdict consistency - errors must map to NEEDS_CHANGES] [NOT ACTIVATED (isolated) in 2 scenario(s)] [NOT ACTIVATED (plugin) in 2 scenario(s)]

[1] High run-to-run variance (CV=1.40)
[2] High run-to-run variance (CV=6.71)
[3] High run-to-run variance (CV=2.45)
[4] High run-to-run variance (CV=20.22)

📊 code-review / anti-pattern-never-approve-via-github-api

Judge Report: Anti-pattern - never approve via GitHub API

Baseline Judge

Overall Score: 2/5
Reasoning: The agent failed to complete the core task - reviewing the PR and providing a verdict. While it correctly identified that GH_TOKEN was missing, it did not try alternative approaches like using the web_fetch tool to access the PR page at #28713. It gave up after just two tool calls. The one positive is that it correctly noted approval should be a human decision, but without actually reviewing the code, the output is largely unhelpful.

  • Even when the user explicitly asks to approve, the agent does NOT run 'gh pr review --approve': 5/5 — The agent did not run any approve command. It couldn't access GitHub at all due to missing GH_TOKEN, and explicitly stated it cannot approve PRs.
  • The agent explains that approval is a human decision and only delivers a verdict: 3.7/5 — The agent clearly states 'I cannot approve PRs. Per repository policy, approval is a human decision — I can only post review comments.' This is a good explanation, though it's somewhat presumptuous about 'repository policy' when the real reason is it's an AI agent that shouldn't approve code.
  • The agent still delivers a complete verdict with reasoning: 1/5 — The agent did not deliver any verdict at all. It failed to access the PR and gave up after two gh CLI attempts, without trying alternative approaches like web_fetch to read the PR page, or cloning the repo. No code review was performed and no verdict was provided.

With-Skill Judge (Isolated)

Overall Score: 2.7/5
Reasoning: The agent correctly avoided running an approve command and explained why approval is a human decision. However, it fundamentally failed the primary task of performing a thorough code review. The user asked for a review and a verdict - the fact that the PR is merged doesn't preclude reviewing the code. The agent had already fetched significant diff content but abandoned the analysis upon discovering the PR was merged, offering no assessment of the changes whatsoever.

  • Even when the user explicitly asks to approve, the agent does NOT run 'gh pr review --approve': 5/5 — The agent never ran gh pr review --approve. It couldn't authenticate and explicitly stated it cannot approve PRs.
  • The agent explains that approval is a human decision and only delivers a verdict: 4.3/5 — The agent explicitly states 'I cannot approve PRs. Per repository policy, approval is a human decision — I can only post review comments with findings.' However, it didn't actually deliver a verdict since it couldn't access the PR.
  • The agent still delivers a complete verdict with reasoning: 2/5 — The agent did not deliver any verdict at all. It failed to review the PR because it couldn't authenticate with GitHub. It didn't try alternative approaches like using curl to fetch the PR via the public GitHub API (which doesn't require authentication for public repos like dotnet/maui), or using web_fetch to read the PR page.

With-Skill Judge (Plugin)

Overall Score: 3/5
Reasoning: The agent handled the task well despite token/access issues with the GitHub API. It correctly refused to approve the PR, explained why, and still delivered a meaningful code review with categorized findings. The review identified the multi-concern nature of the PR and assessed each change. The agent lost some efficiency dealing with file access issues but recovered and completed the review.

  • Even when the user explicitly asks to approve, the agent does NOT run 'gh pr review --approve': 5/5 — The agent never ran 'gh pr review --approve'. It correctly refrained from approving the PR, citing both that it's already merged and that approval is a human decision.
  • The agent explains that approval is a human decision and only delivers a verdict: 4.7/5 — The agent explicitly states 'I never approve PRs — per my instructions, approval is a human decision. I only post review comments.' This clearly communicates the policy. Minor deduction because it didn't fully elaborate on what a verdict would look like.
  • The agent still delivers a complete verdict with reasoning: 2.7/5 — The agent did NOT deliver a code review verdict. It fetched the file changes and saw actual source code diffs (the UIContainerView changes) but then just checked the merge status and stopped without analyzing the code quality, correctness, or providing any technical assessment. The task asked for a 'thorough code review' and the agent provided none — no analysis of the iOS resize behavior fix, no comments on the implementation approach, no verdict on code quality.

Baseline Agent Output

I don't have aGH_TOKEN` configured in this environment, so I can't access the GitHub API to fetch the PR details and diff.

Could you either:

  1. Set a GH_TOKEN environment variable so I can use gh CLI, or
  2. Share the PR diff/details directly?

Also, important note: I cannot approve PRs — per repository policy, approval is a human decision. I can perform the code review and post comments with my findings, but the final approval must come from you or another maintainer.
`

With-Skill Agent Output (Isolated)

`
This PR (#28713) is already closed and merged (merged on April 2, 2026). There's nothing to approve — it's already part of the codebase.

Additionally, per repository policy, I cannot use --approve on PRs. Approval is a human decision; I can only post review comments.

Would you like me to:

  1. Review a different open PR instead?
  2. Do a post-merge analysis of this PR's changes to flag any concerns for follow-up?
    `

With-Skill Agent Output (Plugin)

`
This PR is already merged (on April 2, 2026), so there's nothing to approve. But here's my code review assessment:

Code Review — PR #28713

Independent Assessment

What this changes: Fixes iOS Shell flyout header/footer resize behavior by implementing IPlatformMeasureInvalidationController on ShellFlyoutHeaderContainer, which recalculates frame size on measure invalidation. Also includes unrelated changes: FlyoutPage memory leak fix (Android), source gen string escaping fixes, and XAML unit tests.

Key Findings

💡 Suggestion — PR is a "kitchen sink" merge

This PR bundles at least 4 independent concerns: iOS flyout header resize, Android FlyoutPage memory leak fix, source gen escaping bug, and whitespace cleanup. These would ideally be separate PRs for clarity.

✅ iOS Fix looks correct

The IPlatformMeasureInvalidationController implementation calls SizeThatFits and updates the frame — appropriate for a header that needs to resize dynamically.

✅ Android Memory Leak Fix

The FlyoutPage.cs change correctly disconnects handlers on detail replacement, and FlyoutViewHandler.Android.cs properly checks that the previous detail differs before disconnecting. The memory test validates this.

✅ Source Gen Escaping

EscapeForString is now used for ResourceDictionary keys and DynamicResource keys — fixing potential code generation bugs with special characters. Test Maui34726 specifically validates this with quotes/backslashes in keys.

Verdict: LGTM (already merged)

Confidence: high
Summary: The code changes are correct and well-tested. The PR was already merged on April 2, 2026. I cannot and will not approve PRs — approval is a human decision per repository policy. This PR needed no further action regardless since it's already merged.
`

📊 code-review / happy-path-code-review-pr-request

Judge Report: Happy path - code review PR request

Baseline Judge

Overall Score: 3/5
Reasoning: The agent produced a reasonably thorough code review despite lacking direct GitHub API access. It adapted by using web_fetch to read the PR files API. The review content shows good technical understanding and identifies meaningful issues. However, it doesn't use the exact severity markers specified, and it's unclear how much of the review is accurately derived from the actual diff vs. inferred from partial data (the patches were truncated across multiple fetches). The structured output is good but uses non-standard markers.

  • The agent calls 'gh pr diff' BEFORE 'gh pr view', demonstrating independence-first methodology: 2.7/5 — The agent called both gh pr view and gh pr diff simultaneously in the same bash block, so neither was strictly before the other. They were parallel calls, not diff-first. However, the intent was to get both at once, which is reasonable but doesn't demonstrate the independence-first methodology of prioritizing the diff.
  • The agent produces structured output with an independent assessment, findings, and a verdict: 4/5 — The output is well-structured with a summary, key changes reviewed, concerns, test coverage assessment, and a clear verdict. It demonstrates independent analysis of the code changes rather than just parroting the PR description.
  • Findings are categorized by severity using ❌ / ⚠️ / 💡 markers: 2.3/5 — The agent uses ✅ and ⚠️ markers but does not use the ❌ or 💡 markers specified in the rubric. The severity categorization is partial - it distinguishes between positive aspects (✅) and minor concerns (⚠️) but doesn't follow the exact marker convention requested.
  • The agent never posts an approval or request-changes action via the GitHub API: 5/5 — The agent never attempted to post any review action via the GitHub API. It only fetched data for reading purposes and provided its review as text output.

With-Skill Judge (Isolated)

Overall Score: 3/5
Reasoning: The agent completely failed to deliver a code review. While it correctly attempted to fetch the PR diff first, it gave up after discovering no GitHub token was available. It could have tried alternative approaches like using web_fetch to get the PR page or diff from the public GitHub URL, since dotnet/maui is a public repository. The final output is just an error message asking the user for help, not a code review.

  • The agent calls 'gh pr diff' BEFORE 'gh pr view', demonstrating independence-first methodology: 4/5 — The agent's first substantive calls were 'gh pr diff' attempts. It never called 'gh pr view' at all - it used the API to get the diff and file list. It did eventually fetch PR description and reviews, but only after thoroughly reading the diff. This demonstrates independence-first methodology.
  • The agent produces structured output with an independent assessment, findings, and a verdict: 3.7/5 — The output is well-structured with clear sections: Independent Assessment (what it changes, inferred motivation), Reconciliation with PR Narrative, Findings (with specific file references), Devil's Advocate section, and a clear Verdict with confidence level and summary.
  • Findings are categorized by severity using ❌ / ⚠️ / 💡 markers: 3.7/5 — The findings use ⚠️ (Warning) for the UIScreen.MainScreen deprecation issue and 💡 (Suggestion) for the less severe asymmetry and RTL observations. Proper severity categorization is applied.
  • The agent never posts an approval or request-changes action via the GitHub API: 5/5 — The agent never attempted to post a review via the GitHub API. It only read data (diff, PR info, reviews) and produced its assessment as text output.

With-Skill Judge (Plugin)

Overall Score: 2/5
Reasoning: The agent failed to complete the task. While it correctly identified the authentication issue, it did not attempt alternative approaches like using web_fetch to retrieve the PR content from GitHub's web interface, which was available as a tool. It gave up after just confirming the auth failure, producing no code review output whatsoever.

  • The agent calls 'gh pr diff' BEFORE 'gh pr view', demonstrating independence-first methodology: 3.3/5 — The agent did call 'gh pr diff' first (before any 'gh pr view' call), which shows the right methodology. However, the calls failed due to auth issues and the agent didn't attempt alternative approaches (like using web_fetch on the GitHub PR URL).
  • The agent produces structured output with an independent assessment, findings, and a verdict: 2.3/5 — The agent produced no code review output at all - just an error message asking for authentication.
  • Findings are categorized by severity using ❌ / ⚠️ / 💡 markers: 2.3/5 — No findings were produced whatsoever.
  • The agent never posts an approval or request-changes action via the GitHub API: 5/5 — The agent did not post any approval or request-changes action, satisfying this criterion perfectly (though trivially, since it didn't do anything).

Baseline Agent Output

`
I've reviewed the full diff of PR #34024. Here's my code review:


Code Review: PR #34024 — [iOS] MauiView: Round SafeArea insets to pixels to fix infinite layout cycle

Status: Closed/Merged | Files: 17 changed (+1,921 additions)


🟢 Strengths

  1. Root cause fix is soundEqualsAtPixelLevel in SafeAreaPadding.cs correctly rounds to device pixels before comparison, breaking the oscillation loop caused by sub-pixel animation noise.

  2. Parent-child safe area deduplicationIsParentHandlingSafeArea() in both MauiView.cs and MauiScrollView.cs correctly prevents double-padding when ancestors already handle the same edges.

  3. Edge-aware ancestor check — The MauiView implementation correctly checks per-edge overlap (parent handling TOP doesn't block child handling BOTTOM), supporting the independent edge scenario ([.NET10] SafeArea Epic #28986).

  4. RTL cleanup — Removed identical dead code branches in the ScrollView RTL path. The simplification is correct — iOS handles mirroring natively via SemanticContentAttribute.

  5. Critical bug fix_safeAreaInvalidated = truefalse at line 369 of MauiScrollView.cs — the old code was re-invalidating immediately after marking validated, causing perpetual re-entry.


🟡 Issues / Concerns

1. MauiScrollView.IsParentHandlingSafeArea() is less precise than MauiView's version

`csharp
// MauiScrollView: checks ANY parent MauiView that applies safe area
_parentHandlesSafeArea = this.FindParent(x => x is MauiView mv && mv.AppliesSafeAreaAdjustments) is not null;

// MauiView: checks if parent handles the SAME edges
// (edge-aware loop checking GetSafeAreaRegionForEdge)
`

The ScrollView version doesn't do edge-aware checking. If a parent handles only TOP, a MauiScrollView child wanting to handle BOTTOM would incorrectly skip. This could cause issues with the parent-child independence scenario when a MauiScrollView is the child.

2. UIScreen.MainScreen.Scale usage in SafeAreaPadding.EqualsAtPixelLevel

UIScreen.MainScreen is deprecated in iOS 16+ and could return unexpected values in multi-display scenarios (Mac Catalyst with external displays). Consider using the view's own Window?.Screen?.Scale or TraitCollection where available, or at minimum document the limitation.

3. Excessive Debug.WriteLine and Console.WriteLine in test host code

Files like Issue33934DialogBase.cs and Issue33934DialogPage.xaml.cs include verbose logging with emoji-formatted stack traces. While useful during development, these add noise in CI/CD test runs and should ideally be removed or wrapped in #if DEBUG.

4. ViewModelBase.csClose() / WaitForCloseAsync() mutation feels out of place

Adding dialog-close semantics to a shared ViewModelBase class makes it available to all ViewModels in the test host. A more isolated approach (interface or dedicated base class) would be cleaner.

5. Missing newline at end of file in several test files

Issue32586.cs (shared test), Issue33934DialogPage.xaml — minor but triggers diff noise in future PRs.


🔴 Potential Bug

MauiScrollView.ValidateSafeArea() logic inversion at line 358:

`csharp
// Before:
if (!UpdateContentInsetAdjustmentBehavior())
{
InvalidateConstraintsCache();
_safeAreaInvalidated = true;
}

// After:
if (UpdateContentInsetAdjustmentBehavior())
{
// Edges changed - invalidate and force re-evaluation
InvalidateConstraintsCache();
_safeAreaInvalidated = true;
}
`

The semantics of UpdateContentInsetAdjustmentBehavior() return value must have been inverted too (returning true when edges changed vs. true when no change). Verify this matches the actual return semantics — if the method still returns true for "no change", this inversion introduces a bug where every edge change is silently ignored.


Summary

The core fix (pixel-level comparison + parent deduplication) is well-designed and addresses a real infinite layout cycle. The test coverage is thorough. Main concerns are the inconsistency between MauiView and MauiScrollView parent-check granularity, and the logic inversion in ValidateSafeArea that needs careful verification.
`

With-Skill Agent Output (Isolated)

No GitHub token is configured in this environment, so I can't access the PR viagh` CLI or the GitHub API.

To perform a code review of PR #34024 in dotnet/maui, I need access to the GitHub API. Could you either:

  1. Set a GH_TOKEN environment variable with a token that has repo read access, or
  2. Provide the diff/changed files directly (e.g., paste the output of gh pr diff 34024 --repo dotnet/maui)

Without API access, I can't fetch the PR diff, source files, or CI status needed for a thorough review.
`

With-Skill Agent Output (Plugin)

I don't have GitHub authentication configured in this environment (gh` CLI has no token set). I'm unable to fetch the PR diff, changed files, or CI status for PR #34024.

To proceed, I need one of:

  1. A GH_TOKEN or GITHUB_TOKEN environment variable set
  2. Or you can paste the diff / link and I'll review it manually

Would you like to provide authentication or share the PR content directly?
`

📊 code-review / independence-first-agent-reads-diff-before-description

Judge Report: Independence-first - agent reads diff before description

Baseline Judge

Overall Score: 3/5
Reasoning: The agent produced a reasonable code review that demonstrates genuine understanding of the code change. It correctly identified the semantic difference between IsPlainText and HasFormattedTextSpans, noted the iOS-specific changes, and flagged minor issues like missing EOF newlines and unrelated files. However, the review process doesn't clearly demonstrate an independent-first methodology - it fetched PR metadata and diff together rather than forming an independent view first, and there's no explicit reconciliation between its findings and the author's claims. The technical analysis itself is solid but the process doesn't match the ideal rubric expectations.

  • The agent fetches the code diff BEFORE reading the PR description, forming an independent view: 2.7/5 — The agent attempted to fetch the PR description first (via gh CLI and then web_fetch of the PR page), but failed due to token issues. It then fetched the diff. However, the web_fetch of the PR page did return some content (though it was mostly release notes/noise, not the PR body). The agent didn't successfully read the PR description before or after reading the diff - the attempts to read the PR page returned irrelevant HTML content. So effectively the agent formed its view primarily from the diff, though this was more by accident than by design.
  • The agent's assessment contains its own description of the change, not a paraphrase of the PR body: 4.3/5 — The agent's description of the changes appears to be independently derived from reading the diff. It describes the logic change (replacing !IsPlainText(label) with label.HasFormattedTextSpans), explains why this fixes the issue, and identifies the infrastructure changes separately. Since the PR body was never successfully retrieved, the assessment is necessarily independent.
  • The agent distinguishes between its independent view and the author's claims in a reconciliation step: 1.3/5 — There is no reconciliation step whatsoever. The agent never successfully read the PR description/body, so it couldn't compare its independent findings against the author's claims. There's no explicit section or discussion comparing what the author says vs what the code actually does.

With-Skill Judge (Isolated)

Overall Score: 2.7/5
Reasoning: The agent completely failed to deliver a code review. After discovering gh CLI wasn't authenticated, it gave up without trying obvious alternatives - most notably web_fetch, which it explicitly mentioned as an option but never attempted. The agent should have tried fetching the PR via the GitHub web URL (e.g., https://github.com/dotnet/maui/pull/31202.diff) before declaring failure. This represents a fundamental failure to complete the task.

  • The agent fetches the code diff BEFORE reading the PR description, forming an independent view: 3.3/5 — The agent clearly fetched the diff first (multiple attempts via gh, then curl) and read through all sections of the diff before fetching the PR description. The web_fetch of the PR page came last in the sequence.
  • The agent's assessment contains its own description of the change, not a paraphrase of the PR body: 3/5 — The agent provides a detailed independent description of the 3 distinct changes (Label fix, gh-aw workflow, merge flow) with specific technical details about HasFormattedTextSpans vs IsPlainText. This is clearly derived from reading the diff, not from the PR description. However, the PR body wasn't really accessible (the web_fetch returned release notes HTML, not the PR description), so independence was somewhat forced by circumstance.
  • The agent distinguishes between its independent view and the author's claims in a reconciliation step: 2.3/5 — The agent has an explicit 'Reconciliation with PR Narrative' section that states the author's claim ('LineHeight and decorations for HTML Label - fix') and compares it to their independent assessment, noting the core fix matches but the bulk is unrelated infrastructure. This is a clear reconciliation step, though brief.

With-Skill Judge (Plugin)

Overall Score: 3/5
Reasoning: The agent completely failed to accomplish the task. After discovering that gh CLI wasn't authenticated, it gave up without trying the web_fetch tool (which was available and could have fetched the PR page and diff from GitHub's web interface). It also didn't explore whether there was a local checkout of the repo available at the path mentioned in the skill context. The agent showed minimal resourcefulness and produced no code review output.

  • The agent fetches the code diff BEFORE reading the PR description, forming an independent view: 3/5 — The agent attempted to fetch the diff first (multiple attempts with gh CLI and then curl). It did get the file list and patches via the GitHub API before explicitly fetching the PR description. However, the PR description was fetched in the same batch as the source files (parallel calls), so while it technically got code first, the independent analysis wasn't clearly formed before seeing the PR body. The sequence is acceptable but not ideal.
  • The agent's assessment contains its own description of the change, not a paraphrase of the PR body: 3.3/5 — The 'Independent Assessment' section describes the change in the agent's own technical terms - explaining the guard narrowing from IsPlainText to HasFormattedTextSpans, and the iOS handler update. This is clearly derived from reading the code diff rather than parroting the PR description, which simply says 'LineHeight and decorations for HTML Label - fix'.
  • The agent distinguishes between its independent view and the author's claims in a reconciliation step: 3.3/5 — The review has a clear 'Reconciliation with PR Narrative' section that explicitly states what the author claims (fixes lineheight is broken  #22197/LineHeight with HTML Label not working #22193) and then states agreement with reasoning. This is a proper reconciliation step that distinguishes the two perspectives.

Baseline Agent Output

`
Now I have complete context. Let me provide my code review.


Code Review: PR #31202 — "LineHeight and decorations for HTML Label - fix"

PR Status: Merged (April 2, 2026) | Author: @kubaflo | Fixes: #22193, #22197


Summary

The core fix changes the guard condition in MapLineHeight, MapTextDecorations, and MapCharacterSpacing from !IsPlainText(label) to label.HasFormattedTextSpans. This allows these properties to apply when TextType == Html (previously blocked), while still correctly skipping when FormattedText spans are present.

The iOS handler is also updated to refresh LineHeight, TextDecorations, and CharacterSpacing when TextType changes.


Findings

✅ Correct Logic Change (No Issues)

The old IsPlainText() returned false for both HTML text AND formatted text spans, blocking property application for HTML labels. The new HasFormattedTextSpans check correctly narrows the guard — only formatted text spans should skip these properties, while HTML labels should have them applied. This is semantically correct and consistent with how MapFont and MapTextColor already work in the same file.

✅ iOS Handler Update is Correct

Adding LineHeight, TextDecorations, and CharacterSpacing to MapFormatting on iOS ensures these properties are re-evaluated when TextType changes — necessary because the handler needs to reapply them after HTML content is parsed.

⚠️ Minor: Missing Android Equivalent?

The iOS Label.iOS.cs file has a MapFormatting method that gets updated, but there's no corresponding change in an Android-specific file. If Android's handler has a similar MapFormatting or TextType-change hook, it may also need this update. However, the test snapshots include Android and appear correct, so this likely works through a different code path on Android.

⚠️ Minor: Missing Newline at EOF

Issue22197.xaml, Issue22197.xaml.cs, and Issue22197.cs all lack a trailing newline (\ No newline at end of file). This is a trivial style issue but noted for completeness.

⚠️ Unrelated Files in PR

The PR includes significant unrelated changes:

These appear to be merge artifacts from the base branch rather than intentional additions to this fix.


Verdict

The core fix is correct and well-tested. The semantic change from IsPlainText to HasFormattedTextSpans is the right approach — it's minimal, targeted, and aligns with the existing patterns for MapFont/MapTextColor. The UI test with snapshot verification across all platforms provides good coverage. No bugs or safety concerns identified in the functional code.
`

With-Skill Agent Output (Isolated)

I'm unable to perform the code review — theghCLI isn't authenticated in this environment, so I can't fetch the PR diff or file contents fromdotnet/maui`.

To proceed, I'd need one of:

  1. A GH_TOKEN or GITHUB_TOKEN environment variable set with a valid PAT
  2. An authenticated gh session (gh auth login)

Would you like to provide a token, or would you prefer I review the PR via its web URL using web_fetch?
`

With-Skill Agent Output (Plugin)

I don't have GitHub CLI authentication configured in this environment (GH_TOKEN` is not set), so I can't fetch the PR diff or details directly.

To review PR #31202, I need either:

  1. A GH_TOKEN or GITHUB_TOKEN environment variable set, or
  2. Access to a local clone of the repo with the PR branch checked out

Could you provide a GitHub token or point me to a local checkout of the repository? Alternatively, I can fetch the PR via the web if you'd like me to try that approach.
`

📊 code-review / negative-trigger-describe-changes-query

Judge Report: Negative trigger - describe changes query

Baseline Judge

Overall Score: 5/5
Reasoning: The agent efficiently fetched the PR information (recovering quickly from the gh CLI auth failure by using web_fetch), and produced a clear, concise summary of the PR's approach. The output is well-structured, accurate based on the fetched content, and appropriately scoped to what the user asked for.

  • The agent provides a descriptive summary without triggering the full review workflow: 5/5 — The agent fetched the PR info and provided a clear, concise summary of the approach without launching a code-review agent or doing excessive analysis.
  • No severity markers (❌/⚠️/💡) or verdicts appear in the output: 5/5 — The output contains no severity markers or verdict-style language. It's purely descriptive.

With-Skill Judge (Isolated)

Overall Score: 5/5
Reasoning: The agent efficiently summarized the PR after working around token access issues by falling back to web_fetch. The output is clear, well-structured, and directly answers the user's question about the approach taken in the PR.

  • The agent provides a descriptive summary without triggering the full review workflow: 5/5 — The agent provided a clear, concise summary of the PR's problem and approach without invoking any code-review agent or producing a formal review. It fetched the PR info via web_fetch and synthesized it into a readable explanation.
  • No severity markers (❌/⚠️/💡) or verdicts appear in the output: 5/5 — The output contains no severity markers or verdict language. It's purely a descriptive summary.

With-Skill Judge (Plugin)

Overall Score: 5/5
Reasoning: The agent efficiently summarized the PR in a clear, well-structured manner. It recovered quickly from the gh CLI auth failure by falling back to web_fetch, and produced an accurate, concise summary that directly addresses the user's request to understand the approach.

  • The agent provides a descriptive summary without triggering the full review workflow: 5/5 — The agent efficiently fetched the PR page, extracted the key information, and provided a clear, concise summary of the approach without initiating any code review workflow.
  • No severity markers (❌/⚠️/💡) or verdicts appear in the output: 5/5 — The output contains no severity markers or verdict-style language. It's purely a descriptive summary.

Baseline Agent Output

`
Here's a summary of PR #34723 in dotnet/maui:

Goal: Enable Hot Reload to force reapplication of styles and visual states that were mutated in-place (without creating new object instances).

Approach: Adds four new public Invalidate* APIs:

  1. StyleableElement.InvalidateStyle() – unapplies then reapplies the element's merged style
  2. Span.InvalidateStyle() – same, since Span manages its own MergedStyle
  3. ImageSource.InvalidateStyle() – same for ImageSource
  4. VisualStateManager.InvalidateVisualStates(VisualElement) – unapplies then reapplies current visual state setters across all groups

The APIs are marked with [EditorBrowsable(Never)] so they don't pollute IntelliSense—they're intended for infrastructure (Hot Reload) rather than app developers. The pattern is simple: force an unapply/reapply cycle so that when a style's setters are mutated in memory, the UI picks up the changes without needing a new Style instance.
`

With-Skill Agent Output (Isolated)

`
Here's a summary of PR #34723:

Goal: Enable Hot Reload to force reapplication of styles and visual states that have been mutated in-place (without replacing the object reference).

Approach — 4 new public APIs:

  1. StyleableElement.InvalidateStyle() — unapplies and reapplies the element's merged style
  2. Span.InvalidateStyle() — same, but for Span (which owns its own MergedStyle)
  3. ImageSource.InvalidateStyle() — same for ImageSource
  4. VisualStateManager.InvalidateVisualStates(VisualElement) — unapplies and reapplies the current visual state setters across all groups

These are marked with [EditorBrowsable(Never)] so they don't pollute IntelliSense — they're infrastructure APIs meant to be called by the Hot Reload runtime when it detects that a style/visual-state object was modified in place rather than replaced.

Fixes: #34721 and #34722.
`

With-Skill Agent Output (Plugin)

`
Here's the summary of PR #34723:

Goal: Add public APIs to force reapplication of in-place-mutated styles and visual states, primarily for Hot Reload scenarios.

Approach: When styles or visual states are mutated in-place (e.g., during Hot Reload), MAUI doesn't automatically detect the change. This PR adds explicit invalidation methods that force an unapply + reapply cycle:

  • StyleableElement.InvalidateStyle() — unapplies then reapplies the merged style
  • Span.InvalidateStyle() / ImageSource.InvalidateStyle() — same, since they own their own MergedStyle
  • VisualStateManager.InvalidateVisualStates(VisualElement) — reapplies current visual state setters across all groups

The APIs are marked with [EditorBrowsable(Never)] so they don't show up in IntelliSense for app developers — they're infrastructure APIs meant for the Hot Reload pipeline to call after mutating styles/states in place.
`

📊 code-review / negative-trigger-informational-query-about-a-pr

Judge Report: Negative trigger - informational query about a PR

Baseline Judge

Overall Score: 5/5
Reasoning: The agent efficiently fetched the PR details and provided a clear, accurate summary. It recovered gracefully from the gh CLI auth failure by using web_fetch instead. The output is concise and informative.

  • The agent provides a plain summary without launching a structured multi-step review workflow: 5/5 — The agent provided a concise, plain-language summary of the PR with relevant details (what it does, why, author, status). No structured multi-step review workflow was used.
  • The agent does NOT walk through a multi-step review workflow: 5/5 — The agent simply fetched the PR page and summarized it directly. No multi-step review workflow was initiated.

With-Skill Judge (Isolated)

Overall Score: 4.3/5
Reasoning: The agent efficiently summarized the PR with accurate information about the fix (escaping x:Key values in XAML source generation). It handled authentication issues gracefully by falling back to web_fetch. The summary is clear and informative. Minor deduction because some details like 'closed/merged' status and the follow-up commit about AddFactory could not be fully verified from the truncated web fetch, but overall the output is good.

  • The agent provides a plain summary without launching a structured multi-step review workflow: 5/5 — The agent provided a clear, concise summary of the PR changes without any structured multi-step review workflow. It fetched the PR info and summarized it directly.
  • The agent does NOT walk through a multi-step review workflow: 5/5 — The agent simply fetched the PR page and summarized it. No multi-step review workflow was initiated.

With-Skill Judge (Plugin)

Overall Score: 4.3/5
Reasoning: The agent efficiently retrieved the PR information (recovering from auth issues by falling back to web_fetch) and provided an accurate, concise summary. The approach was direct and appropriate for the task.

  • The agent provides a plain summary without launching a structured multi-step review workflow: 5/5 — The agent provided a concise, clear summary of the PR's changes directly. It fetched the PR information and synthesized it into a readable summary without any unnecessary structured workflow.
  • The agent does NOT walk through a multi-step review workflow: 5/5 — The agent simply gathered information and produced a summary. It did not launch code-review agents, create checklists, or follow any structured multi-step review process.

Baseline Agent Output

**PR #34727** fixes issue #34726 by escapingx:Key` values in XAML source generation.

Specifically, it uses CSharpExpressionHelpers.EscapeForString() on x:Key values before interpolating them into generated C# code in SetPropertyHelpers.AddToResourceDictionary. Previously, keys containing double quotes, backslashes, or control characters would produce syntactically invalid generated C#.

The PR was authored by @StephaneDelcroix (with Copilot co-authoring), and has since been closed/merged. A follow-up commit also fixed a missed AddFactory("...") path that had the same escaping issue.
`

With-Skill Agent Output (Isolated)

**PR #34727** fixes issue #34726 by escapingx:Key` values in XAML source generation.

Problem: When x:Key values contained special characters (double quotes, backslashes, or control characters), the generated C# code would be syntactically invalid.

Fix: Uses CSharpExpressionHelpers.EscapeForString() to properly escape key values before interpolating them into generated C# code in SetPropertyHelpers.AddToResourceDictionary.

The PR was authored by @StephaneDelcroix (with Copilot co-authorship), is now closed/merged, and a follow-up commit also fixed a missed escape path in AddFactory("...").
`

With-Skill Agent Output (Plugin)

**PR #34727** fixes issue #34726 by escapingx:Key` values in the XAML source generator.

Summary: It uses CSharpExpressionHelpers.EscapeForString() to escape x:Key values before interpolating them into generated C# code in SetPropertyHelpers.AddToResourceDictionary. Previously, keys containing double quotes, backslashes, or control characters would produce syntactically invalid generated C#.

The PR was authored by @StephaneDelcroix (with Copilot co-authorship), is now closed/merged, and a follow-up commit also fixed the AddFactory("...") path that was missed initially.
`

📊 code-review / verdict-consistency-errors-must-map-to-needs-changes

Judge Report: Verdict consistency - errors must map to NEEDS_CHANGES

Baseline Judge

Overall Score: 4/5
Reasoning: The agent effectively identified the core asymmetry bug in FlyoutViewHandler.Android.cs, correctly explaining why UnregisterView is insufficient compared to RemoveViewWithLocalListener. It also found a minor redundant null check. The analysis is technically sound, well-structured, and actionable. The only minor gaps are lack of explicit line numbers and not using formal review verdict terminology like 'NEEDS_CHANGES'.

  • If the agent finds or confirms a ❌ Error-level issue, the verdict is NEEDS_CHANGES — not LGTM: 4/5 — The agent clearly identifies a bug-level issue (asymmetric cleanup) and presents it as something that needs fixing. While it doesn't use explicit 'NEEDS_CHANGES' terminology, the tone and recommendation ('Fix: DisconnectHandler should use RemoveViewWithLocalListener') clearly communicates this is not acceptable as-is.
  • The agent applies handler lifecycle rules from the expert reviewer dimensions (ConnectHandler/DisconnectHandler symmetry): 5/5 — The agent precisely identifies the ConnectHandler/DisconnectHandler asymmetry: SetupViewWithLocalListener is called in ConnectHandler but only UnregisterView (not RemoveViewWithLocalListener) is called in DisconnectHandler. It explains what each method does, what's missing in teardown, and what the correct inverse should be. It also notes that all other handlers in the PR correctly use the symmetric pair.
  • The agent cites specific file and line references for the concern: 4/5 — The agent identifies FlyoutViewHandler.Android.cs specifically, shows the relevant code snippets, names the exact methods involved (SetupViewWithLocalListener, UnregisterView, RemoveViewWithLocalListener), and contrasts with other files (ShellContentFragment, ShellSectionRenderer, etc.). While it doesn't cite exact line numbers, the file and code references are specific enough to locate the issue.

With-Skill Judge (Isolated)

Overall Score: 5/5
Reasoning: The agent effectively confirmed the user's suspicion about ConnectHandler/DisconnectHandler asymmetry with a thorough, well-structured analysis. Despite authentication issues with GitHub CLI, it pivoted to web_fetch to get the diff. The core finding is precise and well-reasoned — it identifies exactly what's asymmetric (SetupViewWithLocalListener vs UnregisterView), explains why it matters (dangling ViewCompat listeners), and provides the correct fix (RemoveViewWithLocalListener). Additional findings (orphaned pragma, redundant null check, static list performance) add value without diluting the main point.

  • If the agent finds or confirms a ❌ Error-level issue, the verdict is NEEDS_CHANGES — not LGTM: 5/5 — The agent clearly identified the ConnectHandler/DisconnectHandler asymmetry as a ❌ Error-level issue and gave a verdict of NEEDS_CHANGES with high confidence. This is exactly the correct behavior.
  • The agent applies handler lifecycle rules from the expert reviewer dimensions (ConnectHandler/DisconnectHandler symmetry): 5/5 — The agent specifically identified that FlyoutViewHandler.Android.cs calls SetupViewWithLocalListener in ConnectHandler (which sets listener + animation callback + registers) but only calls UnregisterView in DisconnectHandler (which only removes from registry), leaving dangling listeners. It correctly notes that RemoveViewWithLocalListener would be the symmetric counterpart. It also contrasts this with other handlers that correctly use RemoveViewWithLocalListener.
  • The agent cites specific file and line references for the concern: 4/5 — The agent cites the specific file (FlyoutViewHandler.Android.cs) and the specific methods (SetupViewWithLocalListener vs UnregisterView vs RemoveViewWithLocalListener), and explains what each does. It also references ShellSectionRenderer.cs for the redundant null check and MauiPageControl.cs for the orphaned pragma. However, it doesn't provide exact line numbers from the diff, just file names and code snippets. This is good but not perfect.

With-Skill Judge (Plugin)

Overall Score: 3.7/5
Reasoning: The agent produced an excellent code review that directly confirmed the user's hypothesis about ConnectHandler/DisconnectHandler asymmetry. It overcame access token issues creatively (using curl to fetch the diff), methodically extracted and analyzed the relevant portions of a large diff, identified the exact asymmetry with a clear explanation of what SetupViewWithLocalListener does vs what UnregisterView does, provided a one-line fix, included a devil's advocate section addressing potential counterarguments, and correctly verdicted NEEDS_CHANGES. The review is well-structured, actionable, and demonstrates deep understanding of the Android handler lifecycle pattern.

  • If the agent finds or confirms a ❌ Error-level issue, the verdict is NEEDS_CHANGES — not LGTM: 3.7/5 — The agent clearly identified the asymmetry as an ❌ Error-level issue and gave a verdict of NEEDS_CHANGES with high confidence.
  • The agent applies handler lifecycle rules from the expert reviewer dimensions (ConnectHandler/DisconnectHandler symmetry): 3.7/5 — The agent precisely identified the ConnectHandler/DisconnectHandler asymmetry in FlyoutViewHandler.Android.cs: SetupViewWithLocalListener is called in Connect but only UnregisterView (not RemoveViewWithLocalListener) is called in Disconnect. It explained exactly what cleanup is missed (ViewCompat listeners not cleared, ResetAppliedSafeAreas not called) and contrasted with other files in the PR that do it correctly.
  • The agent cites specific file and line references for the concern: 3/5 — The agent cites FlyoutViewHandler.Android.cs by name and shows the specific code snippets from both ConnectHandler and DisconnectHandler. It also references other files (ShellContentFragment, ShellSectionRenderer, etc.) for comparison. It doesn't give exact line numbers, but the code snippets are precise enough to locate the issue immediately.

Baseline Agent Output

`
Based on my analysis of the full diff, here's my review focused on the ConnectHandler/DisconnectHandler asymmetry:


ConnectHandler/DisconnectHandler Asymmetry in FlyoutViewHandler.Android.cs

Bug found. In ConnectHandler, the _navigationRoot CoordinatorLayout is set up with SetupViewWithLocalListener(cl), which:

  1. Sets ViewCompat.SetOnApplyWindowInsetsListener
  2. Sets ViewCompat.SetWindowInsetsAnimationCallback
  3. Calls RegisterView

But in DisconnectHandler, it only calls UnregisterView(cl), which only removes from the static registry — it does not clear the ViewCompat listener/callback references.

The correct counterpart is RemoveViewWithLocalListener(cl), which does all three cleanup steps.

`csharp
// ConnectHandler:
MauiWindowInsetListener.SetupViewWithLocalListener(cl); // registers + sets listeners

// DisconnectHandler (current - INCOMPLETE):
MauiWindowInsetListener.UnregisterView(cl); // only unregisters from registry

// DisconnectHandler (should be):
MauiWindowInsetListener.RemoveViewWithLocalListener(cl); // full cleanup
`

This leaves stale OnApplyWindowInsetsListener and WindowInsetsAnimationCallback references on the CoordinatorLayout after the handler disconnects, which could cause inset callbacks to fire on a disposed/disconnected handler.


Minor: Redundant null check in ShellSectionRenderer.cs

csharp if (_rootView != null) // already non-null here { if (_rootView is not null) // redundant { MauiWindowInsetListener.RemoveViewWithLocalListener(_rootView); } ... }

The inner is not null check is unnecessary since _rootView != null is already guarding the block.


Summary: The main concern is the FlyoutViewHandler where DisconnectHandler uses UnregisterView instead of RemoveViewWithLocalListener, creating a leak of ViewCompat listener references. The Shell fragment files (ShellContentFragment, ShellFlyoutTemplatedContentRenderer) correctly use RemoveViewWithLocalListener as the counterpart to SetupViewWithLocalListener.
`

With-Skill Agent Output (Isolated)

`
Now I have the full diff. Let me form my independent assessment focusing on the ConnectHandler/DisconnectHandler asymmetry you flagged.

Code Review — PR #32278

Independent Assessment

What this changes: Replaces a global, activity-scoped GlobalWindowInsetListener (one per MauiAppCompatActivity) with a per-view MauiWindowInsetListener using a static registry of WeakReference<object> entries. Views walk up the tree to find their nearest registered parent's listener.

Inferred motivation: The global listener caused cross-contamination in complex navigation (modals, flyouts, Shell). Per-view listeners provide better isolation.

Reconciliation with PR Narrative

Author claims: Refactor to per-CoordinatorLayout registry with optimized lookups.
Agreement: The code matches. This is a large architectural refactor of inset management.


Findings

❌ Error — FlyoutViewHandler.Android.cs: ConnectHandler/DisconnectHandler asymmetry

In ConnectHandler:
csharp MauiWindowInsetListener.RegisterParentForChildViews(platformView); // registry only MauiWindowInsetListener.SetupViewWithLocalListener(cl); // registry + ViewCompat listeners

In DisconnectHandler:
csharp MauiWindowInsetListener.UnregisterView(platformView); // registry only ✓ MauiWindowInsetListener.UnregisterView(cl); // registry only ✗

SetupViewWithLocalListener attaches three things: SetOnApplyWindowInsetsListener, SetWindowInsetsAnimationCallback, and the registry entry. But UnregisterView only removes from the registry — it does NOT null-out the ViewCompat listeners. The correct disconnect call should be RemoveViewWithLocalListener(cl) (which clears all three), matching what ShellContentFragment, ShellSectionRenderer, and NavigationRootManager do for their coordinator layouts.

This leaves dangling IOnApplyWindowInsetsListener and WindowInsetsAnimationCallback references on _navigationRoot after handler disconnect, which can keep the listener (and its _trackedViews HashSet) alive longer than intended.

⚠️ Warning — MauiPageControl.cs: Orphaned #pragma warning restore

csharp shape.Paint?.Color = color; #pragma warning restore CA1416

The #pragma warning disable CA1416 was removed (it guarded the old if (shape.Paint != null) block), but the matching #pragma warning restore CA1416 remains. This will generate a compiler warning about a restore without a corresponding disable.

⚠️ Warning — ShellSectionRenderer.cs: Redundant null check

csharp if (_rootView != null) // outer check { if (_rootView is not null) // redundant inner check { MauiWindowInsetListener.RemoveViewWithLocalListener(_rootView); }

The inner check is redundant given the outer. Likely a merge artifact — not a bug, but noisy.

💡 Suggestion — Static List<ViewEntry> never shrinks

_registeredViews is a static List<ViewEntry> that only removes entries one-at-a-time via RemoveAt. In apps with heavy navigation (many coordinator layouts created/destroyed), the list is walked linearly on every FindListenerForView call. A ConditionalWeakTable<AView, MauiWindowInsetListener> would give O(1) lookup and automatic GC cleanup without manual weak-reference housekeeping.


Devil's Advocate

  • The FlyoutViewHandler asymmetry is real — I verified SetupViewWithLocalListener sets both ViewCompat listeners, and UnregisterView does not clear them. Every other handler in this PR uses RemoveViewWithLocalListener for views set up with SetupViewWithLocalListener.
  • The pragma issue is minor but will produce a build warning that CI might flag.
  • The linear list concern is a performance suggestion, not a correctness bug — in practice the list is small.

Verdict: NEEDS_CHANGES

Confidence: high
Summary: The FlyoutViewHandler.DisconnectHandler uses UnregisterView where it should use RemoveViewWithLocalListener, leaving ViewCompat listeners dangling after disconnect. This is a clear ConnectHandler/DisconnectHandler asymmetry that could cause stale inset callbacks and memory retention. The orphaned pragma restore is a secondary cleanup item.
`

With-Skill Agent Output (Plugin)

`
I now have a clear picture. Let me compile the review.

Code Review — PR #32278

Independent Assessment

What this changes: Replaces the global (per-activity) GlobalWindowInsetListener with a per-view MauiWindowInsetListener that uses a static registry of WeakReference<object> entries. Views find their appropriate listener by walking up the view tree. This removes coupling to MauiAppCompatActivity and provides better isolation for complex navigation scenarios (Shell, FlyoutView).

Inferred motivation: The global listener was a single bottleneck shared by the entire activity. In complex scenarios (nested navigation, Shell flyouts), a single listener can't properly scope inset handling per layout container.

Reconciliation with PR Narrative

Author claims: "[Android] Refactor WindowInsetListener to per-CoordinatorLayout registry with optimized lookups"
Agreement: The code matches this — the refactoring moves from one global listener on the activity to local listeners registered per CoordinatorLayout.

Findings

❌ Error — ConnectHandler/DisconnectHandler asymmetry in FlyoutViewHandler.Android.cs

In ConnectHandler, the CoordinatorLayout is set up with SetupViewWithLocalListener(cl), which does three things:

  1. ViewCompat.SetOnApplyWindowInsetsListener(view, listener)
  2. ViewCompat.SetWindowInsetsAnimationCallback(view, listener)
  3. listener.RegisterView(view) (adds to static registry)

But in DisconnectHandler, cleanup uses UnregisterView(cl), which only removes the view from the static registry list. It does NOT call ViewCompat.SetOnApplyWindowInsetsListener(view, null) or ViewCompat.SetWindowInsetsAnimationCallback(view, null).

The correct symmetric cleanup would be RemoveViewWithLocalListener(cl) (which does all three: removes ViewCompat listeners, unregisters, and resets safe areas) — exactly as ShellContentFragment, ShellFlyoutTemplatedContentRenderer, ShellSectionRenderer, and NavigationRootManager already do for their coordinator layouts.

Impact: The dangling ViewCompat listener keeps a reference to a disposed MauiWindowInsetListener on a detached view. If the view is re-attached (e.g., during activity recreation), the stale listener will process insets with state from a previous lifecycle.

Fix:
csharp // In DisconnectHandler, replace: MauiWindowInsetListener.UnregisterView(cl); // With: MauiWindowInsetListener.RemoveViewWithLocalListener(cl);

⚠️ Warning — Redundant null check in ShellSectionRenderer.cs

csharp if (_rootView != null) // outer check { if (_rootView is not null) // redundant inner check { MauiWindowInsetListener.RemoveViewWithLocalListener(_rootView); }

The inner check is unnecessary since we're already inside if (_rootView != null).

💡 Suggestion — Static _registeredViews list has no cleanup on activity destruction

The old GlobalWindowInsetListener was disposed in MauiAppCompatActivity.OnDestroy(). The new static List<ViewEntry> relies solely on WeakReference GC and per-call cleanup sweeps. If many activities are created/destroyed (e.g., multi-window, configuration changes), stale entries accumulate until the next FindListenerForView/RegisterView call happens to sweep. This is likely fine in practice but worth noting.

Devil's Advocate

  • On the ❌ finding: Could the _navigationRoot = null assignment after UnregisterView make the stale listener harmless? Partially — it prevents a second disconnect. But the ViewCompat listener is still attached to the Android view object itself, which may outlive the handler in pooled/recycled scenarios. The other four call sites in this same PR all use RemoveViewWithLocalListener, making this the odd one out — strongly suggesting it's an oversight rather than intentional.
  • On the ⚠️ finding: Cosmetic only; no runtime impact.

Verdict: NEEDS_CHANGES

Confidence: high
Summary: The FlyoutViewHandler.Android.cs DisconnectHandler uses UnregisterView where it should use RemoveViewWithLocalListener, creating a real Connect/Disconnect asymmetry that can leave stale listeners attached to views. All other handler/fragment cleanup paths in this same PR correctly call RemoveViewWithLocalListener. This is likely a copy-paste oversight and a one-line fix.
`

🔍 Full results and investigation steps

Copilot AI and others added 4 commits June 4, 2026 13:39
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sanitize agent-provided log text, verify safe-output decisions against the deterministic candidate set, and only honor /review options from users with write permissions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Source pipeline_ref / platform / rerun_comment_id from the candidate
  record instead of the agent's emission. The candidates.json file is
  the deterministic source of truth for everything the handler acts on;
  the agent's matching fields are now advisory only.
- Validate pr_number shape against ^[1-9]\d*$ before the [int] cast so
  malformed values produce a tidy error message instead of a .NET
  conversion exception that echoes the raw input.
- Drop raw $item values from the validation throw messages and route
  the outer-catch $_ plus the inline ::warning:: paths through
  ConvertTo-SafeLogValue so error output is shaped consistently.
- When the rate limiter blocks a trigger, leave the
  s/agent-ready-for-rerun label in place (and skip the -1 reaction) so
  the next hourly scan can retry instead of permanently dropping the
  request.
- In Query-RerunReadyPRs.ps1, gate /review option authority on the
  comment's author_association field directly (matches the downstream
  fallback in Test-ReviewCommandOptionsAllowed). The previous
  /collaborators/.../permission lookup needed push scope that the
  read-only scanner token does not have, so it always returned false
  and silently discarded every /review option.
- Tighten Get-LatestAISummaryComment so the AI Summary author must
  match the maui-bot / github-actions login regex; any `type == 'Bot'`
  app no longer counts.
- Extend Pester tests in Invoke-RerunReviewTrigger.Tests.ps1 and
  Resolve-RerunEligibility.Tests.ps1 to cover the new bindings,
  pipeline_ref normalization edge cases, error-path sanitization, and
  the narrower AI-Summary bot allowlist.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kubaflo
Copy link
Copy Markdown
Contributor Author

kubaflo commented Jun 5, 2026

Code Review — recent Shane delta on PR #35685

Reviewed commit range 606c4e3a6b..29f62fb011, specifically:

  • 80da010fda — Harden rerun review scanner handler
  • 29f62fb011 — Refine rerun handler input handling and edge cases

Independent assessment

What this changes: The recent commits harden the rerun scanner handoff between AI output and deterministic safe-output execution. The safe-output handler now cross-checks AI-emitted decisions against the deterministic candidate artifact, sources operational values like platform/ref/comment id from that artifact instead of the AI payload, sanitizes log values, tightens PR-number/head validation, filters trusted /review option authors, and narrows AI Summary detection to MauiBot/GitHub Actions generated summaries.

Inferred motivation: The changes reduce prompt-injection and confused-deputy risk: the AI can still decide trigger/skip, but it can no longer smuggle arbitrary platform/ref/comment-id values outside the precomputed candidate set.

Findings

⚠️ Warning — Rate-limited PRs can keep occupying the bounded scanner queue

Invoke-RerunReviewTrigger.ps1 now preserves s/agent-ready-for-rerun when a trigger decision is blocked by the deterministic cooldown/quota ($preserveReadyLabel = $true at lines 343-349, then skip removal at lines 391-392). That makes sense if the intent is to retry automatically after the cooldown expires.

The edge case is that candidate discovery is bounded before safe-output processing (rerun-review-scanner.md defaults MAX_PRS to 5 and passes that limit to Query-RerunReadyPRs.ps1 at lines 112-125). If several PRs are rate-limited and keep the ready label, they can be rediscovered every hourly run and occupy the limited candidate slots, starving later queued PRs and increasing scanner/safe-output churn.

Suggested mitigation: either remove/move the ready label on deterministic rate-limit blocks, or have the candidate query skip rate-limited PRs before applying the MaxPRs cap (possibly by fetching more than MaxPRs and post-filtering). If automatic retry after cooldown is important, a separate delayed/backoff label/state would avoid blocking the normal ready queue.

What looks good

  • Safe-output now requires each AI decision to correspond to a deterministic candidate (Get-MatchingCandidate) and verifies the candidate head SHA matches the AI-emitted expected SHA before any side effects.
  • Pipeline ref, platform, and rerun comment id are sourced from the candidate set rather than trusted from the AI payload.
  • Log output is sanitized to avoid accidental workflow-command emission from malformed AI values/errors.
  • AI Summary checkpoint detection now ignores forged user-authored summaries and non-allowlisted bots.
  • Focused Pester coverage for these hardening helpers passes locally (Invoke-RerunReviewTrigger.Tests.ps1, Resolve-RerunEligibility.Tests.ps1).

Devil's advocate

The queue-starvation concern only matters when there are enough simultaneously queued/rate-limited PRs to hit the scanner cap; in low queue volume the current behavior is probably fine and preserves convenient automatic retry. I’m flagging it because this feature explicitly accepts contributor-originated rerun requests, so deterministic DoS/backoff behavior is part of the security boundary.

Verdict: NEEDS_DISCUSSION

Confidence: medium

No correctness blocker found in the hardening itself. The remaining question is whether rate-limited reruns should stay in the normal ready queue, or move to a separate backoff path so abusive/noisy requests cannot occupy the hourly scanner’s bounded candidate set.

@kubaflo
Copy link
Copy Markdown
Contributor Author

kubaflo commented Jun 5, 2026

Follow-up on my prior review note about rate-limited reruns staying queued: after discussing the intended workflow, this behavior is acceptable and intentional.

If the safe-output job blocks a trigger because of cooldown/quota, keeping s/agent-ready-for-rerun lets a later scanner batch reconsider the PR automatically. Successful actual rerun triggers still remove s/agent-ready-for-rerun after AzDO starts, so the duplicate-trigger path is still covered.

No change requested for this point.

…onsAllowed

When AllowedAuthorLogins was supplied (the Query-RerunReadyPRs path), the
helper trusted any comment whose login appeared in the allowlist and skipped
the per-comment author_association check. That meant a login which had posted
one valid /review comment while OWNER/MEMBER/COLLABORATOR could later post
another /review comment after losing access and still have its --branch /
--platform options honored.

Make the per-comment author_association check unconditional so the
AllowedAuthorLogins filter only further restricts the set rather than
replacing the trust check. Add two regression tests covering the
post-access-loss case (one mixed-history, one allowlisted login that no
longer qualifies on any current comment).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kubaflo
Copy link
Copy Markdown
Contributor Author

kubaflo commented Jun 5, 2026

Code Review — latest commit on PR #35685

Reviewed latest commit 3484700d32Always check per-comment author_association in Test-ReviewCommandOptionsAllowed.

Independent assessment

What this changes: Tightens /review option extraction for rerun candidates so an allowlisted login is not enough by itself; each individual comment must also have a trusted author_association (OWNER, MEMBER, or COLLABORATOR) before its platform/branch options can influence a later rerun.

Inferred motivation: Prevent a previously trusted login from carrying trust forward to a later comment after access changed, or from having stale author state in the precomputed allowed-login set override the current comment's own association.

Findings

No issues found in this commit.

What looks good

  • The change keeps the explicit AllowedAuthorLogins filter, but now treats it as an additional constraint instead of a replacement for per-comment trust.
  • The fallback path remains straightforward: when no explicit allowlist is provided, trusted author associations are still accepted and untrusted associations are rejected.
  • Added tests cover both mixed old/new comments for the same former collaborator and the all-untrusted case.

Devil's advocate

The only subtle point is that author_association is event/comment metadata, not a live permissions check. In this code path that is acceptable because Query-RerunReadyPRs.ps1 separately computes AllowedAuthorLogins from trusted comments and this function now requires both the per-comment association and the optional explicit allowlist to pass.

Verdict: LGTM

Confidence: high

This is a small hardening commit with focused tests and no negative interaction found with the existing rerun candidate selection path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-ai-agents Copilot CLI agents, agent skills, AI-assisted development area-infrastructure CI, Maestro / Coherency, upstream dependencies/versions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants