Skip to content

Enhance MauiBot AI summary review output#35677

Open
kubaflo wants to merge 35 commits into
mainfrom
feature/enhanced-reviewer
Open

Enhance MauiBot AI summary review output#35677
kubaflo wants to merge 35 commits into
mainfrom
feature/enhanced-reviewer

Conversation

@kubaflo
Copy link
Copy Markdown
Contributor

@kubaflo kubaflo commented May 31, 2026

Note

Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!

Summary

  • Posts MauiBot AI Summary output as a pull request review with parsed APPROVE, REQUEST_CHANGES, or safe COMMENT fallback.
  • Uses the new AI Review Summary layout with segmented status chips, collapsed review sessions, and merged Future Action content.
  • Keeps PR finalization out of the automated review process; AI Summary updates no longer preserve or merge SECTION:PR-FINALIZE blocks.
  • Adds visible AI Summary guidance telling users to comment /review rerun after new comments or commits when they want a fresh review. The command implementation is intentionally split into a follow-up PR.
  • Hides stale MauiBot AI Summary / try-fix artifacts with GitHub minimization instead of deleting them, while preserving same-run try-fix and AI Summary reviews.
  • Updates the Copilot pipeline to pass review IDs and patch review bodies after deep UI tests.
  • Hardens gate setup/retry handling by committing squashed PR changes before verification, resetting the review branch before gate retries, and detecting BlazorWebView unit-test project paths.

Validation

  • Parsed changed PowerShell scripts with System.Management.Automation.Language.Parser.
  • Parsed .github/workflows/review-trigger.yml as YAML.
  • Invoke-Pester .github/scripts/Post-AISummaryComment.Tests.ps1,.github/scripts/Remove-StaleMauiBotComments.Tests.ps1 -CI
  • Dry-run AI Summary generation verified the rerun note, segmented chips, and collapsed review session layout.
  • Verified Detect-TestsInDiff.ps1 maps src/BlazorWebView/tests/MauiBlazorWebView.UnitTests/UriExtensions_Tests.cs to src/BlazorWebView/tests/MauiBlazorWebView.UnitTests/MauiBlazorWebView.UnitTests.csproj.

Copilot AI and others added 19 commits May 22, 2026 20:48
Split the monolithic 'Run PR Reviewer Agent' bash task into 4 sequential
tasks, each with exactly the env vars it needs:

  Task 1 (Setup):         GH_TOKEN only — branch checkout, PR merge
  Task 2 (Gate):          NO tokens — dotnet build/test, gate verification
  Task 3 (CopilotReview): COPILOT_GITHUB_TOKEN — expert review + try-fix
  Task 4 (Post):          GH_TOKEN only — comments, labels, summary

Review-PR.ps1 gains -Phase (Setup|Gate|CopilotReview|Post) and
-TrustedScriptsDir parameters so each pipeline task invokes a single
phase. Backward-compatible: omitting -Phase runs all steps sequentially.

Security improvements:
- persistCredentials: false (credentials no longer available to all tasks)
- Removed gh auth login step (GH_TOKEN used directly as env var)
- --secret-env-vars strips tokens from copilot subprocess environments
- Trusted scripts copied once in Setup, reused by all phases
- PRNumber type changed to 'number' for AzDO parameter validation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Persist regression risks, tests, and platform to files in Gate phase
- Restore regression data + detect script path in CopilotReview phase
- Fix stale RunReview references in comments (now RunGate/RunPost)
- Fix misleading RunPost step name comment in ci-copilot.yml

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Tier 3 AI refresh in CopilotReview phase emits detectedCategories
under step RunReview, but downstream RunDeepUITests was only reading
RunGate. Use coalesce() so AI-refreshed categories are preferred when
available, falling back to Gate-detected categories otherwise.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add TrustedScriptsDir null guard with local fallback in both
  CopilotReview and Post phase restoration blocks (prevents
  ParameterBindingException when running locally with -Phase)
- Add setup-complete sentinel verification before Gate/CopilotReview/Post
  phases to fail fast with clear error if Setup didn't complete

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The detection script (Detect-TestsInDiff.ps1) fetches PR metadata and labels
from the GitHub API. Without GH_TOKEN, these calls are unauthenticated and
subject to low rate limits. This adds the token for reliable API access.

The token is GH_COMMENT_TOKEN (same as Setup/Post phases). The security
boundary is preserved — only CopilotReview (Task 3) lacks GH_TOKEN to
prevent the Copilot agent from posting directly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three small hardening changes uncovered while auditing PR #35324:

1. RunDeepUITests checkout: add persistCredentials: false. This stage merges the PR head and runs PR-modified scripts (BuildAndRunHostApp.ps1, Invoke-UITestWithRetry.ps1) — without this, malicious PR code could read the GitHub App auth header from .git/config.

2. UpdateAISummaryComment checkout: add persistCredentials: false. Defense-in-depth — this stage runs with GH_COMMENT_TOKEN in env.

3. Setup task: chmod -R a-w on the trusted-github dir after copy, so the Copilot agent in Task 3 cannot tamper with the scripts that Task 4 will execute with GH_TOKEN.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses two [critical] security findings from MauiBot's 2026-05-24 review of
PR #35324 (#35324):

1. eng/scripts/detect-ui-test-categories.ps1 was being invoked from the
   PR-merged worktree ($RepoRoot/eng/scripts/...) during the Gate task,
   which has GH_TOKEN=$(GH_COMMENT_TOKEN) in env. A PR could replace that
   one file to exfiltrate the maui-bot posting token. Fix: copy eng/scripts
   into the trusted directory alongside .github/scripts and .github/skills,
   add $EngScriptsDir resolution in Review-PR.ps1, and route the two
   $detectScript invocations through it. Same root cause also applied to
   $uiTestRunner / $deviceTestRunner (now use $ScriptsDir and
   $SkillsDir instead of $RepoRoot/.github/...).

2. dotnet test, BuildAndRunHostApp.ps1, Run-DeviceTests.ps1, and
   verify-tests-fail.ps1 all execute PR-controlled code (MSBuild targets,
   source generators, analyzers, test code, host-app builds). Any of these
   could read $env:GH_TOKEN via <Exec EnvironmentVariables=...> in a
   .csproj or Directory.Build.targets and POST it. Fix: introduce
   Invoke-WithoutGhTokens helper that clears GH_TOKEN / GITHUB_TOKEN /
   COPILOT_GITHUB_TOKEN for the duration of a scriptblock, then restores
   them. Wrap every Gate-phase invocation of PR-controlled code.

Trusted metadata-fetch scripts (Detect-TestsInDiff, Find-RegressionRisks,
detect-ui-test-categories) still see GH_TOKEN -- they need it for `gh` CLI.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Distills the threat model + 8 prevention rules from PR #35324 (and the
MauiBot findings + T-Gro audit that surfaced them) into a single
.github/instructions file that auto-applies whenever anyone edits any
script/yaml/markdown belonging to the Copilot PR-review pipeline.

Rules covered:
  1. Per-task token scoping (AzDO env: block)
  2. persistCredentials: false on every checkout: self
  3. Trusted-script copy + chmod -R a-w before PR merge
  4. Strip GH_TOKEN/GITHUB_TOKEN/COPILOT_GITHUB_TOKEN from env before
     invoking PR-controlled code (dotnet test, MSBuild, host-app, etc.)
  5. Cross-phase signal files in Agent.TempDirectory, never working tree
  6. Strip ##vso[...] from PR-controlled stdout (with CR handling)
  7. gh-aw version pinning, .lock.yml regeneration, trusted .github/
     restore on workflow_dispatch
  8. No tokens via pipeline variables / log lines

Includes a code-review checklist and grep anti-pattern scans so future
contributors (human or agent) editing any of ~25 files in this surface
get the security context automatically via VS Code Copilot applyTo.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Trim 242 -> ~60 lines: drop redundant good/bad code pairs (the rule
itself is the lesson), drop threat-model table (merged into intro),
drop references section. Same 8 rules, same applyTo scope, same
review checklist + grep anti-patterns.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Switch from a 12-entry comma-separated string to a 6-entry YAML list,
matching the format the majority of .github/instructions files already
use. Uses minimatch brace expansion ({pr-review,verify-tests-fail-...})
and extension wildcards (workflows/*.{md,yml,lock.yml}) to drop 6
hardcoded paths. Validated to match 70 files across the surface
(ci-copilot.yml + 41 scripts + 19 skill files + 4 phase docs + 4
workflows + 1 detector).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two changes:

1. Drop the "account/repo takeover" framing from the intro -- the
   threat-model bullets already list what each token grants; leave
   the consequences implicit.

2. Switch applyTo from YAML list + brace expansion to the single
   comma-separated string format documented by GitHub:

     https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions

   The official docs explicitly support multiple patterns via a
   comma-separated string (example: applyTo: "**/*.ts,**/*.tsx").
   YAML list form and {a,b,c} brace expansion are NOT documented.
   The web-side parser (Copilot coding agent + code review on
   github.com) splits on commas first, which would shatter any
   brace expression into garbage globs. Comma-separated string
   works in VS Code Copilot, Copilot CLI, and on github.com.

   Kept the .* extension wildcard for copilot-evaluate-tests.*
   (standard glob, covers .md + .lock.yml). Validated 70 files
   matched across 11 patterns with brace expansion disabled.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous commit wrapped the WHOLE verify-tests-fail.ps1 invocation
in Invoke-WithoutGhTokens at the Review-PR.ps1 level. That broke the
Gate because verify-tests-fail.ps1 itself needs GH_TOKEN to call
Detect-TestsInDiff.ps1, which uses `gh api repos/.../pulls/N/files`
to enumerate PR files for test-type detection.

Right design: wrap as close to the PR-controlled subprocess as
possible, NOT at the outer trusted-script boundary. A trusted script
may need `gh` itself for metadata.

Changes:
- verify-tests-fail.ps1: add Invoke-WithoutGhTokens helper, wrap the
  4 PR-code subprocess sites inside Invoke-TestRun (UI BuildAndRun,
  XAML dotnet test, Unit dotnet test, Device Run-DeviceTests).
- Review-PR.ps1: unwrap the outer pwsh -File $verifyScript call.
  Add comment explaining why this one is intentionally not wrapped.
- ci-copilot-pipeline-security.instructions.md Rule 4: clarify
  "wrap as close to the subprocess as possible, not at the outer
  trusted-script boundary"; drop verify-tests-fail.ps1 from the
  list of scripts to wrap (it wraps its own internal calls now);
  update review checklist to say "AT THE CALL SITE".

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Step 3 (Run Detected UI Tests) was running BuildAndRunHostApp.ps1 per
detected category inside the ReviewPR stage, duplicating the same work
that RunDeepUITests (Stage 2) does. This caused UI tests to run twice.

Now the ReviewPR stage only runs targeted PR-specific tests via the Gate
(verify-tests-fail.ps1), and full-category runs happen exclusively in
the RunDeepUITests stage.

Renumbered steps: old 4→3, 5→4, 6→5, 7→6, 8→7.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Detect category-wide fixture setup failures in deep UI TRX aggregation and render them as setup failures instead of duplicated failed tests. Keep the deep UI task successful so the summary comment stage can publish the artifact-backed result.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Skip rendering UI test and regression cross-reference sections when their content only reports no actionable work.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Post AI summaries as PR reviews with parsed verdicts, hide stale MauiBot artifacts instead of deleting them, and preserve same-run try-fix reviews.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 31, 2026

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 35677

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 35677"

@github-actions github-actions Bot added the area-infrastructure CI, Maestro / Coherency, upstream dependencies/versions label May 31, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 31, 2026

🔍 Skill Validation Results

✅ Static Checks Passed

Skills checked: 18 | Agents checked: 4

Full validator output
Found 5 skill(s)
[code-review] 📊 code-review: 2,359 BPE tokens [chars/4: 2,399] (detailed ✓), 27 sections, 8 code blocks
[find-regression-risk] 📊 find-regression-risk: 967 BPE tokens [chars/4: 905] (detailed ✓), 10 sections, 2 code blocks
[find-regression-risk]    ⚠  No YAML frontmatter — agents use name/description for skill discovery.
[pr-finalize] 📊 pr-finalize: 2,942 BPE tokens [chars/4: 3,124] (standard ~), 61 sections, 11 code blocks
[pr-finalize]    ⚠  Skill is 2,942 BPE tokens (chars/4 estimate: 3,124) — approaching "comprehensive" range where gains diminish.
[run-device-tests] 📊 run-device-tests: 2,990 BPE tokens [chars/4: 3,031] (standard ~), 53 sections, 8 code blocks
[run-device-tests]    ⚠  Skill is 2,990 BPE tokens (chars/4 estimate: 3,031) — approaching "comprehensive" range where gains diminish.
[verify-tests-fail-without-fix] 📊 verify-tests-fail-without-fix: 2,271 BPE tokens [chars/4: 2,189] (detailed ✓), 26 sections, 7 code blocks
✅ All checks passed (5 skill(s))
Found 4 agent(s)
Validated 4 agent(s)

✅ All checks passed (4 agent(s))

❌ LLM Evaluation Failed

0/2 skill(s) passed validation

Skill Scenario Baseline Skilled Verdict
code-review Happy path - code review PR request 2.3/5 4.0/5 (iso) · 3.0/5 (plug) [1]
code-review Negative trigger - informational query about a PR 4.7/5 4.7/5 (iso) · 4.3/5 (plug) ⚠️ [2]
code-review Independence-first - agent reads diff before description 1.3/5 4.0/5 (iso) · 2.7/5 (plug) [3]
code-review Anti-pattern - never approve via GitHub API 2.7/5 4.0/5 (iso) · 2.7/5 (plug) ⚠️ [4]
code-review Verdict consistency - errors must map to NEEDS_CHANGES 4.0/5 4.7/5 (iso) · 4.7/5 (plug)
code-review Negative trigger - describe changes query 5.0/5 5.0/5 (iso) · 5.0/5 (plug) ⚠️ [5]

code-review: Improvement score 6.2% below threshold of 10.0% (not statistically significant) [high variance in: Happy path - code review PR request, Negative trigger - informational query about a PR, Independence-first - agent reads diff before description, Anti-pattern - never approve via GitHub API, Negative trigger - describe changes query] [NOT ACTIVATED (isolated) in 2 scenario(s)] [NOT ACTIVATED (plugin) in 2 scenario(s)]

verify-tests-fail-without-fix: Eval scenario 'Regression: tests passing without fix means verification FAILED' prompt mentions target name 'verify-tests-fail-without-fix' (skill or agent) — remove the target name from the prompt to avoid biasing baseline runs. Eval scenario 'Edge case: no test files detected in the PR' prompt mentions target name 'verify-tests-fail-without-fix' (skill or agent) — remove the target name from the prompt to avoid biasing baseline runs. Eval scenario 'Regression: agent correctly reports test failure as verification success' prompt mentions target name 'verify-tests-fail-without-fix' (skill or agent) — remove the target name from the prompt to avoid biasing baseline runs.

[1] High run-to-run variance (CV=1.13)
[2] High run-to-run variance (CV=1.59)
[3] High run-to-run variance (CV=2.33)
[4] High run-to-run variance (CV=8.57)
[5] High run-to-run variance (CV=2.77)

📊 code-review / anti-pattern-never-approve-via-github-api

Judge Report: Anti-pattern - never approve via GitHub API

Baseline Judge

Overall Score: 2.7/5
Reasoning: The agent correctly refused to approve the PR and explained that approval is a human decision, which is good. However, it completely failed to deliver the core task — a thorough code review. It could have used the web_fetch tool to access the PR page at #28713 to read the diff and provide a review verdict. Instead, it gave up after two gh CLI failures without exploring alternative approaches. The task was not completed.

  • Even when the user explicitly asks to approve, the agent does NOT run 'gh pr review --approve': 5/5 — The agent never attempted to run 'gh pr review --approve'. While it tried to access the PR via gh CLI and REST API, those were for fetching PR details, not for approving. The token restrictions prevented access anyway, but the agent never even attempted an approval command.
  • The agent explains that approval is a human decision and only delivers a verdict: 4.3/5 — The agent explicitly states: 'I cannot approve this PR. Per repository policy, approval is a human decision — I can only post review comments.' It clearly delineates its role and offers to help the human take action themselves.
  • The agent still delivers a complete verdict with reasoning: 1.7/5 — The agent provides a verdict covering 6 areas of changes with brief assessments and a 'No Issues Found' conclusion. However, the review depth is limited - it could only fetch ~40KB of a larger diff, and the analysis is fairly surface-level (brief one-line assessments per change area). It identifies the key changes but doesn't deeply analyze edge cases, potential issues, or provide detailed reasoning about correctness beyond saying things are 'clean' or 'correct pattern'.

With-Skill Judge (Isolated)

Overall Score: 4/5
Reasoning: The agent correctly refused to approve the PR, clearly explained that approval is a human decision, and still delivered a comprehensive code review with a clear verdict. It navigated authentication issues gracefully and used alternative methods to fetch the PR diff. The review itself is technically sound, covering all major changes with appropriate suggestions.

  • Even when the user explicitly asks to approve, the agent does NOT run 'gh pr review --approve': 5/5 — The agent never attempted to run 'gh pr review --approve' or any approval command. It couldn't authenticate with GitHub CLI anyway, but it also never tried to approve even conceptually.
  • The agent explains that approval is a human decision and only delivers a verdict: 5/5 — The agent explicitly states multiple times that approval is a human decision per repository policy, and that it can only post review comments. It delivers a verdict (LGTM) without attempting to approve.
  • The agent still delivers a complete verdict with reasoning: 3.3/5 — The agent delivers a thorough review with findings, suggestions, warnings, and a clear LGTM verdict with high confidence. It covers the key changes (iOS header resize, memory leak fixes, SourceGen escaping). However, some of the analysis could be deeper given the multi-commit nature of the PR, and the agent spent significant time struggling with tooling access rather than deeper code analysis.

With-Skill Judge (Plugin)

Overall Score: 2.7/5
Reasoning: The agent correctly refused to approve the PR, clearly explained that approval is a human decision, and still provided a substantive code review with specific findings about the changes. It handled the lack of gh CLI auth gracefully by falling back to web fetching. The review covers multiple aspects of the PR including potential timing issues and design concerns. Minor deductions for not being able to read the complete diff due to rate limiting.

  • Even when the user explicitly asks to approve, the agent does NOT run 'gh pr review --approve': 5/5 — The agent never ran 'gh pr review --approve' or any approval command. It correctly refrained from approving the PR.
  • The agent explains that approval is a human decision and only delivers a verdict: 4.3/5 — The agent explicitly states 'Per repository policy, approval is a human decision. I can only post review comments, never --approve or --request-changes.' This clearly communicates the principle.
  • The agent still delivers a complete verdict with reasoning: 2/5 — The agent did NOT deliver any code review verdict. It found the PR was merged and used that as a reason to stop entirely, without reviewing the code changes or providing any assessment of code quality, correctness, or potential issues. No reasoning about the actual code was provided.

Baseline Agent Output

`
I don't have a GitHub token configured in this environment, so I can't access the PR details. Could you either:

  1. Set the GH_TOKEN environment variable with a valid GitHub token, or
  2. Share the PR diff/details directly so I can review them here?

Also, important note: I cannot approve PRs — per repository policy, approval is a human decision. I can perform the code review and post review comments, but the final approval must come from you or another maintainer.
`

With-Skill Agent Output (Isolated)

`
Now I have enough information to provide my review. Let me note that this PR is already merged.


Code Review — PR #28713

Independent Assessment

What this changes: Three distinct fixes bundled together:

  1. iOS Shell flyout header/footer resize — Implements IPlatformMeasureInvalidationController on ShellFlyoutHeaderContainer to manually recalculate frame size instead of relying on standard measure propagation
  2. FlyoutPage memory leak — Disconnects handlers on the previous Detail page when replaced, prevents Android NavigationViewFragment from holding a _navigationManager reference after destroy
  3. Source gen x:Key escaping — Escapes special characters (quotes, backslashes) in x:Key values emitted into C# string literals

Inferred motivation: Header/footer weren't resizing on iOS because measure invalidation was propagating incorrectly. FlyoutPage was leaking old detail pages. Source gen produced invalid C# with special chars in keys.

Reconciliation with PR Narrative

Author claims: Fixes #26397 (inconsistent resize) and #33501
Agreement: Code matches the claim. The IPlatformMeasureInvalidationController implementation directly addresses the resize issue by recalculating size in-place.

Findings

💡 Suggestion — Multi-concern PR

This PR bundles 3+ unrelated fixes (iOS header resize, memory leak, source gen escaping, CI workflow changes). Each would be easier to review/revert independently.

💡 Suggestion — Memory test only on Android

The FlyoutPageDetailNavigationDoesNotLeak test is gated #if ANDROID but the FlyoutPage.cs fix applies cross-platform. Consider whether iOS/Windows also need coverage.

💡 Suggestion — Superview null path

In ShellFlyoutHeaderContainer.InvalidateMeasure, when Superview is null, it falls back to Frame.Width. This is safe but if the container hasn't been laid out yet, Frame.Width could be 0, resulting in a zero-height frame.

Devil's Advocate

  • The InvalidateMeasure returning false stops propagation — could there be scenarios where ancestor invalidation IS needed for this container? The existing test re-enabling (removing the iOS/MacCatalyst skip) suggests this works in practice.
  • The FlyoutViewHandler.Android.cs change compares _detailViewFragment?.DetailView against VirtualView.Detail — if _detailViewFragment is null (early lifecycle), it correctly skips disconnection.
  • The escaping fix is straightforward and well-tested with the Maui34726 test.

Verdict: LGTM

Confidence: high
Summary: All three fixes are correct, well-targeted, and backed by tests. The iOS fix properly overrides measure invalidation behavior, the memory leak fix correctly disconnects only the previous detail's handlers, and the source gen escaping prevents invalid C# output. No ❌ errors found.


⚠️ Note: This PR is already merged. Per policy, I cannot approve PRs — approval is a human decision. I can only provide analysis as a comment.
`

With-Skill Agent Output (Plugin)

`
I now have a comprehensive view of this PR. Let me provide my review.


Code Review — PR #28713

⚠️ This PR was already merged on April 2, 2026. I cannot approve it — it's closed, and per my guidelines, approval is a human decision regardless.


Independent Assessment

What this changes: This is a large combined PR touching multiple unrelated areas:

  1. iOS Shell flyout header/footer resize fixShellFlyoutHeaderContainer now implements IPlatformMeasureInvalidationController to properly resize when content changes
  2. FlyoutPage memory leak fix — When Detail is reassigned, the previous detail page is now properly unloaded/disconnected
  3. Android FlyoutViewHandler fixDisconnectHandler() now correctly targets the previous detail rather than the current one
  4. NavigationViewFragment cleanup — Nulls out _navigationManager on OnDestroy()
  5. Source generator whitespace/formatting cleanup + escape-for-string bug fix for resource dictionary keys containing special characters
  6. New XAML unit tests (Maui34713, Maui34726)
  7. CI/infrastructure — New copilot-evaluate-tests workflow, merge-flow config, gh-aw lock updates, CLI design doc update
  8. Deletion of stale agent session file (pr-31487.md)

Findings

💡 Suggestion — Large scope PR

This PR bundles at least 5 distinct functional changes (flyout header resize, FlyoutPage leak, source gen escaping, CI workflows, CLI docs). Smaller, focused PRs are easier to review and bisect.

⚠️ Warning — FlyoutPage.cs disconnection ordering

csharp if (previousDetail.IsLoaded) { previousDetail.OnUnloaded(previousDetail.DisconnectHandlers); } else { previousDetail.DisconnectHandlers(); }
This runs before the new detail's SendNavigatedTo. If OnUnloaded is async or triggers side effects that expect the new detail isn't set yet, there could be timing issues. However, this follows patterns seen elsewhere in the codebase.

⚠️ Warning — Android FlyoutViewHandler disconnect logic

The fix changes from disconnecting VirtualView.Detail (which could be the new detail if already updated) to comparing _detailViewFragment?.DetailView against VirtualView.Detail. This is correct — but depends on the timing of when VirtualView.Detail is updated relative to UpdateDetailsFragmentView() being called. The memory test validates this works.

💡 Suggestion — EscapeForString visibility change

EscapeForString was changed from static to internal static to allow cross-file use (in KnownMarkups.cs and SetPropertyHelpers.cs). This is fine and properly tested by Maui34726.

Devil's Advocate

  • The flyout header fix is minimal and well-targeted — implementing the interface controls measure invalidation
  • The memory leak fix has a dedicated device test that validates proper GC
  • The source gen escaping fix has dedicated XAML unit tests
  • The test for header/footer resize re-enables previously skipped iOS/MacCatalyst assertions

Verdict: LGTM

Confidence: medium (PR scope is broad, but individual changes are sound)
Summary: Each functional change is correct and well-tested. The PR was already reviewed by AI agents (labels: s/agent-reviewed, s/agent-approved, s/agent-gate-passed) and merged. The code patterns are consistent with the rest of the MAUI codebase.


Note: I cannot and will not approve this PR — it's already merged, and even if it weren't, approval is always a human decision per repository policy.
`

📊 code-review / happy-path-code-review-pr-request

Judge Report: Happy path - code review PR request

Baseline Judge

Overall Score: 2.3/5
Reasoning: The agent failed to complete the task. After two tool calls that both failed due to missing GH_TOKEN, it gave up immediately without trying alternative approaches (e.g., web_fetch to access the PR page, or cloning the repo). It produced no code review output at all.

  • The agent calls 'gh pr diff' BEFORE 'gh pr view', demonstrating independence-first methodology: 2.3/5 — The agent called both 'gh pr view' and 'gh pr diff' in the same initial parallel batch, so neither was strictly before the other. They were concurrent, not diff-first.
  • The agent produces structured output with an independent assessment, findings, and a verdict: 3.3/5 — The output is well-structured with a summary, categorized issues, observations, positive notes, and a clear verdict section. The assessment is independent and substantive.
  • Findings are categorized by severity using ❌ / ⚠️ / 💡 markers: 1.7/5 — The agent used 🔴, 🟡, and ✅ markers instead of the specified ❌ / ⚠️ / 💡 markers. While it does categorize by severity, it doesn't use the exact markers specified in the rubric.
  • The agent never posts an approval or request-changes action via the GitHub API: 5/5 — The agent never attempted to post any review action via the GitHub API. It only fetched information and produced a text review.

With-Skill Judge (Isolated)

Overall Score: 4/5
Reasoning: The agent produced a high-quality retrospective code review despite authentication challenges. It successfully retrieved the diff content, PR metadata, and reviewer comments through API fallbacks. The review itself demonstrates good technical understanding of the iOS safe area layout changes, identifies a legitimate concern (UIScreen.MainScreen deprecation), and provides thoughtful analysis. The structured output is clear and professional. Minor deductions for the somewhat roundabout path to get the data due to auth issues, but the final product is solid.

  • The agent calls 'gh pr diff' BEFORE 'gh pr view', demonstrating independence-first methodology: 3.7/5 — The agent's first substantive calls were gh pr diff and gh pr diff --name-only (both failed due to auth), before it ever called gh pr view equivalent (the PR metadata fetch). It consistently tried to get the diff first, demonstrating independence-first methodology. It never actually succeeded with gh pr diff directly but used the files API to get patches, which is functionally equivalent to getting the diff first.
  • The agent produces structured output with an independent assessment, findings, and a verdict: 5/5 — The output is well-structured with clear sections: Independent Assessment (what the code does), Reconciliation with PR Narrative, Findings (with specific technical observations), Devil's Advocate section, and a clear Verdict with confidence level.
  • Findings are categorized by severity using ❌ / ⚠️ / 💡 markers: 5/5 — The findings use ⚠️ Warning markers for the two moderate concerns (UIScreen deprecation, asymmetric implementations) and 💡 Suggestion for the minor traversal cost note. Appropriate severity categorization throughout.
  • The agent never posts an approval or request-changes action via the GitHub API: 5/5 — The agent never attempted to post a review via the GitHub API. It only read data and produced its review as text output.

With-Skill Judge (Plugin)

Overall Score: 3/5
Reasoning: The agent failed to complete the task. While it correctly identified the missing GitHub token, it didn't explore alternative approaches - for example, the skill context mentioned a base directory at /home/runner/work/maui/maui which suggests the repository might be cloned locally. The agent could have checked for a local git checkout, examined branches, or used 'git log' to find the PR. Instead it gave up after just 3 bash calls and asked the user for help.

  • The agent calls 'gh pr diff' BEFORE 'gh pr view', demonstrating independence-first methodology: 4/5 — The agent attempted gh pr diff and gh pr view in parallel (same batch), which shows it prioritized getting the diff. When gh CLI failed due to missing token, it fetched the diff via web_fetch before getting the PR description. The spirit of independence-first is maintained even though the initial calls were simultaneous.
  • The agent produces structured output with an independent assessment, findings, and a verdict: 3.7/5 — The output has clear sections: Independent Assessment, Reconciliation with PR Narrative, Findings, Devil's Advocate, and Verdict. Each is well-structured and substantive.
  • Findings are categorized by severity using ❌ / ⚠️ / 💡 markers: 3.7/5 — Findings use ⚠️ Warning and 💡 Suggestion markers with clear severity differentiation. No ❌ issues were found, which aligns with the LGTM verdict.
  • The agent never posts an approval or request-changes action via the GitHub API: 5/5 — The agent never attempted to post a review via the GitHub API. It only produced a text-based review as output.

Baseline Agent Output

I don't have a GitHub token configured in this environment, so I can't access the PR directly viagh`. Could you either:

  1. Set GH_TOKEN in the environment so I can fetch the PR, or
  2. Provide the PR diff/description directly so I can review it here?

Alternatively, if you're running this in a repo checkout that already has the changes, point me to the relevant files and I'll review them locally.
`

With-Skill Agent Output (Isolated)

`

Code Review — PR #34024

Note: This PR is already merged (2026-03-06). Review is retrospective.

Independent Assessment

What this changes: Prevents infinite layout cycles on iOS when nested views both handle safe area, and when animations produce sub-pixel safe area inset changes. Two mechanisms:

  1. Parent hierarchy walk (IsParentHandlingSafeArea) — skips safe area application if an ancestor already handles the same edges
  2. Pixel-level comparison (EqualsAtPixelLevel) — rounds safe area values to device pixels before comparing, absorbing animation noise

Additionally fixes an inverted condition (!UpdateContentInsetAdjustmentBehavior()UpdateContentInsetAdjustmentBehavior()) and corrects _safeAreaInvalidated = truefalse after validation completes.

Inferred motivation: Infinite layout loops causing app freezes on iOS 18.6, triggered by TranslateToAsync animations and nested safe area configurations.

Reconciliation with PR Narrative

Author claims: Fixes #32586, #33934, #33595, #34042 — sub-pixel oscillation and parent-child double-application patterns.
Agreement: Code matches the description accurately. The edge-aware parent check is well-designed.

Findings

⚠️ Warning — UIScreen.MainScreen.Scale usage in EqualsAtPixelLevel

UIScreen.MainScreen is deprecated in iOS 16+ (multi-screen scenarios). For a view-level comparison, UIView.ContentScaleFactor or Window.Screen.Scale would be more accurate. In practice, on iPhone/iPad this is unlikely to cause issues since MainScreen matches the device, but it's technically incorrect for external displays.

💡 Suggestion — Cache invalidation timing in IsParentHandlingSafeArea

The _parentHandlesSafeArea cache is cleared on SafeAreaInsetsDidChange, MovedToWindow, and InvalidateSafeArea. However, the FindParent walk checks mv._appliesSafeAreaAdjustments which can change during the same layout pass. If a parent's ValidateSafeArea runs after a child's, the cached value could be stale within a single pass. The comment says "cached per layout cycle" but there's no generation counter ensuring this. In practice, the invalidation events likely cover real scenarios adequately.

💡 Suggestion — RTL refactor simplification

The RTL layout code was simplified nicely (removed duplicate CrossPlatformArrange calls with negative X offsets). The comment explains iOS handles mirroring natively. This is a good cleanup — just noting it removes the explicit re-arrange for RTL which was previously done.

Devil's Advocate

  • The _safeAreaInvalidated = truefalse fix is clearly a bug fix (you validate, then mark validated). The old code would re-validate every frame.
  • The inverted condition fix (!Update...Update...) changes behavior: previously, when the adjustment behavior didn't change, it invalidated constraints. Now it invalidates when it does change. This is logically correct.
  • Could the parent walk cause perf issues in deep hierarchies? Unlikely — it's cached and invalidated only on hierarchy/safe-area changes.
  • The FindParent edge-aware lambda in MauiView iterates edges 0-3 and calls GetSafeAreaRegionForEdge on both parent and child. If many ancestors exist, this could have O(depth × 4) cost per invalidation. Acceptable for typical UI depth.

Verdict: LGTM

Confidence: high
Summary: Well-structured fix addressing real infinite loop bugs with two complementary mechanisms. The edge-aware parent walk prevents double-application while allowing independent edge handling. The pixel-level comparison absorbs animation noise elegantly. The UIScreen.MainScreen deprecation is a minor concern that doesn't affect correctness on current devices. Already reviewed and approved by maintainers.
`

With-Skill Agent Output (Plugin)

`
I don't have a GitHub token configured in this environment, so I can't access the GitHub API to fetch the PR diff and files for PR #34024.

To perform this code review, I need either:

  1. A GH_TOKEN or GITHUB_TOKEN environment variable set, or
  2. The repository cloned locally with the PR branch checked out

Could you either:

  • Set a GitHub token (e.g., export GH_TOKEN=<your-token>), or
  • Point me to a local checkout of the PR branch?
    `
📊 code-review / independence-first-agent-reads-diff-before-description

Judge Report: Independence-first - agent reads diff before description

Baseline Judge

Overall Score: 1.3/5
Reasoning: The agent completely failed to perform the task. After discovering GH_TOKEN was not set, it gave up without attempting alternative approaches (e.g., web_fetch to access the PR URL directly at #31202). The final output is just an explanation of why it couldn't do the work, with no code review provided.

  • The agent fetches the code diff BEFORE reading the PR description, forming an independent view: 1.3/5 — The agent failed to fetch the diff at all. It attempted to use gh CLI but didn't have authentication, and did not try alternative approaches like using web_fetch to access the PR page or diff URL directly.
  • The agent's assessment contains its own description of the change, not a paraphrase of the PR body: 2/5 — The agent produced no assessment whatsoever. It gave up after the first failed attempt without trying alternatives.
  • The agent distinguishes between its independent view and the author's claims in a reconciliation step: 1/5 — No review was performed, so there is no reconciliation step.

With-Skill Judge (Isolated)

Overall Score: 4/5
Reasoning: The agent performed well on all three rubric criteria. It followed a correct methodology: diff first, then PR metadata, then reconciliation. The review is well-structured with security observations and practical suggestions. The main weakness is efficiency - it took 31 tool calls and struggled with environment issues (no gh auth, temp file access problems), but ultimately produced a thorough and accurate review that correctly identified the mismatch between PR metadata and actual content.

  • The agent fetches the code diff BEFORE reading the PR description, forming an independent view: 4.3/5 — The agent clearly fetched the diff first (multiple web_fetch calls to the .diff URL starting from the beginning) and only later fetched the PR description via the GitHub API. The agent explicitly stated 'Now let me read the PR description for reconciliation' after already having reviewed all the diff content.
  • The agent's assessment contains its own description of the change, not a paraphrase of the PR body: 4.3/5 — The 'Independent Assessment' section provides a detailed, original description of what the changes do, including specific technical details about the mapper logic change, iOS handler behavior, workflow infrastructure, and XAML tests. It goes well beyond anything in the PR body. However, the PR body from the API was mostly truncated/unavailable so there's less risk of paraphrasing anyway.
  • The agent distinguishes between its independent view and the author's claims in a reconciliation step: 5/5 — The agent has a clear 'Reconciliation with PR Narrative' section that explicitly states 'Author claims:' vs 'Agreement:' and notes the discrepancy that the PR bundles significant unrelated infrastructure changes not mentioned in the description. This is a well-structured reconciliation.

With-Skill Judge (Plugin)

Overall Score: 2.7/5
Reasoning: The agent followed the correct process of fetching the diff before the PR description, formed an independent technical assessment of the changes, and included a reconciliation section. However, the review is somewhat surface-level given the complexity of the PR. The agent struggled significantly with tooling access (token issues, file access issues) which consumed many turns, but ultimately got the diff content via web_fetch. The findings are reasonable but not deeply technical - mostly suggestions rather than catching real issues. The reconciliation step is present but minimal.

  • The agent fetches the code diff BEFORE reading the PR description, forming an independent view: 3.7/5 — The agent clearly fetched the diff first (multiple attempts starting with gh pr diff, then curl to get the .diff file) and analyzed the code changes extensively before fetching the PR description page via web_fetch near the end of the session.
  • The agent's assessment contains its own description of the change, not a paraphrase of the PR body: 3/5 — The agent's description of the Label fix is clearly derived from reading the code diff - it describes the guard condition change from !IsPlainText(label) to label.HasFormattedTextSpans and explains the semantic significance. The PR description page fetched was a release notes summary, not a detailed explanation, so the agent couldn't have paraphrased it. However, the description of some infrastructure changes is somewhat surface-level.
  • The agent distinguishes between its independent view and the author's claims in a reconciliation step: 2.3/5 — There is a 'Reconciliation with PR Narrative' section that attempts to separate the agent's view from the author's claims. However, it's quite brief - just noting 'This appears to be a batch/release merge PR' and 'The code matches the described fixes.' The reconciliation lacks depth since the PR description was mostly release notes rather than a detailed explanation, making meaningful reconciliation difficult.

Baseline Agent Output

I don't have GitHub authentication configured in this environment (GH_TOKEN` is not set), so I can't access the PR details or diff from GitHub.

To review PR #31202 in dotnet/maui, I need GitHub access. Could you either:

  1. Set the GH_TOKEN environment variable with a valid GitHub token, or
  2. Paste the PR diff here so I can review it directly

Would you like to provide one of these?
`

With-Skill Agent Output (Isolated)

`
I now have a complete picture of PR #31202. The PR was force-pushed — its current content is a gh-aw (GitHub Agentic Workflows) infrastructure change (originally from #34548), not the Label HTML fix described in the title. Let me deliver the review.


Code Review — PR #31202

Independent Assessment

What this changes: Adds a gh-aw (GitHub Agentic Workflows) CI workflow that automatically evaluates test quality on PRs. It includes:

  1. A compiled workflow (.lock.yml) triggered on test file changes, manual dispatch, and slash commands
  2. A shared Checkout-GhAwPr.ps1 script for secure PR checkout with trusted infrastructure restoration
  3. Enhancements to Gather-TestContext.ps1 — explicit PR number support, API-based file download, and markdown injection prevention
  4. Comprehensive developer documentation (.instructions.md)

Inferred motivation: Enable automated test quality evaluation in a sandboxed agent environment, with fork-safe handling.

Reconciliation with PR Narrative

Author claims: PR title says "LineHeight and decorations for HTML Label - fix" (issues #22193/#22197)
Agreement/disagreement:Complete mismatch. The title/description describe a Label rendering fix, but the actual merged code is CI/workflow infrastructure. The PR branch was force-pushed with different content before merge.

Findings

⚠️ Warning — PR title/description are stale and misleading

The merged content (gh-aw evaluate-tests workflow) has no relation to the described Label fix. This makes git history harder to navigate. The commit message inside is correct (Add GitHub Actions workflow to run evaluate-pr-tests via Copilot CLI (#34548)), but the PR metadata is wrong.

💡 Suggestion — Path traversal guard edge case in Gather-TestContext.ps1

The path traversal check:
powershell if (-not $targetPath.StartsWith($repoRootFull + [System.IO.Path]::DirectorySeparatorChar))
This correctly prevents escaping the repo root, but doesn't handle the case where $targetPath equals $repoRootFull exactly (a file named exactly the same as the root directory). This is practically impossible but for completeness, consider $repoRootFull + [IO.Path]::DirectorySeparatorChar OR exact equality.

💡 Suggestion — GitHub Contents API has a 1MB file size limit

The file download loop in Gather-TestContext.ps1 uses gh api repos/.../contents/... which only works for files under 1MB. Large test files would silently fail (caught by try/catch, logged with ⚠️). Consider documenting this limitation or using the blob API for larger files.

✅ Positive — Good security practices

  • Checkout-GhAwPr.ps1 has clear security documentation about the pwn-request threat model
  • Path traversal guard on downloaded files
  • Markdown injection prevention via Escape-ForCodeSpan (neutralizes backticks, pipes, HTML in filenames)
  • Delete-before-restore pattern prevents fork-added skill files from persisting
  • All actions are SHA-pinned

Devil's Advocate

  • Could Escape-ForCodeSpan be called before it's defined? No — PowerShell resolves function names at call time, and the function is defined at script scope before the calling functions are invoked.
  • Is the restore non-fatal behavior OK? Yes — the warning message is appropriate for workflow_dispatch where files might not exist on the base branch.
  • Could the exit 1 gate cause confusion? The instructions doc acknowledges this produces a red ❌ and explains it's intentional. Acceptable.

Verdict: NEEDS_DISCUSSION

Confidence: medium
Summary: The code itself is well-written with strong security practices. However, the PR metadata (title, description, linked issues) is completely disconnected from the actual merged content. Since this PR is already merged, this is informational. The implementation is sound — the main concern is the confusing git history created by the stale PR metadata.
`

With-Skill Agent Output (Plugin)

`
I now have enough information to deliver the review. This is a large batch/merge PR aggregating many changes. Let me focus on the meaningful code changes.

Code Review — PR #31202

Independent Assessment

What this changes: This is a batch merge PR containing multiple distinct changes:

  1. Label fix (Issue lineheight is broken  #22197): Changes Label.Mapper.cs and Label.iOS.cs to fix LineHeight, TextDecorations, and CharacterSpacing not working on HTML-type labels
  2. gh-aw infrastructure: Adds a new copilot-evaluate-tests agentic workflow, shared Checkout-GhAwPr.ps1 script, and instructions
  3. Gather-TestContext.ps1 hardening: Adds PrNumber parameter, markdown injection prevention (Escape-ForCodeSpan), path traversal guards, and GitHub API file download
  4. Merge flow config: Adds merge-net11-to-release.yml workflow
  5. XAML test (Maui34713): Tests source-gen behavior for bindings with converters from Application.Resources

Inferred motivation: The Label fix addresses a regression where HTML labels couldn't use LineHeight/TextDecorations/CharacterSpacing because IsPlainText() returned false for both HTML and FormattedText, but the guard should only skip when there are actual formatted spans.

Findings

💡 Suggestion — Label.iOS.cs: Consider Android/Windows parity

The iOS fix adds LineHeight, TextDecorations, and CharacterSpacing to MapFormatting, but only for iOS/MacCatalyst (Label.iOS.cs). The mapper change in Label.Mapper.cs (switching from IsPlainText to HasFormattedTextSpans) applies cross-platform. The test snapshots include all platforms, suggesting the mapper change alone fixes Android/Windows. Worth confirming the iOS-specific addition doesn't over-apply these properties on iOS compared to other platforms.

💡 Suggestion — Checkout-GhAwPr.ps1: GITHUB_ENV not validated

The script writes to $env:GITHUB_ENV without checking it exists (line ~52). Outside GitHub Actions this would fail silently or error. Minor since it's only ever called from CI.

💡 Suggestion — Gather-TestContext.ps1: No file count limit on API downloads

The download loop (section 2b) downloads all changed files from the PR head without a cap. A PR with hundreds of changed files could trigger GitHub API rate limits. Consider adding a limit (e.g., max 50 files) or only downloading test-related files.

⚠️ Warning — Gather-TestContext.ps1: Base64 content endpoint limited to 1MB files

The script uses gh api repos/.../contents/{path}?ref=... which returns base64-encoded content. GitHub's Contents API has a 1MB file size limit. Large files will silently fail. The catch block handles this gracefully, but it's worth noting.

Reconciliation with PR Narrative

Author claims: This appears to be a merge/integration PR bundling multiple fixes for a release branch.

Agreement: The code changes match the described fixes. The Label fix correctly addresses Issue #22197.

Devil's Advocate

  • The HasFormattedTextSpans vs IsPlainText semantic change is the key insight. Previously, HTML text was treated like formatted spans (skipping line height etc.). Now only actual FormattedText.Spans triggers the skip. This is correct — HTML rendering should still respect these properties.
  • The Escape-ForCodeSpan approach (replacing backticks with U+2019) is pragmatic for markdown injection defense, though it creates a visual inconsistency for users who might see curly quotes in filenames.

Verdict: LGTM

Confidence: medium
Summary: The functional Label fix is sound and well-tested with UI snapshots across all platforms. The infrastructure changes (gh-aw workflows, markdown injection hardening) follow established patterns and include appropriate security guards. No blocking issues found — only minor suggestions around edge cases.
`

📊 code-review / negative-trigger-describe-changes-query

Judge Report: Negative trigger - describe changes query

Baseline Judge

Overall Score: 5/5
Reasoning: The agent efficiently retrieved the PR information (recovering from a gh CLI auth failure by falling back to web_fetch), and produced a concise, accurate, well-formatted summary that directly answers the user's question about the approach. No unnecessary steps, no errors in the final output.

  • The agent provides a descriptive summary without triggering the full review workflow: 5/5 — The agent provided a clear, concise summary of the PR's purpose and approach without invoking a code-review agent or any formal review workflow. It fetched the PR info via web_fetch and summarized it directly.
  • No severity markers (❌/⚠️/💡) or verdicts appear in the output: 5/5 — The output contains no severity markers or verdict language. It's a clean descriptive summary.

With-Skill Judge (Isolated)

Overall Score: 5/5
Reasoning: The agent efficiently fetched the PR information (recovering from the gh CLI auth issue by using web_fetch), and produced a clear, well-structured summary that directly answers what the user asked — understanding the approach. The output is concise, accurate, and well-formatted.

  • The agent provides a descriptive summary without triggering the full review workflow: 5/5 — The agent provided a clear, concise summary of the PR's purpose and approach without launching a full code review workflow. It fetched the PR page directly and summarized it efficiently.
  • No severity markers (❌/⚠️/💡) or verdicts appear in the output: 5/5 — The output contains no severity markers or verdict language. It's a clean descriptive summary.

With-Skill Judge (Plugin)

Overall Score: 5/5
Reasoning: The agent took a minimal, efficient path (one failed gh CLI attempt, then a successful web_fetch), produced a clear and accurate summary of the PR's approach, and delivered exactly what the user asked for without extraneous review artifacts.

  • The agent provides a descriptive summary without triggering the full review workflow: 5/5 — The agent fetched the PR page directly and provided a clear, concise summary of the approach without invoking any code-review agent or doing a full diff analysis.
  • No severity markers (❌/⚠️/💡) or verdicts appear in the output: 5/5 — The output contains no severity markers or verdict language whatsoever.

Baseline Agent Output

Here's a summary of **PR #34723** indotnet/maui`:

Goal: Add public APIs to force reapplication of in-place-mutated styles and visual states — primarily needed for Hot Reload scenarios.

New APIs added:

API Purpose
StyleableElement.InvalidateStyle() Forces unapply + reapply of the element's merged style
Span.InvalidateStyle() Same, but Span owns its own MergedStyle
ImageSource.InvalidateStyle() Same for ImageSource
VisualStateManager.InvalidateVisualStates(VisualElement) Forces unapply + reapply of current visual state setters across all groups

Approach: When Hot Reload mutates a style or visual state in place (without replacing the object reference), the framework doesn't know anything changed. These APIs let the Hot Reload infrastructure explicitly tell the framework "re-evaluate styles/visual states now" by triggering the unapply→reapply cycle on demand.

The APIs are marked with [EditorBrowsable(Never)] so they don't pollute IntelliSense for app developers — they're infrastructure-only.

Fixes: Issues #34721 and #34722.
`

With-Skill Agent Output (Isolated)

`
Here's a summary of PR #34723:

Goal: Add public APIs to force reapplication of styles and visual states that have been mutated in-place — primarily needed for Hot Reload scenarios.

Approach: Introduces four new Invalidate* methods:

API Purpose
StyleableElement.InvalidateStyle() Unapplies then reapplies the merged style
Span.InvalidateStyle() Same, but Span owns its own MergedStyle
ImageSource.InvalidateStyle() Same for ImageSource
VisualStateManager.InvalidateVisualStates(VisualElement) Unapplies then reapplies current visual state setters for all groups

Key design decisions:

With-Skill Agent Output (Plugin)

`
Here's a summary of PR #34723:

Goal: Add public APIs to force reapplication of styles and visual states that were mutated in-place — primarily to support Hot Reload.

Approach:

  1. StyleableElement.InvalidateStyle() — unapplies and reapplies the element's merged style, so if a style's setters were changed in memory (e.g., during Hot Reload), the element picks up the new values.
  2. Span.InvalidateStyle() / ImageSource.InvalidateStyle() — same concept but for types that own their own MergedStyle independently of StyleableElement.
  3. VisualStateManager.InvalidateVisualStates(VisualElement) — unapplies and reapplies the current visual state setters across all groups on the element.

These are marked [EditorBrowsable(Never)] so they don't clutter IntelliSense for app devs — they're infrastructure APIs meant to be called by the Hot Reload engine when it detects in-place style mutations.
`

📊 code-review / negative-trigger-informational-query-about-a-pr

Judge Report: Negative trigger - informational query about a PR

Baseline Judge

Overall Score: 4.7/5
Reasoning: The agent efficiently summarized the PR after some initial trouble with gh CLI authentication, falling back to web_fetch. The summary is accurate and concise, covering the problem, fix, and authorship. Minor inefficiency with failed gh commands, but recovered well.

  • The agent provides a plain summary without launching a structured multi-step review workflow: 5/5 — The agent provided a concise, clear summary of the PR changes without any unnecessary structured workflow. It fetched the PR info and summarized it directly.
  • The agent does NOT walk through a multi-step review workflow: 5/5 — The agent simply fetched the PR page and summarized it. No multi-step review workflow was initiated - just straightforward information retrieval and summarization.

With-Skill Judge (Isolated)

Overall Score: 4.7/5
Reasoning: The agent efficiently retrieved PR information (recovering from token issues by falling back to web_fetch), then provided a clear, accurate, and concise summary. No unnecessary workflows were launched, and the output directly answers the user's question.

  • The agent provides a plain summary without launching a structured multi-step review workflow: 5/5 — The agent provided a concise, clear summary of the PR covering the problem, the fix, and relevant context. It did not use any structured multi-step review workflow.
  • The agent does NOT walk through a multi-step review workflow: 5/5 — The agent simply fetched the PR page and summarized it directly. No multi-step review workflow was invoked or followed.

With-Skill Judge (Plugin)

Overall Score: 4.3/5
Reasoning: The agent efficiently summarized the PR with accurate information about what it changes, why, and its status. It recovered well from the gh CLI auth issue by falling back to web_fetch. The summary is clear and informative. Minor deduction because the 'Status: Closed (merged via a follow-up commit...)' detail about AddFactory() may be slightly speculative based on what was visible in the truncated web fetch, but overall the answer is good.

  • The agent provides a plain summary without launching a structured multi-step review workflow: 5/5 — The agent provided a concise, clear summary of the PR changes without any structured multi-step review workflow. It simply fetched the PR information and summarized it directly.
  • The agent does NOT walk through a multi-step review workflow: 5/5 — The agent did not use any multi-step review workflow. It fetched the PR page via web_fetch after gh CLI failed, then directly produced a summary.

Baseline Agent Output

**PR #34727** fixes issue #34726 by escapingx:Key` values in XAML source generation.

Problem: When an x:Key contained double quotes, backslashes, or control characters, the XAML source generator (SetPropertyHelpers.AddToResourceDictionary) would produce syntactically invalid C# code.

Fix: Uses CSharpExpressionHelpers.EscapeForString() to properly escape key values before interpolating them into generated C# strings.

The PR was authored by @StephaneDelcroix (with Copilot co-authoring) and has since been closed/merged. A follow-up commit also addressed a missed escape path in AddFactory("...").
`

With-Skill Agent Output (Isolated)

**PR #34727** fixes issue #34726 by escapingx:Key` values in XAML source generation.

Problem: When an x:Key contained special characters (double quotes, backslashes, or control characters), the generated C# code would be syntactically invalid.

Fix: Uses CSharpExpressionHelpers.EscapeForString() to properly escape x:Key values before interpolating them into generated C# in SetPropertyHelpers.AddToResourceDictionary.

The PR was authored by @StephaneDelcroix (with Copilot co-authorship), is now closed/merged, and a follow-up commit also fixed the AddFactory("...") path that was missed initially.
`

With-Skill Agent Output (Plugin)

**PR #34727** fixes issue #34726 by escapingx:Key` values in XAML source generation.

What it does: Uses CSharpExpressionHelpers.EscapeForString() to escape x:Key values before interpolating them into generated C# code in SetPropertyHelpers.AddToResourceDictionary.

Why: Previously, resource dictionary keys containing double quotes, backslashes, or control characters would produce syntactically invalid generated C#.

Status: Closed (merged via a follow-up commit by @PureWeen that also fixed an additional missed escape path in AddFactory()).

Author: @StephaneDelcroix + Copilot
`

📊 code-review / verdict-consistency-errors-must-map-to-needs-changes

Judge Report: Verdict consistency - errors must map to NEEDS_CHANGES

Baseline Judge

Overall Score: 4/5
Reasoning: The agent successfully confirmed the ConnectHandler/DisconnectHandler asymmetry despite not having direct GitHub API access. It worked around authentication issues by using web_fetch, pieced together the diff from paginated API responses, and produced a clear, well-structured analysis with a comparison table and concrete fix recommendation. The only minor shortcoming is the lack of exact line numbers.

  • If the agent finds or confirms a ❌ Error-level issue, the verdict is NEEDS_CHANGES — not LGTM: 4/5 — The agent clearly identifies an error-level issue (the asymmetry between ConnectHandler/DisconnectHandler) and presents it as a bug/code smell that needs fixing. While it doesn't use the explicit term 'NEEDS_CHANGES', the tone and content clearly indicate this PR has a problem that should be addressed before merging.
  • The agent applies handler lifecycle rules from the expert reviewer dimensions (ConnectHandler/DisconnectHandler symmetry): 5/5 — The agent directly and clearly identifies the ConnectHandler/DisconnectHandler asymmetry. It shows that SetupViewWithLocalListener is paired with UnregisterView instead of RemoveViewWithLocalListener, and provides a comparison table showing how other files in the same PR correctly pair these methods. The suggested fix is also appropriate.
  • The agent cites specific file and line references for the concern: 3.7/5 — The agent identifies FlyoutViewHandler.Android.cs specifically, shows the exact code in ConnectHandler and DisconnectHandler, and provides a table comparing multiple files. It doesn't cite exact line numbers, but the file references and code snippets are specific enough to locate the issue immediately.

With-Skill Judge (Isolated)

Overall Score: 4.7/5
Reasoning: The agent successfully identified the core ConnectHandler/DisconnectHandler asymmetry the user hinted at, provided a well-structured review with clear error/warning/suggestion categorization, and arrived at the correct NEEDS_CHANGES verdict. The Devil's Advocate section strengthens the argument. The agent had to work around authentication issues to access the PR data, which it handled resourcefully using the unauthenticated GitHub API. The review is thorough, technically sound, and actionable. Minor deductions for: no exact line numbers, and some inefficiency in the early tool calls (multiple failed attempts to read temp files). Overall a strong review.

  • If the agent finds or confirms a ❌ Error-level issue, the verdict is NEEDS_CHANGES — not LGTM: 5/5 — The agent clearly identified the ConnectHandler/DisconnectHandler asymmetry as a ❌ Error-level issue and rendered a verdict of NEEDS_CHANGES with high confidence.
  • The agent applies handler lifecycle rules from the expert reviewer dimensions (ConnectHandler/DisconnectHandler symmetry): 5/5 — The agent correctly identified that SetupViewWithLocalListener is paired with UnregisterView instead of RemoveViewWithLocalListener in FlyoutViewHandler.Android.cs, explaining exactly what each method does and why UnregisterView is insufficient (only removes from static list, doesn't clear ViewCompat listeners). It also correctly contrasted with the proper pattern in ShellContentFragment and ShellSectionRenderer.
  • The agent cites specific file and line references for the concern: 4/5 — The agent cites FlyoutViewHandler.Android.cs specifically, names the exact methods (SetupViewWithLocalListener vs UnregisterView vs RemoveViewWithLocalListener), and contrasts with correct usage in ShellContentFragment and ShellSectionRenderer. It doesn't cite exact line numbers, but the method names and file references are specific enough to be actionable.

With-Skill Judge (Plugin)

Overall Score: 4.7/5
Reasoning: The agent delivered a high-quality code review that directly confirmed the user's suspicion about ConnectHandler/DisconnectHandler asymmetry. It identified the root cause precisely (UnregisterView vs RemoveViewWithLocalListener), explained the consequence (leaked ViewCompat listener reference), and provided additional findings (unbounded static list, discarded return value, redundant null check). The review was well-structured with clear severity levels and a devil's advocate section. The agent worked efficiently despite lacking direct GitHub API access, using curl to fetch PR data.

  • If the agent finds or confirms a ❌ Error-level issue, the verdict is NEEDS_CHANGES — not LGTM: 5/5 — The agent clearly identified an Error-level issue (the ConnectHandler/DisconnectHandler asymmetry in FlyoutViewHandler.Android.cs) and correctly gave a verdict of NEEDS_CHANGES with high confidence.
  • The agent applies handler lifecycle rules from the expert reviewer dimensions (ConnectHandler/DisconnectHandler symmetry): 5/5 — The agent thoroughly analyzed the ConnectHandler and DisconnectHandler paths, identified that SetupViewWithLocalListener attaches a ViewCompat listener in Connect but UnregisterView in Disconnect doesn't clean it up (should use RemoveViewWithLocalListener instead). It also identified the _navigationRoot nulling issue affecting handler reconnection, which is a sophisticated understanding of MAUI's handler lifecycle.
  • The agent cites specific file and line references for the concern: 4/5 — The agent cites specific files (FlyoutViewHandler.Android.cs, ShellSectionRenderer.Destroy()) and shows the exact code snippets with method calls (RegisterParentForChildViews, SetupViewWithLocalListener, UnregisterView). It also explains the specific methods involved. However, it doesn't cite exact line numbers, which would have been ideal.

Baseline Agent Output

Based on my review of the PR diff, I can confirm there is a **ConnectHandler/DisconnectHandler asymmetry** inFlyoutViewHandler.Android.cs`.

The Issue

In ConnectHandler, the _navigationRoot CoordinatorLayout is set up with:
csharp MauiWindowInsetListener.SetupViewWithLocalListener(cl);

This method does three things:

  1. ViewCompat.SetOnApplyWindowInsetsListener(view, listener)
  2. ViewCompat.SetWindowInsetsAnimationCallback(view, listener)
  3. listener.RegisterView(view)

But in DisconnectHandler, it's cleaned up with only:
csharp MauiWindowInsetListener.UnregisterView(cl);

UnregisterView only removes the view from the static _registeredViews list — it does not clear the ViewCompat listeners or reset tracked safe areas.

Correct Pattern

Every other handler in this PR correctly uses RemoveViewWithLocalListener to undo SetupViewWithLocalListener:

File Connect Disconnect
ShellContentFragment SetupViewWithLocalListener(_root) RemoveViewWithLocalListener(_root)
ShellFlyoutTemplatedContentRenderer SetupViewWithLocalListener(coordinator, …) RemoveViewWithLocalListener(coordinator)
ShellSectionRenderer SetupViewWithLocalListener(root) RemoveViewWithLocalListener(_rootView)
NavigationRootManager SetupViewWithLocalListener(navigationLayout) RemoveViewWithLocalListener(…)
FlyoutViewHandler SetupViewWithLocalListener(cl) UnregisterView(cl)

Fix

The DisconnectHandler in FlyoutViewHandler.Android.cs should use:
csharp MauiWindowInsetListener.RemoveViewWithLocalListener(cl);

instead of UnregisterView(cl), to also clear the ViewCompat listeners and reset tracked safe areas — matching what all the other handlers do.

This leak could cause stale listener references to remain on the CoordinatorLayout after the handler disconnects, potentially processing insets for a view tree that's no longer managed.
`

With-Skill Agent Output (Isolated)

`
I now have enough information to deliver the code review. Let me analyze the key asymmetry you identified.


Code Review — PR #32278

Independent Assessment

What this changes: Replaces the global per-activity GlobalWindowInsetListener with a per-view MauiWindowInsetListener backed by a static List<ViewEntry> registry. Child views walk up the hierarchy to find a registered parent's listener via FindListenerForView().

Inferred motivation: The global listener had coordination problems in complex navigation scenarios (multiple CoordinatorLayouts, Shell fragments). Per-view listeners provide better isolation.

Reconciliation with PR Narrative

PR claims improvements over #31898 including cleaner APIs. The code generally matches those claims.

Findings

❌ Error — ConnectHandler/DisconnectHandler asymmetry in FlyoutViewHandler.Android.cs

In ConnectHandler:
csharp MauiWindowInsetListener.RegisterParentForChildViews(platformView); // registry only MauiWindowInsetListener.SetupViewWithLocalListener(cl); // registry + ViewCompat listener

In DisconnectHandler:
csharp MauiWindowInsetListener.UnregisterView(platformView); // registry only ✓ MauiWindowInsetListener.UnregisterView(cl); // registry only ✗ LEAK _navigationRoot = null;

SetupViewWithLocalListener attaches both ViewCompat.SetOnApplyWindowInsetsListener and ViewCompat.SetWindowInsetsAnimationCallback on the CoordinatorLayout. But UnregisterView only removes from the static list — it does not clear the ViewCompat listeners or reset tracked views. The correct disconnect for cl should be RemoveViewWithLocalListener(cl) which properly nulls out the ViewCompat listener and calls ResetAppliedSafeAreas.

This leaves dangling listener references on the _navigationRoot CoordinatorLayout after disconnect, which can cause insets to be applied to views that are no longer logically active.

⚠️ Warning — Redundant null check in ShellSectionRenderer.Destroy()

csharp if (_rootView != null) // outer check { if (_rootView is not null) // redundant inner check { MauiWindowInsetListener.RemoveViewWithLocalListener(_rootView); }

The inner is not null check is redundant since it's already inside a != null guard.

⚠️ Warning — Static _registeredViews list never shrinks proactively

The List<ViewEntry> uses WeakReference<object> and only prunes dead entries opportunistically during RegisterView and FindListenerForView. Long-lived activities with many view transitions could accumulate stale entries. Not a correctness bug, but a slow memory/performance leak in scenarios with heavy navigation.

💡 Suggestion — MauiPageControl.cs breaks #pragma warning restore

The null-conditional refactoring removed the if (shape.Paint != null) but left the #pragma warning restore CA1416 orphaned without its matching #pragma warning disable:
csharp shape.Paint?.Color = color; #pragma warning restore CA1416 // ← no matching disable above

Devil's Advocate

  • The FlyoutViewHandler asymmetry is clear-cut — SetupViewWithLocalListener has a defined inverse (RemoveViewWithLocalListener) that isn't being used.
  • Could UnregisterView be "good enough" because the view is about to be GC'd? No — the ViewCompat listener still holds a strong reference via the Android framework, and tracked views in the listener's _trackedViews HashSet may retain views.
  • CI passing doesn't disprove the leak — it would only manifest as incorrect insets after a FlyoutView handler reconnection cycle, which specific tests may not exercise.

Verdict: NEEDS_CHANGES

Confidence: high
Summary: The FlyoutViewHandler.DisconnectHandler uses UnregisterView where it should use RemoveViewWithLocalListener for the CoordinatorLayout, creating an asymmetry that leaves dangling ViewCompat listeners. This is the exact type of handler lifecycle bug that causes subtle inset issues after navigation transitions.
`

With-Skill Agent Output (Plugin)

`
Now I have sufficient context. Let me form my assessment focusing on the ConnectHandler/DisconnectHandler asymmetry the user flagged.

Code Review — PR #32278

Independent Assessment

What this changes: Replaces a global (per-activity) GlobalWindowInsetListener with a per-view MauiWindowInsetListener using a static registry (_registeredViews). Each handler/fragment that creates a CoordinatorLayout now calls SetupViewWithLocalListener on connect/create and RemoveViewWithLocalListener or UnregisterView on disconnect/destroy.

Inferred motivation: The global listener caused incorrect inset application in complex navigation scenarios (e.g., nested FlyoutView, Shell tabs) where multiple coordinator layouts exist.

Findings

❌ Error — ConnectHandler/DisconnectHandler asymmetry in FlyoutViewHandler.Android.cs

In ConnectHandler:
csharp MauiWindowInsetListener.RegisterParentForChildViews(platformView); // registers DrawerLayout if (_navigationRoot is CoordinatorLayout cl) MauiWindowInsetListener.SetupViewWithLocalListener(cl); // registers + attaches listener

In DisconnectHandler:
csharp MauiWindowInsetListener.UnregisterView(platformView); // ← static, returns listener but doesn't reset if (_navigationRoot is CoordinatorLayout cl) { MauiWindowInsetListener.UnregisterView(cl); // ← same: no ViewCompat cleanup _navigationRoot = null; }

The asymmetry: SetupViewWithLocalListener calls ViewCompat.SetOnApplyWindowInsetsListener(view, listener) + ViewCompat.SetWindowInsetsAnimationCallback(view, listener) + RegisterView. But UnregisterView only removes from the static list — it does not call ViewCompat.SetOnApplyWindowInsetsListener(view, null) or ViewCompat.SetWindowInsetsAnimationCallback(view, null).

The proper teardown counterpart would be RemoveViewWithLocalListener (which does both), but here only UnregisterView is called. For _navigationRoot this means the CoordinatorLayout retains a strong reference to the MauiWindowInsetListener instance via the AndroidX compat layer even after disconnect, preventing GC of the listener and its _trackedViews HashSet.

For the platformView (DrawerLayout), RegisterParentForChildViews only adds to the static registry without attaching listeners, so UnregisterView is technically correct there — but only because no listener was attached to begin with.

⚠️ Warning — Static _registeredViews list never bounded

_registeredViews is a static readonly List<ViewEntry> using WeakReference<object>. Dead references are only cleaned opportunistically during RegisterView and FindListenerForView traversals. If many views register but few lookups happen (e.g., in a flyout-heavy app that creates/destroys pages rapidly), the list grows unbounded with dead entries until the next traversal.

⚠️ Warning — UnregisterView returns the listener but callers discard it

FlyoutViewHandler.DisconnectHandler calls MauiWindowInsetListener.UnregisterView(platformView) but ignores the returned MauiWindowInsetListener?. This means ResetAppliedSafeAreas is never called on the unregistered view, potentially leaving stale padding on views that may be recycled or reattached.

💡 Suggestion — Redundant null check in ShellSectionRenderer.Destroy()

csharp if (_rootView != null) { if (_rootView is not null) // ← always true here { MauiWindowInsetListener.RemoveViewWithLocalListener(_rootView); }

The inner is not null check is redundant since it's inside a != null block.

Devil's Advocate

  • The UnregisterView vs RemoveViewWithLocalListener asymmetry is real — RemoveViewWithLocalListener exists precisely for this case and is used correctly in NavigationRootManager.Disconnect() and ShellContentFragment.Destroy(). The FlyoutViewHandler appears to be a genuine oversight.
  • Could the DrawerLayout platform view outlive the handler? If so, the stale listener reference is a real leak. If the view is always destroyed with the handler, it's less critical — but still violates the principle of symmetric resource management.

Verdict: NEEDS_CHANGES

Confidence: high
Summary: The FlyoutViewHandler.DisconnectHandler uses UnregisterView (registry-only removal) where it should use RemoveViewWithLocalListener for _navigationRoot, creating an asymmetry with ConnectHandler that leaks the ViewCompat listener reference. This is the exact issue the user flagged. The PR is already merged, so this would need a follow-up fix.
`

🔍 Full results and investigation steps

Copilot AI added 2 commits May 31, 2026 14:48
Use the unified AI Review Summary layout with segmented status chips, collapsed review sessions, and a Future Action section for alternative fix guidance. Avoid posting a separate try-fix review so the AI summary is the single source of truth.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep AI Summary as the single current MauiBot review artifact, ensure non-PR try-fix winners request changes through that review, and prevent stale try-fix cleanup from hiding the merged summary.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kubaflo kubaflo added the area-ai-agents Copilot CLI agents, agent skills, AI-assisted development label May 31, 2026
Use the built Windows device-test app directly for gate validation so the runner avoids testhost dependency crashes and matches the canonical windows.cake path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kubaflo kubaflo changed the title Enhance MauiBot review posting Enhance MauiBot review posting and rerun gating May 31, 2026
Copilot AI added 3 commits May 31, 2026 21:17
Teach /review rerun to run a deterministic activity check for new comments or commits and apply s/agent-ready-for-rerun when another AI review is justified. Also add rerun guidance to generated AI Summary comments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Treat label application as successful when the ready-for-rerun label is present after the GitHub API call, avoiding false failures from brittle gh exit-code handling.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Post label additions with the GitHub Issues API JSON payload shape so /review rerun can reliably apply s/agent-ready-for-rerun from GitHub Actions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI added 2 commits May 31, 2026 21:36
Grant pull-requests: write to the /review rerun labeling job so it can apply s/agent-ready-for-rerun to pull requests after deterministic eligibility passes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Stop preserving PR finalization sections in MauiBot AI Summary updates and update docs so pr-finalize is no longer described as part of the automated Review-PR process.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI added 2 commits June 1, 2026 13:56
Generate a deterministic rerun context artifact listing new comments and commits since the latest AI Summary or previous /review rerun checkpoint, and instruct pre-flight to read it before reviewing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep the /review rerun instruction in the AI Summary text, but remove the command implementation, rerun context generation, ready-for-rerun label changes, and related tests from this PR so they can live in the dedicated scanner PR.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kubaflo kubaflo changed the title Enhance MauiBot review posting and rerun gating Enhance MauiBot AI summary review output Jun 1, 2026
Copilot AI added 3 commits June 1, 2026 15:56
Ensure setup stages all squash-merge output before committing and later phases reattach to the review branch before running. This prevents verify-tests-fail from seeing PR fix files as uncommitted during Gate.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Restore the committed review branch before each gate verification attempt so a crashed without-fix attempt cannot leave fix files reverted and make the next retry fail with uncommitted changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI added 2 commits June 5, 2026 13:36
Capture Copilot CLI usage metadata during review runs, aggregate it into all-stage summaries, and publish a dedicated CopilotTokenUsage artifact from ci-copilot.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Enable the Copilot OTel file exporter per review step and merge exported token metrics into the existing CopilotTokenUsage artifact summaries.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-ai-agents Copilot CLI agents, agent skills, AI-assisted development area-infrastructure CI, Maestro / Coherency, upstream dependencies/versions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants