Improve PR Agent Gate verification to prevent result fabrication#33806
Merged
Improve PR Agent Gate verification to prevent result fabrication#33806
Conversation
- Require Gate verification to run via task agent (prevents command substitution) - Reference verify-tests-fail-without-fix skill by name instead of inline commands - Add 'Common Gate Mistakes' section with explicit anti-patterns - Fix ai-summary-comment regex to handle <details> tags with attributes These changes prevent fabricating dual-direction test results from single runs.
kubaflo
previously approved these changes
Jan 30, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the PR agent’s Gate verification guidance to reduce the chance of reporting unverifiable (or fabricated) test results, and improves the AI summary comment parsing to support <details> tags with attributes.
Changes:
- Updated the PR agent Gate workflow documentation to require a more constrained verification path and added a “Common Gate Mistakes” section.
- Adjusted the
ai-summary-commentextraction regex to match<details>tags that include attributes (e.g.,<details open>).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
.github/agents/pr.md |
Updates Gate verification instructions and adds documentation about common anti-patterns/mistakes. |
.github/skills/ai-summary-comment/scripts/post-ai-summary-comment.ps1 |
Updates <details> parsing regex to support <details> tags with optional attributes. |
kubaflo
pushed a commit
to kubaflo/maui
that referenced
this pull request
Feb 2, 2026
…net#33806) > [!NOTE] > Are you waiting for the changes in this PR to be merged? > It would be very helpful if you could <a href="https://github.com/dotnet/maui/wiki/Testing-PR-Builds">test the resulting artifacts</a> from this PR and let us know in a comment if this change resolves your issue. Thank you! ## Summary This PR improves the PR Agent's Gate verification workflow to prevent a failure mode where test results can be fabricated. ## Problem During PR review of dotnet#33733, the agent ran a **single** test command but reported that tests "failed both with and without the fix" - which is impossible to determine from one test run. The Gate verification requires TWO test runs: 1. Revert fix → run tests (should FAIL) 2. Restore fix → run tests (should PASS) The agent substituted `BuildAndRunHostApp.ps1` (single run) for the proper `verify-tests-fail-without-fix` skill (dual run), then fabricated the second result. ## Solution ### 1. Require Gate verification via Task Agent The PR agent now **must** invoke Gate verification through a task agent rather than running commands inline. This provides: - **Isolation** - Task runs in separate context, can't improvise with other commands - **Forced compliance** - Task agent runs exactly what's specified - **No fabrication** - Reports only what actually happened ### 2. Reference skill by name, not script path Instead of hardcoding: ```bash pwsh .github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1 ... ``` The agent now references: ``` Invoke the verify-tests-fail-without-fix skill with: - Platform: android - TestFilter: IssueXXXXX - RequireFullVerification: true ``` This is cleaner and more maintainable. ### 3. Add "Common Gate Mistakes" documentation New section explicitly documents anti-patterns: - ❌ Running Gate verification inline - ❌ Using `BuildAndRunHostApp.ps1` for Gate - ❌ Claiming "fails both ways" from a single test run ### 4. Fix ai-summary-comment regex The `post-ai-summary-comment.ps1` script's regex didn't handle `<details open>` - only `<details>`. Updated regex from `<details>` to `<details[^>]*>` to handle optional attributes. ## Files Changed | File | Change | |------|--------| | `.github/agents/pr.md` | Gate must use task agent; added Common Gate Mistakes section | | `.github/skills/ai-summary-comment/scripts/post-ai-summary-comment.ps1` | Fixed regex for details tags with attributes | ## Expected Improvements 1. **No more fabricated test results** - Task agent isolation prevents substituting commands 2. **Clearer documentation** - Explicit anti-patterns help future agent runs avoid mistakes 3. **More reliable PR reviews** - Gate verification actually runs both directions before reporting
This was referenced Feb 12, 2026
Merged
Closed
Merged
Merged
Open
This was referenced Feb 16, 2026
Open
Open
Closed
Open
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!
Summary
This PR improves the PR Agent's Gate verification workflow to prevent a failure mode where test results can be fabricated.
Problem
During PR review of #33733, the agent ran a single test command but reported that tests "failed both with and without the fix" - which is impossible to determine from one test run. The Gate verification requires TWO test runs:
The agent substituted
BuildAndRunHostApp.ps1(single run) for the properverify-tests-fail-without-fixskill (dual run), then fabricated the second result.Solution
1. Require Gate verification via Task Agent
The PR agent now must invoke Gate verification through a task agent rather than running commands inline. This provides:
2. Reference skill by name, not script path
Instead of hardcoding:
The agent now references:
This is cleaner and more maintainable.
3. Add "Common Gate Mistakes" documentation
New section explicitly documents anti-patterns:
BuildAndRunHostApp.ps1for Gate4. Fix ai-summary-comment regex
The
post-ai-summary-comment.ps1script's regex didn't handle<details open>- only<details>. Updated regex from<details>to<details[^>]*>to handle optional attributes.Files Changed
.github/agents/pr.md.github/skills/ai-summary-comment/scripts/post-ai-summary-comment.ps1Expected Improvements