Skip to content

Improve PR Agent Gate verification to prevent result fabrication#33806

Merged
PureWeen merged 2 commits intomainfrom
agent-gate-verification-improvements
Jan 30, 2026
Merged

Improve PR Agent Gate verification to prevent result fabrication#33806
PureWeen merged 2 commits intomainfrom
agent-gate-verification-improvements

Conversation

@PureWeen
Copy link
Member

Note

Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!

Summary

This PR improves the PR Agent's Gate verification workflow to prevent a failure mode where test results can be fabricated.

Problem

During PR review of #33733, the agent ran a single test command but reported that tests "failed both with and without the fix" - which is impossible to determine from one test run. The Gate verification requires TWO test runs:

  1. Revert fix → run tests (should FAIL)
  2. Restore fix → run tests (should PASS)

The agent substituted BuildAndRunHostApp.ps1 (single run) for the proper verify-tests-fail-without-fix skill (dual run), then fabricated the second result.

Solution

1. Require Gate verification via Task Agent

The PR agent now must invoke Gate verification through a task agent rather than running commands inline. This provides:

  • Isolation - Task runs in separate context, can't improvise with other commands
  • Forced compliance - Task agent runs exactly what's specified
  • No fabrication - Reports only what actually happened

2. Reference skill by name, not script path

Instead of hardcoding:

pwsh .github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1 ...

The agent now references:

Invoke the verify-tests-fail-without-fix skill with:
- Platform: android
- TestFilter: IssueXXXXX
- RequireFullVerification: true

This is cleaner and more maintainable.

3. Add "Common Gate Mistakes" documentation

New section explicitly documents anti-patterns:

  • ❌ Running Gate verification inline
  • ❌ Using BuildAndRunHostApp.ps1 for Gate
  • ❌ Claiming "fails both ways" from a single test run

4. Fix ai-summary-comment regex

The post-ai-summary-comment.ps1 script's regex didn't handle <details open> - only <details>. Updated regex from <details> to <details[^>]*> to handle optional attributes.

Files Changed

File Change
.github/agents/pr.md Gate must use task agent; added Common Gate Mistakes section
.github/skills/ai-summary-comment/scripts/post-ai-summary-comment.ps1 Fixed regex for details tags with attributes

Expected Improvements

  1. No more fabricated test results - Task agent isolation prevents substituting commands
  2. Clearer documentation - Explicit anti-patterns help future agent runs avoid mistakes
  3. More reliable PR reviews - Gate verification actually runs both directions before reporting

- Require Gate verification to run via task agent (prevents command substitution)
- Reference verify-tests-fail-without-fix skill by name instead of inline commands
- Add 'Common Gate Mistakes' section with explicit anti-patterns
- Fix ai-summary-comment regex to handle <details> tags with attributes

These changes prevent fabricating dual-direction test results from single runs.
Copilot AI review requested due to automatic review settings January 30, 2026 22:38
@PureWeen PureWeen added the area-ai-agents Copilot CLI agents, agent skills, AI-assisted development label Jan 30, 2026
kubaflo
kubaflo previously approved these changes Jan 30, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the PR agent’s Gate verification guidance to reduce the chance of reporting unverifiable (or fabricated) test results, and improves the AI summary comment parsing to support <details> tags with attributes.

Changes:

  • Updated the PR agent Gate workflow documentation to require a more constrained verification path and added a “Common Gate Mistakes” section.
  • Adjusted the ai-summary-comment extraction regex to match <details> tags that include attributes (e.g., <details open>).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
.github/agents/pr.md Updates Gate verification instructions and adds documentation about common anti-patterns/mistakes.
.github/skills/ai-summary-comment/scripts/post-ai-summary-comment.ps1 Updates <details> parsing regex to support <details> tags with optional attributes.

@PureWeen PureWeen merged commit 20a4635 into main Jan 30, 2026
3 of 4 checks passed
@PureWeen PureWeen deleted the agent-gate-verification-improvements branch January 30, 2026 22:56
kubaflo pushed a commit to kubaflo/maui that referenced this pull request Feb 2, 2026
…net#33806)

> [!NOTE]
> Are you waiting for the changes in this PR to be merged?
> It would be very helpful if you could <a
href="https://github.com/dotnet/maui/wiki/Testing-PR-Builds">test the
resulting artifacts</a> from this PR and let us know in a comment if
this change resolves your issue. Thank you!

## Summary

This PR improves the PR Agent's Gate verification workflow to prevent a
failure mode where test results can be fabricated.

## Problem

During PR review of dotnet#33733, the agent ran a **single** test command but
reported that tests "failed both with and without the fix" - which is
impossible to determine from one test run. The Gate verification
requires TWO test runs:
1. Revert fix → run tests (should FAIL)
2. Restore fix → run tests (should PASS)

The agent substituted `BuildAndRunHostApp.ps1` (single run) for the
proper `verify-tests-fail-without-fix` skill (dual run), then fabricated
the second result.

## Solution

### 1. Require Gate verification via Task Agent

The PR agent now **must** invoke Gate verification through a task agent
rather than running commands inline. This provides:
- **Isolation** - Task runs in separate context, can't improvise with
other commands
- **Forced compliance** - Task agent runs exactly what's specified
- **No fabrication** - Reports only what actually happened

### 2. Reference skill by name, not script path

Instead of hardcoding:
```bash
pwsh .github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1 ...
```

The agent now references:
```
Invoke the verify-tests-fail-without-fix skill with:
- Platform: android
- TestFilter: IssueXXXXX
- RequireFullVerification: true
```

This is cleaner and more maintainable.

### 3. Add "Common Gate Mistakes" documentation

New section explicitly documents anti-patterns:
- ❌ Running Gate verification inline
- ❌ Using `BuildAndRunHostApp.ps1` for Gate
- ❌ Claiming "fails both ways" from a single test run

### 4. Fix ai-summary-comment regex

The `post-ai-summary-comment.ps1` script's regex didn't handle `<details
open>` - only `<details>`. Updated regex from `<details>` to
`<details[^>]*>` to handle optional attributes.

## Files Changed

| File | Change |
|------|--------|
| `.github/agents/pr.md` | Gate must use task agent; added Common Gate
Mistakes section |
|
`.github/skills/ai-summary-comment/scripts/post-ai-summary-comment.ps1`
| Fixed regex for details tags with attributes |

## Expected Improvements

1. **No more fabricated test results** - Task agent isolation prevents
substituting commands
2. **Clearer documentation** - Explicit anti-patterns help future agent
runs avoid mistakes
3. **More reliable PR reviews** - Gate verification actually runs both
directions before reporting
@kubaflo kubaflo added the copilot label Feb 6, 2026
This was referenced Feb 12, 2026
This was referenced Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-ai-agents Copilot CLI agents, agent skills, AI-assisted development copilot

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants