Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
5fef66d
[CI] Extend gate to all test types and decouple from PR review (#34705)
kubaflo Apr 15, 2026
eb0b82f
Improve evaluate-pr-tests workflow: slash_command + workflow_dispatch…
PureWeen Apr 15, 2026
3c06583
Add automated milestone management with Versions.props detection (#34…
PureWeen Apr 16, 2026
f4fb086
Add preview/RC milestone support with release branch detection (#34999)
PureWeen Apr 16, 2026
b5f9079
Add daily PR review queue workflow with actionability detection (#34818)
kubaflo Apr 16, 2026
ff8dfd8
added ui codes
TamilarasanSF4853 Feb 11, 2026
d0fef01
modified codes
TamilarasanSF4853 Feb 12, 2026
9e90b6d
removed changes
TamilarasanSF4853 Feb 12, 2026
9dece64
added buttons
TamilarasanSF4853 Feb 12, 2026
67e0aa5
added test case
TamilarasanSF4853 Feb 12, 2026
c2512f1
added order
TamilarasanSF4853 Feb 13, 2026
43be5cf
added new tests
TamilarasanSF4853 Feb 16, 2026
409064e
added test
TamilarasanSF4853 Feb 17, 2026
b02e70d
added test
TamilarasanSF4853 Feb 17, 2026
d2436e0
updated test cases
TamilarasanSF4853 Feb 17, 2026
1550693
added test cases
TamilarasanSF4853 Feb 17, 2026
2ee727e
added extenstion
TamilarasanSF4853 Feb 24, 2026
308d653
updated test cases
TamilarasanSF4853 Feb 25, 2026
be1c5ad
added sleep
TamilarasanSF4853 Feb 25, 2026
275470f
reduced delay value
TamilarasanSF4853 Feb 25, 2026
9fd539e
added new test
TamilarasanSF4853 Feb 25, 2026
3b2883a
added changes
TamilarasanSF4853 Feb 25, 2026
9fb4774
modified test for iOS 26
TamilarasanSF4853 Feb 27, 2026
2272fac
added changes
TamilarasanSF4853 Feb 27, 2026
4b3b764
added for different androids
TamilarasanSF4853 Feb 27, 2026
002ae06
added changes
TamilarasanSF4853 Apr 6, 2026
5512c98
removed sleep
TamilarasanSF4853 Apr 7, 2026
e8fe562
updated test cases
TamilarasanSF4853 Apr 7, 2026
5468384
added issue link
TamilarasanSF4853 Apr 7, 2026
47424bb
added issue link
TamilarasanSF4853 Apr 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 14 additions & 3 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,18 @@ Skills are modular capabilities that can be invoked directly or used by agents.

#### User-Facing Skills

1. **issue-triage** (`.github/skills/issue-triage/SKILL.md`)
1. **pr-review** (`.github/skills/pr-review/SKILL.md`)
- **Purpose**: End-to-end PR review orchestrator — 3 phases: pr-preflight, try-fix, pr-report. Gate runs separately before this skill via Review-PR.ps1.
- **Trigger phrases**: "review PR #XXXXX", "work on PR #XXXXX", "fix issue #XXXXX", "continue PR #XXXXX"
- **Capabilities**: Multi-model fix exploration, alternative comparison, PR review recommendation
- **Do NOT use for**: Just running tests manually → Use `sandbox-agent`
- **Phase instructions** (in `.github/pr-review/`):
- `pr-preflight.md` — Context gathering from issue/PR
- `pr-report.md` — Final recommendation
- **Phase skill**: `try-fix` — Multi-model fix exploration
- **Note**: Gate (test verification) runs as a script step in `Review-PR.ps1` before this skill is invoked. Gate result is passed in the prompt.

2. **issue-triage** (`.github/skills/issue-triage/SKILL.md`)
- **Purpose**: Query and triage open issues that need milestones, labels, or investigation
- **Trigger phrases**: "find issues to triage", "show me old Android issues", "what issues need attention"
- **Scripts**: `init-triage-session.ps1`, `query-issues.ps1`, `record-triage.ps1`
Expand Down Expand Up @@ -286,8 +297,8 @@ Skills are modular capabilities that can be invoked directly or used by agents.
- **Trigger phrases**: "write XAML tests for #XXXXX", "test XamlC behavior", "reproduce XAML parsing bug"
- **Output**: Test files for Controls.Xaml.UnitTests

8. **verify-tests-fail-without-fix** (`.github/skills/verify-tests-fail-without-fix/SKILL.md`)
- **Purpose**: Verifies UI tests catch the bug before fix and pass with fix
9. **verify-tests-fail-without-fix** (`.github/skills/verify-tests-fail-without-fix/SKILL.md`)
- **Purpose**: Verifies tests catch the bug before fix and pass with fix. Auto-detects test type (UI, device, unit, XAML) and dispatches to the appropriate runner.
- **Two modes**: Verify failure only (test creation) or full verification (test + fix)
- **Used by**: After creating tests, before considering PR complete

Expand Down
82 changes: 73 additions & 9 deletions .github/instructions/gh-aw-workflows.instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,34 @@ applyTo:

# gh-aw (GitHub Agentic Workflows) Guidelines

## 🚨 Before You Build: Prefer Built-in gh-aw Features

**CRITICAL RULE:** Before implementing any trigger, output, scheduling, or interaction mechanism in a gh-aw workflow, check whether gh-aw has a built-in feature that does it. gh-aw extends GitHub Actions with many convenience features — manually reimplementing them is always worse (more code, more bugs, missing platform integration like emoji reactions, sanitized inputs, and noise reduction).

### Step 1: Check the anti-patterns table below
### Step 2: If not listed, check the [triggers reference](https://github.github.com/gh-aw/reference/triggers/), [frontmatter reference](https://github.github.com/gh-aw/reference/frontmatter/), and [safe-outputs reference](https://github.github.com/gh-aw/reference/safe-outputs/)
### Step 3: If a built-in exists, use it. If not, proceed with manual implementation.

### Anti-Patterns: Manual Reimplementations to Avoid

| If you're about to implement... | Use this built-in instead | Docs |
|---------------------------------|--------------------------|------|
| `issue_comment` + `startsWith(comment.body, '/cmd')` | `slash_command:` trigger | [Command Triggers](https://github.github.com/gh-aw/reference/command-triggers/) |
| Manual emoji reaction on triggering comment | `reaction:` field under `on:` | [Frontmatter](https://github.github.com/gh-aw/reference/frontmatter/) |
| Posting "workflow started/completed" status comments | `status-comment: true` under `on:` | [Frontmatter](https://github.github.com/gh-aw/reference/frontmatter/) |
| Fixed cron schedule (`0 9 * * 1`) for non-critical timing | `schedule: weekly on monday around 9:00` (fuzzy) | [Triggers](https://github.github.com/gh-aw/reference/triggers/) |
| Manual `if:` to skip bot-authored PRs | `skip-bots:` under `on:` | [Triggers](https://github.github.com/gh-aw/reference/triggers/) |
| Manual `if:` to skip by author role | `skip-roles:` under `on:` | [Triggers](https://github.github.com/gh-aw/reference/triggers/) |
| Manual label check + removal for one-shot commands | `label_command:` trigger | [Triggers](https://github.github.com/gh-aw/reference/triggers/) |
| Editing old comments to collapse them | `hide-older-comments: true` on `add-comment:` | [Safe Outputs](https://github.github.com/gh-aw/reference/safe-outputs/) |
| Creating no-op report issues | `noop: report-as-issue: false` | [Safe Outputs / Monitoring](https://github.github.com/gh-aw/patterns/monitoring/) |
| Auto-closing older issues from same workflow | `close-older-issues: true` on `create-issue:` | [Safe Outputs](https://github.github.com/gh-aw/reference/safe-outputs/) |
| Disabling workflow after a date | `stop-after:` under `on:` | [Triggers](https://github.github.com/gh-aw/reference/triggers/) |
| Manual approval gating | `manual-approval:` under `on:` | [Triggers](https://github.github.com/gh-aw/reference/triggers/) |
| Search-based skip logic in `steps:` | `skip-if-match:` / `skip-if-no-match:` under `on:` | [Triggers](https://github.github.com/gh-aw/reference/triggers/) |

**Note:** gh-aw is actively developed. If a capability feels like something a framework would provide natively, check the reference docs — it probably exists even if it's not in this table yet.

## Architecture

gh-aw workflows are authored as `.md` files with YAML frontmatter, compiled to `.lock.yml` via `gh aw compile`. The lock file is auto-generated — **never edit it manually**.
Expand All @@ -29,6 +57,8 @@ agent job:
| Platform steps | ✅ Yes | ✅ Yes | ✅ Yes | Platform-controlled |
| Agent container | ❌ Scrubbed | ❌ Scrubbed | ❌ Scrubbed | ✅ But sandboxed |

**⚠️ Agent container credential nuance:** `GITHUB_TOKEN` and `gh` CLI credentials are scrubbed inside the agent container. However, `COPILOT_TOKEN` (used for LLM inference) is present in the environment via `--env-all`. Any subprocess (e.g., `dotnet build`, `npm install`) inherits this variable. The AWF network firewall, `redact_secrets.cjs` (post-agent log scrubbing), and the threat detection agent limit the blast radius. See [Security Boundaries](#security-boundaries) below.

### Step Ordering (Critical)

User `steps:` **always run before** platform-generated steps. You cannot insert user steps after platform steps.
Expand All @@ -48,6 +78,41 @@ By default, `gh aw compile` automatically injects a fork guard into the activati

To **allow fork PRs**, add `forks: ["*"]` to the `pull_request` trigger in the `.md` frontmatter. The compiler removes the auto-injected guard from the compiled `if:` conditions. This is safe when the workflow uses the `Checkout-GhAwPr.ps1` pattern (checkout + trusted-infra restore) and the agent is sandboxed.

## Security Boundaries

### Key Principles (from [GitHub Security Lab](https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/))

1. **Never execute untrusted PR code with elevated credentials.** The classic "pwn-request" attack is `pull_request_target` + checkout PR + run build scripts with `GITHUB_TOKEN`. The attack surface includes build scripts (`make`, `build.ps1`), package manager hooks (`npm postinstall`, MSBuild targets), and test runners.

2. **Treating PR contents as passive data is safe.** Reading, analyzing, or diffing PR code is fine — the danger is *executing* it. Our gh-aw workflows read code for evaluation; they never build or run it.

3. **`pull_request_target` grants write permissions and secrets access.** This is by design — the workflow YAML comes from the base branch (trusted). But any step that checks out and runs fork code in this context creates a vulnerability.

4. **`pull_request` from forks has no secrets access.** GitHub withholds secrets because the workflow YAML comes from the fork (untrusted). This is the safe default for CI builds on fork PRs.

5. **The `workflow_run` pattern separates privilege from code execution.** Build in an unprivileged `pull_request` job → pass artifacts → process in a privileged `workflow_run` job. This is architecturally what gh-aw does: agent runs read-only, `safe_outputs` job has write permissions.

### gh-aw Defense Layers

| Layer | What it does | What it doesn't do |
|-------|-------------|-------------------|
| **AWF network firewall** | Restricts outbound to allowlisted domains | Doesn't prevent reading env vars inside the container |
| **`redact_secrets.cjs`** | Scrubs known secret values from logs/artifacts post-agent | Doesn't catch encoded/obfuscated values |
| **Threat detection agent** | Reviews agent outputs before safe-outputs publishes them | Can miss novel exfiltration techniques |
| **Safe-outputs permission separation** | Write operations happen in separate job, not the agent | Agent can still request writes via safe-output tools |
| **`max: 1` on `add-comment`** | Limits agent to one comment | That one comment could contain sensitive data (mitigated by redaction) |
| **XPIA prompt** | Instructs LLM to resist prompt injection from untrusted content | LLM compliance is probabilistic, not guaranteed |
| **`pre_activation` role check** | Gates on write-access collaborators | Does not apply if `roles: all` is set |

### Rules for gh-aw Workflow Authors

- ✅ **DO** treat PR contents as passive data (read, analyze, diff)
- ✅ **DO** run data-gathering scripts in `steps:` (pre-agent, trusted context) not inside the agent
- ✅ **DO** use `Checkout-GhAwPr.ps1` for `workflow_dispatch` to restore trusted `.github/` from base
- ❌ **DO NOT** run `dotnet build`, `npm install`, or any build command on untrusted PR code inside the agent — build tool hooks (MSBuild targets, postinstall scripts) can read `COPILOT_TOKEN` from the environment
- ❌ **DO NOT** execute workspace scripts (`.ps1`, `.sh`, `.py`) after checking out a fork PR in `steps:` — those run with `GITHUB_TOKEN`
- ❌ **DO NOT** set `roles: all` on workflows that process PR content — this allows any user to trigger the workflow

## Fork PR Handling

### The "pwn-request" Threat Model
Expand All @@ -65,12 +130,13 @@ Reference: https://securitylab.github.com/resources/github-actions-preventing-pw
| `workflow_dispatch` | ❌ Skipped | ✅ Works — user steps handle checkout and restore is final |
| `issue_comment` (same-repo) | ✅ Yes | ✅ Works — files already on PR branch |
| `issue_comment` (fork) | ✅ Yes | ⚠️ Works — `checkout_pr_branch.cjs` re-checks out fork branch after user steps, potentially overwriting restored infra. Acceptable because agent is sandboxed (no credentials, max 1 comment via safe-outputs). Pre-flight check catches missing `SKILL.md` if fork isn't rebased. |
| `slash_command` | ✅ Yes (compiles to `issue_comment` internally) | Same behavior as `issue_comment` above, but with platform-managed command matching, emoji reactions, and sanitized input. Prefer `slash_command:` over manual `issue_comment` + `startsWith()`. |

### The `issue_comment` + Fork Problem

For `/slash-command` triggers on fork PRs, `checkout_pr_branch.cjs` runs AFTER all user steps and re-checks out the fork branch. This overwrites any files restored by user steps (e.g., `.github/skills/`). A fork could include a crafted `SKILL.md` that alters the agent's evaluation behavior.

**Accepted residual risk:** The agent runs in a sandboxed container with all credentials scrubbed. The worst outcome is a manipulated evaluation comment (`safe-outputs: add-comment: max: 1`). The agent has no ability to push code, access secrets, or exfiltrate data. The pre-flight check in the agent prompt catches the case where `SKILL.md` is missing entirely (fork not rebased on `main`).
**Accepted residual risk:** The agent runs in a sandboxed container with `GITHUB_TOKEN` and `gh` CLI credentials scrubbed. `COPILOT_TOKEN` (for LLM inference) remains in the environment but the AWF network firewall restricts outbound connections to an allowlist of domains, `redact_secrets.cjs` scrubs known secret values from logs/outputs post-agent, and the threat detection agent reviews outputs before they are published. The worst practical outcome is a manipulated evaluation comment (`safe-outputs: add-comment: max: 1`). The pre-flight check in the agent prompt catches the case where `SKILL.md` is missing entirely (fork not rebased on `main`).

**Upstream issue:** [github/gh-aw#18481](https://github.com/github/gh-aw/issues/18481) — "Using gh-aw in forks of repositories"

Expand All @@ -88,17 +154,15 @@ steps:
```

The script:
1. Captures the base branch SHA before checkout
2. Checks out the PR branch via `gh pr checkout`
3. Deletes `.github/skills/` and `.github/instructions/` (prevents fork-added files)
4. Restores them from the base branch SHA (best-effort, non-fatal)
1. Verifies the PR author has write access and rejects fork PRs
2. Captures the base branch SHA before checkout
3. Checks out the PR branch via `gh pr checkout`
4. Restores `.github/skills/`, `.github/instructions/`, and `.github/copilot-instructions.md` from the base branch SHA (fatal on failure)

**Behavior by trigger:**
- **`workflow_dispatch`**: Platform checkout is skipped, so the restore IS the final workspace state (trusted files from base branch)
- **`pull_request`** (same-repo): User step restores trusted infra. `checkout_pr_branch.cjs` runs after and re-checks out PR branch — for same-repo PRs, skill files typically match main unless the PR modified them.
- **`pull_request`** (fork with `forks: ["*"]`): Same as above, but fork's skill files may differ. Same residual risk as `issue_comment` fork case — agent is sandboxed, pre-flight catches missing `SKILL.md`.
- **`issue_comment`** (same-repo): Platform re-checks out PR branch — files already match, effectively a no-op
- **`issue_comment`** (fork): Platform re-checks out fork branch after us, overwriting restored files. Agent is sandboxed; pre-flight in the prompt catches missing `SKILL.md`
- **`slash_command`** (same-repo): Platform's `checkout_pr_branch.cjs` handles checkout. Skill files typically match main unless the PR modified them.
- **`slash_command`** (fork): Platform re-checks out fork branch after user steps, overwriting restored files. Agent is sandboxed; pre-flight in the prompt catches missing `SKILL.md`

### Anti-Patterns

Expand Down
71 changes: 41 additions & 30 deletions .github/pr-review/pr-gate.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# PR Gate Test Verification
# PR Gate - Test Before and After Fix

> **⛔ This phase MUST pass before continuing to Try-Fix. If it fails, stop and inform user.**

> 🚨 Gate verification MUST run via task agent — never inline.
> In CI (Review-PR.ps1), the gate runs `verify-tests-fail.ps1` directly as a script step.
> For manual usage, you can invoke it yourself or via a task agent.

---

Expand All @@ -26,41 +27,32 @@ Choose a platform that is BOTH affected by the bug AND available on the current

## Steps

1. **Check if tests exist:**
1. **Detect tests in PR** using the shared detection script:
```bash
gh pr view XXXXX --json files --jq '.files[].path' | grep -E "TestCases\.(HostApp|Shared\.Tests)"
pwsh .github/scripts/shared/Detect-TestsInDiff.ps1 -PRNumber XXXXX
```
If NO tests exist → inform user, suggest `write-tests-agent`. Gate is ⚠️ SKIPPED.
This auto-detects all test types: UI tests, device tests, unit tests, XAML tests.
If NO tests detected → inform user, suggest `write-tests-agent`. Gate is ⚠️ SKIPPED.

2. **Select platform** — must be affected by bug AND available on host (see Platform Selection above).
2. **Select platform** — must be affected by bug AND available on host (see table above).

3. **Run verification via task agent** (MUST use task agent — never inline):
3. **Run verification** via `verify-tests-fail.ps1`:
```bash
pwsh .github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1 \
-Platform {platform} -RequireFullVerification
```
In CI, `Review-PR.ps1` calls this script directly. For manual usage, you can also invoke
it via a task agent for isolation:
```
Invoke the `task` agent with this prompt:

"Invoke the verify-tests-fail-without-fix skill for this PR:
- Platform: {platform}
- TestFilter: 'IssueXXXXX'
- RequireFullVerification: true

Report back: Did tests FAIL without fix? Did tests PASS with fix? Final status?"
```

**Why task agent?** Running inline allows substituting commands and fabricating results. Task agent runs in isolation.

---

## Expected Result

```
╔═══════════════════════════════════════════════════════════╗
║ VERIFICATION PASSED ✅ ║
╠═══════════════════════════════════════════════════════════╣
║ - FAIL without fix (as expected) ║
║ - PASS with fix (as expected) ║
╚═══════════════════════════════════════════════════════════╝
```

---

## If Gate Fails
Expand All @@ -72,25 +64,44 @@ Choose a platform that is BOTH affected by the bug AND available on the current

## Output File

> 🚨 **CRITICAL OUTPUT RULES:**
> - Write gate results ONLY to `gate/content.md` — NEVER copy gate results into other phases (pre-flight, try-fix, report)
> - Use the EXACT template below — no extra explanations, no "Reason:" paragraphs, no "Notes:" sections
> - Keep it SHORT — the template is the complete output

```bash
mkdir -p CustomAgentLogsTmp/PRState/{PRNumber}/PRAgent/gate
```

Write `content.md`:
Write `content.md` using this **exact** template (fill in values, don't add anything else):

```markdown
### Gate Result: {✅ PASSED / ❌ FAILED / ⚠️ SKIPPED}

**Platform:** {platform}
**Mode:** Full Verification

- Tests FAIL without fix: {✅/❌}
- Tests PASS with fix: {✅/❌}
| # | Type | Test Name | Filter |
|---|------|-----------|--------|
| 1 | {type} | {name} | `{filter}` |

| Step | Expected | Actual | Result |
|------|----------|--------|--------|
| Without fix | FAIL | {FAIL/PASS} | {✅/❌} |
| With fix | PASS | {FAIL/PASS} | {✅/❌} |
```

If gate is SKIPPED (no tests found), write only:

```markdown
### Gate Result: ⚠️ SKIPPED

No tests detected in PR. Suggest adding tests via `write-tests-agent`.
```

---

## Common Mistakes

- ❌ Running inline — MUST use task agent
- ❌ Using `BuildAndRunHostApp.ps1` — that runs ONE direction; the skill does TWO
- ❌ Claiming results from a single test run — script does TWO runs automatically
- ❌ Adding verbose explanations to gate/content.md — use the exact template above
- ❌ Copying gate results into try-fix/content.md or report/content.md — gate results belong ONLY in gate/content.md
- ❌ Skipping gate because tests are device tests, not UI tests — the skill supports all test types
5 changes: 4 additions & 1 deletion .github/pr-review/pr-report.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,14 @@

> 🚨 **DO NOT post any comments.** This phase only produces output files.

> 🚨 **DO NOT duplicate content from other phases.** Reference gate/try-fix results by status only (e.g., "Gate: ✅ PASSED") — do NOT copy their full output into report/content.md.

---

## Prerequisites

- Phases 1-3 (Pre-Flight, Gate, Try-Fix) must be complete before starting
- Phases 1-2 (Pre-Flight, Try-Fix) must be complete before starting
- Gate result is available from the prompt (ran separately before this skill)

---

Expand Down
Loading
Loading