diff --git a/.github/instructions/gh-aw-workflows.instructions.md b/.github/instructions/gh-aw-workflows.instructions.md index c8cd70ecc3be..2230e8c86492 100644 --- a/.github/instructions/gh-aw-workflows.instructions.md +++ b/.github/instructions/gh-aw-workflows.instructions.md @@ -6,6 +6,34 @@ applyTo: # gh-aw (GitHub Agentic Workflows) Guidelines +## ๐Ÿšจ Before You Build: Prefer Built-in gh-aw Features + +**CRITICAL RULE:** Before implementing any trigger, output, scheduling, or interaction mechanism in a gh-aw workflow, check whether gh-aw has a built-in feature that does it. gh-aw extends GitHub Actions with many convenience features โ€” manually reimplementing them is always worse (more code, more bugs, missing platform integration like emoji reactions, sanitized inputs, and noise reduction). + +### Step 1: Check the anti-patterns table below +### Step 2: If not listed, check the [triggers reference](https://github.github.com/gh-aw/reference/triggers/), [frontmatter reference](https://github.github.com/gh-aw/reference/frontmatter/), and [safe-outputs reference](https://github.github.com/gh-aw/reference/safe-outputs/) +### Step 3: If a built-in exists, use it. If not, proceed with manual implementation. + +### Anti-Patterns: Manual Reimplementations to Avoid + +| If you're about to implement... | Use this built-in instead | Docs | +|---------------------------------|--------------------------|------| +| `issue_comment` + `startsWith(comment.body, '/cmd')` | `slash_command:` trigger | [Command Triggers](https://github.github.com/gh-aw/reference/command-triggers/) | +| Manual emoji reaction on triggering comment | `reaction:` field under `on:` | [Frontmatter](https://github.github.com/gh-aw/reference/frontmatter/) | +| Posting "workflow started/completed" status comments | `status-comment: true` under `on:` | [Frontmatter](https://github.github.com/gh-aw/reference/frontmatter/) | +| Fixed cron schedule (`0 9 * * 1`) for non-critical timing | `schedule: weekly on monday around 9:00` (fuzzy) | [Triggers](https://github.github.com/gh-aw/reference/triggers/) | +| Manual `if:` to skip bot-authored PRs | `skip-bots:` under `on:` | [Triggers](https://github.github.com/gh-aw/reference/triggers/) | +| Manual `if:` to skip by author role | `skip-roles:` under `on:` | [Triggers](https://github.github.com/gh-aw/reference/triggers/) | +| Manual label check + removal for one-shot commands | `label_command:` trigger | [Triggers](https://github.github.com/gh-aw/reference/triggers/) | +| Editing old comments to collapse them | `hide-older-comments: true` on `add-comment:` | [Safe Outputs](https://github.github.com/gh-aw/reference/safe-outputs/) | +| Creating no-op report issues | `noop: report-as-issue: false` | [Safe Outputs / Monitoring](https://github.github.com/gh-aw/patterns/monitoring/) | +| Auto-closing older issues from same workflow | `close-older-issues: true` on `create-issue:` | [Safe Outputs](https://github.github.com/gh-aw/reference/safe-outputs/) | +| Disabling workflow after a date | `stop-after:` under `on:` | [Triggers](https://github.github.com/gh-aw/reference/triggers/) | +| Manual approval gating | `manual-approval:` under `on:` | [Triggers](https://github.github.com/gh-aw/reference/triggers/) | +| Search-based skip logic in `steps:` | `skip-if-match:` / `skip-if-no-match:` under `on:` | [Triggers](https://github.github.com/gh-aw/reference/triggers/) | + +**Note:** gh-aw is actively developed. If a capability feels like something a framework would provide natively, check the reference docs โ€” it probably exists even if it's not in this table yet. + ## Architecture gh-aw workflows are authored as `.md` files with YAML frontmatter, compiled to `.lock.yml` via `gh aw compile`. The lock file is auto-generated โ€” **never edit it manually**. @@ -29,6 +57,8 @@ agent job: | Platform steps | โœ… Yes | โœ… Yes | โœ… Yes | Platform-controlled | | Agent container | โŒ Scrubbed | โŒ Scrubbed | โŒ Scrubbed | โœ… But sandboxed | +**โš ๏ธ Agent container credential nuance:** `GITHUB_TOKEN` and `gh` CLI credentials are scrubbed inside the agent container. However, `COPILOT_TOKEN` (used for LLM inference) is present in the environment via `--env-all`. Any subprocess (e.g., `dotnet build`, `npm install`) inherits this variable. The AWF network firewall, `redact_secrets.cjs` (post-agent log scrubbing), and the threat detection agent limit the blast radius. See [Security Boundaries](#security-boundaries) below. + ### Step Ordering (Critical) User `steps:` **always run before** platform-generated steps. You cannot insert user steps after platform steps. @@ -48,6 +78,41 @@ By default, `gh aw compile` automatically injects a fork guard into the activati To **allow fork PRs**, add `forks: ["*"]` to the `pull_request` trigger in the `.md` frontmatter. The compiler removes the auto-injected guard from the compiled `if:` conditions. This is safe when the workflow uses the `Checkout-GhAwPr.ps1` pattern (checkout + trusted-infra restore) and the agent is sandboxed. +## Security Boundaries + +### Key Principles (from [GitHub Security Lab](https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/)) + +1. **Never execute untrusted PR code with elevated credentials.** The classic "pwn-request" attack is `pull_request_target` + checkout PR + run build scripts with `GITHUB_TOKEN`. The attack surface includes build scripts (`make`, `build.ps1`), package manager hooks (`npm postinstall`, MSBuild targets), and test runners. + +2. **Treating PR contents as passive data is safe.** Reading, analyzing, or diffing PR code is fine โ€” the danger is *executing* it. Our gh-aw workflows read code for evaluation; they never build or run it. + +3. **`pull_request_target` grants write permissions and secrets access.** This is by design โ€” the workflow YAML comes from the base branch (trusted). But any step that checks out and runs fork code in this context creates a vulnerability. + +4. **`pull_request` from forks has no secrets access.** GitHub withholds secrets because the workflow YAML comes from the fork (untrusted). This is the safe default for CI builds on fork PRs. + +5. **The `workflow_run` pattern separates privilege from code execution.** Build in an unprivileged `pull_request` job โ†’ pass artifacts โ†’ process in a privileged `workflow_run` job. This is architecturally what gh-aw does: agent runs read-only, `safe_outputs` job has write permissions. + +### gh-aw Defense Layers + +| Layer | What it does | What it doesn't do | +|-------|-------------|-------------------| +| **AWF network firewall** | Restricts outbound to allowlisted domains | Doesn't prevent reading env vars inside the container | +| **`redact_secrets.cjs`** | Scrubs known secret values from logs/artifacts post-agent | Doesn't catch encoded/obfuscated values | +| **Threat detection agent** | Reviews agent outputs before safe-outputs publishes them | Can miss novel exfiltration techniques | +| **Safe-outputs permission separation** | Write operations happen in separate job, not the agent | Agent can still request writes via safe-output tools | +| **`max: 1` on `add-comment`** | Limits agent to one comment | That one comment could contain sensitive data (mitigated by redaction) | +| **XPIA prompt** | Instructs LLM to resist prompt injection from untrusted content | LLM compliance is probabilistic, not guaranteed | +| **`pre_activation` role check** | Gates on write-access collaborators | Does not apply if `roles: all` is set | + +### Rules for gh-aw Workflow Authors + +- โœ… **DO** treat PR contents as passive data (read, analyze, diff) +- โœ… **DO** run data-gathering scripts in `steps:` (pre-agent, trusted context) not inside the agent +- โœ… **DO** use `Checkout-GhAwPr.ps1` for `workflow_dispatch` to restore trusted `.github/` from base +- โŒ **DO NOT** run `dotnet build`, `npm install`, or any build command on untrusted PR code inside the agent โ€” build tool hooks (MSBuild targets, postinstall scripts) can read `COPILOT_TOKEN` from the environment +- โŒ **DO NOT** execute workspace scripts (`.ps1`, `.sh`, `.py`) after checking out a fork PR in `steps:` โ€” those run with `GITHUB_TOKEN` +- โŒ **DO NOT** set `roles: all` on workflows that process PR content โ€” this allows any user to trigger the workflow + ## Fork PR Handling ### The "pwn-request" Threat Model @@ -65,12 +130,13 @@ Reference: https://securitylab.github.com/resources/github-actions-preventing-pw | `workflow_dispatch` | โŒ Skipped | โœ… Works โ€” user steps handle checkout and restore is final | | `issue_comment` (same-repo) | โœ… Yes | โœ… Works โ€” files already on PR branch | | `issue_comment` (fork) | โœ… Yes | โš ๏ธ Works โ€” `checkout_pr_branch.cjs` re-checks out fork branch after user steps, potentially overwriting restored infra. Acceptable because agent is sandboxed (no credentials, max 1 comment via safe-outputs). Pre-flight check catches missing `SKILL.md` if fork isn't rebased. | +| `slash_command` | โœ… Yes (compiles to `issue_comment` internally) | Same behavior as `issue_comment` above, but with platform-managed command matching, emoji reactions, and sanitized input. Prefer `slash_command:` over manual `issue_comment` + `startsWith()`. | ### The `issue_comment` + Fork Problem For `/slash-command` triggers on fork PRs, `checkout_pr_branch.cjs` runs AFTER all user steps and re-checks out the fork branch. This overwrites any files restored by user steps (e.g., `.github/skills/`). A fork could include a crafted `SKILL.md` that alters the agent's evaluation behavior. -**Accepted residual risk:** The agent runs in a sandboxed container with all credentials scrubbed. The worst outcome is a manipulated evaluation comment (`safe-outputs: add-comment: max: 1`). The agent has no ability to push code, access secrets, or exfiltrate data. The pre-flight check in the agent prompt catches the case where `SKILL.md` is missing entirely (fork not rebased on `main`). +**Accepted residual risk:** The agent runs in a sandboxed container with `GITHUB_TOKEN` and `gh` CLI credentials scrubbed. `COPILOT_TOKEN` (for LLM inference) remains in the environment but the AWF network firewall restricts outbound connections to an allowlist of domains, `redact_secrets.cjs` scrubs known secret values from logs/outputs post-agent, and the threat detection agent reviews outputs before they are published. The worst practical outcome is a manipulated evaluation comment (`safe-outputs: add-comment: max: 1`). The pre-flight check in the agent prompt catches the case where `SKILL.md` is missing entirely (fork not rebased on `main`). **Upstream issue:** [github/gh-aw#18481](https://github.com/github/gh-aw/issues/18481) โ€” "Using gh-aw in forks of repositories" @@ -88,17 +154,15 @@ steps: ``` The script: -1. Captures the base branch SHA before checkout -2. Checks out the PR branch via `gh pr checkout` -3. Deletes `.github/skills/` and `.github/instructions/` (prevents fork-added files) -4. Restores them from the base branch SHA (best-effort, non-fatal) +1. Verifies the PR author has write access and rejects fork PRs +2. Captures the base branch SHA before checkout +3. Checks out the PR branch via `gh pr checkout` +4. Restores `.github/skills/`, `.github/instructions/`, and `.github/copilot-instructions.md` from the base branch SHA (fatal on failure) **Behavior by trigger:** - **`workflow_dispatch`**: Platform checkout is skipped, so the restore IS the final workspace state (trusted files from base branch) -- **`pull_request`** (same-repo): User step restores trusted infra. `checkout_pr_branch.cjs` runs after and re-checks out PR branch โ€” for same-repo PRs, skill files typically match main unless the PR modified them. -- **`pull_request`** (fork with `forks: ["*"]`): Same as above, but fork's skill files may differ. Same residual risk as `issue_comment` fork case โ€” agent is sandboxed, pre-flight catches missing `SKILL.md`. -- **`issue_comment`** (same-repo): Platform re-checks out PR branch โ€” files already match, effectively a no-op -- **`issue_comment`** (fork): Platform re-checks out fork branch after us, overwriting restored files. Agent is sandboxed; pre-flight in the prompt catches missing `SKILL.md` +- **`slash_command`** (same-repo): Platform's `checkout_pr_branch.cjs` handles checkout. Skill files typically match main unless the PR modified them. +- **`slash_command`** (fork): Platform re-checks out fork branch after user steps, overwriting restored files. Agent is sandboxed; pre-flight in the prompt catches missing `SKILL.md` ### Anti-Patterns diff --git a/.github/scripts/Checkout-GhAwPr.ps1 b/.github/scripts/Checkout-GhAwPr.ps1 index a2f9533bb7d2..1231e451be7d 100644 --- a/.github/scripts/Checkout-GhAwPr.ps1 +++ b/.github/scripts/Checkout-GhAwPr.ps1 @@ -1,26 +1,25 @@ <# .SYNOPSIS - Shared PR checkout for gh-aw (GitHub Agentic Workflows). + Shared PR checkout and trusted-infra restore for gh-aw workflows. .DESCRIPTION Checks out a PR branch and restores trusted agent infrastructure (skills, - instructions) from the base branch. Works for both same-repo and fork PRs. + instructions) from the base branch. This gives the agent the PR's code + changes with the latest skills and instructions from main. - This script is only invoked for workflow_dispatch triggers. For pull_request - and issue_comment, the gh-aw platform's checkout_pr_branch.cjs handles PR - checkout automatically (it runs as a platform step after all user steps). - workflow_dispatch skips the platform checkout entirely, so this script is - the only thing that gets the PR code onto disk. + Currently used for workflow_dispatch triggers. For slash_command and + issue_comment triggers, the gh-aw platform's checkout_pr_branch.cjs + handles PR checkout automatically โ€” but may overwrite trusted infra + with fork-supplied files. Call this script after platform checkout to + restore trusted .github/ from the base branch. - SECURITY NOTE: This script checks out PR code onto disk. This is safe - because NO subsequent user steps execute workspace code โ€” the gh-aw - platform copies the workspace into a sandboxed container with scrubbed - credentials before starting the agent. The classic "pwn-request" attack - requires checkout + execution; we only do checkout. + SECURITY: Before checkout, the script verifies the PR author has + write access (write, maintain, or admin) and rejects fork PRs. + This prevents checkout of untrusted code in privileged contexts. DO NOT add steps after this that run scripts from the workspace - (e.g., ./build.sh, pwsh ./script.ps1). That would create an actual - fork code execution vulnerability. See: + (e.g., ./build.sh, pwsh ./script.ps1). That would create a code + execution vulnerability. See: https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/ .NOTES @@ -42,9 +41,46 @@ if (-not $env:PR_NUMBER -or $env:PR_NUMBER -eq '0') { $PrNumber = $env:PR_NUMBER +# โ”€โ”€ Verify PR is same-repo and author has write access โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ + +$RawJson = gh pr view $PrNumber --repo $env:GITHUB_REPOSITORY --json author,isCrossRepository --jq '{author: .author.login, isFork: .isCrossRepository}' +if ($LASTEXITCODE -ne 0) { + Write-Host "โŒ Failed to fetch PR #$PrNumber metadata" + exit 1 +} + +try { + $PrInfo = $RawJson | ConvertFrom-Json +} catch { + Write-Host "โŒ PR #$PrNumber returned malformed JSON: $RawJson" + exit 1 +} + +if (-not $PrInfo -or -not $PrInfo.author) { + Write-Host "โŒ PR #$PrNumber returned empty or malformed metadata" + exit 1 +} + +if ($PrInfo.isFork) { + Write-Host "โญ๏ธ PR #$PrNumber is from a fork โ€” skipping. Fork PRs are evaluated in the sandboxed agent container via the platform's checkout_pr_branch.cjs." + exit 1 +} + +$Permission = gh api "repos/$($env:GITHUB_REPOSITORY)/collaborators/$($PrInfo.author)/permission" --jq '.permission' +if ($LASTEXITCODE -ne 0) { + Write-Host "โŒ Failed to check permissions for '$($PrInfo.author)'" + exit 1 +} + +$AllowedRoles = @('admin', 'write', 'maintain') +if ($Permission -notin $AllowedRoles) { + Write-Host "โญ๏ธ PR author '$($PrInfo.author)' has '$Permission' access. workflow_dispatch only processes PRs from authors with write access." + exit 1 +} + +Write-Host "โœ… PR #$PrNumber by '$($PrInfo.author)' ($Permission access, same-repo)" + # โ”€โ”€ Save base branch SHA โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ -# Must be captured BEFORE checkout replaces HEAD. -# Exported for potential use by downstream platform steps (e.g., checkout_pr_branch.cjs) $BaseSha = git rev-parse HEAD if ($LASTEXITCODE -ne 0) { @@ -52,6 +88,7 @@ if ($LASTEXITCODE -ne 0) { exit 1 } Add-Content -Path $env:GITHUB_ENV -Value "BASE_SHA=$BaseSha" +Write-Host "Base branch SHA: $BaseSha" # โ”€โ”€ Checkout PR branch โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ @@ -65,17 +102,15 @@ Write-Host "โœ… Checked out PR #$PrNumber" git log --oneline -1 # โ”€โ”€ Restore agent infrastructure from base branch โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ -# This script only runs for workflow_dispatch (other triggers use the platform's -# checkout_pr_branch.cjs instead). For workflow_dispatch the platform checkout is -# skipped, so this restore IS the final workspace state. -# rm -rf first to prevent fork-added files from surviving the restore. - -if (Test-Path '.github/skills/') { Remove-Item -Recurse -Force '.github/skills/' } -if (Test-Path '.github/instructions/') { Remove-Item -Recurse -Force '.github/instructions/' } +# Replace skills and instructions with base branch versions to ensure the agent +# always uses trusted infrastructure from main. Uses git checkout to read files +# directly from the commit tree โ€” works in shallow clones (no history traversal). +# Restore BEFORE deleting so a failure doesn't leave the workspace without infra. git checkout $BaseSha -- .github/skills/ .github/instructions/ .github/copilot-instructions.md 2>&1 if ($LASTEXITCODE -eq 0) { Write-Host "โœ… Restored agent infrastructure from base branch ($BaseSha)" } else { - Write-Host "โš ๏ธ Could not restore agent infrastructure from base branch โ€” files may come from the PR branch" + Write-Host "โŒ Failed to restore agent infrastructure from base branch โ€” aborting to prevent running with untrusted infra" + exit 1 } diff --git a/.github/workflows/copilot-evaluate-tests.lock.yml b/.github/workflows/copilot-evaluate-tests.lock.yml index 5b5dc7f9b37d..8f07ab6405a9 100644 --- a/.github/workflows/copilot-evaluate-tests.lock.yml +++ b/.github/workflows/copilot-evaluate-tests.lock.yml @@ -22,36 +22,33 @@ # # Evaluates test quality, coverage, and appropriateness on PRs that add or modify tests # -# gh-aw-metadata: {"schema_version":"v2","frontmatter_hash":"d671028235c1b911c7a816a257b07b02793a6b57747b4358f792af183e26ca07","compiler_version":"v0.62.2","strict":true} +# gh-aw-metadata: {"schema_version":"v2","frontmatter_hash":"cbfd75d7a699f76155135d83866eda6476cef647c384243c8ee991065a2b44d7","compiler_version":"v0.62.2","strict":true} name: "Evaluate PR Tests" "on": + # bots: # Bots processed as bot check in pre-activation job + # - copilot-swe-agent[bot] # Bots processed as bot check in pre-activation job issue_comment: types: - created - pull_request: - # forks: # Fork filtering applied via job conditions - # - "*" # Fork filtering applied via job conditions - paths: - - src/**/tests/** - - src/**/test/** - types: - - opened - - synchronize - - reopened - - ready_for_review + - edited workflow_dispatch: inputs: pr_number: description: PR number to evaluate required: true type: number + suppress_output: + default: false + description: Dry-run โ€” evaluate but do not post output on the PR + required: false + type: boolean permissions: {} concurrency: cancel-in-progress: true - group: evaluate-pr-tests-${{ github.event.pull_request.number || github.event.issue.number || inputs.pr_number || github.run_id }} + group: evaluate-pr-tests-${{ github.event.issue.number || inputs.pr_number || github.run_id }} run-name: "Evaluate PR Tests" @@ -59,19 +56,22 @@ jobs: activation: needs: pre_activation if: > - (needs.pre_activation.outputs.activated == 'true') && ((github.event_name == 'pull_request' && github.event.pull_request.draft == false) || github.event_name == 'workflow_dispatch' || (github.event_name == 'issue_comment' && - github.event.issue.pull_request && - startsWith(github.event.comment.body, '/evaluate-tests'))) + (needs.pre_activation.outputs.activated == 'true') && (github.event_name == 'issue_comment' || github.event_name == 'workflow_dispatch') runs-on: ubuntu-slim permissions: contents: read + discussions: write + issues: write + pull-requests: write outputs: body: ${{ steps.sanitized.outputs.body }} - comment_id: "" - comment_repo: "" + comment_id: ${{ steps.add-comment.outputs.comment-id }} + comment_repo: ${{ steps.add-comment.outputs.comment-repo }} + comment_url: ${{ steps.add-comment.outputs.comment-url }} lockdown_check_failed: ${{ steps.generate_aw_info.outputs.lockdown_check_failed == 'true' }} model: ${{ steps.generate_aw_info.outputs.model }} secret_verification_result: ${{ steps.validate-secret.outputs.verification_result }} + slash_command: ${{ needs.pre_activation.outputs.matched_command }} text: ${{ steps.sanitized.outputs.text }} title: ${{ steps.sanitized.outputs.title }} steps: @@ -105,6 +105,19 @@ jobs: setupGlobals(core, github, context, exec, io); const { main } = require('${{ runner.temp }}/gh-aw/actions/generate_aw_info.cjs'); await main(core, context); + - name: Add eyes reaction for immediate feedback + id: react + if: github.event_name == 'issues' || github.event_name == 'issue_comment' || github.event_name == 'pull_request_review_comment' || github.event_name == 'discussion' || github.event_name == 'discussion_comment' || (github.event_name == 'pull_request') && (github.event.pull_request.head.repo.id == github.repository_id) + uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8 + env: + GH_AW_REACTION: "eyes" + with: + github-token: ${{ secrets.GITHUB_TOKEN }} + script: | + const { setupGlobals } = require('${{ runner.temp }}/gh-aw/actions/setup_globals.cjs'); + setupGlobals(core, github, context, exec, io); + const { main } = require('${{ runner.temp }}/gh-aw/actions/add_reaction.cjs'); + await main(); - name: Validate COPILOT_GITHUB_TOKEN secret id: validate-secret run: ${RUNNER_TEMP}/gh-aw/actions/validate_multi_secret.sh COPILOT_GITHUB_TOKEN 'GitHub Copilot CLI' https://github.github.com/gh-aw/reference/engines/#github-copilot-default @@ -132,17 +145,32 @@ jobs: - name: Compute current body text id: sanitized uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8 + env: + GH_AW_ALLOWED_BOTS: copilot-swe-agent[bot] with: script: | const { setupGlobals } = require('${{ runner.temp }}/gh-aw/actions/setup_globals.cjs'); setupGlobals(core, github, context, exec, io); const { main } = require('${{ runner.temp }}/gh-aw/actions/compute_text.cjs'); await main(); + - name: Add comment with workflow run link + id: add-comment + if: github.event_name == 'issues' || github.event_name == 'issue_comment' || github.event_name == 'pull_request_review_comment' || github.event_name == 'discussion' || github.event_name == 'discussion_comment' || (github.event_name == 'pull_request') && (github.event.pull_request.head.repo.id == github.repository_id) + uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8 + env: + GH_AW_WORKFLOW_NAME: "Evaluate PR Tests" + GH_AW_SAFE_OUTPUT_MESSAGES: "{\"footer\":\"\\u003e ๐Ÿงช *Test evaluation by [{workflow_name}]({run_url})*\",\"runStarted\":\"๐Ÿ”ฌ Evaluating tests on this PRโ€ฆ [{workflow_name}]({run_url})\",\"runSuccess\":\"โœ… Test evaluation complete! [{workflow_name}]({run_url})\",\"runFailure\":\"โŒ Test evaluation failed. [{workflow_name}]({run_url}) {status}\"}" + with: + script: | + const { setupGlobals } = require('${{ runner.temp }}/gh-aw/actions/setup_globals.cjs'); + setupGlobals(core, github, context, exec, io); + const { main } = require('${{ runner.temp }}/gh-aw/actions/add_workflow_run_comment.cjs'); + await main(); - name: Create prompt with built-in context env: GH_AW_PROMPT: /tmp/gh-aw/aw-prompts/prompt.txt GH_AW_SAFE_OUTPUTS: ${{ env.GH_AW_SAFE_OUTPUTS }} - GH_AW_EXPR_93C755A4: ${{ github.event.pull_request.number || github.event.issue.number || inputs.pr_number }} + GH_AW_EXPR_A77326CF: ${{ github.event.issue.number || inputs.pr_number }} GH_AW_GITHUB_ACTOR: ${{ github.actor }} GH_AW_GITHUB_EVENT_COMMENT_ID: ${{ github.event.comment.id }} GH_AW_GITHUB_EVENT_DISCUSSION_NUMBER: ${{ github.event.discussion.number }} @@ -151,6 +179,7 @@ jobs: GH_AW_GITHUB_REPOSITORY: ${{ github.repository }} GH_AW_GITHUB_RUN_ID: ${{ github.run_id }} GH_AW_GITHUB_WORKSPACE: ${{ github.workspace }} + GH_AW_INPUTS_SUPPRESS_OUTPUT: ${{ inputs.suppress_output }} GH_AW_IS_PR_COMMENT: ${{ github.event.issue.pull_request && 'true' || '' }} run: | bash ${RUNNER_TEMP}/gh-aw/actions/create_prompt_first.sh @@ -210,8 +239,9 @@ jobs: uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8 env: GH_AW_PROMPT: /tmp/gh-aw/aw-prompts/prompt.txt - GH_AW_EXPR_93C755A4: ${{ github.event.pull_request.number || github.event.issue.number || inputs.pr_number }} + GH_AW_EXPR_A77326CF: ${{ github.event.issue.number || inputs.pr_number }} GH_AW_GITHUB_REPOSITORY: ${{ github.repository }} + GH_AW_INPUTS_SUPPRESS_OUTPUT: ${{ inputs.suppress_output }} with: script: | const { setupGlobals } = require('${{ runner.temp }}/gh-aw/actions/setup_globals.cjs'); @@ -222,7 +252,7 @@ jobs: uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8 env: GH_AW_PROMPT: /tmp/gh-aw/aw-prompts/prompt.txt - GH_AW_EXPR_93C755A4: ${{ github.event.pull_request.number || github.event.issue.number || inputs.pr_number }} + GH_AW_EXPR_A77326CF: ${{ github.event.issue.number || inputs.pr_number }} GH_AW_GITHUB_ACTOR: ${{ github.actor }} GH_AW_GITHUB_EVENT_COMMENT_ID: ${{ github.event.comment.id }} GH_AW_GITHUB_EVENT_DISCUSSION_NUMBER: ${{ github.event.discussion.number }} @@ -231,8 +261,10 @@ jobs: GH_AW_GITHUB_REPOSITORY: ${{ github.repository }} GH_AW_GITHUB_RUN_ID: ${{ github.run_id }} GH_AW_GITHUB_WORKSPACE: ${{ github.workspace }} + GH_AW_INPUTS_SUPPRESS_OUTPUT: ${{ inputs.suppress_output }} GH_AW_IS_PR_COMMENT: ${{ github.event.issue.pull_request && 'true' || '' }} GH_AW_NEEDS_PRE_ACTIVATION_OUTPUTS_ACTIVATED: ${{ needs.pre_activation.outputs.activated }} + GH_AW_NEEDS_PRE_ACTIVATION_OUTPUTS_MATCHED_COMMAND: ${{ needs.pre_activation.outputs.matched_command }} with: script: | const { setupGlobals } = require('${{ runner.temp }}/gh-aw/actions/setup_globals.cjs'); @@ -244,7 +276,7 @@ jobs: return await substitutePlaceholders({ file: process.env.GH_AW_PROMPT, substitutions: { - GH_AW_EXPR_93C755A4: process.env.GH_AW_EXPR_93C755A4, + GH_AW_EXPR_A77326CF: process.env.GH_AW_EXPR_A77326CF, GH_AW_GITHUB_ACTOR: process.env.GH_AW_GITHUB_ACTOR, GH_AW_GITHUB_EVENT_COMMENT_ID: process.env.GH_AW_GITHUB_EVENT_COMMENT_ID, GH_AW_GITHUB_EVENT_DISCUSSION_NUMBER: process.env.GH_AW_GITHUB_EVENT_DISCUSSION_NUMBER, @@ -253,8 +285,10 @@ jobs: GH_AW_GITHUB_REPOSITORY: process.env.GH_AW_GITHUB_REPOSITORY, GH_AW_GITHUB_RUN_ID: process.env.GH_AW_GITHUB_RUN_ID, GH_AW_GITHUB_WORKSPACE: process.env.GH_AW_GITHUB_WORKSPACE, + GH_AW_INPUTS_SUPPRESS_OUTPUT: process.env.GH_AW_INPUTS_SUPPRESS_OUTPUT, GH_AW_IS_PR_COMMENT: process.env.GH_AW_IS_PR_COMMENT, - GH_AW_NEEDS_PRE_ACTIVATION_OUTPUTS_ACTIVATED: process.env.GH_AW_NEEDS_PRE_ACTIVATION_OUTPUTS_ACTIVATED + GH_AW_NEEDS_PRE_ACTIVATION_OUTPUTS_ACTIVATED: process.env.GH_AW_NEEDS_PRE_ACTIVATION_OUTPUTS_ACTIVATED, + GH_AW_NEEDS_PRE_ACTIVATION_OUTPUTS_MATCHED_COMMAND: process.env.GH_AW_NEEDS_PRE_ACTIVATION_OUTPUTS_MATCHED_COMMAND } }); - name: Validate prompt placeholders @@ -320,10 +354,9 @@ jobs: GH_TOKEN: ${{ github.token }} - env: GH_TOKEN: ${{ github.token }} - PR_NUMBER: ${{ github.event.pull_request.number }} - if: github.event_name == 'pull_request' + PR_NUMBER: ${{ github.event.issue.number || inputs.pr_number }} name: Gate โ€” skip if no test source files in diff - run: "TEST_FILES=$(gh pr diff \"$PR_NUMBER\" --repo \"$GITHUB_REPOSITORY\" --name-only \\\n | grep -E '\\.(cs|xaml)$' \\\n | grep -iE '(tests?/|TestCases|UnitTests|DeviceTests)' \\\n || true)\nif [ -z \"$TEST_FILES\" ]; then\n echo \"โญ๏ธ No test source files (.cs/.xaml) found in PR diff. Skipping evaluation.\"\n exit 1\nfi\necho \"โœ… Found test files to evaluate:\"\necho \"$TEST_FILES\" | head -20\n" + run: "# Verify this is an open PR\nif ! STATE=$(gh pr view \"$PR_NUMBER\" --repo \"$GITHUB_REPOSITORY\" --json state --jq .state 2>&1); then\n echo \"โŒ Failed to fetch PR #$PR_NUMBER state: $STATE\"\n exit 1\nfi\nif [ \"$STATE\" != \"OPEN\" ]; then\n echo \"โญ๏ธ PR #$PR_NUMBER is $STATE โ€” skipping evaluation.\"\n exit 1\nfi\n# Try gh pr diff first; fall back to REST API only on command failure\nif DIFF_OUTPUT=$(gh pr diff \"$PR_NUMBER\" --repo \"$GITHUB_REPOSITORY\" --name-only 2>/dev/null); then\n TEST_FILES=$(echo \"$DIFF_OUTPUT\" \\\n | grep -E '\\.(cs|xaml)$' \\\n | grep -iE '(tests?/|TestCases|UnitTests|DeviceTests)' \\\n || true)\nelse\n # gh pr diff fails with HTTP 406 for PRs with 300+ files; use paginated files API\n if ! API_FILES=$(gh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate --jq '.[].filename' 2>&1); then\n echo \"โŒ gh pr diff failed and REST API fallback also failed: $API_FILES\"\n exit 1\n fi\n TEST_FILES=$(echo \"$API_FILES\" \\\n | grep -E '\\.(cs|xaml)$' \\\n | grep -iE '(tests?/|TestCases|UnitTests|DeviceTests)' \\\n || true)\nfi\nif [ -z \"$TEST_FILES\" ]; then\n echo \"โญ๏ธ No test source files (.cs/.xaml) found in PR diff. Nothing to evaluate.\"\n exit 1\nfi\necho \"โœ… Found test files to evaluate:\"\necho \"$TEST_FILES\" | head -20\n" - env: GH_TOKEN: ${{ github.token }} PR_NUMBER: ${{ inputs.pr_number }} @@ -593,7 +626,7 @@ jobs: - name: Execute GitHub Copilot CLI id: agentic_execution # Copilot CLI tool arguments (sorted): - timeout-minutes: 15 + timeout-minutes: 20 run: | set -o pipefail touch /tmp/gh-aw/agent-step-summary.md @@ -697,6 +730,7 @@ jobs: GH_AW_ALLOWED_DOMAINS: "api.business.githubcopilot.com,api.enterprise.githubcopilot.com,api.github.com,api.githubcopilot.com,api.individual.githubcopilot.com,api.snapcraft.io,archive.ubuntu.com,azure.archive.ubuntu.com,crl.geotrust.com,crl.globalsign.com,crl.identrust.com,crl.sectigo.com,crl.thawte.com,crl.usertrust.com,crl.verisign.com,crl3.digicert.com,crl4.digicert.com,crls.ssl.com,github.com,host.docker.internal,json-schema.org,json.schemastore.org,keyserver.ubuntu.com,ocsp.digicert.com,ocsp.geotrust.com,ocsp.globalsign.com,ocsp.identrust.com,ocsp.sectigo.com,ocsp.ssl.com,ocsp.thawte.com,ocsp.usertrust.com,ocsp.verisign.com,packagecloud.io,packages.cloud.google.com,packages.microsoft.com,ppa.launchpad.net,raw.githubusercontent.com,registry.npmjs.org,s.symcb.com,s.symcd.com,security.ubuntu.com,telemetry.enterprise.githubcopilot.com,ts-crl.ws.symantec.com,ts-ocsp.ws.symantec.com,www.googleapis.com" GITHUB_SERVER_URL: ${{ github.server_url }} GITHUB_API_URL: ${{ github.api_url }} + GH_AW_COMMAND: evaluate-tests with: script: | const { setupGlobals } = require('${{ runner.temp }}/gh-aw/actions/setup_globals.cjs'); @@ -961,7 +995,7 @@ jobs: GH_AW_SAFE_OUTPUT_MESSAGES: "{\"footer\":\"\\u003e ๐Ÿงช *Test evaluation by [{workflow_name}]({run_url})*\",\"runStarted\":\"๐Ÿ”ฌ Evaluating tests on this PRโ€ฆ [{workflow_name}]({run_url})\",\"runSuccess\":\"โœ… Test evaluation complete! [{workflow_name}]({run_url})\",\"runFailure\":\"โŒ Test evaluation failed. [{workflow_name}]({run_url}) {status}\"}" GH_AW_GROUP_REPORTS: "false" GH_AW_FAILURE_REPORT_AS_ISSUE: "true" - GH_AW_TIMEOUT_MINUTES: "15" + GH_AW_TIMEOUT_MINUTES: "20" with: github-token: ${{ secrets.GH_AW_GITHUB_TOKEN || secrets.GITHUB_TOKEN }} script: | @@ -978,7 +1012,7 @@ jobs: GH_AW_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }} GH_AW_AGENT_CONCLUSION: ${{ needs.agent.result }} GH_AW_NOOP_MESSAGE: ${{ steps.noop.outputs.noop_message }} - GH_AW_NOOP_REPORT_AS_ISSUE: "true" + GH_AW_NOOP_REPORT_AS_ISSUE: "false" with: github-token: ${{ secrets.GH_AW_GITHUB_TOKEN || secrets.GITHUB_TOKEN }} script: | @@ -986,26 +1020,43 @@ jobs: setupGlobals(core, github, context, exec, io); const { main } = require('${{ runner.temp }}/gh-aw/actions/handle_noop_message.cjs'); await main(); + - name: Update reaction comment with completion status + id: conclusion + uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8 + env: + GH_AW_AGENT_OUTPUT: ${{ env.GH_AW_AGENT_OUTPUT }} + GH_AW_COMMENT_ID: ${{ needs.activation.outputs.comment_id }} + GH_AW_COMMENT_REPO: ${{ needs.activation.outputs.comment_repo }} + GH_AW_RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }} + GH_AW_WORKFLOW_NAME: "Evaluate PR Tests" + GH_AW_AGENT_CONCLUSION: ${{ needs.agent.result }} + GH_AW_DETECTION_CONCLUSION: ${{ needs.agent.outputs.detection_conclusion }} + GH_AW_SAFE_OUTPUT_MESSAGES: "{\"footer\":\"\\u003e ๐Ÿงช *Test evaluation by [{workflow_name}]({run_url})*\",\"runStarted\":\"๐Ÿ”ฌ Evaluating tests on this PRโ€ฆ [{workflow_name}]({run_url})\",\"runSuccess\":\"โœ… Test evaluation complete! [{workflow_name}]({run_url})\",\"runFailure\":\"โŒ Test evaluation failed. [{workflow_name}]({run_url}) {status}\"}" + with: + github-token: ${{ secrets.GH_AW_GITHUB_TOKEN || secrets.GITHUB_TOKEN }} + script: | + const { setupGlobals } = require('${{ runner.temp }}/gh-aw/actions/setup_globals.cjs'); + setupGlobals(core, github, context, exec, io); + const { main } = require('${{ runner.temp }}/gh-aw/actions/notify_comment_error.cjs'); + await main(); pre_activation: - if: > - (github.event_name == 'pull_request' && github.event.pull_request.draft == false) || github.event_name == 'workflow_dispatch' || (github.event_name == 'issue_comment' && - github.event.issue.pull_request && - startsWith(github.event.comment.body, '/evaluate-tests')) + if: github.event_name == 'issue_comment' || github.event_name == 'workflow_dispatch' runs-on: ubuntu-slim outputs: - activated: ${{ steps.check_membership.outputs.is_team_member == 'true' }} - matched_command: '' + activated: ${{ (steps.check_membership.outputs.is_team_member == 'true') && (steps.check_command_position.outputs.command_position_ok == 'true') }} + matched_command: ${{ steps.check_command_position.outputs.matched_command }} steps: - name: Setup Scripts uses: github/gh-aw-actions/setup@20045bbd5ad2632b9809856c389708eab1bd16ef # v0.62.2 with: destination: ${{ runner.temp }}/gh-aw/actions - - name: Check team membership for workflow + - name: Check team membership for command workflow id: check_membership uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8 env: GH_AW_REQUIRED_ROLES: admin,maintainer,write + GH_AW_ALLOWED_BOTS: copilot-swe-agent[bot] with: github-token: ${{ secrets.GITHUB_TOKEN }} script: | @@ -1013,6 +1064,17 @@ jobs: setupGlobals(core, github, context, exec, io); const { main } = require('${{ runner.temp }}/gh-aw/actions/check_membership.cjs'); await main(); + - name: Check command position + id: check_command_position + uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8 + env: + GH_AW_COMMANDS: "[\"evaluate-tests\"]" + with: + script: | + const { setupGlobals } = require('${{ runner.temp }}/gh-aw/actions/setup_globals.cjs'); + setupGlobals(core, github, context, exec, io); + const { main } = require('${{ runner.temp }}/gh-aw/actions/check_command_position.cjs'); + await main(); safe_outputs: needs: agent @@ -1074,7 +1136,7 @@ jobs: GH_AW_ALLOWED_DOMAINS: "api.business.githubcopilot.com,api.enterprise.githubcopilot.com,api.github.com,api.githubcopilot.com,api.individual.githubcopilot.com,api.snapcraft.io,archive.ubuntu.com,azure.archive.ubuntu.com,crl.geotrust.com,crl.globalsign.com,crl.identrust.com,crl.sectigo.com,crl.thawte.com,crl.usertrust.com,crl.verisign.com,crl3.digicert.com,crl4.digicert.com,crls.ssl.com,github.com,host.docker.internal,json-schema.org,json.schemastore.org,keyserver.ubuntu.com,ocsp.digicert.com,ocsp.geotrust.com,ocsp.globalsign.com,ocsp.identrust.com,ocsp.sectigo.com,ocsp.ssl.com,ocsp.thawte.com,ocsp.usertrust.com,ocsp.verisign.com,packagecloud.io,packages.cloud.google.com,packages.microsoft.com,ppa.launchpad.net,raw.githubusercontent.com,registry.npmjs.org,s.symcb.com,s.symcd.com,security.ubuntu.com,telemetry.enterprise.githubcopilot.com,ts-crl.ws.symantec.com,ts-ocsp.ws.symantec.com,www.googleapis.com" GITHUB_SERVER_URL: ${{ github.server_url }} GITHUB_API_URL: ${{ github.api_url }} - GH_AW_SAFE_OUTPUTS_HANDLER_CONFIG: "{\"add_comment\":{\"max\":1,\"target\":\"*\"},\"missing_data\":{},\"missing_tool\":{},\"noop\":{\"max\":1,\"report-as-issue\":\"true\"}}" + GH_AW_SAFE_OUTPUTS_HANDLER_CONFIG: "{\"add_comment\":{\"hide_older_comments\":true,\"max\":1,\"target\":\"*\"},\"missing_data\":{},\"missing_tool\":{},\"noop\":{\"max\":1,\"report-as-issue\":\"false\"}}" with: github-token: ${{ secrets.GH_AW_GITHUB_TOKEN || secrets.GITHUB_TOKEN }} script: | diff --git a/.github/workflows/copilot-evaluate-tests.md b/.github/workflows/copilot-evaluate-tests.md index 854c2ec407d8..7752f29a6fea 100644 --- a/.github/workflows/copilot-evaluate-tests.md +++ b/.github/workflows/copilot-evaluate-tests.md @@ -1,27 +1,36 @@ --- description: Evaluates test quality, coverage, and appropriateness on PRs that add or modify tests on: - pull_request: - types: [opened, synchronize, reopened, ready_for_review] - forks: ["*"] - paths: - - 'src/**/tests/**' - - 'src/**/test/**' - issue_comment: - types: [created] + # pull_request_target is intentionally disabled โ€” we don't want auto-runs on PR create/update. + # pull_request_target: + # types: [opened, synchronize, reopened] + # paths: + # - 'src/**/tests/**' + # - 'src/**/test/**' + slash_command: + name: evaluate-tests + events: [pull_request_comment] workflow_dispatch: inputs: pr_number: description: 'PR number to evaluate' required: true type: number - + suppress_output: + description: 'Dry-run โ€” evaluate but do not post output on the PR' + required: false + type: boolean + default: false + bots: + - "copilot-swe-agent[bot]" + +labels: ["pr-review", "testing"] + +# Trigger filtering: slash_command compiles to issue_comment (platform handles +# command matching). workflow_dispatch is always allowed. if: >- - (github.event_name == 'pull_request' && github.event.pull_request.draft == false) || - github.event_name == 'workflow_dispatch' || - (github.event_name == 'issue_comment' && - github.event.issue.pull_request && - startsWith(github.event.comment.body, '/evaluate-tests')) + github.event_name == 'issue_comment' || + github.event_name == 'workflow_dispatch' permissions: contents: read @@ -36,7 +45,9 @@ safe-outputs: add-comment: max: 1 target: "*" + hide-older-comments: true noop: + report-as-issue: false messages: footer: "> ๐Ÿงช *Test evaluation by [{workflow_name}]({run_url})*" run-started: "๐Ÿ”ฌ Evaluating tests on this PRโ€ฆ [{workflow_name}]({run_url})" @@ -50,32 +61,60 @@ tools: network: defaults concurrency: - group: "evaluate-pr-tests-${{ github.event.pull_request.number || github.event.issue.number || inputs.pr_number || github.run_id }}" + group: "evaluate-pr-tests-${{ github.event.issue.number || inputs.pr_number || github.run_id }}" cancel-in-progress: true -timeout-minutes: 15 +timeout-minutes: 20 steps: - name: Gate โ€” skip if no test source files in diff - if: github.event_name == 'pull_request' env: GH_TOKEN: ${{ github.token }} - PR_NUMBER: ${{ github.event.pull_request.number }} + PR_NUMBER: ${{ github.event.issue.number || inputs.pr_number }} run: | - TEST_FILES=$(gh pr diff "$PR_NUMBER" --repo "$GITHUB_REPOSITORY" --name-only \ - | grep -E '\.(cs|xaml)$' \ - | grep -iE '(tests?/|TestCases|UnitTests|DeviceTests)' \ - || true) + # Verify this is an open PR + if ! STATE=$(gh pr view "$PR_NUMBER" --repo "$GITHUB_REPOSITORY" --json state --jq .state 2>&1); then + echo "โŒ Failed to fetch PR #$PR_NUMBER state: $STATE" + exit 1 + fi + if [ "$STATE" != "OPEN" ]; then + echo "โญ๏ธ PR #$PR_NUMBER is $STATE โ€” skipping evaluation." + exit 1 + fi + # Try gh pr diff first; fall back to REST API only on command failure + if DIFF_OUTPUT=$(gh pr diff "$PR_NUMBER" --repo "$GITHUB_REPOSITORY" --name-only 2>/dev/null); then + TEST_FILES=$(echo "$DIFF_OUTPUT" \ + | grep -E '\.(cs|xaml)$' \ + | grep -iE '(tests?/|TestCases|UnitTests|DeviceTests)' \ + || true) + else + # gh pr diff fails with HTTP 406 for PRs with 300+ files; use paginated files API + if ! API_FILES=$(gh api "repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files" --paginate --jq '.[].filename' 2>&1); then + echo "โŒ gh pr diff failed and REST API fallback also failed: $API_FILES" + exit 1 + fi + TEST_FILES=$(echo "$API_FILES" \ + | grep -E '\.(cs|xaml)$' \ + | grep -iE '(tests?/|TestCases|UnitTests|DeviceTests)' \ + || true) + fi if [ -z "$TEST_FILES" ]; then - echo "โญ๏ธ No test source files (.cs/.xaml) found in PR diff. Skipping evaluation." + echo "โญ๏ธ No test source files (.cs/.xaml) found in PR diff. Nothing to evaluate." exit 1 fi echo "โœ… Found test files to evaluate:" echo "$TEST_FILES" | head -20 - # Only needed for workflow_dispatch โ€” for pull_request and issue_comment, - # the gh-aw platform's checkout_pr_branch.cjs handles PR checkout automatically. - # workflow_dispatch skips the platform checkout entirely, so we must do it here. + # For slash_command triggers, the gh-aw platform's checkout_pr_branch.cjs runs + # AFTER all user steps and overlays the PR branch onto the workspace. This means + # fork PRs can supply their own .github/skills/ and .github/instructions/. + # We cannot restore trusted infra here because the platform checkout runs later. + # Mitigation: agent is sandboxed (no credentials), max 1 comment via safe-outputs, + # and the agent prompt includes a pre-flight check that catches missing SKILL.md. + # See: .github/instructions/gh-aw-workflows.instructions.md "The issue_comment + Fork Problem" + + # For workflow_dispatch, the platform skips checkout entirely โ€” this step is the + # only thing that gets the PR code onto disk and restores trusted infra from main. - name: Checkout PR and restore agent infrastructure if: github.event_name == 'workflow_dispatch' env: @@ -91,7 +130,7 @@ Invoke the **evaluate-pr-tests** skill: read and follow `.github/skills/evaluate ## Context - **Repository**: ${{ github.repository }} -- **PR Number**: ${{ github.event.pull_request.number || github.event.issue.number || inputs.pr_number }} +- **PR Number**: ${{ github.event.issue.number || inputs.pr_number }} The PR branch has been checked out for you. All files from the PR are available locally. @@ -110,21 +149,39 @@ If the file is **missing**, the fork PR branch is likely not rebased on the late โŒ **Cannot evaluate**: this PR's branch does not include the evaluate-pr-tests skill (`.github/skills/evaluate-pr-tests/SKILL.md` is missing). -**Fix**: rebase your fork on the latest `main` branch, or use the **workflow_dispatch** trigger (Actions tab โ†’ "Evaluate PR Tests" โ†’ "Run workflow" โ†’ enter PR number) which handles this automatically. +**Fix**: rebase your fork on the latest `main` branch and push again. The evaluation will trigger automatically once the skill file is available. ``` Then stop โ€” do not proceed with the evaluation. +## Dry-run mode + +When triggered via `workflow_dispatch`, the `suppress_output` input controls behavior. +- If `${{ inputs.suppress_output }}` == **true**, perform the full evaluation but **do not** post output on the PR. Write the evaluation to the workflow log only. This is useful for testing the skill without spamming the PR. +- If **false** (default), post the output as normal. + +## When no action is needed + +If there is nothing to evaluate (PR has no test files, PR is a docs-only change, etc.), you **must** call the `noop` tool with a message explaining why: + +```json +{"noop": {"message": "No action needed: [brief explanation, e.g. 'PR contains no test files']"}} +``` + +Do not post a comment and do not silently exit โ€” always use `noop` so the workflow run shows a clear reason. + ## Running the skill 1. Use `gh pr view ` to fetch PR metadata (title, body, labels, base branch). If `gh` CLI is unavailable, use the GitHub MCP tools instead. -2. Run `pwsh .github/skills/evaluate-pr-tests/scripts/Gather-TestContext.ps1` to gather automated context +2. Run `pwsh .github/skills/evaluate-pr-tests/scripts/Gather-TestContext.ps1 -PrNumber ` to gather automated context (use the PR number from the Context section above) 3. Read the context report and the actual changed files, then evaluate per SKILL.md criteria 4. Post results using `add_comment` with `item_number` set to the PR number ## Posting Results -Call `add_comment` with `item_number` set to the PR number. Wrap the report in a collapsible `
` block: +If dry-run mode is active (`suppress_output` is true), log the evaluation report to stdout and stop โ€” do **not** call `add_comment`. + +Otherwise, call `add_comment` with `item_number` set to the PR number. Wrap the report in a collapsible `
` block: ```markdown ## ๐Ÿงช PR Test Evaluation