From 3385cf87f9085fe352e6932dfd2587b1e9f306c2 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sun, 1 Mar 2026 02:18:20 +0000 Subject: [PATCH 1/4] Initial plan From 743093c386933c0d99502f59104c0f544342fa8f Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sun, 1 Mar 2026 02:25:49 +0000 Subject: [PATCH 2/4] Determine agent count in pre-script based on PR size MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add dynamic sub-agent count to the pr-context pre-script: - ≤10 files: direct review (no sub-agents) - 11-20 files: 2 sub-agents (A→Z, Z→A) - >20 files: 3 sub-agents (A→Z, Z→A, largest-first) The pre-script writes /tmp/pr-context/review-strategy.md with precise instructions so the agent follows a deterministic path. The PR review workflow reads this strategy file instead of always spawning 3 sub-agents. Recompiled all affected lock files. Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com> --- .../workflows/gh-aw-fragments/pr-context.md | 79 +++++++++++++++++++ .../gh-aw-mention-in-pr-by-id.lock.yml | 4 +- .../gh-aw-mention-in-pr-no-sandbox.lock.yml | 4 +- .../workflows/gh-aw-mention-in-pr.lock.yml | 4 +- .../gh-aw-pr-review-addresser.lock.yml | 4 +- .github/workflows/gh-aw-pr-review.lock.yml | 22 ++---- .github/workflows/gh-aw-pr-review.md | 18 +---- 7 files changed, 95 insertions(+), 40 deletions(-) diff --git a/.github/workflows/gh-aw-fragments/pr-context.md b/.github/workflows/gh-aw-fragments/pr-context.md index 31b8ee07..2a3c6928 100644 --- a/.github/workflows/gh-aw-fragments/pr-context.md +++ b/.github/workflows/gh-aw-fragments/pr-context.md @@ -37,6 +37,83 @@ steps: jq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \ > /tmp/pr-context/file_order_largest.txt + # Determine sub-agent count based on PR size + FILE_COUNT=$(jq 'length' /tmp/pr-context/files.json) + if [ "$FILE_COUNT" -le 10 ]; then + AGENT_COUNT=0 + elif [ "$FILE_COUNT" -le 20 ]; then + AGENT_COUNT=2 + else + AGENT_COUNT=3 + fi + echo "$AGENT_COUNT" > /tmp/pr-context/agent_count.txt + echo "PR size: ${FILE_COUNT} files → ${AGENT_COUNT} sub-agents" + + # Write review strategy with precise instructions for the agent + echo "# Review Strategy" > /tmp/pr-context/review-strategy.md + echo "" >> /tmp/pr-context/review-strategy.md + echo "**PR size:** ${FILE_COUNT} files | **Sub-agents:** ${AGENT_COUNT}" >> /tmp/pr-context/review-strategy.md + echo "" >> /tmp/pr-context/review-strategy.md + + if [ "$AGENT_COUNT" -eq 0 ]; then + cat >> /tmp/pr-context/review-strategy.md << 'STRATEGY_DIRECT' + ## Direct Review (no sub-agents) + + This PR is small enough to review directly. Do NOT spawn sub-agents. + + Review the diff file by file using the ordering in `/tmp/pr-context/file_order_az.txt`. For each changed file: + + 1. Read the diff from `/tmp/pr-context/diffs/.diff` + 2. Read the full file from the workspace for context + 3. Check existing threads in `/tmp/pr-context/threads/.json` (if it exists) + 4. Identify issues matching the Code Review Reference criteria + 5. Verify each issue: construct a concrete failure scenario, challenge the finding, check for existing threads + + Proceed to the Verify and Comment step with your findings. + STRATEGY_DIRECT + elif [ "$AGENT_COUNT" -eq 2 ]; then + cat >> /tmp/pr-context/review-strategy.md << 'STRATEGY_TWO' + ## Sub-agent Review (2 agents) + + Spawn exactly 2 `code-review` sub-agents in parallel: + + - **Agent 1**: file ordering from `/tmp/pr-context/file_order_az.txt` (A→Z) + - **Agent 2**: file ordering from `/tmp/pr-context/file_order_za.txt` (Z→A) + + Each sub-agent prompt must include: + - Instruction to read `/tmp/pr-context/review-instructions.md` for the review process, criteria, and calibration examples + - Instruction to read `/tmp/pr-context/README.md` for a manifest of all available context files + - The review intensity and minimum severity settings from the workflow + - The path to that sub-agent's file ordering — tell it to read the file for its ordered list (per-file diffs are at `/tmp/pr-context/diffs/.diff`) + - Instruction to read changed files from the workspace (the PR branch is checked out) + + Each sub-agent returns a structured findings list. They do NOT leave inline comments. + + After both sub-agents complete, merge and deduplicate findings per the Pick Three, Keep Many process before proceeding to the Verify and Comment step. + STRATEGY_TWO + else + cat >> /tmp/pr-context/review-strategy.md << 'STRATEGY_THREE' + ## Sub-agent Review (3 agents) + + Spawn exactly 3 `code-review` sub-agents in parallel: + + - **Agent 1**: file ordering from `/tmp/pr-context/file_order_az.txt` (A→Z) + - **Agent 2**: file ordering from `/tmp/pr-context/file_order_za.txt` (Z→A) + - **Agent 3**: file ordering from `/tmp/pr-context/file_order_largest.txt` (largest diff first) + + Each sub-agent prompt must include: + - Instruction to read `/tmp/pr-context/review-instructions.md` for the review process, criteria, and calibration examples + - Instruction to read `/tmp/pr-context/README.md` for a manifest of all available context files + - The review intensity and minimum severity settings from the workflow + - The path to that sub-agent's file ordering — tell it to read the file for its ordered list (per-file diffs are at `/tmp/pr-context/diffs/.diff`) + - Instruction to read changed files from the workspace (the PR branch is checked out) + + Each sub-agent returns a structured findings list. They do NOT leave inline comments. + + After all 3 sub-agents complete, merge and deduplicate findings per the Pick Three, Keep Many process before proceeding to the Verify and Comment step. + STRATEGY_THREE + fi + # Existing reviews gh api "repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews" --paginate \ | jq -s 'add // []' > /tmp/pr-context/reviews.json @@ -132,6 +209,8 @@ steps: | `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` | | `comments.json` | PR discussion comments (not inline) | | `issue-{N}.json` | Linked issue details (one file per linked issue, if any) | + | `agent_count.txt` | Pre-computed sub-agent count: `0` (≤10 files, direct review), `2` (11–20 files), or `3` (>20 files) | + | `review-strategy.md` | Pre-computed review strategy with precise instructions for the agent based on PR size | | `agents.md` | Repository conventions from `generate_agents_md` (if written by agent) | | `review-instructions.md` | Review instructions, criteria, and calibration examples (if written by review-process fragment) | MANIFEST diff --git a/.github/workflows/gh-aw-mention-in-pr-by-id.lock.yml b/.github/workflows/gh-aw-mention-in-pr-by-id.lock.yml index 35726214..ffb4125e 100644 --- a/.github/workflows/gh-aw-mention-in-pr-by-id.lock.yml +++ b/.github/workflows/gh-aw-mention-in-pr-by-id.lock.yml @@ -43,7 +43,7 @@ # # inlined-imports: true # -# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"c1596c1d583a1d00154c0d46ffeb93694bd2f8cd2568fa0b0669f1e76848d4b3"} +# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"5fa6384af683a9009b50e125abfb975d3952de0c48aa52d1b8d3ced33d16665c"} name: "Mention in PR by ID" "on": @@ -690,7 +690,7 @@ jobs: GH_TOKEN: ${{ github.token }} PR_NUMBER: ${{ github.event.pull_request.number || inputs.target-pr-number || github.event.issue.number }} name: Fetch PR context to disk - run: "set -euo pipefail\nmkdir -p /tmp/pr-context\n\n# PR metadata\ngh pr view \"$PR_NUMBER\" --json title,body,author,baseRefName,headRefName,headRefOid,url \\\n > /tmp/pr-context/pr.json\n\n# Full diff\nif ! gh pr diff \"$PR_NUMBER\" > /tmp/pr-context/pr.diff; then\n echo \"::warning::Failed to fetch full PR diff; per-file diffs from files.json are still available.\"\n : > /tmp/pr-context/pr.diff\nfi\n\n# Changed files list (--paginate may output concatenated arrays; jq -s 'add' merges them)\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/files.json\n\n# Per-file diffs\njq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do\n filename=$(echo \"$entry\" | jq -r '.filename')\n mkdir -p \"/tmp/pr-context/diffs/$(dirname \"$filename\")\"\n echo \"$entry\" | jq -r '.patch // empty' > \"/tmp/pr-context/diffs/${filename}.diff\"\ndone\n\n# File orderings for sub-agent review (3 strategies)\njq -r '[.[] | .filename] | sort | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_az.txt\njq -r '[.[] | .filename] | sort | reverse | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_za.txt\njq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_largest.txt\n\n# Existing reviews\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/reviews.json\n\n# Review threads with resolution status (GraphQL — REST lacks isResolved/isOutdated)\ngh api graphql --paginate -f query='\n query($owner: String!, $repo: String!, $number: Int!, $endCursor: String) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $number) {\n reviewThreads(first: 100, after: $endCursor) {\n pageInfo { hasNextPage endCursor }\n nodes {\n id\n isResolved\n isOutdated\n isCollapsed\n path\n line\n startLine\n comments(first: 100) {\n nodes {\n id\n databaseId\n body\n author { login }\n createdAt\n }\n }\n }\n }\n }\n }\n }\n' -F owner=\"${GITHUB_REPOSITORY%/*}\" -F repo=\"${GITHUB_REPOSITORY#*/}\" -F \"number=$PR_NUMBER\" \\\n --jq '.data.repository.pullRequest.reviewThreads.nodes' \\\n | jq -s 'add // []' > /tmp/pr-context/review_comments.json\n\n# Filtered review thread views (pre-computed so agents don't need to parse review_comments.json)\njq '[.[] | select(.isResolved == false)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/unresolved_threads.json\njq '[.[] | select(.isResolved == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/resolved_threads.json\njq '[.[] | select(.isOutdated == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/outdated_threads.json\n\n# Per-file review threads (mirrors diffs/ structure)\njq -c '.[]' /tmp/pr-context/review_comments.json | while IFS= read -r thread; do\n filepath=$(echo \"$thread\" | jq -r '.path // empty')\n [ -z \"$filepath\" ] && continue\n mkdir -p \"/tmp/pr-context/threads/$(dirname \"$filepath\")\"\n echo \"$thread\" >> \"/tmp/pr-context/threads/${filepath}.jsonl\"\ndone\n# Convert per-file JSONL to proper JSON arrays\nmkdir -p /tmp/pr-context/threads\nfind /tmp/pr-context/threads -name '*.jsonl' 2>/dev/null | while IFS= read -r jsonl; do\n jq -s '.' \"$jsonl\" > \"${jsonl%.jsonl}.json\"\n rm \"$jsonl\"\ndone\n\n# PR discussion comments\ngh api \"repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/comments.json\n\n# Linked issues\njq -r '.body // \"\"' /tmp/pr-context/pr.json 2>/dev/null \\\n | grep -oiE '(fixes|closes|resolves)\\s+#[0-9]+' \\\n | grep -oE '[0-9]+$' \\\n | sort -u \\\n | while read -r issue; do\n gh api \"repos/$GITHUB_REPOSITORY/issues/$issue\" > \"/tmp/pr-context/issue-${issue}.json\" || true\n done || true\n\n# Write manifest\ncat > /tmp/pr-context/README.md << 'MANIFEST'\n# PR Context\n\nPre-fetched PR data. All files are in `/tmp/pr-context/`.\n\n| File | Description |\n| --- | --- |\n| `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL |\n| `pr.diff` | Full unified diff of all changes |\n| `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` |\n| `diffs/.diff` | Per-file diffs — one file per changed file, mirroring the repo path under `diffs/` |\n| `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line |\n| `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line |\n| `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line |\n| `reviews.json` | Prior review submissions — author, state (APPROVED/CHANGES_REQUESTED/COMMENTED), body |\n| `review_comments.json` | All review threads (GraphQL) — each thread has `id` (node ID for resolving), `isResolved`, `isOutdated`, `path`, `line`, and nested `comments` with `id`, `databaseId` (numeric REST ID for replies), body/author |\n| `unresolved_threads.json` | Unresolved review threads — subset of `review_comments.json` where `isResolved` is false |\n| `resolved_threads.json` | Resolved review threads — subset of `review_comments.json` where `isResolved` is true |\n| `outdated_threads.json` | Outdated review threads — subset of `review_comments.json` where `isOutdated` is true (code changed since comment) |\n| `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` |\n| `comments.json` | PR discussion comments (not inline) |\n| `issue-{N}.json` | Linked issue details (one file per linked issue, if any) |\n| `agents.md` | Repository conventions from `generate_agents_md` (if written by agent) |\n| `review-instructions.md` | Review instructions, criteria, and calibration examples (if written by review-process fragment) |\nMANIFEST\n\necho \"PR context written to /tmp/pr-context/\"\nls -la /tmp/pr-context/" + run: "set -euo pipefail\nmkdir -p /tmp/pr-context\n\n# PR metadata\ngh pr view \"$PR_NUMBER\" --json title,body,author,baseRefName,headRefName,headRefOid,url \\\n > /tmp/pr-context/pr.json\n\n# Full diff\nif ! gh pr diff \"$PR_NUMBER\" > /tmp/pr-context/pr.diff; then\n echo \"::warning::Failed to fetch full PR diff; per-file diffs from files.json are still available.\"\n : > /tmp/pr-context/pr.diff\nfi\n\n# Changed files list (--paginate may output concatenated arrays; jq -s 'add' merges them)\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/files.json\n\n# Per-file diffs\njq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do\n filename=$(echo \"$entry\" | jq -r '.filename')\n mkdir -p \"/tmp/pr-context/diffs/$(dirname \"$filename\")\"\n echo \"$entry\" | jq -r '.patch // empty' > \"/tmp/pr-context/diffs/${filename}.diff\"\ndone\n\n# File orderings for sub-agent review (3 strategies)\njq -r '[.[] | .filename] | sort | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_az.txt\njq -r '[.[] | .filename] | sort | reverse | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_za.txt\njq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_largest.txt\n\n# Determine sub-agent count based on PR size\nFILE_COUNT=$(jq 'length' /tmp/pr-context/files.json)\nif [ \"$FILE_COUNT\" -le 10 ]; then\n AGENT_COUNT=0\nelif [ \"$FILE_COUNT\" -le 20 ]; then\n AGENT_COUNT=2\nelse\n AGENT_COUNT=3\nfi\necho \"$AGENT_COUNT\" > /tmp/pr-context/agent_count.txt\necho \"PR size: ${FILE_COUNT} files → ${AGENT_COUNT} sub-agents\"\n\n# Write review strategy with precise instructions for the agent\necho \"# Review Strategy\" > /tmp/pr-context/review-strategy.md\necho \"\" >> /tmp/pr-context/review-strategy.md\necho \"**PR size:** ${FILE_COUNT} files | **Sub-agents:** ${AGENT_COUNT}\" >> /tmp/pr-context/review-strategy.md\necho \"\" >> /tmp/pr-context/review-strategy.md\n\nif [ \"$AGENT_COUNT\" -eq 0 ]; then\n cat >> /tmp/pr-context/review-strategy.md << 'STRATEGY_DIRECT'\n## Direct Review (no sub-agents)\n\nThis PR is small enough to review directly. Do NOT spawn sub-agents.\n\nReview the diff file by file using the ordering in `/tmp/pr-context/file_order_az.txt`. For each changed file:\n\n1. Read the diff from `/tmp/pr-context/diffs/.diff`\n2. Read the full file from the workspace for context\n3. Check existing threads in `/tmp/pr-context/threads/.json` (if it exists)\n4. Identify issues matching the Code Review Reference criteria\n5. Verify each issue: construct a concrete failure scenario, challenge the finding, check for existing threads\n\nProceed to the Verify and Comment step with your findings.\nSTRATEGY_DIRECT\nelif [ \"$AGENT_COUNT\" -eq 2 ]; then\n cat >> /tmp/pr-context/review-strategy.md << 'STRATEGY_TWO'\n## Sub-agent Review (2 agents)\n\nSpawn exactly 2 `code-review` sub-agents in parallel:\n\n- **Agent 1**: file ordering from `/tmp/pr-context/file_order_az.txt` (A→Z)\n- **Agent 2**: file ordering from `/tmp/pr-context/file_order_za.txt` (Z→A)\n\nEach sub-agent prompt must include:\n- Instruction to read `/tmp/pr-context/review-instructions.md` for the review process, criteria, and calibration examples\n- Instruction to read `/tmp/pr-context/README.md` for a manifest of all available context files\n- The review intensity and minimum severity settings from the workflow\n- The path to that sub-agent's file ordering — tell it to read the file for its ordered list (per-file diffs are at `/tmp/pr-context/diffs/.diff`)\n- Instruction to read changed files from the workspace (the PR branch is checked out)\n\nEach sub-agent returns a structured findings list. They do NOT leave inline comments.\n\nAfter both sub-agents complete, merge and deduplicate findings per the Pick Three, Keep Many process before proceeding to the Verify and Comment step.\nSTRATEGY_TWO\nelse\n cat >> /tmp/pr-context/review-strategy.md << 'STRATEGY_THREE'\n## Sub-agent Review (3 agents)\n\nSpawn exactly 3 `code-review` sub-agents in parallel:\n\n- **Agent 1**: file ordering from `/tmp/pr-context/file_order_az.txt` (A→Z)\n- **Agent 2**: file ordering from `/tmp/pr-context/file_order_za.txt` (Z→A)\n- **Agent 3**: file ordering from `/tmp/pr-context/file_order_largest.txt` (largest diff first)\n\nEach sub-agent prompt must include:\n- Instruction to read `/tmp/pr-context/review-instructions.md` for the review process, criteria, and calibration examples\n- Instruction to read `/tmp/pr-context/README.md` for a manifest of all available context files\n- The review intensity and minimum severity settings from the workflow\n- The path to that sub-agent's file ordering — tell it to read the file for its ordered list (per-file diffs are at `/tmp/pr-context/diffs/.diff`)\n- Instruction to read changed files from the workspace (the PR branch is checked out)\n\nEach sub-agent returns a structured findings list. They do NOT leave inline comments.\n\nAfter all 3 sub-agents complete, merge and deduplicate findings per the Pick Three, Keep Many process before proceeding to the Verify and Comment step.\nSTRATEGY_THREE\nfi\n\n# Existing reviews\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/reviews.json\n\n# Review threads with resolution status (GraphQL — REST lacks isResolved/isOutdated)\ngh api graphql --paginate -f query='\n query($owner: String!, $repo: String!, $number: Int!, $endCursor: String) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $number) {\n reviewThreads(first: 100, after: $endCursor) {\n pageInfo { hasNextPage endCursor }\n nodes {\n id\n isResolved\n isOutdated\n isCollapsed\n path\n line\n startLine\n comments(first: 100) {\n nodes {\n id\n databaseId\n body\n author { login }\n createdAt\n }\n }\n }\n }\n }\n }\n }\n' -F owner=\"${GITHUB_REPOSITORY%/*}\" -F repo=\"${GITHUB_REPOSITORY#*/}\" -F \"number=$PR_NUMBER\" \\\n --jq '.data.repository.pullRequest.reviewThreads.nodes' \\\n | jq -s 'add // []' > /tmp/pr-context/review_comments.json\n\n# Filtered review thread views (pre-computed so agents don't need to parse review_comments.json)\njq '[.[] | select(.isResolved == false)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/unresolved_threads.json\njq '[.[] | select(.isResolved == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/resolved_threads.json\njq '[.[] | select(.isOutdated == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/outdated_threads.json\n\n# Per-file review threads (mirrors diffs/ structure)\njq -c '.[]' /tmp/pr-context/review_comments.json | while IFS= read -r thread; do\n filepath=$(echo \"$thread\" | jq -r '.path // empty')\n [ -z \"$filepath\" ] && continue\n mkdir -p \"/tmp/pr-context/threads/$(dirname \"$filepath\")\"\n echo \"$thread\" >> \"/tmp/pr-context/threads/${filepath}.jsonl\"\ndone\n# Convert per-file JSONL to proper JSON arrays\nmkdir -p /tmp/pr-context/threads\nfind /tmp/pr-context/threads -name '*.jsonl' 2>/dev/null | while IFS= read -r jsonl; do\n jq -s '.' \"$jsonl\" > \"${jsonl%.jsonl}.json\"\n rm \"$jsonl\"\ndone\n\n# PR discussion comments\ngh api \"repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/comments.json\n\n# Linked issues\njq -r '.body // \"\"' /tmp/pr-context/pr.json 2>/dev/null \\\n | grep -oiE '(fixes|closes|resolves)\\s+#[0-9]+' \\\n | grep -oE '[0-9]+$' \\\n | sort -u \\\n | while read -r issue; do\n gh api \"repos/$GITHUB_REPOSITORY/issues/$issue\" > \"/tmp/pr-context/issue-${issue}.json\" || true\n done || true\n\n# Write manifest\ncat > /tmp/pr-context/README.md << 'MANIFEST'\n# PR Context\n\nPre-fetched PR data. All files are in `/tmp/pr-context/`.\n\n| File | Description |\n| --- | --- |\n| `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL |\n| `pr.diff` | Full unified diff of all changes |\n| `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` |\n| `diffs/.diff` | Per-file diffs — one file per changed file, mirroring the repo path under `diffs/` |\n| `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line |\n| `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line |\n| `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line |\n| `reviews.json` | Prior review submissions — author, state (APPROVED/CHANGES_REQUESTED/COMMENTED), body |\n| `review_comments.json` | All review threads (GraphQL) — each thread has `id` (node ID for resolving), `isResolved`, `isOutdated`, `path`, `line`, and nested `comments` with `id`, `databaseId` (numeric REST ID for replies), body/author |\n| `unresolved_threads.json` | Unresolved review threads — subset of `review_comments.json` where `isResolved` is false |\n| `resolved_threads.json` | Resolved review threads — subset of `review_comments.json` where `isResolved` is true |\n| `outdated_threads.json` | Outdated review threads — subset of `review_comments.json` where `isOutdated` is true (code changed since comment) |\n| `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` |\n| `comments.json` | PR discussion comments (not inline) |\n| `issue-{N}.json` | Linked issue details (one file per linked issue, if any) |\n| `agent_count.txt` | Pre-computed sub-agent count: `0` (≤10 files, direct review), `2` (11–20 files), or `3` (>20 files) |\n| `review-strategy.md` | Pre-computed review strategy with precise instructions for the agent based on PR size |\n| `agents.md` | Repository conventions from `generate_agents_md` (if written by agent) |\n| `review-instructions.md` | Review instructions, criteria, and calibration examples (if written by review-process fragment) |\nMANIFEST\n\necho \"PR context written to /tmp/pr-context/\"\nls -la /tmp/pr-context/" - name: Write review instructions to disk run: "mkdir -p /tmp/pr-context\ncat > /tmp/pr-context/review-instructions.md << 'REVIEW_EOF'\n# Review Instructions for Sub-agents\n\nYou are a code review sub-agent. Read these instructions, then review the PR files in the order provided in your prompt.\n\n## Context\n\nBefore reviewing files, read these to understand the PR:\n\n1. `/tmp/pr-context/pr.json` — PR title, description, author, and branches. Understand what the PR is trying to accomplish.\n2. `/tmp/pr-context/agents.md` — Repository coding conventions and guidelines (if it exists).\n3. `/tmp/pr-context/review_comments.json` — Existing review threads. Note which files already have threads so you don't duplicate.\n4. `/tmp/pr-context/issue-*.json` — Linked issue details (if any). Understand the motivation and acceptance criteria.\n\n## Process\n\nReview the PR diff file by file in your assigned order. For each changed file:\n\n1. **Read the diff** for this file from `/tmp/pr-context/diffs/.diff` to understand what changed. If the diff is empty or truncated (e.g., binary files or very large changes), fall back to reading the full file from the workspace and comparing against context.\n2. **Read the full file from the workspace.** The PR branch is checked out locally — open the file directly to get complete contents with line numbers.\n3. **Check existing threads** for this file from `/tmp/pr-context/threads/.json` (if it exists). Skip issues that are already under discussion — each thread has `isResolved` and `isOutdated` fields.\n4. **Identify potential issues** matching the review criteria below.\n5. **Quick-check each issue** before including it:\n - What specific code pattern or change triggers this concern?\n - Is there an obvious guard, handler, or mitigation visible in the immediate context?\n - Can you describe a concrete failure scenario (the `evidence` field)? If you cannot articulate what specific input or state triggers the problem, drop the finding.\n - If the issue is clearly handled, skip it. If you're unsure, include it — the parent will verify.\n6. **Add to your findings list.** Do NOT leave inline comments — you don't have that tool. Return findings in this format:\n\n```\n- file: path/to/file\n line: 42\n severity: HIGH\n title: Brief title\n description: What the issue is and why it matters\n evidence: The specific code pattern and failure scenario\n suggestion: corrected code here (optional — only if you can provide a concrete fix)\n```\n\n**Review every file in your assigned order.** Files reviewed earlier get more attention, which is why different sub-agents use different orderings.\n\n**Check existing threads** — per-file threads are at `/tmp/pr-context/threads/.json` (step 3 above). The full list is at `/tmp/pr-context/review_comments.json`. Do not flag issues that are already under discussion (resolved or unresolved). For outdated threads, only re-flag if the issue still applies to the current diff.\n\n**Return your full findings list** when done, or an empty list if no issues were found.\n\n## Review Criteria\n\nFocus on these categories in priority order:\n\n1. Security vulnerabilities (injection, XSS, auth bypass, secrets exposure)\n2. Logic bugs that could cause runtime failures or incorrect behavior\n3. Data integrity issues (race conditions, missing transactions, corruption risk)\n4. Performance bottlenecks (N+1 queries, memory leaks, blocking operations)\n5. Error handling gaps (unhandled exceptions, missing validation)\n6. Breaking changes to public APIs without migration path\n7. Missing or incorrect test coverage for critical paths\n\n## What NOT to Flag\n\nOnly review the diff — do not flag issues in unchanged code, pre-existing problems not introduced by this PR, or style preferences handled by linters or formatters.\n\n**Common false positives** — these patterns look like issues but usually aren't. Before flagging anything in these categories, confirm the problem is real by reading the surrounding code:\n\n- **Security — input already sanitized:** Don't flag injection or XSS risks when inputs are sanitized upstream, parameterized queries are used, or the framework auto-escapes output.\n- **Null/undefined — guarded elsewhere:** Don't flag potential null dereferences if the value is guaranteed by a type guard, assertion, schema validation, or upstream null check.\n- **Error handling — handled at a different layer:** Don't flag missing try/catch if the caller, middleware, or framework catches and handles the error (e.g., Express error middleware, React error boundaries).\n- **Performance — theoretical, not practical:** Don't flag algorithmic complexity (e.g., O(n^2)) unless N is demonstrably large enough to matter in the actual usage context. \"This could be slow\" without evidence is not actionable.\n- **Validation — exists at another layer:** Don't flag missing input validation if it's handled by an API gateway, middleware, schema validator, or type system.\n- **Test coverage — trivial or generated code:** Don't flag missing tests for trivial getters/setters, auto-generated code, or simple delegation methods.\n- **Style or naming — not in coding guidelines:** Don't flag naming conventions or code style unless they violate the repository's documented coding guidelines (from `generate_agents_md` or CONTRIBUTING docs).\n\n**Existing review threads** — check BEFORE flagging any issue:\n\n- **Resolved with reviewer reply** (e.g. \"This is intentional\") — reviewer's decision is final. Do NOT re-flag.\n- **Resolved without reply** — author likely fixed it. Do NOT re-raise unless the fix introduced a new problem.\n- **Unresolved** — already flagged. Do NOT duplicate.\n- **Outdated** — only re-flag if the issue still applies to the current diff.\n\nWhen in doubt, do not duplicate. Redundant comments erode trust.\n\nFinding no issues is a valid and valuable outcome. An empty findings list is better than findings that waste the author's time or erode trust. Do not manufacture findings to justify your review — if the code is sound, return an empty list.\n\n## Severity Classification\n\nDetermine severity AFTER investigating the issue, not before. First identify the problem and trace through the code, then assign a severity based on the evidence you found.\n\n- 🔴 CRITICAL — Must fix before merge (security vulnerabilities, data corruption, production-breaking bugs)\n- 🟠 HIGH — Should fix before merge (logic errors, missing validation, significant performance issues)\n- 🟡 MEDIUM — Address soon, non-blocking (error handling gaps, suboptimal patterns, missing edge cases)\n- ⚪ LOW — Author discretion (minor improvements, documentation gaps)\n- 💬 NITPICK — Truly optional (stylistic preferences, alternative approaches)\n\n## Review Intensity\n\nThe review intensity is `${{ inputs.intensity || 'balanced' }}`.\n\n- **conservative**: High evidence bar. Only flag when you can demonstrate a concrete failure scenario. If you can construct a reasonable counterargument, do not flag. Approval with zero findings is the expected outcome for most PRs.\n- **balanced**: Standard evidence bar. Flag when you can point to specific code that would fail. If the issue is ambiguous, lean toward not flagging.\n- **aggressive**: Lower evidence bar. Flag when evidence exists even if the failure scenario is not fully confirmed. Improvement suggestions welcome but must cite specific code.\n\n## Calibration Examples\n\nUse these examples to calibrate your judgment. Each pair shows a real issue and a similar-looking pattern that is NOT an issue.\n\n### Example 1: Null/Undefined Access\n\n**True positive — flag this:**\n\n```js\n// PR adds this handler\napp.get('/user/:id', async (req, res) => {\n const user = await db.findUser(req.params.id);\n res.json({ name: user.name, email: user.email });\n});\n```\n\nWhy flag: `db.findUser()` can return `null` when no user matches the ID. Accessing `user.name` will throw a TypeError at runtime. No upstream guard exists — the route handler is the entry point.\n\n**False positive — do NOT flag this:**\n\n```ts\n// PR adds this line inside an existing function\nconst settings = user.getSettings();\n```\n\nWhy skip: Reading the full file reveals `user` is typed as `User` (not `User | null`), and the calling function only runs after `authenticateUser()` middleware which guarantees a valid user object. The null case is handled at a different layer.\n\n### Example 2: SQL Injection\n\n**True positive — flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE customer_id = '{customer_id}'\")\n```\n\nWhy flag: String interpolation in a SQL query with user-controlled input (`customer_id` comes from the request). No parameterization or sanitization anywhere in the call chain.\n\n**False positive — do NOT flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE status = '{OrderStatus.PENDING.value}'\")\n```\n\nWhy skip: The interpolated value is a hardcoded enum constant (`OrderStatus.PENDING`), not user input. There is no injection vector.\n\n### Example 3: Borderline — Do NOT Flag\n\n```go\n// PR adds this function\nfunc processItems(items []Item) []Result {\n results := make([]Result, 0)\n for _, item := range items {\n for _, tag := range item.Tags {\n results = append(results, process(item, tag))\n }\n }\n return results\n}\n```\n\nThis looks like an O(n*m) performance concern. But without evidence that `items` or `Tags` are large in practice, this is speculative. The function processes a bounded dataset (items from a single user request). Do not flag theoretical performance issues without evidence of real-world impact.\nREVIEW_EOF" - env: diff --git a/.github/workflows/gh-aw-mention-in-pr-no-sandbox.lock.yml b/.github/workflows/gh-aw-mention-in-pr-no-sandbox.lock.yml index 4cf8e4d8..b3c8239d 100644 --- a/.github/workflows/gh-aw-mention-in-pr-no-sandbox.lock.yml +++ b/.github/workflows/gh-aw-mention-in-pr-no-sandbox.lock.yml @@ -44,7 +44,7 @@ # # inlined-imports: true # -# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"b8f87b805cb8e2d0dabcb69a1a14c5bfe167f537bbf66deda2f051163a3bbb21"} +# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"11b215382228d760088d794373ebca41545e05eb041c254a0e7924e45cff3b40"} name: "Mention in PR (no sandbox)" "on": @@ -756,7 +756,7 @@ jobs: GH_TOKEN: ${{ github.token }} PR_NUMBER: ${{ github.event.pull_request.number || inputs.target-pr-number || github.event.issue.number }} name: Fetch PR context to disk - run: "set -euo pipefail\nmkdir -p /tmp/pr-context\n\n# PR metadata\ngh pr view \"$PR_NUMBER\" --json title,body,author,baseRefName,headRefName,headRefOid,url \\\n > /tmp/pr-context/pr.json\n\n# Full diff\nif ! gh pr diff \"$PR_NUMBER\" > /tmp/pr-context/pr.diff; then\n echo \"::warning::Failed to fetch full PR diff; per-file diffs from files.json are still available.\"\n : > /tmp/pr-context/pr.diff\nfi\n\n# Changed files list (--paginate may output concatenated arrays; jq -s 'add' merges them)\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/files.json\n\n# Per-file diffs\njq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do\n filename=$(echo \"$entry\" | jq -r '.filename')\n mkdir -p \"/tmp/pr-context/diffs/$(dirname \"$filename\")\"\n echo \"$entry\" | jq -r '.patch // empty' > \"/tmp/pr-context/diffs/${filename}.diff\"\ndone\n\n# File orderings for sub-agent review (3 strategies)\njq -r '[.[] | .filename] | sort | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_az.txt\njq -r '[.[] | .filename] | sort | reverse | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_za.txt\njq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_largest.txt\n\n# Existing reviews\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/reviews.json\n\n# Review threads with resolution status (GraphQL — REST lacks isResolved/isOutdated)\ngh api graphql --paginate -f query='\n query($owner: String!, $repo: String!, $number: Int!, $endCursor: String) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $number) {\n reviewThreads(first: 100, after: $endCursor) {\n pageInfo { hasNextPage endCursor }\n nodes {\n id\n isResolved\n isOutdated\n isCollapsed\n path\n line\n startLine\n comments(first: 100) {\n nodes {\n id\n databaseId\n body\n author { login }\n createdAt\n }\n }\n }\n }\n }\n }\n }\n' -F owner=\"${GITHUB_REPOSITORY%/*}\" -F repo=\"${GITHUB_REPOSITORY#*/}\" -F \"number=$PR_NUMBER\" \\\n --jq '.data.repository.pullRequest.reviewThreads.nodes' \\\n | jq -s 'add // []' > /tmp/pr-context/review_comments.json\n\n# Filtered review thread views (pre-computed so agents don't need to parse review_comments.json)\njq '[.[] | select(.isResolved == false)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/unresolved_threads.json\njq '[.[] | select(.isResolved == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/resolved_threads.json\njq '[.[] | select(.isOutdated == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/outdated_threads.json\n\n# Per-file review threads (mirrors diffs/ structure)\njq -c '.[]' /tmp/pr-context/review_comments.json | while IFS= read -r thread; do\n filepath=$(echo \"$thread\" | jq -r '.path // empty')\n [ -z \"$filepath\" ] && continue\n mkdir -p \"/tmp/pr-context/threads/$(dirname \"$filepath\")\"\n echo \"$thread\" >> \"/tmp/pr-context/threads/${filepath}.jsonl\"\ndone\n# Convert per-file JSONL to proper JSON arrays\nmkdir -p /tmp/pr-context/threads\nfind /tmp/pr-context/threads -name '*.jsonl' 2>/dev/null | while IFS= read -r jsonl; do\n jq -s '.' \"$jsonl\" > \"${jsonl%.jsonl}.json\"\n rm \"$jsonl\"\ndone\n\n# PR discussion comments\ngh api \"repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/comments.json\n\n# Linked issues\njq -r '.body // \"\"' /tmp/pr-context/pr.json 2>/dev/null \\\n | grep -oiE '(fixes|closes|resolves)\\s+#[0-9]+' \\\n | grep -oE '[0-9]+$' \\\n | sort -u \\\n | while read -r issue; do\n gh api \"repos/$GITHUB_REPOSITORY/issues/$issue\" > \"/tmp/pr-context/issue-${issue}.json\" || true\n done || true\n\n# Write manifest\ncat > /tmp/pr-context/README.md << 'MANIFEST'\n# PR Context\n\nPre-fetched PR data. All files are in `/tmp/pr-context/`.\n\n| File | Description |\n| --- | --- |\n| `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL |\n| `pr.diff` | Full unified diff of all changes |\n| `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` |\n| `diffs/.diff` | Per-file diffs — one file per changed file, mirroring the repo path under `diffs/` |\n| `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line |\n| `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line |\n| `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line |\n| `reviews.json` | Prior review submissions — author, state (APPROVED/CHANGES_REQUESTED/COMMENTED), body |\n| `review_comments.json` | All review threads (GraphQL) — each thread has `id` (node ID for resolving), `isResolved`, `isOutdated`, `path`, `line`, and nested `comments` with `id`, `databaseId` (numeric REST ID for replies), body/author |\n| `unresolved_threads.json` | Unresolved review threads — subset of `review_comments.json` where `isResolved` is false |\n| `resolved_threads.json` | Resolved review threads — subset of `review_comments.json` where `isResolved` is true |\n| `outdated_threads.json` | Outdated review threads — subset of `review_comments.json` where `isOutdated` is true (code changed since comment) |\n| `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` |\n| `comments.json` | PR discussion comments (not inline) |\n| `issue-{N}.json` | Linked issue details (one file per linked issue, if any) |\n| `agents.md` | Repository conventions from `generate_agents_md` (if written by agent) |\n| `review-instructions.md` | Review instructions, criteria, and calibration examples (if written by review-process fragment) |\nMANIFEST\n\necho \"PR context written to /tmp/pr-context/\"\nls -la /tmp/pr-context/" + run: "set -euo pipefail\nmkdir -p /tmp/pr-context\n\n# PR metadata\ngh pr view \"$PR_NUMBER\" --json title,body,author,baseRefName,headRefName,headRefOid,url \\\n > /tmp/pr-context/pr.json\n\n# Full diff\nif ! gh pr diff \"$PR_NUMBER\" > /tmp/pr-context/pr.diff; then\n echo \"::warning::Failed to fetch full PR diff; per-file diffs from files.json are still available.\"\n : > /tmp/pr-context/pr.diff\nfi\n\n# Changed files list (--paginate may output concatenated arrays; jq -s 'add' merges them)\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/files.json\n\n# Per-file diffs\njq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do\n filename=$(echo \"$entry\" | jq -r '.filename')\n mkdir -p \"/tmp/pr-context/diffs/$(dirname \"$filename\")\"\n echo \"$entry\" | jq -r '.patch // empty' > \"/tmp/pr-context/diffs/${filename}.diff\"\ndone\n\n# File orderings for sub-agent review (3 strategies)\njq -r '[.[] | .filename] | sort | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_az.txt\njq -r '[.[] | .filename] | sort | reverse | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_za.txt\njq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_largest.txt\n\n# Determine sub-agent count based on PR size\nFILE_COUNT=$(jq 'length' /tmp/pr-context/files.json)\nif [ \"$FILE_COUNT\" -le 10 ]; then\n AGENT_COUNT=0\nelif [ \"$FILE_COUNT\" -le 20 ]; then\n AGENT_COUNT=2\nelse\n AGENT_COUNT=3\nfi\necho \"$AGENT_COUNT\" > /tmp/pr-context/agent_count.txt\necho \"PR size: ${FILE_COUNT} files → ${AGENT_COUNT} sub-agents\"\n\n# Write review strategy with precise instructions for the agent\necho \"# Review Strategy\" > /tmp/pr-context/review-strategy.md\necho \"\" >> /tmp/pr-context/review-strategy.md\necho \"**PR size:** ${FILE_COUNT} files | **Sub-agents:** ${AGENT_COUNT}\" >> /tmp/pr-context/review-strategy.md\necho \"\" >> /tmp/pr-context/review-strategy.md\n\nif [ \"$AGENT_COUNT\" -eq 0 ]; then\n cat >> /tmp/pr-context/review-strategy.md << 'STRATEGY_DIRECT'\n## Direct Review (no sub-agents)\n\nThis PR is small enough to review directly. Do NOT spawn sub-agents.\n\nReview the diff file by file using the ordering in `/tmp/pr-context/file_order_az.txt`. For each changed file:\n\n1. Read the diff from `/tmp/pr-context/diffs/.diff`\n2. Read the full file from the workspace for context\n3. Check existing threads in `/tmp/pr-context/threads/.json` (if it exists)\n4. Identify issues matching the Code Review Reference criteria\n5. Verify each issue: construct a concrete failure scenario, challenge the finding, check for existing threads\n\nProceed to the Verify and Comment step with your findings.\nSTRATEGY_DIRECT\nelif [ \"$AGENT_COUNT\" -eq 2 ]; then\n cat >> /tmp/pr-context/review-strategy.md << 'STRATEGY_TWO'\n## Sub-agent Review (2 agents)\n\nSpawn exactly 2 `code-review` sub-agents in parallel:\n\n- **Agent 1**: file ordering from `/tmp/pr-context/file_order_az.txt` (A→Z)\n- **Agent 2**: file ordering from `/tmp/pr-context/file_order_za.txt` (Z→A)\n\nEach sub-agent prompt must include:\n- Instruction to read `/tmp/pr-context/review-instructions.md` for the review process, criteria, and calibration examples\n- Instruction to read `/tmp/pr-context/README.md` for a manifest of all available context files\n- The review intensity and minimum severity settings from the workflow\n- The path to that sub-agent's file ordering — tell it to read the file for its ordered list (per-file diffs are at `/tmp/pr-context/diffs/.diff`)\n- Instruction to read changed files from the workspace (the PR branch is checked out)\n\nEach sub-agent returns a structured findings list. They do NOT leave inline comments.\n\nAfter both sub-agents complete, merge and deduplicate findings per the Pick Three, Keep Many process before proceeding to the Verify and Comment step.\nSTRATEGY_TWO\nelse\n cat >> /tmp/pr-context/review-strategy.md << 'STRATEGY_THREE'\n## Sub-agent Review (3 agents)\n\nSpawn exactly 3 `code-review` sub-agents in parallel:\n\n- **Agent 1**: file ordering from `/tmp/pr-context/file_order_az.txt` (A→Z)\n- **Agent 2**: file ordering from `/tmp/pr-context/file_order_za.txt` (Z→A)\n- **Agent 3**: file ordering from `/tmp/pr-context/file_order_largest.txt` (largest diff first)\n\nEach sub-agent prompt must include:\n- Instruction to read `/tmp/pr-context/review-instructions.md` for the review process, criteria, and calibration examples\n- Instruction to read `/tmp/pr-context/README.md` for a manifest of all available context files\n- The review intensity and minimum severity settings from the workflow\n- The path to that sub-agent's file ordering — tell it to read the file for its ordered list (per-file diffs are at `/tmp/pr-context/diffs/.diff`)\n- Instruction to read changed files from the workspace (the PR branch is checked out)\n\nEach sub-agent returns a structured findings list. They do NOT leave inline comments.\n\nAfter all 3 sub-agents complete, merge and deduplicate findings per the Pick Three, Keep Many process before proceeding to the Verify and Comment step.\nSTRATEGY_THREE\nfi\n\n# Existing reviews\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/reviews.json\n\n# Review threads with resolution status (GraphQL — REST lacks isResolved/isOutdated)\ngh api graphql --paginate -f query='\n query($owner: String!, $repo: String!, $number: Int!, $endCursor: String) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $number) {\n reviewThreads(first: 100, after: $endCursor) {\n pageInfo { hasNextPage endCursor }\n nodes {\n id\n isResolved\n isOutdated\n isCollapsed\n path\n line\n startLine\n comments(first: 100) {\n nodes {\n id\n databaseId\n body\n author { login }\n createdAt\n }\n }\n }\n }\n }\n }\n }\n' -F owner=\"${GITHUB_REPOSITORY%/*}\" -F repo=\"${GITHUB_REPOSITORY#*/}\" -F \"number=$PR_NUMBER\" \\\n --jq '.data.repository.pullRequest.reviewThreads.nodes' \\\n | jq -s 'add // []' > /tmp/pr-context/review_comments.json\n\n# Filtered review thread views (pre-computed so agents don't need to parse review_comments.json)\njq '[.[] | select(.isResolved == false)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/unresolved_threads.json\njq '[.[] | select(.isResolved == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/resolved_threads.json\njq '[.[] | select(.isOutdated == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/outdated_threads.json\n\n# Per-file review threads (mirrors diffs/ structure)\njq -c '.[]' /tmp/pr-context/review_comments.json | while IFS= read -r thread; do\n filepath=$(echo \"$thread\" | jq -r '.path // empty')\n [ -z \"$filepath\" ] && continue\n mkdir -p \"/tmp/pr-context/threads/$(dirname \"$filepath\")\"\n echo \"$thread\" >> \"/tmp/pr-context/threads/${filepath}.jsonl\"\ndone\n# Convert per-file JSONL to proper JSON arrays\nmkdir -p /tmp/pr-context/threads\nfind /tmp/pr-context/threads -name '*.jsonl' 2>/dev/null | while IFS= read -r jsonl; do\n jq -s '.' \"$jsonl\" > \"${jsonl%.jsonl}.json\"\n rm \"$jsonl\"\ndone\n\n# PR discussion comments\ngh api \"repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/comments.json\n\n# Linked issues\njq -r '.body // \"\"' /tmp/pr-context/pr.json 2>/dev/null \\\n | grep -oiE '(fixes|closes|resolves)\\s+#[0-9]+' \\\n | grep -oE '[0-9]+$' \\\n | sort -u \\\n | while read -r issue; do\n gh api \"repos/$GITHUB_REPOSITORY/issues/$issue\" > \"/tmp/pr-context/issue-${issue}.json\" || true\n done || true\n\n# Write manifest\ncat > /tmp/pr-context/README.md << 'MANIFEST'\n# PR Context\n\nPre-fetched PR data. All files are in `/tmp/pr-context/`.\n\n| File | Description |\n| --- | --- |\n| `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL |\n| `pr.diff` | Full unified diff of all changes |\n| `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` |\n| `diffs/.diff` | Per-file diffs — one file per changed file, mirroring the repo path under `diffs/` |\n| `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line |\n| `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line |\n| `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line |\n| `reviews.json` | Prior review submissions — author, state (APPROVED/CHANGES_REQUESTED/COMMENTED), body |\n| `review_comments.json` | All review threads (GraphQL) — each thread has `id` (node ID for resolving), `isResolved`, `isOutdated`, `path`, `line`, and nested `comments` with `id`, `databaseId` (numeric REST ID for replies), body/author |\n| `unresolved_threads.json` | Unresolved review threads — subset of `review_comments.json` where `isResolved` is false |\n| `resolved_threads.json` | Resolved review threads — subset of `review_comments.json` where `isResolved` is true |\n| `outdated_threads.json` | Outdated review threads — subset of `review_comments.json` where `isOutdated` is true (code changed since comment) |\n| `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` |\n| `comments.json` | PR discussion comments (not inline) |\n| `issue-{N}.json` | Linked issue details (one file per linked issue, if any) |\n| `agent_count.txt` | Pre-computed sub-agent count: `0` (≤10 files, direct review), `2` (11–20 files), or `3` (>20 files) |\n| `review-strategy.md` | Pre-computed review strategy with precise instructions for the agent based on PR size |\n| `agents.md` | Repository conventions from `generate_agents_md` (if written by agent) |\n| `review-instructions.md` | Review instructions, criteria, and calibration examples (if written by review-process fragment) |\nMANIFEST\n\necho \"PR context written to /tmp/pr-context/\"\nls -la /tmp/pr-context/" - name: Write review instructions to disk run: "mkdir -p /tmp/pr-context\ncat > /tmp/pr-context/review-instructions.md << 'REVIEW_EOF'\n# Review Instructions for Sub-agents\n\nYou are a code review sub-agent. Read these instructions, then review the PR files in the order provided in your prompt.\n\n## Context\n\nBefore reviewing files, read these to understand the PR:\n\n1. `/tmp/pr-context/pr.json` — PR title, description, author, and branches. Understand what the PR is trying to accomplish.\n2. `/tmp/pr-context/agents.md` — Repository coding conventions and guidelines (if it exists).\n3. `/tmp/pr-context/review_comments.json` — Existing review threads. Note which files already have threads so you don't duplicate.\n4. `/tmp/pr-context/issue-*.json` — Linked issue details (if any). Understand the motivation and acceptance criteria.\n\n## Process\n\nReview the PR diff file by file in your assigned order. For each changed file:\n\n1. **Read the diff** for this file from `/tmp/pr-context/diffs/.diff` to understand what changed. If the diff is empty or truncated (e.g., binary files or very large changes), fall back to reading the full file from the workspace and comparing against context.\n2. **Read the full file from the workspace.** The PR branch is checked out locally — open the file directly to get complete contents with line numbers.\n3. **Check existing threads** for this file from `/tmp/pr-context/threads/.json` (if it exists). Skip issues that are already under discussion — each thread has `isResolved` and `isOutdated` fields.\n4. **Identify potential issues** matching the review criteria below.\n5. **Quick-check each issue** before including it:\n - What specific code pattern or change triggers this concern?\n - Is there an obvious guard, handler, or mitigation visible in the immediate context?\n - Can you describe a concrete failure scenario (the `evidence` field)? If you cannot articulate what specific input or state triggers the problem, drop the finding.\n - If the issue is clearly handled, skip it. If you're unsure, include it — the parent will verify.\n6. **Add to your findings list.** Do NOT leave inline comments — you don't have that tool. Return findings in this format:\n\n```\n- file: path/to/file\n line: 42\n severity: HIGH\n title: Brief title\n description: What the issue is and why it matters\n evidence: The specific code pattern and failure scenario\n suggestion: corrected code here (optional — only if you can provide a concrete fix)\n```\n\n**Review every file in your assigned order.** Files reviewed earlier get more attention, which is why different sub-agents use different orderings.\n\n**Check existing threads** — per-file threads are at `/tmp/pr-context/threads/.json` (step 3 above). The full list is at `/tmp/pr-context/review_comments.json`. Do not flag issues that are already under discussion (resolved or unresolved). For outdated threads, only re-flag if the issue still applies to the current diff.\n\n**Return your full findings list** when done, or an empty list if no issues were found.\n\n## Review Criteria\n\nFocus on these categories in priority order:\n\n1. Security vulnerabilities (injection, XSS, auth bypass, secrets exposure)\n2. Logic bugs that could cause runtime failures or incorrect behavior\n3. Data integrity issues (race conditions, missing transactions, corruption risk)\n4. Performance bottlenecks (N+1 queries, memory leaks, blocking operations)\n5. Error handling gaps (unhandled exceptions, missing validation)\n6. Breaking changes to public APIs without migration path\n7. Missing or incorrect test coverage for critical paths\n\n## What NOT to Flag\n\nOnly review the diff — do not flag issues in unchanged code, pre-existing problems not introduced by this PR, or style preferences handled by linters or formatters.\n\n**Common false positives** — these patterns look like issues but usually aren't. Before flagging anything in these categories, confirm the problem is real by reading the surrounding code:\n\n- **Security — input already sanitized:** Don't flag injection or XSS risks when inputs are sanitized upstream, parameterized queries are used, or the framework auto-escapes output.\n- **Null/undefined — guarded elsewhere:** Don't flag potential null dereferences if the value is guaranteed by a type guard, assertion, schema validation, or upstream null check.\n- **Error handling — handled at a different layer:** Don't flag missing try/catch if the caller, middleware, or framework catches and handles the error (e.g., Express error middleware, React error boundaries).\n- **Performance — theoretical, not practical:** Don't flag algorithmic complexity (e.g., O(n^2)) unless N is demonstrably large enough to matter in the actual usage context. \"This could be slow\" without evidence is not actionable.\n- **Validation — exists at another layer:** Don't flag missing input validation if it's handled by an API gateway, middleware, schema validator, or type system.\n- **Test coverage — trivial or generated code:** Don't flag missing tests for trivial getters/setters, auto-generated code, or simple delegation methods.\n- **Style or naming — not in coding guidelines:** Don't flag naming conventions or code style unless they violate the repository's documented coding guidelines (from `generate_agents_md` or CONTRIBUTING docs).\n\n**Existing review threads** — check BEFORE flagging any issue:\n\n- **Resolved with reviewer reply** (e.g. \"This is intentional\") — reviewer's decision is final. Do NOT re-flag.\n- **Resolved without reply** — author likely fixed it. Do NOT re-raise unless the fix introduced a new problem.\n- **Unresolved** — already flagged. Do NOT duplicate.\n- **Outdated** — only re-flag if the issue still applies to the current diff.\n\nWhen in doubt, do not duplicate. Redundant comments erode trust.\n\nFinding no issues is a valid and valuable outcome. An empty findings list is better than findings that waste the author's time or erode trust. Do not manufacture findings to justify your review — if the code is sound, return an empty list.\n\n## Severity Classification\n\nDetermine severity AFTER investigating the issue, not before. First identify the problem and trace through the code, then assign a severity based on the evidence you found.\n\n- 🔴 CRITICAL — Must fix before merge (security vulnerabilities, data corruption, production-breaking bugs)\n- 🟠 HIGH — Should fix before merge (logic errors, missing validation, significant performance issues)\n- 🟡 MEDIUM — Address soon, non-blocking (error handling gaps, suboptimal patterns, missing edge cases)\n- ⚪ LOW — Author discretion (minor improvements, documentation gaps)\n- 💬 NITPICK — Truly optional (stylistic preferences, alternative approaches)\n\n## Review Intensity\n\nThe review intensity is `${{ inputs.intensity || 'balanced' }}`.\n\n- **conservative**: High evidence bar. Only flag when you can demonstrate a concrete failure scenario. If you can construct a reasonable counterargument, do not flag. Approval with zero findings is the expected outcome for most PRs.\n- **balanced**: Standard evidence bar. Flag when you can point to specific code that would fail. If the issue is ambiguous, lean toward not flagging.\n- **aggressive**: Lower evidence bar. Flag when evidence exists even if the failure scenario is not fully confirmed. Improvement suggestions welcome but must cite specific code.\n\n## Calibration Examples\n\nUse these examples to calibrate your judgment. Each pair shows a real issue and a similar-looking pattern that is NOT an issue.\n\n### Example 1: Null/Undefined Access\n\n**True positive — flag this:**\n\n```js\n// PR adds this handler\napp.get('/user/:id', async (req, res) => {\n const user = await db.findUser(req.params.id);\n res.json({ name: user.name, email: user.email });\n});\n```\n\nWhy flag: `db.findUser()` can return `null` when no user matches the ID. Accessing `user.name` will throw a TypeError at runtime. No upstream guard exists — the route handler is the entry point.\n\n**False positive — do NOT flag this:**\n\n```ts\n// PR adds this line inside an existing function\nconst settings = user.getSettings();\n```\n\nWhy skip: Reading the full file reveals `user` is typed as `User` (not `User | null`), and the calling function only runs after `authenticateUser()` middleware which guarantees a valid user object. The null case is handled at a different layer.\n\n### Example 2: SQL Injection\n\n**True positive — flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE customer_id = '{customer_id}'\")\n```\n\nWhy flag: String interpolation in a SQL query with user-controlled input (`customer_id` comes from the request). No parameterization or sanitization anywhere in the call chain.\n\n**False positive — do NOT flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE status = '{OrderStatus.PENDING.value}'\")\n```\n\nWhy skip: The interpolated value is a hardcoded enum constant (`OrderStatus.PENDING`), not user input. There is no injection vector.\n\n### Example 3: Borderline — Do NOT Flag\n\n```go\n// PR adds this function\nfunc processItems(items []Item) []Result {\n results := make([]Result, 0)\n for _, item := range items {\n for _, tag := range item.Tags {\n results = append(results, process(item, tag))\n }\n }\n return results\n}\n```\n\nThis looks like an O(n*m) performance concern. But without evidence that `items` or `Tags` are large in practice, this is speculative. The function processes a bounded dataset (items from a single user request). Do not flag theoretical performance issues without evidence of real-world impact.\nREVIEW_EOF" - env: diff --git a/.github/workflows/gh-aw-mention-in-pr.lock.yml b/.github/workflows/gh-aw-mention-in-pr.lock.yml index 8397c3e9..87bb9f7d 100644 --- a/.github/workflows/gh-aw-mention-in-pr.lock.yml +++ b/.github/workflows/gh-aw-mention-in-pr.lock.yml @@ -44,7 +44,7 @@ # # inlined-imports: true # -# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"93982cd9e65d497aa1db1b9215850ab8773b3d6bcd4c01af401cef6e4059b706"} +# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"ab4817aae891420f448b1b12fc5b6e7def188f00d7b9e3a03869d86e1b9045c5"} name: "Mention in PR" "on": @@ -784,7 +784,7 @@ jobs: GH_TOKEN: ${{ github.token }} PR_NUMBER: ${{ github.event.pull_request.number || inputs.target-pr-number || github.event.issue.number }} name: Fetch PR context to disk - run: "set -euo pipefail\nmkdir -p /tmp/pr-context\n\n# PR metadata\ngh pr view \"$PR_NUMBER\" --json title,body,author,baseRefName,headRefName,headRefOid,url \\\n > /tmp/pr-context/pr.json\n\n# Full diff\nif ! gh pr diff \"$PR_NUMBER\" > /tmp/pr-context/pr.diff; then\n echo \"::warning::Failed to fetch full PR diff; per-file diffs from files.json are still available.\"\n : > /tmp/pr-context/pr.diff\nfi\n\n# Changed files list (--paginate may output concatenated arrays; jq -s 'add' merges them)\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/files.json\n\n# Per-file diffs\njq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do\n filename=$(echo \"$entry\" | jq -r '.filename')\n mkdir -p \"/tmp/pr-context/diffs/$(dirname \"$filename\")\"\n echo \"$entry\" | jq -r '.patch // empty' > \"/tmp/pr-context/diffs/${filename}.diff\"\ndone\n\n# File orderings for sub-agent review (3 strategies)\njq -r '[.[] | .filename] | sort | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_az.txt\njq -r '[.[] | .filename] | sort | reverse | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_za.txt\njq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_largest.txt\n\n# Existing reviews\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/reviews.json\n\n# Review threads with resolution status (GraphQL — REST lacks isResolved/isOutdated)\ngh api graphql --paginate -f query='\n query($owner: String!, $repo: String!, $number: Int!, $endCursor: String) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $number) {\n reviewThreads(first: 100, after: $endCursor) {\n pageInfo { hasNextPage endCursor }\n nodes {\n id\n isResolved\n isOutdated\n isCollapsed\n path\n line\n startLine\n comments(first: 100) {\n nodes {\n id\n databaseId\n body\n author { login }\n createdAt\n }\n }\n }\n }\n }\n }\n }\n' -F owner=\"${GITHUB_REPOSITORY%/*}\" -F repo=\"${GITHUB_REPOSITORY#*/}\" -F \"number=$PR_NUMBER\" \\\n --jq '.data.repository.pullRequest.reviewThreads.nodes' \\\n | jq -s 'add // []' > /tmp/pr-context/review_comments.json\n\n# Filtered review thread views (pre-computed so agents don't need to parse review_comments.json)\njq '[.[] | select(.isResolved == false)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/unresolved_threads.json\njq '[.[] | select(.isResolved == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/resolved_threads.json\njq '[.[] | select(.isOutdated == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/outdated_threads.json\n\n# Per-file review threads (mirrors diffs/ structure)\njq -c '.[]' /tmp/pr-context/review_comments.json | while IFS= read -r thread; do\n filepath=$(echo \"$thread\" | jq -r '.path // empty')\n [ -z \"$filepath\" ] && continue\n mkdir -p \"/tmp/pr-context/threads/$(dirname \"$filepath\")\"\n echo \"$thread\" >> \"/tmp/pr-context/threads/${filepath}.jsonl\"\ndone\n# Convert per-file JSONL to proper JSON arrays\nmkdir -p /tmp/pr-context/threads\nfind /tmp/pr-context/threads -name '*.jsonl' 2>/dev/null | while IFS= read -r jsonl; do\n jq -s '.' \"$jsonl\" > \"${jsonl%.jsonl}.json\"\n rm \"$jsonl\"\ndone\n\n# PR discussion comments\ngh api \"repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/comments.json\n\n# Linked issues\njq -r '.body // \"\"' /tmp/pr-context/pr.json 2>/dev/null \\\n | grep -oiE '(fixes|closes|resolves)\\s+#[0-9]+' \\\n | grep -oE '[0-9]+$' \\\n | sort -u \\\n | while read -r issue; do\n gh api \"repos/$GITHUB_REPOSITORY/issues/$issue\" > \"/tmp/pr-context/issue-${issue}.json\" || true\n done || true\n\n# Write manifest\ncat > /tmp/pr-context/README.md << 'MANIFEST'\n# PR Context\n\nPre-fetched PR data. All files are in `/tmp/pr-context/`.\n\n| File | Description |\n| --- | --- |\n| `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL |\n| `pr.diff` | Full unified diff of all changes |\n| `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` |\n| `diffs/.diff` | Per-file diffs — one file per changed file, mirroring the repo path under `diffs/` |\n| `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line |\n| `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line |\n| `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line |\n| `reviews.json` | Prior review submissions — author, state (APPROVED/CHANGES_REQUESTED/COMMENTED), body |\n| `review_comments.json` | All review threads (GraphQL) — each thread has `id` (node ID for resolving), `isResolved`, `isOutdated`, `path`, `line`, and nested `comments` with `id`, `databaseId` (numeric REST ID for replies), body/author |\n| `unresolved_threads.json` | Unresolved review threads — subset of `review_comments.json` where `isResolved` is false |\n| `resolved_threads.json` | Resolved review threads — subset of `review_comments.json` where `isResolved` is true |\n| `outdated_threads.json` | Outdated review threads — subset of `review_comments.json` where `isOutdated` is true (code changed since comment) |\n| `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` |\n| `comments.json` | PR discussion comments (not inline) |\n| `issue-{N}.json` | Linked issue details (one file per linked issue, if any) |\n| `agents.md` | Repository conventions from `generate_agents_md` (if written by agent) |\n| `review-instructions.md` | Review instructions, criteria, and calibration examples (if written by review-process fragment) |\nMANIFEST\n\necho \"PR context written to /tmp/pr-context/\"\nls -la /tmp/pr-context/" + run: "set -euo pipefail\nmkdir -p /tmp/pr-context\n\n# PR metadata\ngh pr view \"$PR_NUMBER\" --json title,body,author,baseRefName,headRefName,headRefOid,url \\\n > /tmp/pr-context/pr.json\n\n# Full diff\nif ! gh pr diff \"$PR_NUMBER\" > /tmp/pr-context/pr.diff; then\n echo \"::warning::Failed to fetch full PR diff; per-file diffs from files.json are still available.\"\n : > /tmp/pr-context/pr.diff\nfi\n\n# Changed files list (--paginate may output concatenated arrays; jq -s 'add' merges them)\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/files.json\n\n# Per-file diffs\njq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do\n filename=$(echo \"$entry\" | jq -r '.filename')\n mkdir -p \"/tmp/pr-context/diffs/$(dirname \"$filename\")\"\n echo \"$entry\" | jq -r '.patch // empty' > \"/tmp/pr-context/diffs/${filename}.diff\"\ndone\n\n# File orderings for sub-agent review (3 strategies)\njq -r '[.[] | .filename] | sort | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_az.txt\njq -r '[.[] | .filename] | sort | reverse | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_za.txt\njq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_largest.txt\n\n# Determine sub-agent count based on PR size\nFILE_COUNT=$(jq 'length' /tmp/pr-context/files.json)\nif [ \"$FILE_COUNT\" -le 10 ]; then\n AGENT_COUNT=0\nelif [ \"$FILE_COUNT\" -le 20 ]; then\n AGENT_COUNT=2\nelse\n AGENT_COUNT=3\nfi\necho \"$AGENT_COUNT\" > /tmp/pr-context/agent_count.txt\necho \"PR size: ${FILE_COUNT} files → ${AGENT_COUNT} sub-agents\"\n\n# Write review strategy with precise instructions for the agent\necho \"# Review Strategy\" > /tmp/pr-context/review-strategy.md\necho \"\" >> /tmp/pr-context/review-strategy.md\necho \"**PR size:** ${FILE_COUNT} files | **Sub-agents:** ${AGENT_COUNT}\" >> /tmp/pr-context/review-strategy.md\necho \"\" >> /tmp/pr-context/review-strategy.md\n\nif [ \"$AGENT_COUNT\" -eq 0 ]; then\n cat >> /tmp/pr-context/review-strategy.md << 'STRATEGY_DIRECT'\n## Direct Review (no sub-agents)\n\nThis PR is small enough to review directly. Do NOT spawn sub-agents.\n\nReview the diff file by file using the ordering in `/tmp/pr-context/file_order_az.txt`. For each changed file:\n\n1. Read the diff from `/tmp/pr-context/diffs/.diff`\n2. Read the full file from the workspace for context\n3. Check existing threads in `/tmp/pr-context/threads/.json` (if it exists)\n4. Identify issues matching the Code Review Reference criteria\n5. Verify each issue: construct a concrete failure scenario, challenge the finding, check for existing threads\n\nProceed to the Verify and Comment step with your findings.\nSTRATEGY_DIRECT\nelif [ \"$AGENT_COUNT\" -eq 2 ]; then\n cat >> /tmp/pr-context/review-strategy.md << 'STRATEGY_TWO'\n## Sub-agent Review (2 agents)\n\nSpawn exactly 2 `code-review` sub-agents in parallel:\n\n- **Agent 1**: file ordering from `/tmp/pr-context/file_order_az.txt` (A→Z)\n- **Agent 2**: file ordering from `/tmp/pr-context/file_order_za.txt` (Z→A)\n\nEach sub-agent prompt must include:\n- Instruction to read `/tmp/pr-context/review-instructions.md` for the review process, criteria, and calibration examples\n- Instruction to read `/tmp/pr-context/README.md` for a manifest of all available context files\n- The review intensity and minimum severity settings from the workflow\n- The path to that sub-agent's file ordering — tell it to read the file for its ordered list (per-file diffs are at `/tmp/pr-context/diffs/.diff`)\n- Instruction to read changed files from the workspace (the PR branch is checked out)\n\nEach sub-agent returns a structured findings list. They do NOT leave inline comments.\n\nAfter both sub-agents complete, merge and deduplicate findings per the Pick Three, Keep Many process before proceeding to the Verify and Comment step.\nSTRATEGY_TWO\nelse\n cat >> /tmp/pr-context/review-strategy.md << 'STRATEGY_THREE'\n## Sub-agent Review (3 agents)\n\nSpawn exactly 3 `code-review` sub-agents in parallel:\n\n- **Agent 1**: file ordering from `/tmp/pr-context/file_order_az.txt` (A→Z)\n- **Agent 2**: file ordering from `/tmp/pr-context/file_order_za.txt` (Z→A)\n- **Agent 3**: file ordering from `/tmp/pr-context/file_order_largest.txt` (largest diff first)\n\nEach sub-agent prompt must include:\n- Instruction to read `/tmp/pr-context/review-instructions.md` for the review process, criteria, and calibration examples\n- Instruction to read `/tmp/pr-context/README.md` for a manifest of all available context files\n- The review intensity and minimum severity settings from the workflow\n- The path to that sub-agent's file ordering — tell it to read the file for its ordered list (per-file diffs are at `/tmp/pr-context/diffs/.diff`)\n- Instruction to read changed files from the workspace (the PR branch is checked out)\n\nEach sub-agent returns a structured findings list. They do NOT leave inline comments.\n\nAfter all 3 sub-agents complete, merge and deduplicate findings per the Pick Three, Keep Many process before proceeding to the Verify and Comment step.\nSTRATEGY_THREE\nfi\n\n# Existing reviews\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/reviews.json\n\n# Review threads with resolution status (GraphQL — REST lacks isResolved/isOutdated)\ngh api graphql --paginate -f query='\n query($owner: String!, $repo: String!, $number: Int!, $endCursor: String) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $number) {\n reviewThreads(first: 100, after: $endCursor) {\n pageInfo { hasNextPage endCursor }\n nodes {\n id\n isResolved\n isOutdated\n isCollapsed\n path\n line\n startLine\n comments(first: 100) {\n nodes {\n id\n databaseId\n body\n author { login }\n createdAt\n }\n }\n }\n }\n }\n }\n }\n' -F owner=\"${GITHUB_REPOSITORY%/*}\" -F repo=\"${GITHUB_REPOSITORY#*/}\" -F \"number=$PR_NUMBER\" \\\n --jq '.data.repository.pullRequest.reviewThreads.nodes' \\\n | jq -s 'add // []' > /tmp/pr-context/review_comments.json\n\n# Filtered review thread views (pre-computed so agents don't need to parse review_comments.json)\njq '[.[] | select(.isResolved == false)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/unresolved_threads.json\njq '[.[] | select(.isResolved == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/resolved_threads.json\njq '[.[] | select(.isOutdated == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/outdated_threads.json\n\n# Per-file review threads (mirrors diffs/ structure)\njq -c '.[]' /tmp/pr-context/review_comments.json | while IFS= read -r thread; do\n filepath=$(echo \"$thread\" | jq -r '.path // empty')\n [ -z \"$filepath\" ] && continue\n mkdir -p \"/tmp/pr-context/threads/$(dirname \"$filepath\")\"\n echo \"$thread\" >> \"/tmp/pr-context/threads/${filepath}.jsonl\"\ndone\n# Convert per-file JSONL to proper JSON arrays\nmkdir -p /tmp/pr-context/threads\nfind /tmp/pr-context/threads -name '*.jsonl' 2>/dev/null | while IFS= read -r jsonl; do\n jq -s '.' \"$jsonl\" > \"${jsonl%.jsonl}.json\"\n rm \"$jsonl\"\ndone\n\n# PR discussion comments\ngh api \"repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/comments.json\n\n# Linked issues\njq -r '.body // \"\"' /tmp/pr-context/pr.json 2>/dev/null \\\n | grep -oiE '(fixes|closes|resolves)\\s+#[0-9]+' \\\n | grep -oE '[0-9]+$' \\\n | sort -u \\\n | while read -r issue; do\n gh api \"repos/$GITHUB_REPOSITORY/issues/$issue\" > \"/tmp/pr-context/issue-${issue}.json\" || true\n done || true\n\n# Write manifest\ncat > /tmp/pr-context/README.md << 'MANIFEST'\n# PR Context\n\nPre-fetched PR data. All files are in `/tmp/pr-context/`.\n\n| File | Description |\n| --- | --- |\n| `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL |\n| `pr.diff` | Full unified diff of all changes |\n| `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` |\n| `diffs/.diff` | Per-file diffs — one file per changed file, mirroring the repo path under `diffs/` |\n| `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line |\n| `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line |\n| `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line |\n| `reviews.json` | Prior review submissions — author, state (APPROVED/CHANGES_REQUESTED/COMMENTED), body |\n| `review_comments.json` | All review threads (GraphQL) — each thread has `id` (node ID for resolving), `isResolved`, `isOutdated`, `path`, `line`, and nested `comments` with `id`, `databaseId` (numeric REST ID for replies), body/author |\n| `unresolved_threads.json` | Unresolved review threads — subset of `review_comments.json` where `isResolved` is false |\n| `resolved_threads.json` | Resolved review threads — subset of `review_comments.json` where `isResolved` is true |\n| `outdated_threads.json` | Outdated review threads — subset of `review_comments.json` where `isOutdated` is true (code changed since comment) |\n| `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` |\n| `comments.json` | PR discussion comments (not inline) |\n| `issue-{N}.json` | Linked issue details (one file per linked issue, if any) |\n| `agent_count.txt` | Pre-computed sub-agent count: `0` (≤10 files, direct review), `2` (11–20 files), or `3` (>20 files) |\n| `review-strategy.md` | Pre-computed review strategy with precise instructions for the agent based on PR size |\n| `agents.md` | Repository conventions from `generate_agents_md` (if written by agent) |\n| `review-instructions.md` | Review instructions, criteria, and calibration examples (if written by review-process fragment) |\nMANIFEST\n\necho \"PR context written to /tmp/pr-context/\"\nls -la /tmp/pr-context/" - name: Write review instructions to disk run: "mkdir -p /tmp/pr-context\ncat > /tmp/pr-context/review-instructions.md << 'REVIEW_EOF'\n# Review Instructions for Sub-agents\n\nYou are a code review sub-agent. Read these instructions, then review the PR files in the order provided in your prompt.\n\n## Context\n\nBefore reviewing files, read these to understand the PR:\n\n1. `/tmp/pr-context/pr.json` — PR title, description, author, and branches. Understand what the PR is trying to accomplish.\n2. `/tmp/pr-context/agents.md` — Repository coding conventions and guidelines (if it exists).\n3. `/tmp/pr-context/review_comments.json` — Existing review threads. Note which files already have threads so you don't duplicate.\n4. `/tmp/pr-context/issue-*.json` — Linked issue details (if any). Understand the motivation and acceptance criteria.\n\n## Process\n\nReview the PR diff file by file in your assigned order. For each changed file:\n\n1. **Read the diff** for this file from `/tmp/pr-context/diffs/.diff` to understand what changed. If the diff is empty or truncated (e.g., binary files or very large changes), fall back to reading the full file from the workspace and comparing against context.\n2. **Read the full file from the workspace.** The PR branch is checked out locally — open the file directly to get complete contents with line numbers.\n3. **Check existing threads** for this file from `/tmp/pr-context/threads/.json` (if it exists). Skip issues that are already under discussion — each thread has `isResolved` and `isOutdated` fields.\n4. **Identify potential issues** matching the review criteria below.\n5. **Quick-check each issue** before including it:\n - What specific code pattern or change triggers this concern?\n - Is there an obvious guard, handler, or mitigation visible in the immediate context?\n - Can you describe a concrete failure scenario (the `evidence` field)? If you cannot articulate what specific input or state triggers the problem, drop the finding.\n - If the issue is clearly handled, skip it. If you're unsure, include it — the parent will verify.\n6. **Add to your findings list.** Do NOT leave inline comments — you don't have that tool. Return findings in this format:\n\n```\n- file: path/to/file\n line: 42\n severity: HIGH\n title: Brief title\n description: What the issue is and why it matters\n evidence: The specific code pattern and failure scenario\n suggestion: corrected code here (optional — only if you can provide a concrete fix)\n```\n\n**Review every file in your assigned order.** Files reviewed earlier get more attention, which is why different sub-agents use different orderings.\n\n**Check existing threads** — per-file threads are at `/tmp/pr-context/threads/.json` (step 3 above). The full list is at `/tmp/pr-context/review_comments.json`. Do not flag issues that are already under discussion (resolved or unresolved). For outdated threads, only re-flag if the issue still applies to the current diff.\n\n**Return your full findings list** when done, or an empty list if no issues were found.\n\n## Review Criteria\n\nFocus on these categories in priority order:\n\n1. Security vulnerabilities (injection, XSS, auth bypass, secrets exposure)\n2. Logic bugs that could cause runtime failures or incorrect behavior\n3. Data integrity issues (race conditions, missing transactions, corruption risk)\n4. Performance bottlenecks (N+1 queries, memory leaks, blocking operations)\n5. Error handling gaps (unhandled exceptions, missing validation)\n6. Breaking changes to public APIs without migration path\n7. Missing or incorrect test coverage for critical paths\n\n## What NOT to Flag\n\nOnly review the diff — do not flag issues in unchanged code, pre-existing problems not introduced by this PR, or style preferences handled by linters or formatters.\n\n**Common false positives** — these patterns look like issues but usually aren't. Before flagging anything in these categories, confirm the problem is real by reading the surrounding code:\n\n- **Security — input already sanitized:** Don't flag injection or XSS risks when inputs are sanitized upstream, parameterized queries are used, or the framework auto-escapes output.\n- **Null/undefined — guarded elsewhere:** Don't flag potential null dereferences if the value is guaranteed by a type guard, assertion, schema validation, or upstream null check.\n- **Error handling — handled at a different layer:** Don't flag missing try/catch if the caller, middleware, or framework catches and handles the error (e.g., Express error middleware, React error boundaries).\n- **Performance — theoretical, not practical:** Don't flag algorithmic complexity (e.g., O(n^2)) unless N is demonstrably large enough to matter in the actual usage context. \"This could be slow\" without evidence is not actionable.\n- **Validation — exists at another layer:** Don't flag missing input validation if it's handled by an API gateway, middleware, schema validator, or type system.\n- **Test coverage — trivial or generated code:** Don't flag missing tests for trivial getters/setters, auto-generated code, or simple delegation methods.\n- **Style or naming — not in coding guidelines:** Don't flag naming conventions or code style unless they violate the repository's documented coding guidelines (from `generate_agents_md` or CONTRIBUTING docs).\n\n**Existing review threads** — check BEFORE flagging any issue:\n\n- **Resolved with reviewer reply** (e.g. \"This is intentional\") — reviewer's decision is final. Do NOT re-flag.\n- **Resolved without reply** — author likely fixed it. Do NOT re-raise unless the fix introduced a new problem.\n- **Unresolved** — already flagged. Do NOT duplicate.\n- **Outdated** — only re-flag if the issue still applies to the current diff.\n\nWhen in doubt, do not duplicate. Redundant comments erode trust.\n\nFinding no issues is a valid and valuable outcome. An empty findings list is better than findings that waste the author's time or erode trust. Do not manufacture findings to justify your review — if the code is sound, return an empty list.\n\n## Severity Classification\n\nDetermine severity AFTER investigating the issue, not before. First identify the problem and trace through the code, then assign a severity based on the evidence you found.\n\n- 🔴 CRITICAL — Must fix before merge (security vulnerabilities, data corruption, production-breaking bugs)\n- 🟠 HIGH — Should fix before merge (logic errors, missing validation, significant performance issues)\n- 🟡 MEDIUM — Address soon, non-blocking (error handling gaps, suboptimal patterns, missing edge cases)\n- ⚪ LOW — Author discretion (minor improvements, documentation gaps)\n- 💬 NITPICK — Truly optional (stylistic preferences, alternative approaches)\n\n## Review Intensity\n\nThe review intensity is `${{ inputs.intensity || 'balanced' }}`.\n\n- **conservative**: High evidence bar. Only flag when you can demonstrate a concrete failure scenario. If you can construct a reasonable counterargument, do not flag. Approval with zero findings is the expected outcome for most PRs.\n- **balanced**: Standard evidence bar. Flag when you can point to specific code that would fail. If the issue is ambiguous, lean toward not flagging.\n- **aggressive**: Lower evidence bar. Flag when evidence exists even if the failure scenario is not fully confirmed. Improvement suggestions welcome but must cite specific code.\n\n## Calibration Examples\n\nUse these examples to calibrate your judgment. Each pair shows a real issue and a similar-looking pattern that is NOT an issue.\n\n### Example 1: Null/Undefined Access\n\n**True positive — flag this:**\n\n```js\n// PR adds this handler\napp.get('/user/:id', async (req, res) => {\n const user = await db.findUser(req.params.id);\n res.json({ name: user.name, email: user.email });\n});\n```\n\nWhy flag: `db.findUser()` can return `null` when no user matches the ID. Accessing `user.name` will throw a TypeError at runtime. No upstream guard exists — the route handler is the entry point.\n\n**False positive — do NOT flag this:**\n\n```ts\n// PR adds this line inside an existing function\nconst settings = user.getSettings();\n```\n\nWhy skip: Reading the full file reveals `user` is typed as `User` (not `User | null`), and the calling function only runs after `authenticateUser()` middleware which guarantees a valid user object. The null case is handled at a different layer.\n\n### Example 2: SQL Injection\n\n**True positive — flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE customer_id = '{customer_id}'\")\n```\n\nWhy flag: String interpolation in a SQL query with user-controlled input (`customer_id` comes from the request). No parameterization or sanitization anywhere in the call chain.\n\n**False positive — do NOT flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE status = '{OrderStatus.PENDING.value}'\")\n```\n\nWhy skip: The interpolated value is a hardcoded enum constant (`OrderStatus.PENDING`), not user input. There is no injection vector.\n\n### Example 3: Borderline — Do NOT Flag\n\n```go\n// PR adds this function\nfunc processItems(items []Item) []Result {\n results := make([]Result, 0)\n for _, item := range items {\n for _, tag := range item.Tags {\n results = append(results, process(item, tag))\n }\n }\n return results\n}\n```\n\nThis looks like an O(n*m) performance concern. But without evidence that `items` or `Tags` are large in practice, this is speculative. The function processes a bounded dataset (items from a single user request). Do not flag theoretical performance issues without evidence of real-world impact.\nREVIEW_EOF" - env: diff --git a/.github/workflows/gh-aw-pr-review-addresser.lock.yml b/.github/workflows/gh-aw-pr-review-addresser.lock.yml index 5a23878b..636c8733 100644 --- a/.github/workflows/gh-aw-pr-review-addresser.lock.yml +++ b/.github/workflows/gh-aw-pr-review-addresser.lock.yml @@ -40,7 +40,7 @@ # # inlined-imports: true # -# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"e14207a7410bee2f30a37e655d590e23de4f5a211b2f562a06622289c3b028b2"} +# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"d9fa0709f0a70a945e449233acf9c972cd8d26929e2945756237ef124d8a7838"} name: "PR Review Addresser" "on": @@ -599,7 +599,7 @@ jobs: GH_TOKEN: ${{ github.token }} PR_NUMBER: ${{ github.event.pull_request.number || inputs.target-pr-number || github.event.issue.number }} name: Fetch PR context to disk - run: "set -euo pipefail\nmkdir -p /tmp/pr-context\n\n# PR metadata\ngh pr view \"$PR_NUMBER\" --json title,body,author,baseRefName,headRefName,headRefOid,url \\\n > /tmp/pr-context/pr.json\n\n# Full diff\nif ! gh pr diff \"$PR_NUMBER\" > /tmp/pr-context/pr.diff; then\n echo \"::warning::Failed to fetch full PR diff; per-file diffs from files.json are still available.\"\n : > /tmp/pr-context/pr.diff\nfi\n\n# Changed files list (--paginate may output concatenated arrays; jq -s 'add' merges them)\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/files.json\n\n# Per-file diffs\njq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do\n filename=$(echo \"$entry\" | jq -r '.filename')\n mkdir -p \"/tmp/pr-context/diffs/$(dirname \"$filename\")\"\n echo \"$entry\" | jq -r '.patch // empty' > \"/tmp/pr-context/diffs/${filename}.diff\"\ndone\n\n# File orderings for sub-agent review (3 strategies)\njq -r '[.[] | .filename] | sort | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_az.txt\njq -r '[.[] | .filename] | sort | reverse | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_za.txt\njq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_largest.txt\n\n# Existing reviews\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/reviews.json\n\n# Review threads with resolution status (GraphQL — REST lacks isResolved/isOutdated)\ngh api graphql --paginate -f query='\n query($owner: String!, $repo: String!, $number: Int!, $endCursor: String) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $number) {\n reviewThreads(first: 100, after: $endCursor) {\n pageInfo { hasNextPage endCursor }\n nodes {\n id\n isResolved\n isOutdated\n isCollapsed\n path\n line\n startLine\n comments(first: 100) {\n nodes {\n id\n databaseId\n body\n author { login }\n createdAt\n }\n }\n }\n }\n }\n }\n }\n' -F owner=\"${GITHUB_REPOSITORY%/*}\" -F repo=\"${GITHUB_REPOSITORY#*/}\" -F \"number=$PR_NUMBER\" \\\n --jq '.data.repository.pullRequest.reviewThreads.nodes' \\\n | jq -s 'add // []' > /tmp/pr-context/review_comments.json\n\n# Filtered review thread views (pre-computed so agents don't need to parse review_comments.json)\njq '[.[] | select(.isResolved == false)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/unresolved_threads.json\njq '[.[] | select(.isResolved == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/resolved_threads.json\njq '[.[] | select(.isOutdated == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/outdated_threads.json\n\n# Per-file review threads (mirrors diffs/ structure)\njq -c '.[]' /tmp/pr-context/review_comments.json | while IFS= read -r thread; do\n filepath=$(echo \"$thread\" | jq -r '.path // empty')\n [ -z \"$filepath\" ] && continue\n mkdir -p \"/tmp/pr-context/threads/$(dirname \"$filepath\")\"\n echo \"$thread\" >> \"/tmp/pr-context/threads/${filepath}.jsonl\"\ndone\n# Convert per-file JSONL to proper JSON arrays\nmkdir -p /tmp/pr-context/threads\nfind /tmp/pr-context/threads -name '*.jsonl' 2>/dev/null | while IFS= read -r jsonl; do\n jq -s '.' \"$jsonl\" > \"${jsonl%.jsonl}.json\"\n rm \"$jsonl\"\ndone\n\n# PR discussion comments\ngh api \"repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/comments.json\n\n# Linked issues\njq -r '.body // \"\"' /tmp/pr-context/pr.json 2>/dev/null \\\n | grep -oiE '(fixes|closes|resolves)\\s+#[0-9]+' \\\n | grep -oE '[0-9]+$' \\\n | sort -u \\\n | while read -r issue; do\n gh api \"repos/$GITHUB_REPOSITORY/issues/$issue\" > \"/tmp/pr-context/issue-${issue}.json\" || true\n done || true\n\n# Write manifest\ncat > /tmp/pr-context/README.md << 'MANIFEST'\n# PR Context\n\nPre-fetched PR data. All files are in `/tmp/pr-context/`.\n\n| File | Description |\n| --- | --- |\n| `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL |\n| `pr.diff` | Full unified diff of all changes |\n| `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` |\n| `diffs/.diff` | Per-file diffs — one file per changed file, mirroring the repo path under `diffs/` |\n| `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line |\n| `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line |\n| `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line |\n| `reviews.json` | Prior review submissions — author, state (APPROVED/CHANGES_REQUESTED/COMMENTED), body |\n| `review_comments.json` | All review threads (GraphQL) — each thread has `id` (node ID for resolving), `isResolved`, `isOutdated`, `path`, `line`, and nested `comments` with `id`, `databaseId` (numeric REST ID for replies), body/author |\n| `unresolved_threads.json` | Unresolved review threads — subset of `review_comments.json` where `isResolved` is false |\n| `resolved_threads.json` | Resolved review threads — subset of `review_comments.json` where `isResolved` is true |\n| `outdated_threads.json` | Outdated review threads — subset of `review_comments.json` where `isOutdated` is true (code changed since comment) |\n| `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` |\n| `comments.json` | PR discussion comments (not inline) |\n| `issue-{N}.json` | Linked issue details (one file per linked issue, if any) |\n| `agents.md` | Repository conventions from `generate_agents_md` (if written by agent) |\n| `review-instructions.md` | Review instructions, criteria, and calibration examples (if written by review-process fragment) |\nMANIFEST\n\necho \"PR context written to /tmp/pr-context/\"\nls -la /tmp/pr-context/" + run: "set -euo pipefail\nmkdir -p /tmp/pr-context\n\n# PR metadata\ngh pr view \"$PR_NUMBER\" --json title,body,author,baseRefName,headRefName,headRefOid,url \\\n > /tmp/pr-context/pr.json\n\n# Full diff\nif ! gh pr diff \"$PR_NUMBER\" > /tmp/pr-context/pr.diff; then\n echo \"::warning::Failed to fetch full PR diff; per-file diffs from files.json are still available.\"\n : > /tmp/pr-context/pr.diff\nfi\n\n# Changed files list (--paginate may output concatenated arrays; jq -s 'add' merges them)\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/files.json\n\n# Per-file diffs\njq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do\n filename=$(echo \"$entry\" | jq -r '.filename')\n mkdir -p \"/tmp/pr-context/diffs/$(dirname \"$filename\")\"\n echo \"$entry\" | jq -r '.patch // empty' > \"/tmp/pr-context/diffs/${filename}.diff\"\ndone\n\n# File orderings for sub-agent review (3 strategies)\njq -r '[.[] | .filename] | sort | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_az.txt\njq -r '[.[] | .filename] | sort | reverse | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_za.txt\njq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_largest.txt\n\n# Determine sub-agent count based on PR size\nFILE_COUNT=$(jq 'length' /tmp/pr-context/files.json)\nif [ \"$FILE_COUNT\" -le 10 ]; then\n AGENT_COUNT=0\nelif [ \"$FILE_COUNT\" -le 20 ]; then\n AGENT_COUNT=2\nelse\n AGENT_COUNT=3\nfi\necho \"$AGENT_COUNT\" > /tmp/pr-context/agent_count.txt\necho \"PR size: ${FILE_COUNT} files → ${AGENT_COUNT} sub-agents\"\n\n# Write review strategy with precise instructions for the agent\necho \"# Review Strategy\" > /tmp/pr-context/review-strategy.md\necho \"\" >> /tmp/pr-context/review-strategy.md\necho \"**PR size:** ${FILE_COUNT} files | **Sub-agents:** ${AGENT_COUNT}\" >> /tmp/pr-context/review-strategy.md\necho \"\" >> /tmp/pr-context/review-strategy.md\n\nif [ \"$AGENT_COUNT\" -eq 0 ]; then\n cat >> /tmp/pr-context/review-strategy.md << 'STRATEGY_DIRECT'\n## Direct Review (no sub-agents)\n\nThis PR is small enough to review directly. Do NOT spawn sub-agents.\n\nReview the diff file by file using the ordering in `/tmp/pr-context/file_order_az.txt`. For each changed file:\n\n1. Read the diff from `/tmp/pr-context/diffs/.diff`\n2. Read the full file from the workspace for context\n3. Check existing threads in `/tmp/pr-context/threads/.json` (if it exists)\n4. Identify issues matching the Code Review Reference criteria\n5. Verify each issue: construct a concrete failure scenario, challenge the finding, check for existing threads\n\nProceed to the Verify and Comment step with your findings.\nSTRATEGY_DIRECT\nelif [ \"$AGENT_COUNT\" -eq 2 ]; then\n cat >> /tmp/pr-context/review-strategy.md << 'STRATEGY_TWO'\n## Sub-agent Review (2 agents)\n\nSpawn exactly 2 `code-review` sub-agents in parallel:\n\n- **Agent 1**: file ordering from `/tmp/pr-context/file_order_az.txt` (A→Z)\n- **Agent 2**: file ordering from `/tmp/pr-context/file_order_za.txt` (Z→A)\n\nEach sub-agent prompt must include:\n- Instruction to read `/tmp/pr-context/review-instructions.md` for the review process, criteria, and calibration examples\n- Instruction to read `/tmp/pr-context/README.md` for a manifest of all available context files\n- The review intensity and minimum severity settings from the workflow\n- The path to that sub-agent's file ordering — tell it to read the file for its ordered list (per-file diffs are at `/tmp/pr-context/diffs/.diff`)\n- Instruction to read changed files from the workspace (the PR branch is checked out)\n\nEach sub-agent returns a structured findings list. They do NOT leave inline comments.\n\nAfter both sub-agents complete, merge and deduplicate findings per the Pick Three, Keep Many process before proceeding to the Verify and Comment step.\nSTRATEGY_TWO\nelse\n cat >> /tmp/pr-context/review-strategy.md << 'STRATEGY_THREE'\n## Sub-agent Review (3 agents)\n\nSpawn exactly 3 `code-review` sub-agents in parallel:\n\n- **Agent 1**: file ordering from `/tmp/pr-context/file_order_az.txt` (A→Z)\n- **Agent 2**: file ordering from `/tmp/pr-context/file_order_za.txt` (Z→A)\n- **Agent 3**: file ordering from `/tmp/pr-context/file_order_largest.txt` (largest diff first)\n\nEach sub-agent prompt must include:\n- Instruction to read `/tmp/pr-context/review-instructions.md` for the review process, criteria, and calibration examples\n- Instruction to read `/tmp/pr-context/README.md` for a manifest of all available context files\n- The review intensity and minimum severity settings from the workflow\n- The path to that sub-agent's file ordering — tell it to read the file for its ordered list (per-file diffs are at `/tmp/pr-context/diffs/.diff`)\n- Instruction to read changed files from the workspace (the PR branch is checked out)\n\nEach sub-agent returns a structured findings list. They do NOT leave inline comments.\n\nAfter all 3 sub-agents complete, merge and deduplicate findings per the Pick Three, Keep Many process before proceeding to the Verify and Comment step.\nSTRATEGY_THREE\nfi\n\n# Existing reviews\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/reviews.json\n\n# Review threads with resolution status (GraphQL — REST lacks isResolved/isOutdated)\ngh api graphql --paginate -f query='\n query($owner: String!, $repo: String!, $number: Int!, $endCursor: String) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $number) {\n reviewThreads(first: 100, after: $endCursor) {\n pageInfo { hasNextPage endCursor }\n nodes {\n id\n isResolved\n isOutdated\n isCollapsed\n path\n line\n startLine\n comments(first: 100) {\n nodes {\n id\n databaseId\n body\n author { login }\n createdAt\n }\n }\n }\n }\n }\n }\n }\n' -F owner=\"${GITHUB_REPOSITORY%/*}\" -F repo=\"${GITHUB_REPOSITORY#*/}\" -F \"number=$PR_NUMBER\" \\\n --jq '.data.repository.pullRequest.reviewThreads.nodes' \\\n | jq -s 'add // []' > /tmp/pr-context/review_comments.json\n\n# Filtered review thread views (pre-computed so agents don't need to parse review_comments.json)\njq '[.[] | select(.isResolved == false)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/unresolved_threads.json\njq '[.[] | select(.isResolved == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/resolved_threads.json\njq '[.[] | select(.isOutdated == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/outdated_threads.json\n\n# Per-file review threads (mirrors diffs/ structure)\njq -c '.[]' /tmp/pr-context/review_comments.json | while IFS= read -r thread; do\n filepath=$(echo \"$thread\" | jq -r '.path // empty')\n [ -z \"$filepath\" ] && continue\n mkdir -p \"/tmp/pr-context/threads/$(dirname \"$filepath\")\"\n echo \"$thread\" >> \"/tmp/pr-context/threads/${filepath}.jsonl\"\ndone\n# Convert per-file JSONL to proper JSON arrays\nmkdir -p /tmp/pr-context/threads\nfind /tmp/pr-context/threads -name '*.jsonl' 2>/dev/null | while IFS= read -r jsonl; do\n jq -s '.' \"$jsonl\" > \"${jsonl%.jsonl}.json\"\n rm \"$jsonl\"\ndone\n\n# PR discussion comments\ngh api \"repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/comments.json\n\n# Linked issues\njq -r '.body // \"\"' /tmp/pr-context/pr.json 2>/dev/null \\\n | grep -oiE '(fixes|closes|resolves)\\s+#[0-9]+' \\\n | grep -oE '[0-9]+$' \\\n | sort -u \\\n | while read -r issue; do\n gh api \"repos/$GITHUB_REPOSITORY/issues/$issue\" > \"/tmp/pr-context/issue-${issue}.json\" || true\n done || true\n\n# Write manifest\ncat > /tmp/pr-context/README.md << 'MANIFEST'\n# PR Context\n\nPre-fetched PR data. All files are in `/tmp/pr-context/`.\n\n| File | Description |\n| --- | --- |\n| `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL |\n| `pr.diff` | Full unified diff of all changes |\n| `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` |\n| `diffs/.diff` | Per-file diffs — one file per changed file, mirroring the repo path under `diffs/` |\n| `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line |\n| `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line |\n| `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line |\n| `reviews.json` | Prior review submissions — author, state (APPROVED/CHANGES_REQUESTED/COMMENTED), body |\n| `review_comments.json` | All review threads (GraphQL) — each thread has `id` (node ID for resolving), `isResolved`, `isOutdated`, `path`, `line`, and nested `comments` with `id`, `databaseId` (numeric REST ID for replies), body/author |\n| `unresolved_threads.json` | Unresolved review threads — subset of `review_comments.json` where `isResolved` is false |\n| `resolved_threads.json` | Resolved review threads — subset of `review_comments.json` where `isResolved` is true |\n| `outdated_threads.json` | Outdated review threads — subset of `review_comments.json` where `isOutdated` is true (code changed since comment) |\n| `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` |\n| `comments.json` | PR discussion comments (not inline) |\n| `issue-{N}.json` | Linked issue details (one file per linked issue, if any) |\n| `agent_count.txt` | Pre-computed sub-agent count: `0` (≤10 files, direct review), `2` (11–20 files), or `3` (>20 files) |\n| `review-strategy.md` | Pre-computed review strategy with precise instructions for the agent based on PR size |\n| `agents.md` | Repository conventions from `generate_agents_md` (if written by agent) |\n| `review-instructions.md` | Review instructions, criteria, and calibration examples (if written by review-process fragment) |\nMANIFEST\n\necho \"PR context written to /tmp/pr-context/\"\nls -la /tmp/pr-context/" - env: GITHUB_TOKEN: ${{ github.token }} REPO_NAME: ${{ github.repository }} diff --git a/.github/workflows/gh-aw-pr-review.lock.yml b/.github/workflows/gh-aw-pr-review.lock.yml index be4ad4d8..dfd96ac1 100644 --- a/.github/workflows/gh-aw-pr-review.lock.yml +++ b/.github/workflows/gh-aw-pr-review.lock.yml @@ -40,7 +40,7 @@ # # inlined-imports: true # -# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"f7adb4454527b9d061ef790aece9abe299347d6fc5caa2d0d62cb5797e7392e9"} +# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"fea51dad6ede0be930d94b3797849c328d1b4d2f633b1afc94114731eb857123"} name: "PR Review" "on": @@ -480,25 +480,13 @@ jobs: 4. Read `/tmp/pr-context/reviews.json` to check prior review submissions from this bot. Note any prior verdicts to avoid redundant reviews. 5. Read `/tmp/pr-context/review_comments.json` to check existing review threads. Note which files already have threads and whether they are resolved, unresolved, or outdated. - ### Step 2: Sub-agent Review + ### Step 2: Review - **File orderings are pre-computed** at `/tmp/pr-context/`: - - **Agent 1**: `/tmp/pr-context/file_order_az.txt` — alphabetical (A → Z) - - **Agent 2**: `/tmp/pr-context/file_order_za.txt` — reverse alphabetical (Z → A) - - **Agent 3**: `/tmp/pr-context/file_order_largest.txt` — by diff size descending - - **Spawn sub-agents:** Follow the **Pick Three, Keep Many** process — spawn 3 `code-review` sub-agents to review the PR diff in parallel. Each sub-agent prompt must include: - - Instruction to read `/tmp/pr-context/review-instructions.md` for the review process, criteria, and calibration examples - - Instruction to read `/tmp/pr-context/README.md` for a manifest of all available context files - - The review intensity (`__GH_AW_INPUTS_INTENSITY__`) and minimum severity (`__GH_AW_INPUTS_MINIMUM_SEVERITY__`) - - The path to that sub-agent's file ordering (e.g., `/tmp/pr-context/file_order_az.txt`) — tell it to read the file for its ordered list (per-file diffs are at `/tmp/pr-context/diffs/.diff`) - - Instruction to read changed files from the workspace (the PR branch is checked out) - - Each sub-agent returns a structured findings list. They do NOT leave inline comments. + Read `/tmp/pr-context/review-strategy.md` for the pre-computed review strategy. The strategy is determined by PR size and specifies the exact number of sub-agents (0, 2, or 3). Follow the instructions in that file exactly. ### Step 3: Verify and Comment - After merging and deduplicating sub-agent findings per the Pick Three, Keep Many process, verify each finding before leaving a comment. For every finding: + If sub-agents were used, merge and deduplicate findings per the Pick Three, Keep Many process. Verify each finding before leaving a comment. For every finding: 1. **Read the file and surrounding context** — open the full file, not just the diff. Understand the broader code. 2. **Construct a concrete failure scenario** — what specific input or state causes the bug? If you cannot describe one, drop the finding. @@ -710,7 +698,7 @@ jobs: GH_TOKEN: ${{ github.token }} PR_NUMBER: ${{ github.event.pull_request.number || inputs.target-pr-number || github.event.issue.number }} name: Fetch PR context to disk - run: "set -euo pipefail\nmkdir -p /tmp/pr-context\n\n# PR metadata\ngh pr view \"$PR_NUMBER\" --json title,body,author,baseRefName,headRefName,headRefOid,url \\\n > /tmp/pr-context/pr.json\n\n# Full diff\nif ! gh pr diff \"$PR_NUMBER\" > /tmp/pr-context/pr.diff; then\n echo \"::warning::Failed to fetch full PR diff; per-file diffs from files.json are still available.\"\n : > /tmp/pr-context/pr.diff\nfi\n\n# Changed files list (--paginate may output concatenated arrays; jq -s 'add' merges them)\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/files.json\n\n# Per-file diffs\njq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do\n filename=$(echo \"$entry\" | jq -r '.filename')\n mkdir -p \"/tmp/pr-context/diffs/$(dirname \"$filename\")\"\n echo \"$entry\" | jq -r '.patch // empty' > \"/tmp/pr-context/diffs/${filename}.diff\"\ndone\n\n# File orderings for sub-agent review (3 strategies)\njq -r '[.[] | .filename] | sort | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_az.txt\njq -r '[.[] | .filename] | sort | reverse | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_za.txt\njq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_largest.txt\n\n# Existing reviews\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/reviews.json\n\n# Review threads with resolution status (GraphQL — REST lacks isResolved/isOutdated)\ngh api graphql --paginate -f query='\n query($owner: String!, $repo: String!, $number: Int!, $endCursor: String) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $number) {\n reviewThreads(first: 100, after: $endCursor) {\n pageInfo { hasNextPage endCursor }\n nodes {\n id\n isResolved\n isOutdated\n isCollapsed\n path\n line\n startLine\n comments(first: 100) {\n nodes {\n id\n databaseId\n body\n author { login }\n createdAt\n }\n }\n }\n }\n }\n }\n }\n' -F owner=\"${GITHUB_REPOSITORY%/*}\" -F repo=\"${GITHUB_REPOSITORY#*/}\" -F \"number=$PR_NUMBER\" \\\n --jq '.data.repository.pullRequest.reviewThreads.nodes' \\\n | jq -s 'add // []' > /tmp/pr-context/review_comments.json\n\n# Filtered review thread views (pre-computed so agents don't need to parse review_comments.json)\njq '[.[] | select(.isResolved == false)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/unresolved_threads.json\njq '[.[] | select(.isResolved == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/resolved_threads.json\njq '[.[] | select(.isOutdated == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/outdated_threads.json\n\n# Per-file review threads (mirrors diffs/ structure)\njq -c '.[]' /tmp/pr-context/review_comments.json | while IFS= read -r thread; do\n filepath=$(echo \"$thread\" | jq -r '.path // empty')\n [ -z \"$filepath\" ] && continue\n mkdir -p \"/tmp/pr-context/threads/$(dirname \"$filepath\")\"\n echo \"$thread\" >> \"/tmp/pr-context/threads/${filepath}.jsonl\"\ndone\n# Convert per-file JSONL to proper JSON arrays\nmkdir -p /tmp/pr-context/threads\nfind /tmp/pr-context/threads -name '*.jsonl' 2>/dev/null | while IFS= read -r jsonl; do\n jq -s '.' \"$jsonl\" > \"${jsonl%.jsonl}.json\"\n rm \"$jsonl\"\ndone\n\n# PR discussion comments\ngh api \"repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/comments.json\n\n# Linked issues\njq -r '.body // \"\"' /tmp/pr-context/pr.json 2>/dev/null \\\n | grep -oiE '(fixes|closes|resolves)\\s+#[0-9]+' \\\n | grep -oE '[0-9]+$' \\\n | sort -u \\\n | while read -r issue; do\n gh api \"repos/$GITHUB_REPOSITORY/issues/$issue\" > \"/tmp/pr-context/issue-${issue}.json\" || true\n done || true\n\n# Write manifest\ncat > /tmp/pr-context/README.md << 'MANIFEST'\n# PR Context\n\nPre-fetched PR data. All files are in `/tmp/pr-context/`.\n\n| File | Description |\n| --- | --- |\n| `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL |\n| `pr.diff` | Full unified diff of all changes |\n| `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` |\n| `diffs/.diff` | Per-file diffs — one file per changed file, mirroring the repo path under `diffs/` |\n| `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line |\n| `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line |\n| `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line |\n| `reviews.json` | Prior review submissions — author, state (APPROVED/CHANGES_REQUESTED/COMMENTED), body |\n| `review_comments.json` | All review threads (GraphQL) — each thread has `id` (node ID for resolving), `isResolved`, `isOutdated`, `path`, `line`, and nested `comments` with `id`, `databaseId` (numeric REST ID for replies), body/author |\n| `unresolved_threads.json` | Unresolved review threads — subset of `review_comments.json` where `isResolved` is false |\n| `resolved_threads.json` | Resolved review threads — subset of `review_comments.json` where `isResolved` is true |\n| `outdated_threads.json` | Outdated review threads — subset of `review_comments.json` where `isOutdated` is true (code changed since comment) |\n| `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` |\n| `comments.json` | PR discussion comments (not inline) |\n| `issue-{N}.json` | Linked issue details (one file per linked issue, if any) |\n| `agents.md` | Repository conventions from `generate_agents_md` (if written by agent) |\n| `review-instructions.md` | Review instructions, criteria, and calibration examples (if written by review-process fragment) |\nMANIFEST\n\necho \"PR context written to /tmp/pr-context/\"\nls -la /tmp/pr-context/" + run: "set -euo pipefail\nmkdir -p /tmp/pr-context\n\n# PR metadata\ngh pr view \"$PR_NUMBER\" --json title,body,author,baseRefName,headRefName,headRefOid,url \\\n > /tmp/pr-context/pr.json\n\n# Full diff\nif ! gh pr diff \"$PR_NUMBER\" > /tmp/pr-context/pr.diff; then\n echo \"::warning::Failed to fetch full PR diff; per-file diffs from files.json are still available.\"\n : > /tmp/pr-context/pr.diff\nfi\n\n# Changed files list (--paginate may output concatenated arrays; jq -s 'add' merges them)\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/files.json\n\n# Per-file diffs\njq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do\n filename=$(echo \"$entry\" | jq -r '.filename')\n mkdir -p \"/tmp/pr-context/diffs/$(dirname \"$filename\")\"\n echo \"$entry\" | jq -r '.patch // empty' > \"/tmp/pr-context/diffs/${filename}.diff\"\ndone\n\n# File orderings for sub-agent review (3 strategies)\njq -r '[.[] | .filename] | sort | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_az.txt\njq -r '[.[] | .filename] | sort | reverse | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_za.txt\njq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_largest.txt\n\n# Determine sub-agent count based on PR size\nFILE_COUNT=$(jq 'length' /tmp/pr-context/files.json)\nif [ \"$FILE_COUNT\" -le 10 ]; then\n AGENT_COUNT=0\nelif [ \"$FILE_COUNT\" -le 20 ]; then\n AGENT_COUNT=2\nelse\n AGENT_COUNT=3\nfi\necho \"$AGENT_COUNT\" > /tmp/pr-context/agent_count.txt\necho \"PR size: ${FILE_COUNT} files → ${AGENT_COUNT} sub-agents\"\n\n# Write review strategy with precise instructions for the agent\necho \"# Review Strategy\" > /tmp/pr-context/review-strategy.md\necho \"\" >> /tmp/pr-context/review-strategy.md\necho \"**PR size:** ${FILE_COUNT} files | **Sub-agents:** ${AGENT_COUNT}\" >> /tmp/pr-context/review-strategy.md\necho \"\" >> /tmp/pr-context/review-strategy.md\n\nif [ \"$AGENT_COUNT\" -eq 0 ]; then\n cat >> /tmp/pr-context/review-strategy.md << 'STRATEGY_DIRECT'\n## Direct Review (no sub-agents)\n\nThis PR is small enough to review directly. Do NOT spawn sub-agents.\n\nReview the diff file by file using the ordering in `/tmp/pr-context/file_order_az.txt`. For each changed file:\n\n1. Read the diff from `/tmp/pr-context/diffs/.diff`\n2. Read the full file from the workspace for context\n3. Check existing threads in `/tmp/pr-context/threads/.json` (if it exists)\n4. Identify issues matching the Code Review Reference criteria\n5. Verify each issue: construct a concrete failure scenario, challenge the finding, check for existing threads\n\nProceed to the Verify and Comment step with your findings.\nSTRATEGY_DIRECT\nelif [ \"$AGENT_COUNT\" -eq 2 ]; then\n cat >> /tmp/pr-context/review-strategy.md << 'STRATEGY_TWO'\n## Sub-agent Review (2 agents)\n\nSpawn exactly 2 `code-review` sub-agents in parallel:\n\n- **Agent 1**: file ordering from `/tmp/pr-context/file_order_az.txt` (A→Z)\n- **Agent 2**: file ordering from `/tmp/pr-context/file_order_za.txt` (Z→A)\n\nEach sub-agent prompt must include:\n- Instruction to read `/tmp/pr-context/review-instructions.md` for the review process, criteria, and calibration examples\n- Instruction to read `/tmp/pr-context/README.md` for a manifest of all available context files\n- The review intensity and minimum severity settings from the workflow\n- The path to that sub-agent's file ordering — tell it to read the file for its ordered list (per-file diffs are at `/tmp/pr-context/diffs/.diff`)\n- Instruction to read changed files from the workspace (the PR branch is checked out)\n\nEach sub-agent returns a structured findings list. They do NOT leave inline comments.\n\nAfter both sub-agents complete, merge and deduplicate findings per the Pick Three, Keep Many process before proceeding to the Verify and Comment step.\nSTRATEGY_TWO\nelse\n cat >> /tmp/pr-context/review-strategy.md << 'STRATEGY_THREE'\n## Sub-agent Review (3 agents)\n\nSpawn exactly 3 `code-review` sub-agents in parallel:\n\n- **Agent 1**: file ordering from `/tmp/pr-context/file_order_az.txt` (A→Z)\n- **Agent 2**: file ordering from `/tmp/pr-context/file_order_za.txt` (Z→A)\n- **Agent 3**: file ordering from `/tmp/pr-context/file_order_largest.txt` (largest diff first)\n\nEach sub-agent prompt must include:\n- Instruction to read `/tmp/pr-context/review-instructions.md` for the review process, criteria, and calibration examples\n- Instruction to read `/tmp/pr-context/README.md` for a manifest of all available context files\n- The review intensity and minimum severity settings from the workflow\n- The path to that sub-agent's file ordering — tell it to read the file for its ordered list (per-file diffs are at `/tmp/pr-context/diffs/.diff`)\n- Instruction to read changed files from the workspace (the PR branch is checked out)\n\nEach sub-agent returns a structured findings list. They do NOT leave inline comments.\n\nAfter all 3 sub-agents complete, merge and deduplicate findings per the Pick Three, Keep Many process before proceeding to the Verify and Comment step.\nSTRATEGY_THREE\nfi\n\n# Existing reviews\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/reviews.json\n\n# Review threads with resolution status (GraphQL — REST lacks isResolved/isOutdated)\ngh api graphql --paginate -f query='\n query($owner: String!, $repo: String!, $number: Int!, $endCursor: String) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $number) {\n reviewThreads(first: 100, after: $endCursor) {\n pageInfo { hasNextPage endCursor }\n nodes {\n id\n isResolved\n isOutdated\n isCollapsed\n path\n line\n startLine\n comments(first: 100) {\n nodes {\n id\n databaseId\n body\n author { login }\n createdAt\n }\n }\n }\n }\n }\n }\n }\n' -F owner=\"${GITHUB_REPOSITORY%/*}\" -F repo=\"${GITHUB_REPOSITORY#*/}\" -F \"number=$PR_NUMBER\" \\\n --jq '.data.repository.pullRequest.reviewThreads.nodes' \\\n | jq -s 'add // []' > /tmp/pr-context/review_comments.json\n\n# Filtered review thread views (pre-computed so agents don't need to parse review_comments.json)\njq '[.[] | select(.isResolved == false)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/unresolved_threads.json\njq '[.[] | select(.isResolved == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/resolved_threads.json\njq '[.[] | select(.isOutdated == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/outdated_threads.json\n\n# Per-file review threads (mirrors diffs/ structure)\njq -c '.[]' /tmp/pr-context/review_comments.json | while IFS= read -r thread; do\n filepath=$(echo \"$thread\" | jq -r '.path // empty')\n [ -z \"$filepath\" ] && continue\n mkdir -p \"/tmp/pr-context/threads/$(dirname \"$filepath\")\"\n echo \"$thread\" >> \"/tmp/pr-context/threads/${filepath}.jsonl\"\ndone\n# Convert per-file JSONL to proper JSON arrays\nmkdir -p /tmp/pr-context/threads\nfind /tmp/pr-context/threads -name '*.jsonl' 2>/dev/null | while IFS= read -r jsonl; do\n jq -s '.' \"$jsonl\" > \"${jsonl%.jsonl}.json\"\n rm \"$jsonl\"\ndone\n\n# PR discussion comments\ngh api \"repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/comments.json\n\n# Linked issues\njq -r '.body // \"\"' /tmp/pr-context/pr.json 2>/dev/null \\\n | grep -oiE '(fixes|closes|resolves)\\s+#[0-9]+' \\\n | grep -oE '[0-9]+$' \\\n | sort -u \\\n | while read -r issue; do\n gh api \"repos/$GITHUB_REPOSITORY/issues/$issue\" > \"/tmp/pr-context/issue-${issue}.json\" || true\n done || true\n\n# Write manifest\ncat > /tmp/pr-context/README.md << 'MANIFEST'\n# PR Context\n\nPre-fetched PR data. All files are in `/tmp/pr-context/`.\n\n| File | Description |\n| --- | --- |\n| `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL |\n| `pr.diff` | Full unified diff of all changes |\n| `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` |\n| `diffs/.diff` | Per-file diffs — one file per changed file, mirroring the repo path under `diffs/` |\n| `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line |\n| `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line |\n| `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line |\n| `reviews.json` | Prior review submissions — author, state (APPROVED/CHANGES_REQUESTED/COMMENTED), body |\n| `review_comments.json` | All review threads (GraphQL) — each thread has `id` (node ID for resolving), `isResolved`, `isOutdated`, `path`, `line`, and nested `comments` with `id`, `databaseId` (numeric REST ID for replies), body/author |\n| `unresolved_threads.json` | Unresolved review threads — subset of `review_comments.json` where `isResolved` is false |\n| `resolved_threads.json` | Resolved review threads — subset of `review_comments.json` where `isResolved` is true |\n| `outdated_threads.json` | Outdated review threads — subset of `review_comments.json` where `isOutdated` is true (code changed since comment) |\n| `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` |\n| `comments.json` | PR discussion comments (not inline) |\n| `issue-{N}.json` | Linked issue details (one file per linked issue, if any) |\n| `agent_count.txt` | Pre-computed sub-agent count: `0` (≤10 files, direct review), `2` (11–20 files), or `3` (>20 files) |\n| `review-strategy.md` | Pre-computed review strategy with precise instructions for the agent based on PR size |\n| `agents.md` | Repository conventions from `generate_agents_md` (if written by agent) |\n| `review-instructions.md` | Review instructions, criteria, and calibration examples (if written by review-process fragment) |\nMANIFEST\n\necho \"PR context written to /tmp/pr-context/\"\nls -la /tmp/pr-context/" - name: Write review instructions to disk run: "mkdir -p /tmp/pr-context\ncat > /tmp/pr-context/review-instructions.md << 'REVIEW_EOF'\n# Review Instructions for Sub-agents\n\nYou are a code review sub-agent. Read these instructions, then review the PR files in the order provided in your prompt.\n\n## Context\n\nBefore reviewing files, read these to understand the PR:\n\n1. `/tmp/pr-context/pr.json` — PR title, description, author, and branches. Understand what the PR is trying to accomplish.\n2. `/tmp/pr-context/agents.md` — Repository coding conventions and guidelines (if it exists).\n3. `/tmp/pr-context/review_comments.json` — Existing review threads. Note which files already have threads so you don't duplicate.\n4. `/tmp/pr-context/issue-*.json` — Linked issue details (if any). Understand the motivation and acceptance criteria.\n\n## Process\n\nReview the PR diff file by file in your assigned order. For each changed file:\n\n1. **Read the diff** for this file from `/tmp/pr-context/diffs/.diff` to understand what changed. If the diff is empty or truncated (e.g., binary files or very large changes), fall back to reading the full file from the workspace and comparing against context.\n2. **Read the full file from the workspace.** The PR branch is checked out locally — open the file directly to get complete contents with line numbers.\n3. **Check existing threads** for this file from `/tmp/pr-context/threads/.json` (if it exists). Skip issues that are already under discussion — each thread has `isResolved` and `isOutdated` fields.\n4. **Identify potential issues** matching the review criteria below.\n5. **Quick-check each issue** before including it:\n - What specific code pattern or change triggers this concern?\n - Is there an obvious guard, handler, or mitigation visible in the immediate context?\n - Can you describe a concrete failure scenario (the `evidence` field)? If you cannot articulate what specific input or state triggers the problem, drop the finding.\n - If the issue is clearly handled, skip it. If you're unsure, include it — the parent will verify.\n6. **Add to your findings list.** Do NOT leave inline comments — you don't have that tool. Return findings in this format:\n\n```\n- file: path/to/file\n line: 42\n severity: HIGH\n title: Brief title\n description: What the issue is and why it matters\n evidence: The specific code pattern and failure scenario\n suggestion: corrected code here (optional — only if you can provide a concrete fix)\n```\n\n**Review every file in your assigned order.** Files reviewed earlier get more attention, which is why different sub-agents use different orderings.\n\n**Check existing threads** — per-file threads are at `/tmp/pr-context/threads/.json` (step 3 above). The full list is at `/tmp/pr-context/review_comments.json`. Do not flag issues that are already under discussion (resolved or unresolved). For outdated threads, only re-flag if the issue still applies to the current diff.\n\n**Return your full findings list** when done, or an empty list if no issues were found.\n\n## Review Criteria\n\nFocus on these categories in priority order:\n\n1. Security vulnerabilities (injection, XSS, auth bypass, secrets exposure)\n2. Logic bugs that could cause runtime failures or incorrect behavior\n3. Data integrity issues (race conditions, missing transactions, corruption risk)\n4. Performance bottlenecks (N+1 queries, memory leaks, blocking operations)\n5. Error handling gaps (unhandled exceptions, missing validation)\n6. Breaking changes to public APIs without migration path\n7. Missing or incorrect test coverage for critical paths\n\n## What NOT to Flag\n\nOnly review the diff — do not flag issues in unchanged code, pre-existing problems not introduced by this PR, or style preferences handled by linters or formatters.\n\n**Common false positives** — these patterns look like issues but usually aren't. Before flagging anything in these categories, confirm the problem is real by reading the surrounding code:\n\n- **Security — input already sanitized:** Don't flag injection or XSS risks when inputs are sanitized upstream, parameterized queries are used, or the framework auto-escapes output.\n- **Null/undefined — guarded elsewhere:** Don't flag potential null dereferences if the value is guaranteed by a type guard, assertion, schema validation, or upstream null check.\n- **Error handling — handled at a different layer:** Don't flag missing try/catch if the caller, middleware, or framework catches and handles the error (e.g., Express error middleware, React error boundaries).\n- **Performance — theoretical, not practical:** Don't flag algorithmic complexity (e.g., O(n^2)) unless N is demonstrably large enough to matter in the actual usage context. \"This could be slow\" without evidence is not actionable.\n- **Validation — exists at another layer:** Don't flag missing input validation if it's handled by an API gateway, middleware, schema validator, or type system.\n- **Test coverage — trivial or generated code:** Don't flag missing tests for trivial getters/setters, auto-generated code, or simple delegation methods.\n- **Style or naming — not in coding guidelines:** Don't flag naming conventions or code style unless they violate the repository's documented coding guidelines (from `generate_agents_md` or CONTRIBUTING docs).\n\n**Existing review threads** — check BEFORE flagging any issue:\n\n- **Resolved with reviewer reply** (e.g. \"This is intentional\") — reviewer's decision is final. Do NOT re-flag.\n- **Resolved without reply** — author likely fixed it. Do NOT re-raise unless the fix introduced a new problem.\n- **Unresolved** — already flagged. Do NOT duplicate.\n- **Outdated** — only re-flag if the issue still applies to the current diff.\n\nWhen in doubt, do not duplicate. Redundant comments erode trust.\n\nFinding no issues is a valid and valuable outcome. An empty findings list is better than findings that waste the author's time or erode trust. Do not manufacture findings to justify your review — if the code is sound, return an empty list.\n\n## Severity Classification\n\nDetermine severity AFTER investigating the issue, not before. First identify the problem and trace through the code, then assign a severity based on the evidence you found.\n\n- 🔴 CRITICAL — Must fix before merge (security vulnerabilities, data corruption, production-breaking bugs)\n- 🟠 HIGH — Should fix before merge (logic errors, missing validation, significant performance issues)\n- 🟡 MEDIUM — Address soon, non-blocking (error handling gaps, suboptimal patterns, missing edge cases)\n- ⚪ LOW — Author discretion (minor improvements, documentation gaps)\n- 💬 NITPICK — Truly optional (stylistic preferences, alternative approaches)\n\n## Review Intensity\n\nThe review intensity is `${{ inputs.intensity || 'balanced' }}`.\n\n- **conservative**: High evidence bar. Only flag when you can demonstrate a concrete failure scenario. If you can construct a reasonable counterargument, do not flag. Approval with zero findings is the expected outcome for most PRs.\n- **balanced**: Standard evidence bar. Flag when you can point to specific code that would fail. If the issue is ambiguous, lean toward not flagging.\n- **aggressive**: Lower evidence bar. Flag when evidence exists even if the failure scenario is not fully confirmed. Improvement suggestions welcome but must cite specific code.\n\n## Calibration Examples\n\nUse these examples to calibrate your judgment. Each pair shows a real issue and a similar-looking pattern that is NOT an issue.\n\n### Example 1: Null/Undefined Access\n\n**True positive — flag this:**\n\n```js\n// PR adds this handler\napp.get('/user/:id', async (req, res) => {\n const user = await db.findUser(req.params.id);\n res.json({ name: user.name, email: user.email });\n});\n```\n\nWhy flag: `db.findUser()` can return `null` when no user matches the ID. Accessing `user.name` will throw a TypeError at runtime. No upstream guard exists — the route handler is the entry point.\n\n**False positive — do NOT flag this:**\n\n```ts\n// PR adds this line inside an existing function\nconst settings = user.getSettings();\n```\n\nWhy skip: Reading the full file reveals `user` is typed as `User` (not `User | null`), and the calling function only runs after `authenticateUser()` middleware which guarantees a valid user object. The null case is handled at a different layer.\n\n### Example 2: SQL Injection\n\n**True positive — flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE customer_id = '{customer_id}'\")\n```\n\nWhy flag: String interpolation in a SQL query with user-controlled input (`customer_id` comes from the request). No parameterization or sanitization anywhere in the call chain.\n\n**False positive — do NOT flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE status = '{OrderStatus.PENDING.value}'\")\n```\n\nWhy skip: The interpolated value is a hardcoded enum constant (`OrderStatus.PENDING`), not user input. There is no injection vector.\n\n### Example 3: Borderline — Do NOT Flag\n\n```go\n// PR adds this function\nfunc processItems(items []Item) []Result {\n results := make([]Result, 0)\n for _, item := range items {\n for _, tag := range item.Tags {\n results = append(results, process(item, tag))\n }\n }\n return results\n}\n```\n\nThis looks like an O(n*m) performance concern. But without evidence that `items` or `Tags` are large in practice, this is speculative. The function processes a bounded dataset (items from a single user request). Do not flag theoretical performance issues without evidence of real-world impact.\nREVIEW_EOF" - env: diff --git a/.github/workflows/gh-aw-pr-review.md b/.github/workflows/gh-aw-pr-review.md index aaab7018..c36bea38 100644 --- a/.github/workflows/gh-aw-pr-review.md +++ b/.github/workflows/gh-aw-pr-review.md @@ -120,25 +120,13 @@ Follow these steps in order. 4. Read `/tmp/pr-context/reviews.json` to check prior review submissions from this bot. Note any prior verdicts to avoid redundant reviews. 5. Read `/tmp/pr-context/review_comments.json` to check existing review threads. Note which files already have threads and whether they are resolved, unresolved, or outdated. -### Step 2: Sub-agent Review +### Step 2: Review -**File orderings are pre-computed** at `/tmp/pr-context/`: -- **Agent 1**: `/tmp/pr-context/file_order_az.txt` — alphabetical (A → Z) -- **Agent 2**: `/tmp/pr-context/file_order_za.txt` — reverse alphabetical (Z → A) -- **Agent 3**: `/tmp/pr-context/file_order_largest.txt` — by diff size descending - -**Spawn sub-agents:** Follow the **Pick Three, Keep Many** process — spawn 3 `code-review` sub-agents to review the PR diff in parallel. Each sub-agent prompt must include: -- Instruction to read `/tmp/pr-context/review-instructions.md` for the review process, criteria, and calibration examples -- Instruction to read `/tmp/pr-context/README.md` for a manifest of all available context files -- The review intensity (`${{ inputs.intensity }}`) and minimum severity (`${{ inputs.minimum_severity }}`) -- The path to that sub-agent's file ordering (e.g., `/tmp/pr-context/file_order_az.txt`) — tell it to read the file for its ordered list (per-file diffs are at `/tmp/pr-context/diffs/.diff`) -- Instruction to read changed files from the workspace (the PR branch is checked out) - -Each sub-agent returns a structured findings list. They do NOT leave inline comments. +Read `/tmp/pr-context/review-strategy.md` for the pre-computed review strategy. The strategy is determined by PR size and specifies the exact number of sub-agents (0, 2, or 3). Follow the instructions in that file exactly. ### Step 3: Verify and Comment -After merging and deduplicating sub-agent findings per the Pick Three, Keep Many process, verify each finding before leaving a comment. For every finding: +If sub-agents were used, merge and deduplicate findings per the Pick Three, Keep Many process. Verify each finding before leaving a comment. For every finding: 1. **Read the file and surrounding context** — open the full file, not just the diff. Understand the broader code. 2. **Construct a concrete failure scenario** — what specific input or state causes the bug? If you cannot describe one, drop the finding. From 736867627244642cbc09a157be74ba19b307a5c9 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Sun, 1 Mar 2026 03:01:43 +0000 Subject: [PATCH 3/4] Add PR review workflow copy without hardcoded 3-agent fragment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- github/workflows/gh-aw-pr-review.md | 160 ++++++++++++++++++++++++++++ 1 file changed, 160 insertions(+) create mode 100644 github/workflows/gh-aw-pr-review.md diff --git a/github/workflows/gh-aw-pr-review.md b/github/workflows/gh-aw-pr-review.md new file mode 100644 index 00000000..7d6f22c8 --- /dev/null +++ b/github/workflows/gh-aw-pr-review.md @@ -0,0 +1,160 @@ +--- +inlined-imports: true +name: "PR Review" +description: "AI code review with inline comments on pull requests" +imports: + - gh-aw-fragments/elastic-tools.md + - gh-aw-fragments/runtime-setup.md + - gh-aw-fragments/formatting.md + - gh-aw-fragments/rigor.md + - gh-aw-fragments/mcp-pagination.md + - gh-aw-fragments/pr-context.md + - gh-aw-fragments/review-process.md + - gh-aw-fragments/messages-footer.md + - gh-aw-fragments/safe-output-review-comment.md + - gh-aw-fragments/safe-output-submit-review.md + - gh-aw-fragments/network-ecosystems.md +engine: + id: copilot + model: ${{ inputs.model }} + concurrency: + group: "gh-aw-copilot-${{ github.workflow }}-pr-review-${{ github.event.pull_request.number }}" +on: + workflow_call: + inputs: + model: + description: "AI model to use" + type: string + required: false + default: "gpt-5.3-codex" + additional-instructions: + description: "Repo-specific instructions appended to the agent prompt" + type: string + required: false + default: "" + setup-commands: + description: "Shell commands to run before the agent starts (dependency install, build, etc.)" + type: string + required: false + default: "" + allowed-bot-users: + description: "Allowlisted bot actor usernames (comma-separated)" + type: string + required: false + default: "github-actions[bot]" + intensity: + description: "Review intensity: conservative, balanced, or aggressive" + type: string + required: false + default: "balanced" + minimum_severity: + description: "Minimum severity for inline comments: critical, high, medium, low, or nitpick. Issues below this threshold go in a collapsible section of the review body instead." + type: string + required: false + default: "low" + messages-footer: + description: "Footer appended to all agent comments and reviews" + type: string + required: false + default: "" + create-pull-request-review-comment-max: + description: "Maximum number of review comments the agent can create per run" + type: string + required: false + default: "30" + secrets: + COPILOT_GITHUB_TOKEN: + required: true + roles: [admin, maintainer, write] + bots: + - "${{ inputs.allowed-bot-users }}" +concurrency: + group: ${{ github.workflow }}-pr-review-${{ github.event.pull_request.number }} + cancel-in-progress: true +permissions: + actions: read + contents: read + pull-requests: read + issues: read +tools: + github: + toolsets: [repos, issues, pull_requests, search, actions] + bash: true + web-fetch: +safe-outputs: + activation-comments: false +strict: false +timeout-minutes: 90 +steps: + - name: Repo-specific setup + if: ${{ inputs.setup-commands != '' }} + env: + SETUP_COMMANDS: ${{ inputs.setup-commands }} + run: eval "$SETUP_COMMANDS" +--- + +# PR Review Agent + +Review pull requests in ${{ github.repository }} and provide actionable feedback via inline review comments on specific code lines. + +## Context + +- **Repository**: ${{ github.repository }} +- **PR**: #${{ github.event.pull_request.number }} — ${{ github.event.pull_request.title }} +- **PR context on disk**: `/tmp/pr-context/` — PR metadata, diff, files, reviews, comments, and linked issues are pre-fetched. Read from these files instead of calling the API. + +## Constraints + +This workflow is read-only. You can read files, search code, run commands, and interact with PRs and issues — but your only outputs are inline review comments and a review submission. + +## Review Process + +Follow these steps in order. + +### Step 1: Gather Context + +1. Call `generate_agents_md` to get the repository's coding guidelines and conventions. Write the result to `/tmp/pr-context/agents.md` so sub-agents can read it. If `generate_agents_md` fails, continue without it. +2. Read `/tmp/pr-context/pr.json` for PR details (author, description, branches). +3. Read `/tmp/pr-context/issue-*.json` files if any exist to understand linked issue motivation and acceptance criteria. +4. Read `/tmp/pr-context/reviews.json` to check prior review submissions from this bot. Note any prior verdicts to avoid redundant reviews. +5. Read `/tmp/pr-context/review_comments.json` to check existing review threads. Note which files already have threads and whether they are resolved, unresolved, or outdated. + +### Step 2: Review + +Read `/tmp/pr-context/review-strategy.md` for the pre-computed review strategy. The strategy is determined by PR size and specifies the exact number of sub-agents (0, 2, or 3). Follow the instructions in that file exactly. + +### Step 3: Verify and Comment + +If sub-agents were used, merge and deduplicate findings per the Pick Three, Keep Many process. Verify each finding before leaving a comment. For every finding: + +1. **Read the file and surrounding context** — open the full file, not just the diff. Understand the broader code. +2. **Construct a concrete failure scenario** — what specific input or state causes the bug? If you cannot describe one, drop the finding. +3. **Challenge the finding** — would a senior engineer familiar with this codebase agree this is a real issue? If "probably not" or "unsure", drop it. +4. **Check existing threads** — if this issue was already flagged in a prior review (resolved or unresolved), do not duplicate. + +Only leave a comment if the finding survives all four checks. Findings flagged independently by multiple sub-agents are stronger candidates. Findings from only one sub-agent deserve extra scrutiny. + +Leave inline comments (`create_pull_request_review_comment`) per the **Code Review Reference** above for each finding that survives verification. Comment on each file's findings before moving to the next file. If no findings survive verification, proceed directly to Step 4. + +### Step 4: Submit the Review + +**Skip if nothing new:** If you left zero inline comments during this review AND your verdict would be the same as the most recent review from this bot (compare against reviews in Step 1), call `noop` with a message like "No new findings — prior review still applies" and stop. Do not submit a redundant review. + +After all comments are posted, step back and consider the PR as a whole. Call **`submit_pull_request_review`** with: +- The review type (REQUEST_CHANGES, COMMENT, or APPROVE) +- A review body that is **only the verdict and only if the verdict is not APPROVE**. If you have cross-cutting feedback that spans multiple files or cannot be expressed as inline comments, include it here. Otherwise, leave the review body empty — your inline comments already contain the detail. + +**Bot-authored PRs:** If the PR author is `github-actions[bot]`, you can only submit a `COMMENT` review — `APPROVE` and `REQUEST_CHANGES` will fail because GitHub does not allow bot accounts to approve or request changes on their own PRs. Use `COMMENT` and state your verdict in the review body instead. + +**Do NOT** describe what the PR does, list the files you reviewed, summarize inline comments, or restate prior review feedback. The PR author already knows what their PR does. Your inline comments already contain all the detail. The review body exists solely to communicate the approve/request-changes decision and important/critical feedback that cannot be covered in inline comments. + +If you have no issues, or you have only provided NITPICK and LOW issues, submit an APPROVE review. Otherwise, submit a REQUEST_CHANGES review. + +## Review Settings + +- **Intensity**: `${{ inputs.intensity }}` +- **Minimum inline severity**: `${{ inputs.minimum_severity }}` + +These override the defaults defined in the Code Review Reference above. + +${{ inputs.additional-instructions }} From 7b2304a867f7a91d8a72e8636ae96118055b719d Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Sun, 1 Mar 2026 03:06:44 +0000 Subject: [PATCH 4/4] Add conditional pick-three fragment copy Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../gh-aw-fragments/pick-three-keep-many.md | 22 +++++++++++++++++++ github/workflows/gh-aw-pr-review.md | 1 + 2 files changed, 23 insertions(+) create mode 100644 github/workflows/gh-aw-fragments/pick-three-keep-many.md diff --git a/github/workflows/gh-aw-fragments/pick-three-keep-many.md b/github/workflows/gh-aw-fragments/pick-three-keep-many.md new file mode 100644 index 00000000..d76bdd70 --- /dev/null +++ b/github/workflows/gh-aw-fragments/pick-three-keep-many.md @@ -0,0 +1,22 @@ +### Pick Three, Keep Many + +If your review strategy requires sub-agents, parallelize your work using sub-agents. Spawn the exact number of sub-agents specified by `/tmp/pr-context/review-strategy.md`, with each sub-agent approaching the task from a different angle (for example, different focus areas, heuristics, or file order). If the strategy says direct review, do not spawn sub-agents. + +**How to spawn sub-agents:** Call `runSubagent` with the `agentType` and `model` specified by the workflow instructions below (defaulting to `agentType: "general-purpose"` and `model: "${{ inputs.model }}"` if none are specified). Sub-agents cannot see your conversation history, the other sub-agents' results, or any context you have gathered so far. Each prompt must be **fully self-contained** — include everything the sub-agent needs: + +- The full task description and objective (restate it, don't summarize) +- All repository context, conventions, and constraints you've gathered (e.g., from `generate_agents_md`) +- Any relevant data the sub-agent needs to do its job (diffs, file contents, existing threads) +- The quality criteria and output format you expect +- The specific angle that distinguishes this sub-agent from the others + +Err on the side of providing too much context rather than too little. A well-informed sub-agent with a 10,000-token prompt will produce far better results than one that has to rediscover the codebase from scratch. + +**Wait for all spawned sub-agents to complete.** Do not proceed until every sub-agent you started has returned its result. + +**Merge and deduplicate findings** across all sub-agents: +1. If multiple sub-agents flagged the same issue, keep the version with the strongest evidence and clearest explanation. +2. If a finding is unique to one sub-agent, include it only if it passes the quality gate on its own merits — a finding flagged by only one sub-agent deserves extra scrutiny. +3. Drop any finding that does not meet the verification criteria. + +**Filter aggressively for quality.** Your job as the parent agent is to be the quality gate. Sub-agents cast a wide net; you decide what's worth keeping. For each surviving finding, verify it yourself — check that file paths exist, line numbers are accurate, the problem is real, and the finding is actionable. Discard anything vague, speculative, or already addressed. If no findings survive filtering, call `noop`. diff --git a/github/workflows/gh-aw-pr-review.md b/github/workflows/gh-aw-pr-review.md index 7d6f22c8..c36bea38 100644 --- a/github/workflows/gh-aw-pr-review.md +++ b/github/workflows/gh-aw-pr-review.md @@ -13,6 +13,7 @@ imports: - gh-aw-fragments/messages-footer.md - gh-aw-fragments/safe-output-review-comment.md - gh-aw-fragments/safe-output-submit-review.md + - gh-aw-fragments/pick-three-keep-many.md - gh-aw-fragments/network-ecosystems.md engine: id: copilot