diff --git a/.github/workflows/gh-aw-estc-docs-pr-review.lock.yml b/.github/workflows/gh-aw-estc-docs-pr-review.lock.yml index 3351230d..5b8a9cfe 100644 --- a/.github/workflows/gh-aw-estc-docs-pr-review.lock.yml +++ b/.github/workflows/gh-aw-estc-docs-pr-review.lock.yml @@ -302,9 +302,9 @@ jobs: ## create-pull-request-review-comment - **Required fields**: `path` (file path), `line` (line number), and `body` (comment text). - - **Line**: Must be within the diff — an added or context line in the patch. Must be the **exact line number from reading the file** (not estimated from the patch). Lines outside the diff will fail. + - **Line**: Must be a numbered line in the per-file diff (`/tmp/pr-context/diffs/.diff`). Only lines with a number prefix are commentable. Lines outside the diff will fail with a GitHub API error. - **Body**: Sanitized with the standard pipeline (mentions neutralized, HTML filtered, URLs restricted). GitHub API limit is ~65,536 characters. - - **Side**: Defaults to `RIGHT` (the new code). Use `LEFT` only when commenting on deleted lines. + - **Side**: Always `RIGHT` (the new code). Deleted lines have no number prefix in the diff and cannot be targeted for inline comments — include findings about deleted code in the review body instead. - **Suggestion blocks**: Use ` ```suggestion ` fences for concrete code fixes. The suggestion must actually change the code — don't suggest identical code. Only include a `suggestion` block when you can provide a concrete code fix that **actually changes** the code. Only flag issues you are confident are real problems — false positives erode trust. Once you have flagged an issue, you cannot unflag it. diff --git a/.github/workflows/gh-aw-fragments/pr-context.md b/.github/workflows/gh-aw-fragments/pr-context.md index a3a74fc9..48bbc902 100644 --- a/.github/workflows/gh-aw-fragments/pr-context.md +++ b/.github/workflows/gh-aw-fragments/pr-context.md @@ -22,11 +22,19 @@ steps: gh api "repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files" --paginate \ | jq -s 'add // []' > /tmp/pr-context/files.json - # Per-file diffs + # Per-file diffs with line numbers + # Lines with a number prefix are commentable (RIGHT side: added or context). + # Deleted lines get no number (LEFT side only). jq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do filename=$(echo "$entry" | jq -r '.filename') mkdir -p "/tmp/pr-context/diffs/$(dirname "$filename")" - echo "$entry" | jq -r '.patch // empty' > "/tmp/pr-context/diffs/${filename}.diff" + echo "$entry" | jq -r '.patch // empty' | awk ' + /^@@/ { s=$3; gsub(/\+/,"",s); split(s,a,","); line=a[1]+0; print; next } + /^\+/ { printf "%d\t%s\n", line++, $0; next } + /^-/ { printf " \t%s\n", $0; next } + /^ / { printf "%d\t%s\n", line++, $0; next } + { print } + ' > "/tmp/pr-context/diffs/${filename}.diff" done # File orderings for sub-agent review (3 strategies) @@ -126,7 +134,7 @@ steps: | `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL | | `pr.diff` | Full unified diff of all changes | | `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` | - | `diffs/.diff` | Per-file diffs — one file per changed file, mirroring the repo path under `diffs/` | + | `diffs/.diff` | Per-file diffs with line numbers — numbered lines (e.g. `405\t+ code`) are commentable; lines without numbers are deleted (LEFT side only) | | `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line | | `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line | | `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line | @@ -143,9 +151,7 @@ steps: | `review-instructions.md` | Review instructions, criteria, and calibration examples for sub-agents | | `agent-review.md` | Main agent instructions — review approach resolved from PR size (written when `ready_to_code_review` is called) | | `parent-review.md` | Comment format and inline severity threshold for the parent agent (written when `ready_to_code_review` is called) | - | `subagent-az.md` | Sub-agent instructions: review files A → Z (written when `ready_to_code_review` is called) | - | `subagent-za.md` | Sub-agent instructions: review files Z → A (written when `ready_to_code_review` is called) | - | `subagent-largest.md` | Sub-agent instructions: review files largest diff first (written when `ready_to_code_review` is called) | + | `subagent-*.md` | Sub-agent instructions (only written for medium/large PRs when `ready_to_code_review` is called — check `agent-review.md` for which exist) | MANIFEST echo "PR context written to /tmp/pr-context/" diff --git a/.github/workflows/gh-aw-fragments/review-process.md b/.github/workflows/gh-aw-fragments/review-process.md index 36b1620a..029ccdb9 100644 --- a/.github/workflows/gh-aw-fragments/review-process.md +++ b/.github/workflows/gh-aw-fragments/review-process.md @@ -21,7 +21,7 @@ steps: Review the PR diff file by file in your assigned order. For each changed file: - 1. **Read the diff** for this file from `/tmp/pr-context/diffs/.diff` to understand what changed. If the diff is empty or truncated (e.g., binary files or very large changes), fall back to reading the full file from the workspace and comparing against context. + 1. **Read the diff** for this file from `/tmp/pr-context/diffs/.diff` to understand what changed. Each line is prefixed with its line number (e.g., `405\t+ code`). Lines with numbers are commentable; lines without numbers are deleted lines (LEFT side only). If the diff is empty or truncated (e.g., binary files or very large changes), fall back to reading the full file from the workspace and comparing against context. 2. **Read the full file from the workspace.** The PR branch is checked out locally — open the file directly to get complete contents with line numbers. 3. **Check existing threads** for this file from `/tmp/pr-context/threads/.json` (if it exists). Skip issues that are already under discussion — each thread has `isResolved` and `isOutdated` fields. 4. **Identify potential issues** matching the review criteria below. diff --git a/.github/workflows/gh-aw-fragments/safe-output-code-review.md b/.github/workflows/gh-aw-fragments/safe-output-code-review.md index c318f705..29435cb4 100644 --- a/.github/workflows/gh-aw-fragments/safe-output-code-review.md +++ b/.github/workflows/gh-aw-fragments/safe-output-code-review.md @@ -17,9 +17,9 @@ safe-inputs: pr_size = 'unknown size' diff_lines = 0 - # Write one instruction file per sub-agent file ordering strategy - for key, desc in [('az', 'A \u2192 Z (alphabetical)'), ('za', 'Z \u2192 A (reverse alphabetical)'), ('largest', 'largest diff first')]: - lines = [ + # Determine review approach based on PR size + def write_subagent(key, desc): + sa_lines = [ '# PR Review Sub-Agent', '', 'Review the PR as a code review sub-agent. Return findings only \u2014 do NOT leave inline comments.', @@ -40,16 +40,17 @@ safe-inputs: '', f'Review files in this order: `/tmp/pr-context/file_order_{key}.txt` ({desc})', ] - with open(f'/tmp/pr-context/subagent-{key}.md', 'w') as f: - f.write('\n'.join(lines) + '\n') + with open(f'/tmp/pr-context/subagent-{key}.md', 'w') as sa_f: + sa_f.write('\n'.join(sa_lines) + '\n') - # Determine review approach based on PR size if diff_lines < 200: approach_lines = [ f'**Small PR ({pr_size}):** Review directly \u2014 no sub-agents. Review files in order from `/tmp/pr-context/file_order_az.txt`, reading each diff from `/tmp/pr-context/diffs/.diff` and the full file from the workspace.', ] size_key = 'small' elif diff_lines < 800: + write_subagent('az', 'A \u2192 Z (alphabetical)') + write_subagent('za', 'Z \u2192 A (reverse alphabetical)') approach_lines = [ f'**Medium PR ({pr_size}):** Use the **Pick Three, Keep Many** process \u2014 spawn 2 `code-review` sub-agents in parallel:', '', @@ -60,6 +61,9 @@ safe-inputs: ] size_key = 'medium' else: + write_subagent('az', 'A \u2192 Z (alphabetical)') + write_subagent('za', 'Z \u2192 A (reverse alphabetical)') + write_subagent('largest', 'largest diff first') approach_lines = [ f'**Large PR ({pr_size}):** Use the **Pick Three, Keep Many** process \u2014 spawn 3 `code-review` sub-agents in parallel:', '', @@ -84,8 +88,7 @@ safe-inputs: '## Comment Format', '', 'Call **`create_pull_request_review_comment`** with:', - '- The file path and the **exact line number from reading the file** (not estimated from the diff)', - '- The line must be within the diff (an added or context line in the patch)', + '- The file path and a **line number that appears in the numbered diff** (`/tmp/pr-context/diffs/.diff`). Only lines with a number prefix are commentable. If your target line has no number in the diff, include the finding in the review body instead.', '', t5, '**[SEVERITY] Brief title**', diff --git a/.github/workflows/gh-aw-fragments/safe-output-review-comment.md b/.github/workflows/gh-aw-fragments/safe-output-review-comment.md index 57979762..76794447 100644 --- a/.github/workflows/gh-aw-fragments/safe-output-review-comment.md +++ b/.github/workflows/gh-aw-fragments/safe-output-review-comment.md @@ -7,9 +7,9 @@ safe-outputs: ## create-pull-request-review-comment - **Required fields**: `path` (file path), `line` (line number), and `body` (comment text). -- **Line**: Must be within the diff — an added or context line in the patch. Must be the **exact line number from reading the file** (not estimated from the patch). Lines outside the diff will fail. +- **Line**: Must be a numbered line in the per-file diff (`/tmp/pr-context/diffs/.diff`). Only lines with a number prefix are commentable. Lines outside the diff will fail with a GitHub API error. - **Body**: Sanitized with the standard pipeline (mentions neutralized, HTML filtered, URLs restricted). GitHub API limit is ~65,536 characters. -- **Side**: Defaults to `RIGHT` (the new code). Use `LEFT` only when commenting on deleted lines. +- **Side**: Always `RIGHT` (the new code). Deleted lines have no number prefix in the diff and cannot be targeted for inline comments — include findings about deleted code in the review body instead. - **Suggestion blocks**: Use ` ```suggestion ` fences for concrete code fixes. The suggestion must actually change the code — don't suggest identical code. Only include a `suggestion` block when you can provide a concrete code fix that **actually changes** the code. Only flag issues you are confident are real problems — false positives erode trust. Once you have flagged an issue, you cannot unflag it. \ No newline at end of file diff --git a/.github/workflows/gh-aw-mention-in-pr-by-id.lock.yml b/.github/workflows/gh-aw-mention-in-pr-by-id.lock.yml index 922ab4d5..f9b73c35 100644 --- a/.github/workflows/gh-aw-mention-in-pr-by-id.lock.yml +++ b/.github/workflows/gh-aw-mention-in-pr-by-id.lock.yml @@ -46,7 +46,7 @@ # # inlined-imports: true # -# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"0dca38278f9109c09297caa208f8cdfbe276764e3859deac129bf82e4c953b23"} +# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"6e38d9fdce08dc549a7302d86c523d0e504363054f235e56bc7abe644f8c6d35"} name: "Mention in PR by ID" "on": @@ -389,9 +389,9 @@ jobs: ## create-pull-request-review-comment - **Required fields**: `path` (file path), `line` (line number), and `body` (comment text). - - **Line**: Must be within the diff — an added or context line in the patch. Must be the **exact line number from reading the file** (not estimated from the patch). Lines outside the diff will fail. + - **Line**: Must be a numbered line in the per-file diff (`/tmp/pr-context/diffs/.diff`). Only lines with a number prefix are commentable. Lines outside the diff will fail with a GitHub API error. - **Body**: Sanitized with the standard pipeline (mentions neutralized, HTML filtered, URLs restricted). GitHub API limit is ~65,536 characters. - - **Side**: Defaults to `RIGHT` (the new code). Use `LEFT` only when commenting on deleted lines. + - **Side**: Always `RIGHT` (the new code). Deleted lines have no number prefix in the diff and cannot be targeted for inline comments — include findings about deleted code in the review body instead. - **Suggestion blocks**: Use ` ```suggestion ` fences for concrete code fixes. The suggestion must actually change the code — don't suggest identical code. Only include a `suggestion` block when you can provide a concrete code fix that **actually changes** the code. Only flag issues you are confident are real problems — false positives erode trust. Once you have flagged an issue, you cannot unflag it. @@ -648,9 +648,9 @@ jobs: GH_TOKEN: ${{ github.token }} PR_NUMBER: ${{ github.event.pull_request.number || inputs.target-pr-number || github.event.issue.number }} name: Fetch PR context to disk - run: "set -euo pipefail\nmkdir -p /tmp/pr-context\n\n# PR metadata\ngh pr view \"$PR_NUMBER\" --json title,body,author,baseRefName,headRefName,headRefOid,url \\\n > /tmp/pr-context/pr.json\n\n# Full diff\nif ! gh pr diff \"$PR_NUMBER\" > /tmp/pr-context/pr.diff; then\n echo \"::warning::Failed to fetch full PR diff; per-file diffs from files.json are still available.\"\n : > /tmp/pr-context/pr.diff\nfi\n\n# Changed files list (--paginate may output concatenated arrays; jq -s 'add' merges them)\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/files.json\n\n# Per-file diffs\njq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do\n filename=$(echo \"$entry\" | jq -r '.filename')\n mkdir -p \"/tmp/pr-context/diffs/$(dirname \"$filename\")\"\n echo \"$entry\" | jq -r '.patch // empty' > \"/tmp/pr-context/diffs/${filename}.diff\"\ndone\n\n# File orderings for sub-agent review (3 strategies)\njq -r '[.[] | .filename] | sort | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_az.txt\njq -r '[.[] | .filename] | sort | reverse | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_za.txt\njq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_largest.txt\n\n# Compute PR size metrics for review fan-out decisions\nFILE_COUNT=$(jq 'length' /tmp/pr-context/files.json)\nDIFF_LINES=$(wc -l < /tmp/pr-context/pr.diff | tr -d ' ')\necho \"${FILE_COUNT} files, ${DIFF_LINES} diff lines\" > /tmp/pr-context/pr-size.txt\necho \"PR size: ${FILE_COUNT} files, ${DIFF_LINES} diff lines\"\n\n# Existing reviews\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/reviews.json\n\n# Review threads with resolution status (GraphQL — REST lacks isResolved/isOutdated)\ngh api graphql --paginate -f query='\n query($owner: String!, $repo: String!, $number: Int!, $endCursor: String) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $number) {\n reviewThreads(first: 100, after: $endCursor) {\n pageInfo { hasNextPage endCursor }\n nodes {\n id\n isResolved\n isOutdated\n isCollapsed\n path\n line\n startLine\n comments(first: 100) {\n nodes {\n id\n databaseId\n body\n author { login }\n createdAt\n }\n }\n }\n }\n }\n }\n }\n' -F owner=\"${GITHUB_REPOSITORY%/*}\" -F repo=\"${GITHUB_REPOSITORY#*/}\" -F \"number=$PR_NUMBER\" \\\n --jq '.data.repository.pullRequest.reviewThreads.nodes' \\\n | jq -s 'add // []' > /tmp/pr-context/review_comments.json\n\n# Filtered review thread views (pre-computed so agents don't need to parse review_comments.json)\njq '[.[] | select(.isResolved == false)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/unresolved_threads.json\njq '[.[] | select(.isResolved == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/resolved_threads.json\njq '[.[] | select(.isOutdated == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/outdated_threads.json\n\n# Per-file review threads (mirrors diffs/ structure)\njq -c '.[]' /tmp/pr-context/review_comments.json | while IFS= read -r thread; do\n filepath=$(echo \"$thread\" | jq -r '.path // empty')\n [ -z \"$filepath\" ] && continue\n mkdir -p \"/tmp/pr-context/threads/$(dirname \"$filepath\")\"\n echo \"$thread\" >> \"/tmp/pr-context/threads/${filepath}.jsonl\"\ndone\n# Convert per-file JSONL to proper JSON arrays\nmkdir -p /tmp/pr-context/threads\nfind /tmp/pr-context/threads -name '*.jsonl' 2>/dev/null | while IFS= read -r jsonl; do\n jq -s '.' \"$jsonl\" > \"${jsonl%.jsonl}.json\"\n rm \"$jsonl\"\ndone\n\n# PR discussion comments\ngh api \"repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/comments.json\n\n# Linked issues\njq -r '.body // \"\"' /tmp/pr-context/pr.json 2>/dev/null \\\n | grep -oiE '(fixes|closes|resolves)\\s+#[0-9]+' \\\n | grep -oE '[0-9]+$' \\\n | sort -u \\\n | while read -r issue; do\n gh api \"repos/$GITHUB_REPOSITORY/issues/$issue\" > \"/tmp/pr-context/issue-${issue}.json\" || true\n done || true\n\n# Write manifest\ncat > /tmp/pr-context/README.md << 'MANIFEST'\n# PR Context\n\nPre-fetched PR data. All files are in `/tmp/pr-context/`.\n\n| File | Description |\n| --- | --- |\n| `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL |\n| `pr.diff` | Full unified diff of all changes |\n| `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` |\n| `diffs/.diff` | Per-file diffs — one file per changed file, mirroring the repo path under `diffs/` |\n| `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line |\n| `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line |\n| `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line |\n| `reviews.json` | Prior review submissions — author, state (APPROVED/CHANGES_REQUESTED/COMMENTED), body |\n| `review_comments.json` | All review threads (GraphQL) — each thread has `id` (node ID for resolving), `isResolved`, `isOutdated`, `path`, `line`, and nested `comments` with `id`, `databaseId` (numeric REST ID for replies), body/author |\n| `unresolved_threads.json` | Unresolved review threads — subset of `review_comments.json` where `isResolved` is false |\n| `resolved_threads.json` | Resolved review threads — subset of `review_comments.json` where `isResolved` is true |\n| `outdated_threads.json` | Outdated review threads — subset of `review_comments.json` where `isOutdated` is true (code changed since comment) |\n| `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` |\n| `comments.json` | PR discussion comments (not inline) |\n| `issue-{N}.json` | Linked issue details (one file per linked issue, if any) |\n| `pr-size.txt` | PR size metrics: `{N} files, {M} diff lines` — used by the agent to decide review fan-out |\n| `agents.md` | Not used in PR context — repository conventions are at `/tmp/agents.md` (pre-fetched) |\n| `review-instructions.md` | Review instructions, criteria, and calibration examples for sub-agents |\n| `agent-review.md` | Main agent instructions — review approach resolved from PR size (written when `ready_to_code_review` is called) |\n| `parent-review.md` | Comment format and inline severity threshold for the parent agent (written when `ready_to_code_review` is called) |\n| `subagent-az.md` | Sub-agent instructions: review files A → Z (written when `ready_to_code_review` is called) |\n| `subagent-za.md` | Sub-agent instructions: review files Z → A (written when `ready_to_code_review` is called) |\n| `subagent-largest.md` | Sub-agent instructions: review files largest diff first (written when `ready_to_code_review` is called) |\nMANIFEST\n\necho \"PR context written to /tmp/pr-context/\"\nls -la /tmp/pr-context/" + run: "set -euo pipefail\nmkdir -p /tmp/pr-context\n\n# PR metadata\ngh pr view \"$PR_NUMBER\" --json title,body,author,baseRefName,headRefName,headRefOid,url \\\n > /tmp/pr-context/pr.json\n\n# Full diff\nif ! gh pr diff \"$PR_NUMBER\" > /tmp/pr-context/pr.diff; then\n echo \"::warning::Failed to fetch full PR diff; per-file diffs from files.json are still available.\"\n : > /tmp/pr-context/pr.diff\nfi\n\n# Changed files list (--paginate may output concatenated arrays; jq -s 'add' merges them)\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/files.json\n\n# Per-file diffs with line numbers\n# Lines with a number prefix are commentable (RIGHT side: added or context).\n# Deleted lines get no number (LEFT side only).\njq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do\n filename=$(echo \"$entry\" | jq -r '.filename')\n mkdir -p \"/tmp/pr-context/diffs/$(dirname \"$filename\")\"\n echo \"$entry\" | jq -r '.patch // empty' | awk '\n /^@@/ { s=$3; gsub(/\\+/,\"\",s); split(s,a,\",\"); line=a[1]+0; print; next }\n /^\\+/ { printf \"%d\\t%s\\n\", line++, $0; next }\n /^-/ { printf \" \\t%s\\n\", $0; next }\n /^ / { printf \"%d\\t%s\\n\", line++, $0; next }\n { print }\n ' > \"/tmp/pr-context/diffs/${filename}.diff\"\ndone\n\n# File orderings for sub-agent review (3 strategies)\njq -r '[.[] | .filename] | sort | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_az.txt\njq -r '[.[] | .filename] | sort | reverse | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_za.txt\njq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_largest.txt\n\n# Compute PR size metrics for review fan-out decisions\nFILE_COUNT=$(jq 'length' /tmp/pr-context/files.json)\nDIFF_LINES=$(wc -l < /tmp/pr-context/pr.diff | tr -d ' ')\necho \"${FILE_COUNT} files, ${DIFF_LINES} diff lines\" > /tmp/pr-context/pr-size.txt\necho \"PR size: ${FILE_COUNT} files, ${DIFF_LINES} diff lines\"\n\n# Existing reviews\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/reviews.json\n\n# Review threads with resolution status (GraphQL — REST lacks isResolved/isOutdated)\ngh api graphql --paginate -f query='\n query($owner: String!, $repo: String!, $number: Int!, $endCursor: String) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $number) {\n reviewThreads(first: 100, after: $endCursor) {\n pageInfo { hasNextPage endCursor }\n nodes {\n id\n isResolved\n isOutdated\n isCollapsed\n path\n line\n startLine\n comments(first: 100) {\n nodes {\n id\n databaseId\n body\n author { login }\n createdAt\n }\n }\n }\n }\n }\n }\n }\n' -F owner=\"${GITHUB_REPOSITORY%/*}\" -F repo=\"${GITHUB_REPOSITORY#*/}\" -F \"number=$PR_NUMBER\" \\\n --jq '.data.repository.pullRequest.reviewThreads.nodes' \\\n | jq -s 'add // []' > /tmp/pr-context/review_comments.json\n\n# Filtered review thread views (pre-computed so agents don't need to parse review_comments.json)\njq '[.[] | select(.isResolved == false)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/unresolved_threads.json\njq '[.[] | select(.isResolved == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/resolved_threads.json\njq '[.[] | select(.isOutdated == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/outdated_threads.json\n\n# Per-file review threads (mirrors diffs/ structure)\njq -c '.[]' /tmp/pr-context/review_comments.json | while IFS= read -r thread; do\n filepath=$(echo \"$thread\" | jq -r '.path // empty')\n [ -z \"$filepath\" ] && continue\n mkdir -p \"/tmp/pr-context/threads/$(dirname \"$filepath\")\"\n echo \"$thread\" >> \"/tmp/pr-context/threads/${filepath}.jsonl\"\ndone\n# Convert per-file JSONL to proper JSON arrays\nmkdir -p /tmp/pr-context/threads\nfind /tmp/pr-context/threads -name '*.jsonl' 2>/dev/null | while IFS= read -r jsonl; do\n jq -s '.' \"$jsonl\" > \"${jsonl%.jsonl}.json\"\n rm \"$jsonl\"\ndone\n\n# PR discussion comments\ngh api \"repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/comments.json\n\n# Linked issues\njq -r '.body // \"\"' /tmp/pr-context/pr.json 2>/dev/null \\\n | grep -oiE '(fixes|closes|resolves)\\s+#[0-9]+' \\\n | grep -oE '[0-9]+$' \\\n | sort -u \\\n | while read -r issue; do\n gh api \"repos/$GITHUB_REPOSITORY/issues/$issue\" > \"/tmp/pr-context/issue-${issue}.json\" || true\n done || true\n\n# Write manifest\ncat > /tmp/pr-context/README.md << 'MANIFEST'\n# PR Context\n\nPre-fetched PR data. All files are in `/tmp/pr-context/`.\n\n| File | Description |\n| --- | --- |\n| `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL |\n| `pr.diff` | Full unified diff of all changes |\n| `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` |\n| `diffs/.diff` | Per-file diffs with line numbers — numbered lines (e.g. `405\\t+ code`) are commentable; lines without numbers are deleted (LEFT side only) |\n| `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line |\n| `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line |\n| `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line |\n| `reviews.json` | Prior review submissions — author, state (APPROVED/CHANGES_REQUESTED/COMMENTED), body |\n| `review_comments.json` | All review threads (GraphQL) — each thread has `id` (node ID for resolving), `isResolved`, `isOutdated`, `path`, `line`, and nested `comments` with `id`, `databaseId` (numeric REST ID for replies), body/author |\n| `unresolved_threads.json` | Unresolved review threads — subset of `review_comments.json` where `isResolved` is false |\n| `resolved_threads.json` | Resolved review threads — subset of `review_comments.json` where `isResolved` is true |\n| `outdated_threads.json` | Outdated review threads — subset of `review_comments.json` where `isOutdated` is true (code changed since comment) |\n| `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` |\n| `comments.json` | PR discussion comments (not inline) |\n| `issue-{N}.json` | Linked issue details (one file per linked issue, if any) |\n| `pr-size.txt` | PR size metrics: `{N} files, {M} diff lines` — used by the agent to decide review fan-out |\n| `agents.md` | Not used in PR context — repository conventions are at `/tmp/agents.md` (pre-fetched) |\n| `review-instructions.md` | Review instructions, criteria, and calibration examples for sub-agents |\n| `agent-review.md` | Main agent instructions — review approach resolved from PR size (written when `ready_to_code_review` is called) |\n| `parent-review.md` | Comment format and inline severity threshold for the parent agent (written when `ready_to_code_review` is called) |\n| `subagent-*.md` | Sub-agent instructions (only written for medium/large PRs when `ready_to_code_review` is called — check `agent-review.md` for which exist) |\nMANIFEST\n\necho \"PR context written to /tmp/pr-context/\"\nls -la /tmp/pr-context/" - name: Write review instructions to disk - run: "mkdir -p /tmp/pr-context\ncat > /tmp/pr-context/review-instructions.md << 'REVIEW_EOF'\n# Review Instructions for Sub-agents\n\nYou are a code review sub-agent. Read these instructions, then review the PR files in the order provided in your prompt.\n\n## Context\n\nBefore reviewing files, read these to understand the PR:\n\n1. `/tmp/pr-context/pr.json` — PR title, description, author, and branches. Understand what the PR is trying to accomplish.\n2. `/tmp/agents.md` — Repository coding conventions and guidelines (skip if missing).\n3. `/tmp/pr-context/review_comments.json` — Existing review threads. Note which files already have threads so you don't duplicate.\n4. `/tmp/pr-context/issue-*.json` — Linked issue details (if any). Understand the motivation and acceptance criteria.\n\n## Process\n\nReview the PR diff file by file in your assigned order. For each changed file:\n\n1. **Read the diff** for this file from `/tmp/pr-context/diffs/.diff` to understand what changed. If the diff is empty or truncated (e.g., binary files or very large changes), fall back to reading the full file from the workspace and comparing against context.\n2. **Read the full file from the workspace.** The PR branch is checked out locally — open the file directly to get complete contents with line numbers.\n3. **Check existing threads** for this file from `/tmp/pr-context/threads/.json` (if it exists). Skip issues that are already under discussion — each thread has `isResolved` and `isOutdated` fields.\n4. **Identify potential issues** matching the review criteria below.\n5. **Quick-check each issue** before including it:\n - What specific code pattern or change triggers this concern?\n - Is there an obvious guard, handler, or mitigation visible in the immediate context?\n - Can you describe a concrete failure scenario (the `evidence` field)? If you cannot articulate what specific input or state triggers the problem, drop the finding.\n - If the issue is clearly handled, skip it. If you're unsure, include it — the parent will verify.\n6. **Add to your findings list.** Do NOT leave inline comments — you don't have that tool. Return findings in this format:\n\n```\n- file: path/to/file\n line: 42\n severity: HIGH\n title: Brief title\n description: What the issue is and why it matters\n evidence: The specific code pattern and failure scenario\n suggestion: corrected code here (optional — only if you can provide a concrete fix)\n```\n\n**Review every file in your assigned order.** Files reviewed earlier get more attention, which is why different sub-agents use different orderings.\n\n**Check existing threads** — per-file threads are at `/tmp/pr-context/threads/.json` (step 3 above). The full list is at `/tmp/pr-context/review_comments.json`. Do not flag issues that are already under discussion (resolved or unresolved). For outdated threads, only re-flag if the issue still applies to the current diff.\n\n**Return your full findings list** when done, or an empty list if no issues were found.\n\n## Review Criteria\n\nFocus on these categories in priority order:\n\n1. Security vulnerabilities (injection, XSS, auth bypass, secrets exposure)\n2. Logic bugs that could cause runtime failures or incorrect behavior\n3. Data integrity issues (race conditions, missing transactions, corruption risk)\n4. Performance bottlenecks (N+1 queries, memory leaks, blocking operations)\n5. Error handling gaps (unhandled exceptions, missing validation)\n6. Breaking changes to public APIs without migration path\n7. Missing or incorrect test coverage for critical paths\n\n## What NOT to Flag\n\nOnly review the diff — do not flag issues in unchanged code, pre-existing problems not introduced by this PR, or style preferences handled by linters or formatters.\n\n**Common false positives** — these patterns look like issues but usually aren't. Before flagging anything in these categories, confirm the problem is real by reading the surrounding code:\n\n- **Security — input already sanitized:** Don't flag injection or XSS risks when inputs are sanitized upstream, parameterized queries are used, or the framework auto-escapes output.\n- **Null/undefined — guarded elsewhere:** Don't flag potential null dereferences if the value is guaranteed by a type guard, assertion, schema validation, or upstream null check.\n- **Error handling — handled at a different layer:** Don't flag missing try/catch if the caller, middleware, or framework catches and handles the error (e.g., Express error middleware, React error boundaries).\n- **Performance — theoretical, not practical:** Don't flag algorithmic complexity (e.g., O(n^2)) unless N is demonstrably large enough to matter in the actual usage context. \"This could be slow\" without evidence is not actionable.\n- **Validation — exists at another layer:** Don't flag missing input validation if it's handled by an API gateway, middleware, schema validator, or type system.\n- **Test coverage — trivial or generated code:** Don't flag missing tests for trivial getters/setters, auto-generated code, or simple delegation methods.\n- **Style or naming — not in coding guidelines:** Don't flag naming conventions or code style unless they violate the repository's documented coding guidelines (from `/tmp/agents.md` or CONTRIBUTING docs).\n\n**Existing review threads** — check BEFORE flagging any issue:\n\n- **Resolved with reviewer reply** (e.g. \"This is intentional\") — reviewer's decision is final. Do NOT re-flag.\n- **Resolved without reply** — author likely fixed it. Do NOT re-raise unless the fix introduced a new problem.\n- **Unresolved** — already flagged. Do NOT duplicate.\n- **Outdated** — only re-flag if the issue still applies to the current diff.\n\nWhen in doubt, do not duplicate. Redundant comments erode trust.\n\nFinding no issues is a valid and valuable outcome. An empty findings list is better than findings that waste the author's time or erode trust. Do not manufacture findings to justify your review — if the code is sound, return an empty list.\n\n## Severity Classification\n\nDetermine severity AFTER investigating the issue, not before. First identify the problem and trace through the code, then assign a severity based on the evidence you found.\n\n- 🔴 CRITICAL — Must fix before merge (security vulnerabilities, data corruption, production-breaking bugs)\n- 🟠 HIGH — Should fix before merge (logic errors, missing validation, significant performance issues)\n- 🟡 MEDIUM — Address soon, non-blocking (error handling gaps, suboptimal patterns, missing edge cases)\n- ⚪ LOW — Author discretion (minor improvements, documentation gaps)\n- 💬 NITPICK — Truly optional (stylistic preferences, alternative approaches)\n\n## Review Intensity\n\nThe review intensity is `${{ inputs.intensity || 'balanced' }}`.\n\n- **conservative**: High evidence bar. Only flag when you can demonstrate a concrete failure scenario. If you can construct a reasonable counterargument, do not flag. Approval with zero findings is the expected outcome for most PRs.\n- **balanced**: Standard evidence bar. Flag when you can point to specific code that would fail. If the issue is ambiguous, lean toward not flagging.\n- **aggressive**: Lower evidence bar. Flag when evidence exists even if the failure scenario is not fully confirmed. Improvement suggestions welcome but must cite specific code.\n\n## Calibration Examples\n\nUse these examples to calibrate your judgment. Each pair shows a real issue and a similar-looking pattern that is NOT an issue.\n\n### Example 1: Null/Undefined Access\n\n**True positive — flag this:**\n\n```js\n// PR adds this handler\napp.get('/user/:id', async (req, res) => {\n const user = await db.findUser(req.params.id);\n res.json({ name: user.name, email: user.email });\n});\n```\n\nWhy flag: `db.findUser()` can return `null` when no user matches the ID. Accessing `user.name` will throw a TypeError at runtime. No upstream guard exists — the route handler is the entry point.\n\n**False positive — do NOT flag this:**\n\n```ts\n// PR adds this line inside an existing function\nconst settings = user.getSettings();\n```\n\nWhy skip: Reading the full file reveals `user` is typed as `User` (not `User | null`), and the calling function only runs after `authenticateUser()` middleware which guarantees a valid user object. The null case is handled at a different layer.\n\n### Example 2: SQL Injection\n\n**True positive — flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE customer_id = '{customer_id}'\")\n```\n\nWhy flag: String interpolation in a SQL query with user-controlled input (`customer_id` comes from the request). No parameterization or sanitization anywhere in the call chain.\n\n**False positive — do NOT flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE status = '{OrderStatus.PENDING.value}'\")\n```\n\nWhy skip: The interpolated value is a hardcoded enum constant (`OrderStatus.PENDING`), not user input. There is no injection vector.\n\n### Example 3: Borderline — Do NOT Flag\n\n```go\n// PR adds this function\nfunc processItems(items []Item) []Result {\n results := make([]Result, 0)\n for _, item := range items {\n for _, tag := range item.Tags {\n results = append(results, process(item, tag))\n }\n }\n return results\n}\n```\n\nThis looks like an O(n*m) performance concern. But without evidence that `items` or `Tags` are large in practice, this is speculative. The function processes a bounded dataset (items from a single user request). Do not flag theoretical performance issues without evidence of real-world impact.\nREVIEW_EOF" + run: "mkdir -p /tmp/pr-context\ncat > /tmp/pr-context/review-instructions.md << 'REVIEW_EOF'\n# Review Instructions for Sub-agents\n\nYou are a code review sub-agent. Read these instructions, then review the PR files in the order provided in your prompt.\n\n## Context\n\nBefore reviewing files, read these to understand the PR:\n\n1. `/tmp/pr-context/pr.json` — PR title, description, author, and branches. Understand what the PR is trying to accomplish.\n2. `/tmp/agents.md` — Repository coding conventions and guidelines (skip if missing).\n3. `/tmp/pr-context/review_comments.json` — Existing review threads. Note which files already have threads so you don't duplicate.\n4. `/tmp/pr-context/issue-*.json` — Linked issue details (if any). Understand the motivation and acceptance criteria.\n\n## Process\n\nReview the PR diff file by file in your assigned order. For each changed file:\n\n1. **Read the diff** for this file from `/tmp/pr-context/diffs/.diff` to understand what changed. Each line is prefixed with its line number (e.g., `405\\t+ code`). Lines with numbers are commentable; lines without numbers are deleted lines (LEFT side only). If the diff is empty or truncated (e.g., binary files or very large changes), fall back to reading the full file from the workspace and comparing against context.\n2. **Read the full file from the workspace.** The PR branch is checked out locally — open the file directly to get complete contents with line numbers.\n3. **Check existing threads** for this file from `/tmp/pr-context/threads/.json` (if it exists). Skip issues that are already under discussion — each thread has `isResolved` and `isOutdated` fields.\n4. **Identify potential issues** matching the review criteria below.\n5. **Quick-check each issue** before including it:\n - What specific code pattern or change triggers this concern?\n - Is there an obvious guard, handler, or mitigation visible in the immediate context?\n - Can you describe a concrete failure scenario (the `evidence` field)? If you cannot articulate what specific input or state triggers the problem, drop the finding.\n - If the issue is clearly handled, skip it. If you're unsure, include it — the parent will verify.\n6. **Add to your findings list.** Do NOT leave inline comments — you don't have that tool. Return findings in this format:\n\n```\n- file: path/to/file\n line: 42\n severity: HIGH\n title: Brief title\n description: What the issue is and why it matters\n evidence: The specific code pattern and failure scenario\n suggestion: corrected code here (optional — only if you can provide a concrete fix)\n```\n\n**Review every file in your assigned order.** Files reviewed earlier get more attention, which is why different sub-agents use different orderings.\n\n**Check existing threads** — per-file threads are at `/tmp/pr-context/threads/.json` (step 3 above). The full list is at `/tmp/pr-context/review_comments.json`. Do not flag issues that are already under discussion (resolved or unresolved). For outdated threads, only re-flag if the issue still applies to the current diff.\n\n**Return your full findings list** when done, or an empty list if no issues were found.\n\n## Review Criteria\n\nFocus on these categories in priority order:\n\n1. Security vulnerabilities (injection, XSS, auth bypass, secrets exposure)\n2. Logic bugs that could cause runtime failures or incorrect behavior\n3. Data integrity issues (race conditions, missing transactions, corruption risk)\n4. Performance bottlenecks (N+1 queries, memory leaks, blocking operations)\n5. Error handling gaps (unhandled exceptions, missing validation)\n6. Breaking changes to public APIs without migration path\n7. Missing or incorrect test coverage for critical paths\n\n## What NOT to Flag\n\nOnly review the diff — do not flag issues in unchanged code, pre-existing problems not introduced by this PR, or style preferences handled by linters or formatters.\n\n**Common false positives** — these patterns look like issues but usually aren't. Before flagging anything in these categories, confirm the problem is real by reading the surrounding code:\n\n- **Security — input already sanitized:** Don't flag injection or XSS risks when inputs are sanitized upstream, parameterized queries are used, or the framework auto-escapes output.\n- **Null/undefined — guarded elsewhere:** Don't flag potential null dereferences if the value is guaranteed by a type guard, assertion, schema validation, or upstream null check.\n- **Error handling — handled at a different layer:** Don't flag missing try/catch if the caller, middleware, or framework catches and handles the error (e.g., Express error middleware, React error boundaries).\n- **Performance — theoretical, not practical:** Don't flag algorithmic complexity (e.g., O(n^2)) unless N is demonstrably large enough to matter in the actual usage context. \"This could be slow\" without evidence is not actionable.\n- **Validation — exists at another layer:** Don't flag missing input validation if it's handled by an API gateway, middleware, schema validator, or type system.\n- **Test coverage — trivial or generated code:** Don't flag missing tests for trivial getters/setters, auto-generated code, or simple delegation methods.\n- **Style or naming — not in coding guidelines:** Don't flag naming conventions or code style unless they violate the repository's documented coding guidelines (from `/tmp/agents.md` or CONTRIBUTING docs).\n\n**Existing review threads** — check BEFORE flagging any issue:\n\n- **Resolved with reviewer reply** (e.g. \"This is intentional\") — reviewer's decision is final. Do NOT re-flag.\n- **Resolved without reply** — author likely fixed it. Do NOT re-raise unless the fix introduced a new problem.\n- **Unresolved** — already flagged. Do NOT duplicate.\n- **Outdated** — only re-flag if the issue still applies to the current diff.\n\nWhen in doubt, do not duplicate. Redundant comments erode trust.\n\nFinding no issues is a valid and valuable outcome. An empty findings list is better than findings that waste the author's time or erode trust. Do not manufacture findings to justify your review — if the code is sound, return an empty list.\n\n## Severity Classification\n\nDetermine severity AFTER investigating the issue, not before. First identify the problem and trace through the code, then assign a severity based on the evidence you found.\n\n- 🔴 CRITICAL — Must fix before merge (security vulnerabilities, data corruption, production-breaking bugs)\n- 🟠 HIGH — Should fix before merge (logic errors, missing validation, significant performance issues)\n- 🟡 MEDIUM — Address soon, non-blocking (error handling gaps, suboptimal patterns, missing edge cases)\n- ⚪ LOW — Author discretion (minor improvements, documentation gaps)\n- 💬 NITPICK — Truly optional (stylistic preferences, alternative approaches)\n\n## Review Intensity\n\nThe review intensity is `${{ inputs.intensity || 'balanced' }}`.\n\n- **conservative**: High evidence bar. Only flag when you can demonstrate a concrete failure scenario. If you can construct a reasonable counterargument, do not flag. Approval with zero findings is the expected outcome for most PRs.\n- **balanced**: Standard evidence bar. Flag when you can point to specific code that would fail. If the issue is ambiguous, lean toward not flagging.\n- **aggressive**: Lower evidence bar. Flag when evidence exists even if the failure scenario is not fully confirmed. Improvement suggestions welcome but must cite specific code.\n\n## Calibration Examples\n\nUse these examples to calibrate your judgment. Each pair shows a real issue and a similar-looking pattern that is NOT an issue.\n\n### Example 1: Null/Undefined Access\n\n**True positive — flag this:**\n\n```js\n// PR adds this handler\napp.get('/user/:id', async (req, res) => {\n const user = await db.findUser(req.params.id);\n res.json({ name: user.name, email: user.email });\n});\n```\n\nWhy flag: `db.findUser()` can return `null` when no user matches the ID. Accessing `user.name` will throw a TypeError at runtime. No upstream guard exists — the route handler is the entry point.\n\n**False positive — do NOT flag this:**\n\n```ts\n// PR adds this line inside an existing function\nconst settings = user.getSettings();\n```\n\nWhy skip: Reading the full file reveals `user` is typed as `User` (not `User | null`), and the calling function only runs after `authenticateUser()` middleware which guarantees a valid user object. The null case is handled at a different layer.\n\n### Example 2: SQL Injection\n\n**True positive — flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE customer_id = '{customer_id}'\")\n```\n\nWhy flag: String interpolation in a SQL query with user-controlled input (`customer_id` comes from the request). No parameterization or sanitization anywhere in the call chain.\n\n**False positive — do NOT flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE status = '{OrderStatus.PENDING.value}'\")\n```\n\nWhy skip: The interpolated value is a hardcoded enum constant (`OrderStatus.PENDING`), not user input. There is no injection vector.\n\n### Example 3: Borderline — Do NOT Flag\n\n```go\n// PR adds this function\nfunc processItems(items []Item) []Result {\n results := make([]Result, 0)\n for _, item := range items {\n for _, tag := range item.Tags {\n results = append(results, process(item, tag))\n }\n }\n return results\n}\n```\n\nThis looks like an O(n*m) performance concern. But without evidence that `items` or `Tags` are large in practice, this is speculative. The function processes a bounded dataset (items from a single user request). Do not flag theoretical performance issues without evidence of real-world impact.\nREVIEW_EOF" - name: Write Playwright instructions to disk run: "cat > /tmp/playwright-instructions.md << 'EOF'\n# Playwright MCP Tools\n\nUse Playwright MCP tools for interactive browser automation.\nUse these tools to explore the app step by step — do NOT write Node.js scripts.\n\n## Available tools\n\n- `browser_navigate` — go to a URL\n- `browser_click` — click an element\n- `browser_type` — type text into an input\n- `browser_snapshot` — get an accessibility tree (YAML) of the current page\n- `browser_take_screenshot` — capture a screenshot\n- `browser_console_execute` — run JavaScript in the browser console\n\n## Why MCP tools instead of scripts\n\nMCP tools are interactive: you see the page state after each action and\ndecide what to do next. This is ideal for exploratory testing where you\nneed to adapt based on what you find. Scripts are fire-and-forget — if\na selector is wrong, you don't find out until the script fails.\n\n## Measuring DOM properties\n\nFor programmatic checks (e.g. element heights, contrast), use\n`browser_console_execute`:\n\n```javascript\n(() => {\n const els = document.querySelectorAll('input, button, [role=\"combobox\"], [role=\"button\"]');\n return JSON.stringify(Array.from(els)\n .map(el => {\n const r = el.getBoundingClientRect();\n return { tag: el.tagName, h: Math.round(r.height), top: Math.round(r.top), text: el.textContent?.trim().slice(0, 20) };\n })\n .filter(el => el.top > 50 && el.top < 250));\n})()\n```\n\n## Handling failures\n\n- Do not retry the same action more than twice — the page is in a different state than expected.\n- Diagnose before moving on: use `browser_take_screenshot` and `browser_snapshot` to see what's on the page.\n- Adapt (different selector, different path) or report the failure as a finding.\n- Never claim you verified something you didn't — if it failed and you skipped it, say so.\nEOF" - env: @@ -1201,9 +1201,9 @@ jobs: pr_size = 'unknown size' diff_lines = 0 - # Write one instruction file per sub-agent file ordering strategy - for key, desc in [('az', 'A \u2192 Z (alphabetical)'), ('za', 'Z \u2192 A (reverse alphabetical)'), ('largest', 'largest diff first')]: - lines = [ + # Determine review approach based on PR size + def write_subagent(key, desc): + sa_lines = [ '# PR Review Sub-Agent', '', 'Review the PR as a code review sub-agent. Return findings only \u2014 do NOT leave inline comments.', @@ -1224,16 +1224,17 @@ jobs: '', f'Review files in this order: `/tmp/pr-context/file_order_{key}.txt` ({desc})', ] - with open(f'/tmp/pr-context/subagent-{key}.md', 'w') as f: - f.write('\n'.join(lines) + '\n') + with open(f'/tmp/pr-context/subagent-{key}.md', 'w') as sa_f: + sa_f.write('\n'.join(sa_lines) + '\n') - # Determine review approach based on PR size if diff_lines < 200: approach_lines = [ f'**Small PR ({pr_size}):** Review directly \u2014 no sub-agents. Review files in order from `/tmp/pr-context/file_order_az.txt`, reading each diff from `/tmp/pr-context/diffs/.diff` and the full file from the workspace.', ] size_key = 'small' elif diff_lines < 800: + write_subagent('az', 'A \u2192 Z (alphabetical)') + write_subagent('za', 'Z \u2192 A (reverse alphabetical)') approach_lines = [ f'**Medium PR ({pr_size}):** Use the **Pick Three, Keep Many** process \u2014 spawn 2 `code-review` sub-agents in parallel:', '', @@ -1244,6 +1245,9 @@ jobs: ] size_key = 'medium' else: + write_subagent('az', 'A \u2192 Z (alphabetical)') + write_subagent('za', 'Z \u2192 A (reverse alphabetical)') + write_subagent('largest', 'largest diff first') approach_lines = [ f'**Large PR ({pr_size}):** Use the **Pick Three, Keep Many** process \u2014 spawn 3 `code-review` sub-agents in parallel:', '', @@ -1268,8 +1272,7 @@ jobs: '## Comment Format', '', 'Call **`create_pull_request_review_comment`** with:', - '- The file path and the **exact line number from reading the file** (not estimated from the diff)', - '- The line must be within the diff (an added or context line in the patch)', + '- The file path and a **line number that appears in the numbered diff** (`/tmp/pr-context/diffs/.diff`). Only lines with a number prefix are commentable. If your target line has no number in the diff, include the finding in the review body instead.', '', t5, '**[SEVERITY] Brief title**', diff --git a/.github/workflows/gh-aw-mention-in-pr-no-sandbox.lock.yml b/.github/workflows/gh-aw-mention-in-pr-no-sandbox.lock.yml index 06cd711d..2d1a7b3d 100644 --- a/.github/workflows/gh-aw-mention-in-pr-no-sandbox.lock.yml +++ b/.github/workflows/gh-aw-mention-in-pr-no-sandbox.lock.yml @@ -47,7 +47,7 @@ # # inlined-imports: true # -# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"6039e02b70c9c36fa05ad607482e6a9f0a7bf7b144d4985deeaf0f93b15b826e"} +# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"191daaa7ef21c5c5e6e00f87c6b6c26122d660e5dd8da2e756960d83b698c1dd"} name: "Mention in PR (no sandbox)" "on": @@ -399,9 +399,9 @@ jobs: ## create-pull-request-review-comment - **Required fields**: `path` (file path), `line` (line number), and `body` (comment text). - - **Line**: Must be within the diff — an added or context line in the patch. Must be the **exact line number from reading the file** (not estimated from the patch). Lines outside the diff will fail. + - **Line**: Must be a numbered line in the per-file diff (`/tmp/pr-context/diffs/.diff`). Only lines with a number prefix are commentable. Lines outside the diff will fail with a GitHub API error. - **Body**: Sanitized with the standard pipeline (mentions neutralized, HTML filtered, URLs restricted). GitHub API limit is ~65,536 characters. - - **Side**: Defaults to `RIGHT` (the new code). Use `LEFT` only when commenting on deleted lines. + - **Side**: Always `RIGHT` (the new code). Deleted lines have no number prefix in the diff and cannot be targeted for inline comments — include findings about deleted code in the review body instead. - **Suggestion blocks**: Use ` ```suggestion ` fences for concrete code fixes. The suggestion must actually change the code — don't suggest identical code. Only include a `suggestion` block when you can provide a concrete code fix that **actually changes** the code. Only flag issues you are confident are real problems — false positives erode trust. Once you have flagged an issue, you cannot unflag it. @@ -703,9 +703,9 @@ jobs: GH_TOKEN: ${{ github.token }} PR_NUMBER: ${{ github.event.pull_request.number || inputs.target-pr-number || github.event.issue.number }} name: Fetch PR context to disk - run: "set -euo pipefail\nmkdir -p /tmp/pr-context\n\n# PR metadata\ngh pr view \"$PR_NUMBER\" --json title,body,author,baseRefName,headRefName,headRefOid,url \\\n > /tmp/pr-context/pr.json\n\n# Full diff\nif ! gh pr diff \"$PR_NUMBER\" > /tmp/pr-context/pr.diff; then\n echo \"::warning::Failed to fetch full PR diff; per-file diffs from files.json are still available.\"\n : > /tmp/pr-context/pr.diff\nfi\n\n# Changed files list (--paginate may output concatenated arrays; jq -s 'add' merges them)\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/files.json\n\n# Per-file diffs\njq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do\n filename=$(echo \"$entry\" | jq -r '.filename')\n mkdir -p \"/tmp/pr-context/diffs/$(dirname \"$filename\")\"\n echo \"$entry\" | jq -r '.patch // empty' > \"/tmp/pr-context/diffs/${filename}.diff\"\ndone\n\n# File orderings for sub-agent review (3 strategies)\njq -r '[.[] | .filename] | sort | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_az.txt\njq -r '[.[] | .filename] | sort | reverse | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_za.txt\njq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_largest.txt\n\n# Compute PR size metrics for review fan-out decisions\nFILE_COUNT=$(jq 'length' /tmp/pr-context/files.json)\nDIFF_LINES=$(wc -l < /tmp/pr-context/pr.diff | tr -d ' ')\necho \"${FILE_COUNT} files, ${DIFF_LINES} diff lines\" > /tmp/pr-context/pr-size.txt\necho \"PR size: ${FILE_COUNT} files, ${DIFF_LINES} diff lines\"\n\n# Existing reviews\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/reviews.json\n\n# Review threads with resolution status (GraphQL — REST lacks isResolved/isOutdated)\ngh api graphql --paginate -f query='\n query($owner: String!, $repo: String!, $number: Int!, $endCursor: String) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $number) {\n reviewThreads(first: 100, after: $endCursor) {\n pageInfo { hasNextPage endCursor }\n nodes {\n id\n isResolved\n isOutdated\n isCollapsed\n path\n line\n startLine\n comments(first: 100) {\n nodes {\n id\n databaseId\n body\n author { login }\n createdAt\n }\n }\n }\n }\n }\n }\n }\n' -F owner=\"${GITHUB_REPOSITORY%/*}\" -F repo=\"${GITHUB_REPOSITORY#*/}\" -F \"number=$PR_NUMBER\" \\\n --jq '.data.repository.pullRequest.reviewThreads.nodes' \\\n | jq -s 'add // []' > /tmp/pr-context/review_comments.json\n\n# Filtered review thread views (pre-computed so agents don't need to parse review_comments.json)\njq '[.[] | select(.isResolved == false)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/unresolved_threads.json\njq '[.[] | select(.isResolved == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/resolved_threads.json\njq '[.[] | select(.isOutdated == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/outdated_threads.json\n\n# Per-file review threads (mirrors diffs/ structure)\njq -c '.[]' /tmp/pr-context/review_comments.json | while IFS= read -r thread; do\n filepath=$(echo \"$thread\" | jq -r '.path // empty')\n [ -z \"$filepath\" ] && continue\n mkdir -p \"/tmp/pr-context/threads/$(dirname \"$filepath\")\"\n echo \"$thread\" >> \"/tmp/pr-context/threads/${filepath}.jsonl\"\ndone\n# Convert per-file JSONL to proper JSON arrays\nmkdir -p /tmp/pr-context/threads\nfind /tmp/pr-context/threads -name '*.jsonl' 2>/dev/null | while IFS= read -r jsonl; do\n jq -s '.' \"$jsonl\" > \"${jsonl%.jsonl}.json\"\n rm \"$jsonl\"\ndone\n\n# PR discussion comments\ngh api \"repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/comments.json\n\n# Linked issues\njq -r '.body // \"\"' /tmp/pr-context/pr.json 2>/dev/null \\\n | grep -oiE '(fixes|closes|resolves)\\s+#[0-9]+' \\\n | grep -oE '[0-9]+$' \\\n | sort -u \\\n | while read -r issue; do\n gh api \"repos/$GITHUB_REPOSITORY/issues/$issue\" > \"/tmp/pr-context/issue-${issue}.json\" || true\n done || true\n\n# Write manifest\ncat > /tmp/pr-context/README.md << 'MANIFEST'\n# PR Context\n\nPre-fetched PR data. All files are in `/tmp/pr-context/`.\n\n| File | Description |\n| --- | --- |\n| `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL |\n| `pr.diff` | Full unified diff of all changes |\n| `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` |\n| `diffs/.diff` | Per-file diffs — one file per changed file, mirroring the repo path under `diffs/` |\n| `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line |\n| `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line |\n| `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line |\n| `reviews.json` | Prior review submissions — author, state (APPROVED/CHANGES_REQUESTED/COMMENTED), body |\n| `review_comments.json` | All review threads (GraphQL) — each thread has `id` (node ID for resolving), `isResolved`, `isOutdated`, `path`, `line`, and nested `comments` with `id`, `databaseId` (numeric REST ID for replies), body/author |\n| `unresolved_threads.json` | Unresolved review threads — subset of `review_comments.json` where `isResolved` is false |\n| `resolved_threads.json` | Resolved review threads — subset of `review_comments.json` where `isResolved` is true |\n| `outdated_threads.json` | Outdated review threads — subset of `review_comments.json` where `isOutdated` is true (code changed since comment) |\n| `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` |\n| `comments.json` | PR discussion comments (not inline) |\n| `issue-{N}.json` | Linked issue details (one file per linked issue, if any) |\n| `pr-size.txt` | PR size metrics: `{N} files, {M} diff lines` — used by the agent to decide review fan-out |\n| `agents.md` | Not used in PR context — repository conventions are at `/tmp/agents.md` (pre-fetched) |\n| `review-instructions.md` | Review instructions, criteria, and calibration examples for sub-agents |\n| `agent-review.md` | Main agent instructions — review approach resolved from PR size (written when `ready_to_code_review` is called) |\n| `parent-review.md` | Comment format and inline severity threshold for the parent agent (written when `ready_to_code_review` is called) |\n| `subagent-az.md` | Sub-agent instructions: review files A → Z (written when `ready_to_code_review` is called) |\n| `subagent-za.md` | Sub-agent instructions: review files Z → A (written when `ready_to_code_review` is called) |\n| `subagent-largest.md` | Sub-agent instructions: review files largest diff first (written when `ready_to_code_review` is called) |\nMANIFEST\n\necho \"PR context written to /tmp/pr-context/\"\nls -la /tmp/pr-context/" + run: "set -euo pipefail\nmkdir -p /tmp/pr-context\n\n# PR metadata\ngh pr view \"$PR_NUMBER\" --json title,body,author,baseRefName,headRefName,headRefOid,url \\\n > /tmp/pr-context/pr.json\n\n# Full diff\nif ! gh pr diff \"$PR_NUMBER\" > /tmp/pr-context/pr.diff; then\n echo \"::warning::Failed to fetch full PR diff; per-file diffs from files.json are still available.\"\n : > /tmp/pr-context/pr.diff\nfi\n\n# Changed files list (--paginate may output concatenated arrays; jq -s 'add' merges them)\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/files.json\n\n# Per-file diffs with line numbers\n# Lines with a number prefix are commentable (RIGHT side: added or context).\n# Deleted lines get no number (LEFT side only).\njq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do\n filename=$(echo \"$entry\" | jq -r '.filename')\n mkdir -p \"/tmp/pr-context/diffs/$(dirname \"$filename\")\"\n echo \"$entry\" | jq -r '.patch // empty' | awk '\n /^@@/ { s=$3; gsub(/\\+/,\"\",s); split(s,a,\",\"); line=a[1]+0; print; next }\n /^\\+/ { printf \"%d\\t%s\\n\", line++, $0; next }\n /^-/ { printf \" \\t%s\\n\", $0; next }\n /^ / { printf \"%d\\t%s\\n\", line++, $0; next }\n { print }\n ' > \"/tmp/pr-context/diffs/${filename}.diff\"\ndone\n\n# File orderings for sub-agent review (3 strategies)\njq -r '[.[] | .filename] | sort | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_az.txt\njq -r '[.[] | .filename] | sort | reverse | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_za.txt\njq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_largest.txt\n\n# Compute PR size metrics for review fan-out decisions\nFILE_COUNT=$(jq 'length' /tmp/pr-context/files.json)\nDIFF_LINES=$(wc -l < /tmp/pr-context/pr.diff | tr -d ' ')\necho \"${FILE_COUNT} files, ${DIFF_LINES} diff lines\" > /tmp/pr-context/pr-size.txt\necho \"PR size: ${FILE_COUNT} files, ${DIFF_LINES} diff lines\"\n\n# Existing reviews\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/reviews.json\n\n# Review threads with resolution status (GraphQL — REST lacks isResolved/isOutdated)\ngh api graphql --paginate -f query='\n query($owner: String!, $repo: String!, $number: Int!, $endCursor: String) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $number) {\n reviewThreads(first: 100, after: $endCursor) {\n pageInfo { hasNextPage endCursor }\n nodes {\n id\n isResolved\n isOutdated\n isCollapsed\n path\n line\n startLine\n comments(first: 100) {\n nodes {\n id\n databaseId\n body\n author { login }\n createdAt\n }\n }\n }\n }\n }\n }\n }\n' -F owner=\"${GITHUB_REPOSITORY%/*}\" -F repo=\"${GITHUB_REPOSITORY#*/}\" -F \"number=$PR_NUMBER\" \\\n --jq '.data.repository.pullRequest.reviewThreads.nodes' \\\n | jq -s 'add // []' > /tmp/pr-context/review_comments.json\n\n# Filtered review thread views (pre-computed so agents don't need to parse review_comments.json)\njq '[.[] | select(.isResolved == false)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/unresolved_threads.json\njq '[.[] | select(.isResolved == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/resolved_threads.json\njq '[.[] | select(.isOutdated == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/outdated_threads.json\n\n# Per-file review threads (mirrors diffs/ structure)\njq -c '.[]' /tmp/pr-context/review_comments.json | while IFS= read -r thread; do\n filepath=$(echo \"$thread\" | jq -r '.path // empty')\n [ -z \"$filepath\" ] && continue\n mkdir -p \"/tmp/pr-context/threads/$(dirname \"$filepath\")\"\n echo \"$thread\" >> \"/tmp/pr-context/threads/${filepath}.jsonl\"\ndone\n# Convert per-file JSONL to proper JSON arrays\nmkdir -p /tmp/pr-context/threads\nfind /tmp/pr-context/threads -name '*.jsonl' 2>/dev/null | while IFS= read -r jsonl; do\n jq -s '.' \"$jsonl\" > \"${jsonl%.jsonl}.json\"\n rm \"$jsonl\"\ndone\n\n# PR discussion comments\ngh api \"repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/comments.json\n\n# Linked issues\njq -r '.body // \"\"' /tmp/pr-context/pr.json 2>/dev/null \\\n | grep -oiE '(fixes|closes|resolves)\\s+#[0-9]+' \\\n | grep -oE '[0-9]+$' \\\n | sort -u \\\n | while read -r issue; do\n gh api \"repos/$GITHUB_REPOSITORY/issues/$issue\" > \"/tmp/pr-context/issue-${issue}.json\" || true\n done || true\n\n# Write manifest\ncat > /tmp/pr-context/README.md << 'MANIFEST'\n# PR Context\n\nPre-fetched PR data. All files are in `/tmp/pr-context/`.\n\n| File | Description |\n| --- | --- |\n| `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL |\n| `pr.diff` | Full unified diff of all changes |\n| `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` |\n| `diffs/.diff` | Per-file diffs with line numbers — numbered lines (e.g. `405\\t+ code`) are commentable; lines without numbers are deleted (LEFT side only) |\n| `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line |\n| `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line |\n| `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line |\n| `reviews.json` | Prior review submissions — author, state (APPROVED/CHANGES_REQUESTED/COMMENTED), body |\n| `review_comments.json` | All review threads (GraphQL) — each thread has `id` (node ID for resolving), `isResolved`, `isOutdated`, `path`, `line`, and nested `comments` with `id`, `databaseId` (numeric REST ID for replies), body/author |\n| `unresolved_threads.json` | Unresolved review threads — subset of `review_comments.json` where `isResolved` is false |\n| `resolved_threads.json` | Resolved review threads — subset of `review_comments.json` where `isResolved` is true |\n| `outdated_threads.json` | Outdated review threads — subset of `review_comments.json` where `isOutdated` is true (code changed since comment) |\n| `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` |\n| `comments.json` | PR discussion comments (not inline) |\n| `issue-{N}.json` | Linked issue details (one file per linked issue, if any) |\n| `pr-size.txt` | PR size metrics: `{N} files, {M} diff lines` — used by the agent to decide review fan-out |\n| `agents.md` | Not used in PR context — repository conventions are at `/tmp/agents.md` (pre-fetched) |\n| `review-instructions.md` | Review instructions, criteria, and calibration examples for sub-agents |\n| `agent-review.md` | Main agent instructions — review approach resolved from PR size (written when `ready_to_code_review` is called) |\n| `parent-review.md` | Comment format and inline severity threshold for the parent agent (written when `ready_to_code_review` is called) |\n| `subagent-*.md` | Sub-agent instructions (only written for medium/large PRs when `ready_to_code_review` is called — check `agent-review.md` for which exist) |\nMANIFEST\n\necho \"PR context written to /tmp/pr-context/\"\nls -la /tmp/pr-context/" - name: Write review instructions to disk - run: "mkdir -p /tmp/pr-context\ncat > /tmp/pr-context/review-instructions.md << 'REVIEW_EOF'\n# Review Instructions for Sub-agents\n\nYou are a code review sub-agent. Read these instructions, then review the PR files in the order provided in your prompt.\n\n## Context\n\nBefore reviewing files, read these to understand the PR:\n\n1. `/tmp/pr-context/pr.json` — PR title, description, author, and branches. Understand what the PR is trying to accomplish.\n2. `/tmp/agents.md` — Repository coding conventions and guidelines (skip if missing).\n3. `/tmp/pr-context/review_comments.json` — Existing review threads. Note which files already have threads so you don't duplicate.\n4. `/tmp/pr-context/issue-*.json` — Linked issue details (if any). Understand the motivation and acceptance criteria.\n\n## Process\n\nReview the PR diff file by file in your assigned order. For each changed file:\n\n1. **Read the diff** for this file from `/tmp/pr-context/diffs/.diff` to understand what changed. If the diff is empty or truncated (e.g., binary files or very large changes), fall back to reading the full file from the workspace and comparing against context.\n2. **Read the full file from the workspace.** The PR branch is checked out locally — open the file directly to get complete contents with line numbers.\n3. **Check existing threads** for this file from `/tmp/pr-context/threads/.json` (if it exists). Skip issues that are already under discussion — each thread has `isResolved` and `isOutdated` fields.\n4. **Identify potential issues** matching the review criteria below.\n5. **Quick-check each issue** before including it:\n - What specific code pattern or change triggers this concern?\n - Is there an obvious guard, handler, or mitigation visible in the immediate context?\n - Can you describe a concrete failure scenario (the `evidence` field)? If you cannot articulate what specific input or state triggers the problem, drop the finding.\n - If the issue is clearly handled, skip it. If you're unsure, include it — the parent will verify.\n6. **Add to your findings list.** Do NOT leave inline comments — you don't have that tool. Return findings in this format:\n\n```\n- file: path/to/file\n line: 42\n severity: HIGH\n title: Brief title\n description: What the issue is and why it matters\n evidence: The specific code pattern and failure scenario\n suggestion: corrected code here (optional — only if you can provide a concrete fix)\n```\n\n**Review every file in your assigned order.** Files reviewed earlier get more attention, which is why different sub-agents use different orderings.\n\n**Check existing threads** — per-file threads are at `/tmp/pr-context/threads/.json` (step 3 above). The full list is at `/tmp/pr-context/review_comments.json`. Do not flag issues that are already under discussion (resolved or unresolved). For outdated threads, only re-flag if the issue still applies to the current diff.\n\n**Return your full findings list** when done, or an empty list if no issues were found.\n\n## Review Criteria\n\nFocus on these categories in priority order:\n\n1. Security vulnerabilities (injection, XSS, auth bypass, secrets exposure)\n2. Logic bugs that could cause runtime failures or incorrect behavior\n3. Data integrity issues (race conditions, missing transactions, corruption risk)\n4. Performance bottlenecks (N+1 queries, memory leaks, blocking operations)\n5. Error handling gaps (unhandled exceptions, missing validation)\n6. Breaking changes to public APIs without migration path\n7. Missing or incorrect test coverage for critical paths\n\n## What NOT to Flag\n\nOnly review the diff — do not flag issues in unchanged code, pre-existing problems not introduced by this PR, or style preferences handled by linters or formatters.\n\n**Common false positives** — these patterns look like issues but usually aren't. Before flagging anything in these categories, confirm the problem is real by reading the surrounding code:\n\n- **Security — input already sanitized:** Don't flag injection or XSS risks when inputs are sanitized upstream, parameterized queries are used, or the framework auto-escapes output.\n- **Null/undefined — guarded elsewhere:** Don't flag potential null dereferences if the value is guaranteed by a type guard, assertion, schema validation, or upstream null check.\n- **Error handling — handled at a different layer:** Don't flag missing try/catch if the caller, middleware, or framework catches and handles the error (e.g., Express error middleware, React error boundaries).\n- **Performance — theoretical, not practical:** Don't flag algorithmic complexity (e.g., O(n^2)) unless N is demonstrably large enough to matter in the actual usage context. \"This could be slow\" without evidence is not actionable.\n- **Validation — exists at another layer:** Don't flag missing input validation if it's handled by an API gateway, middleware, schema validator, or type system.\n- **Test coverage — trivial or generated code:** Don't flag missing tests for trivial getters/setters, auto-generated code, or simple delegation methods.\n- **Style or naming — not in coding guidelines:** Don't flag naming conventions or code style unless they violate the repository's documented coding guidelines (from `/tmp/agents.md` or CONTRIBUTING docs).\n\n**Existing review threads** — check BEFORE flagging any issue:\n\n- **Resolved with reviewer reply** (e.g. \"This is intentional\") — reviewer's decision is final. Do NOT re-flag.\n- **Resolved without reply** — author likely fixed it. Do NOT re-raise unless the fix introduced a new problem.\n- **Unresolved** — already flagged. Do NOT duplicate.\n- **Outdated** — only re-flag if the issue still applies to the current diff.\n\nWhen in doubt, do not duplicate. Redundant comments erode trust.\n\nFinding no issues is a valid and valuable outcome. An empty findings list is better than findings that waste the author's time or erode trust. Do not manufacture findings to justify your review — if the code is sound, return an empty list.\n\n## Severity Classification\n\nDetermine severity AFTER investigating the issue, not before. First identify the problem and trace through the code, then assign a severity based on the evidence you found.\n\n- 🔴 CRITICAL — Must fix before merge (security vulnerabilities, data corruption, production-breaking bugs)\n- 🟠 HIGH — Should fix before merge (logic errors, missing validation, significant performance issues)\n- 🟡 MEDIUM — Address soon, non-blocking (error handling gaps, suboptimal patterns, missing edge cases)\n- ⚪ LOW — Author discretion (minor improvements, documentation gaps)\n- 💬 NITPICK — Truly optional (stylistic preferences, alternative approaches)\n\n## Review Intensity\n\nThe review intensity is `${{ inputs.intensity || 'balanced' }}`.\n\n- **conservative**: High evidence bar. Only flag when you can demonstrate a concrete failure scenario. If you can construct a reasonable counterargument, do not flag. Approval with zero findings is the expected outcome for most PRs.\n- **balanced**: Standard evidence bar. Flag when you can point to specific code that would fail. If the issue is ambiguous, lean toward not flagging.\n- **aggressive**: Lower evidence bar. Flag when evidence exists even if the failure scenario is not fully confirmed. Improvement suggestions welcome but must cite specific code.\n\n## Calibration Examples\n\nUse these examples to calibrate your judgment. Each pair shows a real issue and a similar-looking pattern that is NOT an issue.\n\n### Example 1: Null/Undefined Access\n\n**True positive — flag this:**\n\n```js\n// PR adds this handler\napp.get('/user/:id', async (req, res) => {\n const user = await db.findUser(req.params.id);\n res.json({ name: user.name, email: user.email });\n});\n```\n\nWhy flag: `db.findUser()` can return `null` when no user matches the ID. Accessing `user.name` will throw a TypeError at runtime. No upstream guard exists — the route handler is the entry point.\n\n**False positive — do NOT flag this:**\n\n```ts\n// PR adds this line inside an existing function\nconst settings = user.getSettings();\n```\n\nWhy skip: Reading the full file reveals `user` is typed as `User` (not `User | null`), and the calling function only runs after `authenticateUser()` middleware which guarantees a valid user object. The null case is handled at a different layer.\n\n### Example 2: SQL Injection\n\n**True positive — flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE customer_id = '{customer_id}'\")\n```\n\nWhy flag: String interpolation in a SQL query with user-controlled input (`customer_id` comes from the request). No parameterization or sanitization anywhere in the call chain.\n\n**False positive — do NOT flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE status = '{OrderStatus.PENDING.value}'\")\n```\n\nWhy skip: The interpolated value is a hardcoded enum constant (`OrderStatus.PENDING`), not user input. There is no injection vector.\n\n### Example 3: Borderline — Do NOT Flag\n\n```go\n// PR adds this function\nfunc processItems(items []Item) []Result {\n results := make([]Result, 0)\n for _, item := range items {\n for _, tag := range item.Tags {\n results = append(results, process(item, tag))\n }\n }\n return results\n}\n```\n\nThis looks like an O(n*m) performance concern. But without evidence that `items` or `Tags` are large in practice, this is speculative. The function processes a bounded dataset (items from a single user request). Do not flag theoretical performance issues without evidence of real-world impact.\nREVIEW_EOF" + run: "mkdir -p /tmp/pr-context\ncat > /tmp/pr-context/review-instructions.md << 'REVIEW_EOF'\n# Review Instructions for Sub-agents\n\nYou are a code review sub-agent. Read these instructions, then review the PR files in the order provided in your prompt.\n\n## Context\n\nBefore reviewing files, read these to understand the PR:\n\n1. `/tmp/pr-context/pr.json` — PR title, description, author, and branches. Understand what the PR is trying to accomplish.\n2. `/tmp/agents.md` — Repository coding conventions and guidelines (skip if missing).\n3. `/tmp/pr-context/review_comments.json` — Existing review threads. Note which files already have threads so you don't duplicate.\n4. `/tmp/pr-context/issue-*.json` — Linked issue details (if any). Understand the motivation and acceptance criteria.\n\n## Process\n\nReview the PR diff file by file in your assigned order. For each changed file:\n\n1. **Read the diff** for this file from `/tmp/pr-context/diffs/.diff` to understand what changed. Each line is prefixed with its line number (e.g., `405\\t+ code`). Lines with numbers are commentable; lines without numbers are deleted lines (LEFT side only). If the diff is empty or truncated (e.g., binary files or very large changes), fall back to reading the full file from the workspace and comparing against context.\n2. **Read the full file from the workspace.** The PR branch is checked out locally — open the file directly to get complete contents with line numbers.\n3. **Check existing threads** for this file from `/tmp/pr-context/threads/.json` (if it exists). Skip issues that are already under discussion — each thread has `isResolved` and `isOutdated` fields.\n4. **Identify potential issues** matching the review criteria below.\n5. **Quick-check each issue** before including it:\n - What specific code pattern or change triggers this concern?\n - Is there an obvious guard, handler, or mitigation visible in the immediate context?\n - Can you describe a concrete failure scenario (the `evidence` field)? If you cannot articulate what specific input or state triggers the problem, drop the finding.\n - If the issue is clearly handled, skip it. If you're unsure, include it — the parent will verify.\n6. **Add to your findings list.** Do NOT leave inline comments — you don't have that tool. Return findings in this format:\n\n```\n- file: path/to/file\n line: 42\n severity: HIGH\n title: Brief title\n description: What the issue is and why it matters\n evidence: The specific code pattern and failure scenario\n suggestion: corrected code here (optional — only if you can provide a concrete fix)\n```\n\n**Review every file in your assigned order.** Files reviewed earlier get more attention, which is why different sub-agents use different orderings.\n\n**Check existing threads** — per-file threads are at `/tmp/pr-context/threads/.json` (step 3 above). The full list is at `/tmp/pr-context/review_comments.json`. Do not flag issues that are already under discussion (resolved or unresolved). For outdated threads, only re-flag if the issue still applies to the current diff.\n\n**Return your full findings list** when done, or an empty list if no issues were found.\n\n## Review Criteria\n\nFocus on these categories in priority order:\n\n1. Security vulnerabilities (injection, XSS, auth bypass, secrets exposure)\n2. Logic bugs that could cause runtime failures or incorrect behavior\n3. Data integrity issues (race conditions, missing transactions, corruption risk)\n4. Performance bottlenecks (N+1 queries, memory leaks, blocking operations)\n5. Error handling gaps (unhandled exceptions, missing validation)\n6. Breaking changes to public APIs without migration path\n7. Missing or incorrect test coverage for critical paths\n\n## What NOT to Flag\n\nOnly review the diff — do not flag issues in unchanged code, pre-existing problems not introduced by this PR, or style preferences handled by linters or formatters.\n\n**Common false positives** — these patterns look like issues but usually aren't. Before flagging anything in these categories, confirm the problem is real by reading the surrounding code:\n\n- **Security — input already sanitized:** Don't flag injection or XSS risks when inputs are sanitized upstream, parameterized queries are used, or the framework auto-escapes output.\n- **Null/undefined — guarded elsewhere:** Don't flag potential null dereferences if the value is guaranteed by a type guard, assertion, schema validation, or upstream null check.\n- **Error handling — handled at a different layer:** Don't flag missing try/catch if the caller, middleware, or framework catches and handles the error (e.g., Express error middleware, React error boundaries).\n- **Performance — theoretical, not practical:** Don't flag algorithmic complexity (e.g., O(n^2)) unless N is demonstrably large enough to matter in the actual usage context. \"This could be slow\" without evidence is not actionable.\n- **Validation — exists at another layer:** Don't flag missing input validation if it's handled by an API gateway, middleware, schema validator, or type system.\n- **Test coverage — trivial or generated code:** Don't flag missing tests for trivial getters/setters, auto-generated code, or simple delegation methods.\n- **Style or naming — not in coding guidelines:** Don't flag naming conventions or code style unless they violate the repository's documented coding guidelines (from `/tmp/agents.md` or CONTRIBUTING docs).\n\n**Existing review threads** — check BEFORE flagging any issue:\n\n- **Resolved with reviewer reply** (e.g. \"This is intentional\") — reviewer's decision is final. Do NOT re-flag.\n- **Resolved without reply** — author likely fixed it. Do NOT re-raise unless the fix introduced a new problem.\n- **Unresolved** — already flagged. Do NOT duplicate.\n- **Outdated** — only re-flag if the issue still applies to the current diff.\n\nWhen in doubt, do not duplicate. Redundant comments erode trust.\n\nFinding no issues is a valid and valuable outcome. An empty findings list is better than findings that waste the author's time or erode trust. Do not manufacture findings to justify your review — if the code is sound, return an empty list.\n\n## Severity Classification\n\nDetermine severity AFTER investigating the issue, not before. First identify the problem and trace through the code, then assign a severity based on the evidence you found.\n\n- 🔴 CRITICAL — Must fix before merge (security vulnerabilities, data corruption, production-breaking bugs)\n- 🟠 HIGH — Should fix before merge (logic errors, missing validation, significant performance issues)\n- 🟡 MEDIUM — Address soon, non-blocking (error handling gaps, suboptimal patterns, missing edge cases)\n- ⚪ LOW — Author discretion (minor improvements, documentation gaps)\n- 💬 NITPICK — Truly optional (stylistic preferences, alternative approaches)\n\n## Review Intensity\n\nThe review intensity is `${{ inputs.intensity || 'balanced' }}`.\n\n- **conservative**: High evidence bar. Only flag when you can demonstrate a concrete failure scenario. If you can construct a reasonable counterargument, do not flag. Approval with zero findings is the expected outcome for most PRs.\n- **balanced**: Standard evidence bar. Flag when you can point to specific code that would fail. If the issue is ambiguous, lean toward not flagging.\n- **aggressive**: Lower evidence bar. Flag when evidence exists even if the failure scenario is not fully confirmed. Improvement suggestions welcome but must cite specific code.\n\n## Calibration Examples\n\nUse these examples to calibrate your judgment. Each pair shows a real issue and a similar-looking pattern that is NOT an issue.\n\n### Example 1: Null/Undefined Access\n\n**True positive — flag this:**\n\n```js\n// PR adds this handler\napp.get('/user/:id', async (req, res) => {\n const user = await db.findUser(req.params.id);\n res.json({ name: user.name, email: user.email });\n});\n```\n\nWhy flag: `db.findUser()` can return `null` when no user matches the ID. Accessing `user.name` will throw a TypeError at runtime. No upstream guard exists — the route handler is the entry point.\n\n**False positive — do NOT flag this:**\n\n```ts\n// PR adds this line inside an existing function\nconst settings = user.getSettings();\n```\n\nWhy skip: Reading the full file reveals `user` is typed as `User` (not `User | null`), and the calling function only runs after `authenticateUser()` middleware which guarantees a valid user object. The null case is handled at a different layer.\n\n### Example 2: SQL Injection\n\n**True positive — flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE customer_id = '{customer_id}'\")\n```\n\nWhy flag: String interpolation in a SQL query with user-controlled input (`customer_id` comes from the request). No parameterization or sanitization anywhere in the call chain.\n\n**False positive — do NOT flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE status = '{OrderStatus.PENDING.value}'\")\n```\n\nWhy skip: The interpolated value is a hardcoded enum constant (`OrderStatus.PENDING`), not user input. There is no injection vector.\n\n### Example 3: Borderline — Do NOT Flag\n\n```go\n// PR adds this function\nfunc processItems(items []Item) []Result {\n results := make([]Result, 0)\n for _, item := range items {\n for _, tag := range item.Tags {\n results = append(results, process(item, tag))\n }\n }\n return results\n}\n```\n\nThis looks like an O(n*m) performance concern. But without evidence that `items` or `Tags` are large in practice, this is speculative. The function processes a bounded dataset (items from a single user request). Do not flag theoretical performance issues without evidence of real-world impact.\nREVIEW_EOF" - name: Write Playwright instructions to disk run: "cat > /tmp/playwright-instructions.md << 'EOF'\n# Playwright MCP Tools\n\nUse Playwright MCP tools for interactive browser automation.\nUse these tools to explore the app step by step — do NOT write Node.js scripts.\n\n## Available tools\n\n- `browser_navigate` — go to a URL\n- `browser_click` — click an element\n- `browser_type` — type text into an input\n- `browser_snapshot` — get an accessibility tree (YAML) of the current page\n- `browser_take_screenshot` — capture a screenshot\n- `browser_console_execute` — run JavaScript in the browser console\n\n## Why MCP tools instead of scripts\n\nMCP tools are interactive: you see the page state after each action and\ndecide what to do next. This is ideal for exploratory testing where you\nneed to adapt based on what you find. Scripts are fire-and-forget — if\na selector is wrong, you don't find out until the script fails.\n\n## Measuring DOM properties\n\nFor programmatic checks (e.g. element heights, contrast), use\n`browser_console_execute`:\n\n```javascript\n(() => {\n const els = document.querySelectorAll('input, button, [role=\"combobox\"], [role=\"button\"]');\n return JSON.stringify(Array.from(els)\n .map(el => {\n const r = el.getBoundingClientRect();\n return { tag: el.tagName, h: Math.round(r.height), top: Math.round(r.top), text: el.textContent?.trim().slice(0, 20) };\n })\n .filter(el => el.top > 50 && el.top < 250));\n})()\n```\n\n## Handling failures\n\n- Do not retry the same action more than twice — the page is in a different state than expected.\n- Diagnose before moving on: use `browser_take_screenshot` and `browser_snapshot` to see what's on the page.\n- Adapt (different selector, different path) or report the failure as a finding.\n- Never claim you verified something you didn't — if it failed and you skipped it, say so.\nEOF" - env: @@ -1286,9 +1286,9 @@ jobs: pr_size = 'unknown size' diff_lines = 0 - # Write one instruction file per sub-agent file ordering strategy - for key, desc in [('az', 'A \u2192 Z (alphabetical)'), ('za', 'Z \u2192 A (reverse alphabetical)'), ('largest', 'largest diff first')]: - lines = [ + # Determine review approach based on PR size + def write_subagent(key, desc): + sa_lines = [ '# PR Review Sub-Agent', '', 'Review the PR as a code review sub-agent. Return findings only \u2014 do NOT leave inline comments.', @@ -1309,16 +1309,17 @@ jobs: '', f'Review files in this order: `/tmp/pr-context/file_order_{key}.txt` ({desc})', ] - with open(f'/tmp/pr-context/subagent-{key}.md', 'w') as f: - f.write('\n'.join(lines) + '\n') + with open(f'/tmp/pr-context/subagent-{key}.md', 'w') as sa_f: + sa_f.write('\n'.join(sa_lines) + '\n') - # Determine review approach based on PR size if diff_lines < 200: approach_lines = [ f'**Small PR ({pr_size}):** Review directly \u2014 no sub-agents. Review files in order from `/tmp/pr-context/file_order_az.txt`, reading each diff from `/tmp/pr-context/diffs/.diff` and the full file from the workspace.', ] size_key = 'small' elif diff_lines < 800: + write_subagent('az', 'A \u2192 Z (alphabetical)') + write_subagent('za', 'Z \u2192 A (reverse alphabetical)') approach_lines = [ f'**Medium PR ({pr_size}):** Use the **Pick Three, Keep Many** process \u2014 spawn 2 `code-review` sub-agents in parallel:', '', @@ -1329,6 +1330,9 @@ jobs: ] size_key = 'medium' else: + write_subagent('az', 'A \u2192 Z (alphabetical)') + write_subagent('za', 'Z \u2192 A (reverse alphabetical)') + write_subagent('largest', 'largest diff first') approach_lines = [ f'**Large PR ({pr_size}):** Use the **Pick Three, Keep Many** process \u2014 spawn 3 `code-review` sub-agents in parallel:', '', @@ -1353,8 +1357,7 @@ jobs: '## Comment Format', '', 'Call **`create_pull_request_review_comment`** with:', - '- The file path and the **exact line number from reading the file** (not estimated from the diff)', - '- The line must be within the diff (an added or context line in the patch)', + '- The file path and a **line number that appears in the numbered diff** (`/tmp/pr-context/diffs/.diff`). Only lines with a number prefix are commentable. If your target line has no number in the diff, include the finding in the review body instead.', '', t5, '**[SEVERITY] Brief title**', diff --git a/.github/workflows/gh-aw-mention-in-pr.lock.yml b/.github/workflows/gh-aw-mention-in-pr.lock.yml index 226cbe5e..09018261 100644 --- a/.github/workflows/gh-aw-mention-in-pr.lock.yml +++ b/.github/workflows/gh-aw-mention-in-pr.lock.yml @@ -47,7 +47,7 @@ # # inlined-imports: true # -# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"aec76f363ebbb5b8454ecae61cd57b014397bdabd3406eeb3d309602bee21330"} +# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"442d5d958f872e46c29a3f6e9fbabe2f27ca539e1f91137e9428673410941e9c"} name: "Mention in PR" "on": @@ -411,9 +411,9 @@ jobs: ## create-pull-request-review-comment - **Required fields**: `path` (file path), `line` (line number), and `body` (comment text). - - **Line**: Must be within the diff — an added or context line in the patch. Must be the **exact line number from reading the file** (not estimated from the patch). Lines outside the diff will fail. + - **Line**: Must be a numbered line in the per-file diff (`/tmp/pr-context/diffs/.diff`). Only lines with a number prefix are commentable. Lines outside the diff will fail with a GitHub API error. - **Body**: Sanitized with the standard pipeline (mentions neutralized, HTML filtered, URLs restricted). GitHub API limit is ~65,536 characters. - - **Side**: Defaults to `RIGHT` (the new code). Use `LEFT` only when commenting on deleted lines. + - **Side**: Always `RIGHT` (the new code). Deleted lines have no number prefix in the diff and cannot be targeted for inline comments — include findings about deleted code in the review body instead. - **Suggestion blocks**: Use ` ```suggestion ` fences for concrete code fixes. The suggestion must actually change the code — don't suggest identical code. Only include a `suggestion` block when you can provide a concrete code fix that **actually changes** the code. Only flag issues you are confident are real problems — false positives erode trust. Once you have flagged an issue, you cannot unflag it. @@ -731,9 +731,9 @@ jobs: GH_TOKEN: ${{ github.token }} PR_NUMBER: ${{ github.event.pull_request.number || inputs.target-pr-number || github.event.issue.number }} name: Fetch PR context to disk - run: "set -euo pipefail\nmkdir -p /tmp/pr-context\n\n# PR metadata\ngh pr view \"$PR_NUMBER\" --json title,body,author,baseRefName,headRefName,headRefOid,url \\\n > /tmp/pr-context/pr.json\n\n# Full diff\nif ! gh pr diff \"$PR_NUMBER\" > /tmp/pr-context/pr.diff; then\n echo \"::warning::Failed to fetch full PR diff; per-file diffs from files.json are still available.\"\n : > /tmp/pr-context/pr.diff\nfi\n\n# Changed files list (--paginate may output concatenated arrays; jq -s 'add' merges them)\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/files.json\n\n# Per-file diffs\njq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do\n filename=$(echo \"$entry\" | jq -r '.filename')\n mkdir -p \"/tmp/pr-context/diffs/$(dirname \"$filename\")\"\n echo \"$entry\" | jq -r '.patch // empty' > \"/tmp/pr-context/diffs/${filename}.diff\"\ndone\n\n# File orderings for sub-agent review (3 strategies)\njq -r '[.[] | .filename] | sort | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_az.txt\njq -r '[.[] | .filename] | sort | reverse | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_za.txt\njq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_largest.txt\n\n# Compute PR size metrics for review fan-out decisions\nFILE_COUNT=$(jq 'length' /tmp/pr-context/files.json)\nDIFF_LINES=$(wc -l < /tmp/pr-context/pr.diff | tr -d ' ')\necho \"${FILE_COUNT} files, ${DIFF_LINES} diff lines\" > /tmp/pr-context/pr-size.txt\necho \"PR size: ${FILE_COUNT} files, ${DIFF_LINES} diff lines\"\n\n# Existing reviews\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/reviews.json\n\n# Review threads with resolution status (GraphQL — REST lacks isResolved/isOutdated)\ngh api graphql --paginate -f query='\n query($owner: String!, $repo: String!, $number: Int!, $endCursor: String) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $number) {\n reviewThreads(first: 100, after: $endCursor) {\n pageInfo { hasNextPage endCursor }\n nodes {\n id\n isResolved\n isOutdated\n isCollapsed\n path\n line\n startLine\n comments(first: 100) {\n nodes {\n id\n databaseId\n body\n author { login }\n createdAt\n }\n }\n }\n }\n }\n }\n }\n' -F owner=\"${GITHUB_REPOSITORY%/*}\" -F repo=\"${GITHUB_REPOSITORY#*/}\" -F \"number=$PR_NUMBER\" \\\n --jq '.data.repository.pullRequest.reviewThreads.nodes' \\\n | jq -s 'add // []' > /tmp/pr-context/review_comments.json\n\n# Filtered review thread views (pre-computed so agents don't need to parse review_comments.json)\njq '[.[] | select(.isResolved == false)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/unresolved_threads.json\njq '[.[] | select(.isResolved == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/resolved_threads.json\njq '[.[] | select(.isOutdated == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/outdated_threads.json\n\n# Per-file review threads (mirrors diffs/ structure)\njq -c '.[]' /tmp/pr-context/review_comments.json | while IFS= read -r thread; do\n filepath=$(echo \"$thread\" | jq -r '.path // empty')\n [ -z \"$filepath\" ] && continue\n mkdir -p \"/tmp/pr-context/threads/$(dirname \"$filepath\")\"\n echo \"$thread\" >> \"/tmp/pr-context/threads/${filepath}.jsonl\"\ndone\n# Convert per-file JSONL to proper JSON arrays\nmkdir -p /tmp/pr-context/threads\nfind /tmp/pr-context/threads -name '*.jsonl' 2>/dev/null | while IFS= read -r jsonl; do\n jq -s '.' \"$jsonl\" > \"${jsonl%.jsonl}.json\"\n rm \"$jsonl\"\ndone\n\n# PR discussion comments\ngh api \"repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/comments.json\n\n# Linked issues\njq -r '.body // \"\"' /tmp/pr-context/pr.json 2>/dev/null \\\n | grep -oiE '(fixes|closes|resolves)\\s+#[0-9]+' \\\n | grep -oE '[0-9]+$' \\\n | sort -u \\\n | while read -r issue; do\n gh api \"repos/$GITHUB_REPOSITORY/issues/$issue\" > \"/tmp/pr-context/issue-${issue}.json\" || true\n done || true\n\n# Write manifest\ncat > /tmp/pr-context/README.md << 'MANIFEST'\n# PR Context\n\nPre-fetched PR data. All files are in `/tmp/pr-context/`.\n\n| File | Description |\n| --- | --- |\n| `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL |\n| `pr.diff` | Full unified diff of all changes |\n| `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` |\n| `diffs/.diff` | Per-file diffs — one file per changed file, mirroring the repo path under `diffs/` |\n| `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line |\n| `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line |\n| `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line |\n| `reviews.json` | Prior review submissions — author, state (APPROVED/CHANGES_REQUESTED/COMMENTED), body |\n| `review_comments.json` | All review threads (GraphQL) — each thread has `id` (node ID for resolving), `isResolved`, `isOutdated`, `path`, `line`, and nested `comments` with `id`, `databaseId` (numeric REST ID for replies), body/author |\n| `unresolved_threads.json` | Unresolved review threads — subset of `review_comments.json` where `isResolved` is false |\n| `resolved_threads.json` | Resolved review threads — subset of `review_comments.json` where `isResolved` is true |\n| `outdated_threads.json` | Outdated review threads — subset of `review_comments.json` where `isOutdated` is true (code changed since comment) |\n| `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` |\n| `comments.json` | PR discussion comments (not inline) |\n| `issue-{N}.json` | Linked issue details (one file per linked issue, if any) |\n| `pr-size.txt` | PR size metrics: `{N} files, {M} diff lines` — used by the agent to decide review fan-out |\n| `agents.md` | Not used in PR context — repository conventions are at `/tmp/agents.md` (pre-fetched) |\n| `review-instructions.md` | Review instructions, criteria, and calibration examples for sub-agents |\n| `agent-review.md` | Main agent instructions — review approach resolved from PR size (written when `ready_to_code_review` is called) |\n| `parent-review.md` | Comment format and inline severity threshold for the parent agent (written when `ready_to_code_review` is called) |\n| `subagent-az.md` | Sub-agent instructions: review files A → Z (written when `ready_to_code_review` is called) |\n| `subagent-za.md` | Sub-agent instructions: review files Z → A (written when `ready_to_code_review` is called) |\n| `subagent-largest.md` | Sub-agent instructions: review files largest diff first (written when `ready_to_code_review` is called) |\nMANIFEST\n\necho \"PR context written to /tmp/pr-context/\"\nls -la /tmp/pr-context/" + run: "set -euo pipefail\nmkdir -p /tmp/pr-context\n\n# PR metadata\ngh pr view \"$PR_NUMBER\" --json title,body,author,baseRefName,headRefName,headRefOid,url \\\n > /tmp/pr-context/pr.json\n\n# Full diff\nif ! gh pr diff \"$PR_NUMBER\" > /tmp/pr-context/pr.diff; then\n echo \"::warning::Failed to fetch full PR diff; per-file diffs from files.json are still available.\"\n : > /tmp/pr-context/pr.diff\nfi\n\n# Changed files list (--paginate may output concatenated arrays; jq -s 'add' merges them)\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/files.json\n\n# Per-file diffs with line numbers\n# Lines with a number prefix are commentable (RIGHT side: added or context).\n# Deleted lines get no number (LEFT side only).\njq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do\n filename=$(echo \"$entry\" | jq -r '.filename')\n mkdir -p \"/tmp/pr-context/diffs/$(dirname \"$filename\")\"\n echo \"$entry\" | jq -r '.patch // empty' | awk '\n /^@@/ { s=$3; gsub(/\\+/,\"\",s); split(s,a,\",\"); line=a[1]+0; print; next }\n /^\\+/ { printf \"%d\\t%s\\n\", line++, $0; next }\n /^-/ { printf \" \\t%s\\n\", $0; next }\n /^ / { printf \"%d\\t%s\\n\", line++, $0; next }\n { print }\n ' > \"/tmp/pr-context/diffs/${filename}.diff\"\ndone\n\n# File orderings for sub-agent review (3 strategies)\njq -r '[.[] | .filename] | sort | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_az.txt\njq -r '[.[] | .filename] | sort | reverse | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_za.txt\njq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_largest.txt\n\n# Compute PR size metrics for review fan-out decisions\nFILE_COUNT=$(jq 'length' /tmp/pr-context/files.json)\nDIFF_LINES=$(wc -l < /tmp/pr-context/pr.diff | tr -d ' ')\necho \"${FILE_COUNT} files, ${DIFF_LINES} diff lines\" > /tmp/pr-context/pr-size.txt\necho \"PR size: ${FILE_COUNT} files, ${DIFF_LINES} diff lines\"\n\n# Existing reviews\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/reviews.json\n\n# Review threads with resolution status (GraphQL — REST lacks isResolved/isOutdated)\ngh api graphql --paginate -f query='\n query($owner: String!, $repo: String!, $number: Int!, $endCursor: String) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $number) {\n reviewThreads(first: 100, after: $endCursor) {\n pageInfo { hasNextPage endCursor }\n nodes {\n id\n isResolved\n isOutdated\n isCollapsed\n path\n line\n startLine\n comments(first: 100) {\n nodes {\n id\n databaseId\n body\n author { login }\n createdAt\n }\n }\n }\n }\n }\n }\n }\n' -F owner=\"${GITHUB_REPOSITORY%/*}\" -F repo=\"${GITHUB_REPOSITORY#*/}\" -F \"number=$PR_NUMBER\" \\\n --jq '.data.repository.pullRequest.reviewThreads.nodes' \\\n | jq -s 'add // []' > /tmp/pr-context/review_comments.json\n\n# Filtered review thread views (pre-computed so agents don't need to parse review_comments.json)\njq '[.[] | select(.isResolved == false)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/unresolved_threads.json\njq '[.[] | select(.isResolved == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/resolved_threads.json\njq '[.[] | select(.isOutdated == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/outdated_threads.json\n\n# Per-file review threads (mirrors diffs/ structure)\njq -c '.[]' /tmp/pr-context/review_comments.json | while IFS= read -r thread; do\n filepath=$(echo \"$thread\" | jq -r '.path // empty')\n [ -z \"$filepath\" ] && continue\n mkdir -p \"/tmp/pr-context/threads/$(dirname \"$filepath\")\"\n echo \"$thread\" >> \"/tmp/pr-context/threads/${filepath}.jsonl\"\ndone\n# Convert per-file JSONL to proper JSON arrays\nmkdir -p /tmp/pr-context/threads\nfind /tmp/pr-context/threads -name '*.jsonl' 2>/dev/null | while IFS= read -r jsonl; do\n jq -s '.' \"$jsonl\" > \"${jsonl%.jsonl}.json\"\n rm \"$jsonl\"\ndone\n\n# PR discussion comments\ngh api \"repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/comments.json\n\n# Linked issues\njq -r '.body // \"\"' /tmp/pr-context/pr.json 2>/dev/null \\\n | grep -oiE '(fixes|closes|resolves)\\s+#[0-9]+' \\\n | grep -oE '[0-9]+$' \\\n | sort -u \\\n | while read -r issue; do\n gh api \"repos/$GITHUB_REPOSITORY/issues/$issue\" > \"/tmp/pr-context/issue-${issue}.json\" || true\n done || true\n\n# Write manifest\ncat > /tmp/pr-context/README.md << 'MANIFEST'\n# PR Context\n\nPre-fetched PR data. All files are in `/tmp/pr-context/`.\n\n| File | Description |\n| --- | --- |\n| `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL |\n| `pr.diff` | Full unified diff of all changes |\n| `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` |\n| `diffs/.diff` | Per-file diffs with line numbers — numbered lines (e.g. `405\\t+ code`) are commentable; lines without numbers are deleted (LEFT side only) |\n| `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line |\n| `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line |\n| `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line |\n| `reviews.json` | Prior review submissions — author, state (APPROVED/CHANGES_REQUESTED/COMMENTED), body |\n| `review_comments.json` | All review threads (GraphQL) — each thread has `id` (node ID for resolving), `isResolved`, `isOutdated`, `path`, `line`, and nested `comments` with `id`, `databaseId` (numeric REST ID for replies), body/author |\n| `unresolved_threads.json` | Unresolved review threads — subset of `review_comments.json` where `isResolved` is false |\n| `resolved_threads.json` | Resolved review threads — subset of `review_comments.json` where `isResolved` is true |\n| `outdated_threads.json` | Outdated review threads — subset of `review_comments.json` where `isOutdated` is true (code changed since comment) |\n| `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` |\n| `comments.json` | PR discussion comments (not inline) |\n| `issue-{N}.json` | Linked issue details (one file per linked issue, if any) |\n| `pr-size.txt` | PR size metrics: `{N} files, {M} diff lines` — used by the agent to decide review fan-out |\n| `agents.md` | Not used in PR context — repository conventions are at `/tmp/agents.md` (pre-fetched) |\n| `review-instructions.md` | Review instructions, criteria, and calibration examples for sub-agents |\n| `agent-review.md` | Main agent instructions — review approach resolved from PR size (written when `ready_to_code_review` is called) |\n| `parent-review.md` | Comment format and inline severity threshold for the parent agent (written when `ready_to_code_review` is called) |\n| `subagent-*.md` | Sub-agent instructions (only written for medium/large PRs when `ready_to_code_review` is called — check `agent-review.md` for which exist) |\nMANIFEST\n\necho \"PR context written to /tmp/pr-context/\"\nls -la /tmp/pr-context/" - name: Write review instructions to disk - run: "mkdir -p /tmp/pr-context\ncat > /tmp/pr-context/review-instructions.md << 'REVIEW_EOF'\n# Review Instructions for Sub-agents\n\nYou are a code review sub-agent. Read these instructions, then review the PR files in the order provided in your prompt.\n\n## Context\n\nBefore reviewing files, read these to understand the PR:\n\n1. `/tmp/pr-context/pr.json` — PR title, description, author, and branches. Understand what the PR is trying to accomplish.\n2. `/tmp/agents.md` — Repository coding conventions and guidelines (skip if missing).\n3. `/tmp/pr-context/review_comments.json` — Existing review threads. Note which files already have threads so you don't duplicate.\n4. `/tmp/pr-context/issue-*.json` — Linked issue details (if any). Understand the motivation and acceptance criteria.\n\n## Process\n\nReview the PR diff file by file in your assigned order. For each changed file:\n\n1. **Read the diff** for this file from `/tmp/pr-context/diffs/.diff` to understand what changed. If the diff is empty or truncated (e.g., binary files or very large changes), fall back to reading the full file from the workspace and comparing against context.\n2. **Read the full file from the workspace.** The PR branch is checked out locally — open the file directly to get complete contents with line numbers.\n3. **Check existing threads** for this file from `/tmp/pr-context/threads/.json` (if it exists). Skip issues that are already under discussion — each thread has `isResolved` and `isOutdated` fields.\n4. **Identify potential issues** matching the review criteria below.\n5. **Quick-check each issue** before including it:\n - What specific code pattern or change triggers this concern?\n - Is there an obvious guard, handler, or mitigation visible in the immediate context?\n - Can you describe a concrete failure scenario (the `evidence` field)? If you cannot articulate what specific input or state triggers the problem, drop the finding.\n - If the issue is clearly handled, skip it. If you're unsure, include it — the parent will verify.\n6. **Add to your findings list.** Do NOT leave inline comments — you don't have that tool. Return findings in this format:\n\n```\n- file: path/to/file\n line: 42\n severity: HIGH\n title: Brief title\n description: What the issue is and why it matters\n evidence: The specific code pattern and failure scenario\n suggestion: corrected code here (optional — only if you can provide a concrete fix)\n```\n\n**Review every file in your assigned order.** Files reviewed earlier get more attention, which is why different sub-agents use different orderings.\n\n**Check existing threads** — per-file threads are at `/tmp/pr-context/threads/.json` (step 3 above). The full list is at `/tmp/pr-context/review_comments.json`. Do not flag issues that are already under discussion (resolved or unresolved). For outdated threads, only re-flag if the issue still applies to the current diff.\n\n**Return your full findings list** when done, or an empty list if no issues were found.\n\n## Review Criteria\n\nFocus on these categories in priority order:\n\n1. Security vulnerabilities (injection, XSS, auth bypass, secrets exposure)\n2. Logic bugs that could cause runtime failures or incorrect behavior\n3. Data integrity issues (race conditions, missing transactions, corruption risk)\n4. Performance bottlenecks (N+1 queries, memory leaks, blocking operations)\n5. Error handling gaps (unhandled exceptions, missing validation)\n6. Breaking changes to public APIs without migration path\n7. Missing or incorrect test coverage for critical paths\n\n## What NOT to Flag\n\nOnly review the diff — do not flag issues in unchanged code, pre-existing problems not introduced by this PR, or style preferences handled by linters or formatters.\n\n**Common false positives** — these patterns look like issues but usually aren't. Before flagging anything in these categories, confirm the problem is real by reading the surrounding code:\n\n- **Security — input already sanitized:** Don't flag injection or XSS risks when inputs are sanitized upstream, parameterized queries are used, or the framework auto-escapes output.\n- **Null/undefined — guarded elsewhere:** Don't flag potential null dereferences if the value is guaranteed by a type guard, assertion, schema validation, or upstream null check.\n- **Error handling — handled at a different layer:** Don't flag missing try/catch if the caller, middleware, or framework catches and handles the error (e.g., Express error middleware, React error boundaries).\n- **Performance — theoretical, not practical:** Don't flag algorithmic complexity (e.g., O(n^2)) unless N is demonstrably large enough to matter in the actual usage context. \"This could be slow\" without evidence is not actionable.\n- **Validation — exists at another layer:** Don't flag missing input validation if it's handled by an API gateway, middleware, schema validator, or type system.\n- **Test coverage — trivial or generated code:** Don't flag missing tests for trivial getters/setters, auto-generated code, or simple delegation methods.\n- **Style or naming — not in coding guidelines:** Don't flag naming conventions or code style unless they violate the repository's documented coding guidelines (from `/tmp/agents.md` or CONTRIBUTING docs).\n\n**Existing review threads** — check BEFORE flagging any issue:\n\n- **Resolved with reviewer reply** (e.g. \"This is intentional\") — reviewer's decision is final. Do NOT re-flag.\n- **Resolved without reply** — author likely fixed it. Do NOT re-raise unless the fix introduced a new problem.\n- **Unresolved** — already flagged. Do NOT duplicate.\n- **Outdated** — only re-flag if the issue still applies to the current diff.\n\nWhen in doubt, do not duplicate. Redundant comments erode trust.\n\nFinding no issues is a valid and valuable outcome. An empty findings list is better than findings that waste the author's time or erode trust. Do not manufacture findings to justify your review — if the code is sound, return an empty list.\n\n## Severity Classification\n\nDetermine severity AFTER investigating the issue, not before. First identify the problem and trace through the code, then assign a severity based on the evidence you found.\n\n- 🔴 CRITICAL — Must fix before merge (security vulnerabilities, data corruption, production-breaking bugs)\n- 🟠 HIGH — Should fix before merge (logic errors, missing validation, significant performance issues)\n- 🟡 MEDIUM — Address soon, non-blocking (error handling gaps, suboptimal patterns, missing edge cases)\n- ⚪ LOW — Author discretion (minor improvements, documentation gaps)\n- 💬 NITPICK — Truly optional (stylistic preferences, alternative approaches)\n\n## Review Intensity\n\nThe review intensity is `${{ inputs.intensity || 'balanced' }}`.\n\n- **conservative**: High evidence bar. Only flag when you can demonstrate a concrete failure scenario. If you can construct a reasonable counterargument, do not flag. Approval with zero findings is the expected outcome for most PRs.\n- **balanced**: Standard evidence bar. Flag when you can point to specific code that would fail. If the issue is ambiguous, lean toward not flagging.\n- **aggressive**: Lower evidence bar. Flag when evidence exists even if the failure scenario is not fully confirmed. Improvement suggestions welcome but must cite specific code.\n\n## Calibration Examples\n\nUse these examples to calibrate your judgment. Each pair shows a real issue and a similar-looking pattern that is NOT an issue.\n\n### Example 1: Null/Undefined Access\n\n**True positive — flag this:**\n\n```js\n// PR adds this handler\napp.get('/user/:id', async (req, res) => {\n const user = await db.findUser(req.params.id);\n res.json({ name: user.name, email: user.email });\n});\n```\n\nWhy flag: `db.findUser()` can return `null` when no user matches the ID. Accessing `user.name` will throw a TypeError at runtime. No upstream guard exists — the route handler is the entry point.\n\n**False positive — do NOT flag this:**\n\n```ts\n// PR adds this line inside an existing function\nconst settings = user.getSettings();\n```\n\nWhy skip: Reading the full file reveals `user` is typed as `User` (not `User | null`), and the calling function only runs after `authenticateUser()` middleware which guarantees a valid user object. The null case is handled at a different layer.\n\n### Example 2: SQL Injection\n\n**True positive — flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE customer_id = '{customer_id}'\")\n```\n\nWhy flag: String interpolation in a SQL query with user-controlled input (`customer_id` comes from the request). No parameterization or sanitization anywhere in the call chain.\n\n**False positive — do NOT flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE status = '{OrderStatus.PENDING.value}'\")\n```\n\nWhy skip: The interpolated value is a hardcoded enum constant (`OrderStatus.PENDING`), not user input. There is no injection vector.\n\n### Example 3: Borderline — Do NOT Flag\n\n```go\n// PR adds this function\nfunc processItems(items []Item) []Result {\n results := make([]Result, 0)\n for _, item := range items {\n for _, tag := range item.Tags {\n results = append(results, process(item, tag))\n }\n }\n return results\n}\n```\n\nThis looks like an O(n*m) performance concern. But without evidence that `items` or `Tags` are large in practice, this is speculative. The function processes a bounded dataset (items from a single user request). Do not flag theoretical performance issues without evidence of real-world impact.\nREVIEW_EOF" + run: "mkdir -p /tmp/pr-context\ncat > /tmp/pr-context/review-instructions.md << 'REVIEW_EOF'\n# Review Instructions for Sub-agents\n\nYou are a code review sub-agent. Read these instructions, then review the PR files in the order provided in your prompt.\n\n## Context\n\nBefore reviewing files, read these to understand the PR:\n\n1. `/tmp/pr-context/pr.json` — PR title, description, author, and branches. Understand what the PR is trying to accomplish.\n2. `/tmp/agents.md` — Repository coding conventions and guidelines (skip if missing).\n3. `/tmp/pr-context/review_comments.json` — Existing review threads. Note which files already have threads so you don't duplicate.\n4. `/tmp/pr-context/issue-*.json` — Linked issue details (if any). Understand the motivation and acceptance criteria.\n\n## Process\n\nReview the PR diff file by file in your assigned order. For each changed file:\n\n1. **Read the diff** for this file from `/tmp/pr-context/diffs/.diff` to understand what changed. Each line is prefixed with its line number (e.g., `405\\t+ code`). Lines with numbers are commentable; lines without numbers are deleted lines (LEFT side only). If the diff is empty or truncated (e.g., binary files or very large changes), fall back to reading the full file from the workspace and comparing against context.\n2. **Read the full file from the workspace.** The PR branch is checked out locally — open the file directly to get complete contents with line numbers.\n3. **Check existing threads** for this file from `/tmp/pr-context/threads/.json` (if it exists). Skip issues that are already under discussion — each thread has `isResolved` and `isOutdated` fields.\n4. **Identify potential issues** matching the review criteria below.\n5. **Quick-check each issue** before including it:\n - What specific code pattern or change triggers this concern?\n - Is there an obvious guard, handler, or mitigation visible in the immediate context?\n - Can you describe a concrete failure scenario (the `evidence` field)? If you cannot articulate what specific input or state triggers the problem, drop the finding.\n - If the issue is clearly handled, skip it. If you're unsure, include it — the parent will verify.\n6. **Add to your findings list.** Do NOT leave inline comments — you don't have that tool. Return findings in this format:\n\n```\n- file: path/to/file\n line: 42\n severity: HIGH\n title: Brief title\n description: What the issue is and why it matters\n evidence: The specific code pattern and failure scenario\n suggestion: corrected code here (optional — only if you can provide a concrete fix)\n```\n\n**Review every file in your assigned order.** Files reviewed earlier get more attention, which is why different sub-agents use different orderings.\n\n**Check existing threads** — per-file threads are at `/tmp/pr-context/threads/.json` (step 3 above). The full list is at `/tmp/pr-context/review_comments.json`. Do not flag issues that are already under discussion (resolved or unresolved). For outdated threads, only re-flag if the issue still applies to the current diff.\n\n**Return your full findings list** when done, or an empty list if no issues were found.\n\n## Review Criteria\n\nFocus on these categories in priority order:\n\n1. Security vulnerabilities (injection, XSS, auth bypass, secrets exposure)\n2. Logic bugs that could cause runtime failures or incorrect behavior\n3. Data integrity issues (race conditions, missing transactions, corruption risk)\n4. Performance bottlenecks (N+1 queries, memory leaks, blocking operations)\n5. Error handling gaps (unhandled exceptions, missing validation)\n6. Breaking changes to public APIs without migration path\n7. Missing or incorrect test coverage for critical paths\n\n## What NOT to Flag\n\nOnly review the diff — do not flag issues in unchanged code, pre-existing problems not introduced by this PR, or style preferences handled by linters or formatters.\n\n**Common false positives** — these patterns look like issues but usually aren't. Before flagging anything in these categories, confirm the problem is real by reading the surrounding code:\n\n- **Security — input already sanitized:** Don't flag injection or XSS risks when inputs are sanitized upstream, parameterized queries are used, or the framework auto-escapes output.\n- **Null/undefined — guarded elsewhere:** Don't flag potential null dereferences if the value is guaranteed by a type guard, assertion, schema validation, or upstream null check.\n- **Error handling — handled at a different layer:** Don't flag missing try/catch if the caller, middleware, or framework catches and handles the error (e.g., Express error middleware, React error boundaries).\n- **Performance — theoretical, not practical:** Don't flag algorithmic complexity (e.g., O(n^2)) unless N is demonstrably large enough to matter in the actual usage context. \"This could be slow\" without evidence is not actionable.\n- **Validation — exists at another layer:** Don't flag missing input validation if it's handled by an API gateway, middleware, schema validator, or type system.\n- **Test coverage — trivial or generated code:** Don't flag missing tests for trivial getters/setters, auto-generated code, or simple delegation methods.\n- **Style or naming — not in coding guidelines:** Don't flag naming conventions or code style unless they violate the repository's documented coding guidelines (from `/tmp/agents.md` or CONTRIBUTING docs).\n\n**Existing review threads** — check BEFORE flagging any issue:\n\n- **Resolved with reviewer reply** (e.g. \"This is intentional\") — reviewer's decision is final. Do NOT re-flag.\n- **Resolved without reply** — author likely fixed it. Do NOT re-raise unless the fix introduced a new problem.\n- **Unresolved** — already flagged. Do NOT duplicate.\n- **Outdated** — only re-flag if the issue still applies to the current diff.\n\nWhen in doubt, do not duplicate. Redundant comments erode trust.\n\nFinding no issues is a valid and valuable outcome. An empty findings list is better than findings that waste the author's time or erode trust. Do not manufacture findings to justify your review — if the code is sound, return an empty list.\n\n## Severity Classification\n\nDetermine severity AFTER investigating the issue, not before. First identify the problem and trace through the code, then assign a severity based on the evidence you found.\n\n- 🔴 CRITICAL — Must fix before merge (security vulnerabilities, data corruption, production-breaking bugs)\n- 🟠 HIGH — Should fix before merge (logic errors, missing validation, significant performance issues)\n- 🟡 MEDIUM — Address soon, non-blocking (error handling gaps, suboptimal patterns, missing edge cases)\n- ⚪ LOW — Author discretion (minor improvements, documentation gaps)\n- 💬 NITPICK — Truly optional (stylistic preferences, alternative approaches)\n\n## Review Intensity\n\nThe review intensity is `${{ inputs.intensity || 'balanced' }}`.\n\n- **conservative**: High evidence bar. Only flag when you can demonstrate a concrete failure scenario. If you can construct a reasonable counterargument, do not flag. Approval with zero findings is the expected outcome for most PRs.\n- **balanced**: Standard evidence bar. Flag when you can point to specific code that would fail. If the issue is ambiguous, lean toward not flagging.\n- **aggressive**: Lower evidence bar. Flag when evidence exists even if the failure scenario is not fully confirmed. Improvement suggestions welcome but must cite specific code.\n\n## Calibration Examples\n\nUse these examples to calibrate your judgment. Each pair shows a real issue and a similar-looking pattern that is NOT an issue.\n\n### Example 1: Null/Undefined Access\n\n**True positive — flag this:**\n\n```js\n// PR adds this handler\napp.get('/user/:id', async (req, res) => {\n const user = await db.findUser(req.params.id);\n res.json({ name: user.name, email: user.email });\n});\n```\n\nWhy flag: `db.findUser()` can return `null` when no user matches the ID. Accessing `user.name` will throw a TypeError at runtime. No upstream guard exists — the route handler is the entry point.\n\n**False positive — do NOT flag this:**\n\n```ts\n// PR adds this line inside an existing function\nconst settings = user.getSettings();\n```\n\nWhy skip: Reading the full file reveals `user` is typed as `User` (not `User | null`), and the calling function only runs after `authenticateUser()` middleware which guarantees a valid user object. The null case is handled at a different layer.\n\n### Example 2: SQL Injection\n\n**True positive — flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE customer_id = '{customer_id}'\")\n```\n\nWhy flag: String interpolation in a SQL query with user-controlled input (`customer_id` comes from the request). No parameterization or sanitization anywhere in the call chain.\n\n**False positive — do NOT flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE status = '{OrderStatus.PENDING.value}'\")\n```\n\nWhy skip: The interpolated value is a hardcoded enum constant (`OrderStatus.PENDING`), not user input. There is no injection vector.\n\n### Example 3: Borderline — Do NOT Flag\n\n```go\n// PR adds this function\nfunc processItems(items []Item) []Result {\n results := make([]Result, 0)\n for _, item := range items {\n for _, tag := range item.Tags {\n results = append(results, process(item, tag))\n }\n }\n return results\n}\n```\n\nThis looks like an O(n*m) performance concern. But without evidence that `items` or `Tags` are large in practice, this is speculative. The function processes a bounded dataset (items from a single user request). Do not flag theoretical performance issues without evidence of real-world impact.\nREVIEW_EOF" - name: Write Playwright instructions to disk run: "cat > /tmp/playwright-instructions.md << 'EOF'\n# Playwright MCP Tools\n\nUse Playwright MCP tools for interactive browser automation.\nUse these tools to explore the app step by step — do NOT write Node.js scripts.\n\n## Available tools\n\n- `browser_navigate` — go to a URL\n- `browser_click` — click an element\n- `browser_type` — type text into an input\n- `browser_snapshot` — get an accessibility tree (YAML) of the current page\n- `browser_take_screenshot` — capture a screenshot\n- `browser_console_execute` — run JavaScript in the browser console\n\n## Why MCP tools instead of scripts\n\nMCP tools are interactive: you see the page state after each action and\ndecide what to do next. This is ideal for exploratory testing where you\nneed to adapt based on what you find. Scripts are fire-and-forget — if\na selector is wrong, you don't find out until the script fails.\n\n## Measuring DOM properties\n\nFor programmatic checks (e.g. element heights, contrast), use\n`browser_console_execute`:\n\n```javascript\n(() => {\n const els = document.querySelectorAll('input, button, [role=\"combobox\"], [role=\"button\"]');\n return JSON.stringify(Array.from(els)\n .map(el => {\n const r = el.getBoundingClientRect();\n return { tag: el.tagName, h: Math.round(r.height), top: Math.round(r.top), text: el.textContent?.trim().slice(0, 20) };\n })\n .filter(el => el.top > 50 && el.top < 250));\n})()\n```\n\n## Handling failures\n\n- Do not retry the same action more than twice — the page is in a different state than expected.\n- Diagnose before moving on: use `browser_take_screenshot` and `browser_snapshot` to see what's on the page.\n- Adapt (different selector, different path) or report the failure as a finding.\n- Never claim you verified something you didn't — if it failed and you skipped it, say so.\nEOF" - env: @@ -1316,9 +1316,9 @@ jobs: pr_size = 'unknown size' diff_lines = 0 - # Write one instruction file per sub-agent file ordering strategy - for key, desc in [('az', 'A \u2192 Z (alphabetical)'), ('za', 'Z \u2192 A (reverse alphabetical)'), ('largest', 'largest diff first')]: - lines = [ + # Determine review approach based on PR size + def write_subagent(key, desc): + sa_lines = [ '# PR Review Sub-Agent', '', 'Review the PR as a code review sub-agent. Return findings only \u2014 do NOT leave inline comments.', @@ -1339,16 +1339,17 @@ jobs: '', f'Review files in this order: `/tmp/pr-context/file_order_{key}.txt` ({desc})', ] - with open(f'/tmp/pr-context/subagent-{key}.md', 'w') as f: - f.write('\n'.join(lines) + '\n') + with open(f'/tmp/pr-context/subagent-{key}.md', 'w') as sa_f: + sa_f.write('\n'.join(sa_lines) + '\n') - # Determine review approach based on PR size if diff_lines < 200: approach_lines = [ f'**Small PR ({pr_size}):** Review directly \u2014 no sub-agents. Review files in order from `/tmp/pr-context/file_order_az.txt`, reading each diff from `/tmp/pr-context/diffs/.diff` and the full file from the workspace.', ] size_key = 'small' elif diff_lines < 800: + write_subagent('az', 'A \u2192 Z (alphabetical)') + write_subagent('za', 'Z \u2192 A (reverse alphabetical)') approach_lines = [ f'**Medium PR ({pr_size}):** Use the **Pick Three, Keep Many** process \u2014 spawn 2 `code-review` sub-agents in parallel:', '', @@ -1359,6 +1360,9 @@ jobs: ] size_key = 'medium' else: + write_subagent('az', 'A \u2192 Z (alphabetical)') + write_subagent('za', 'Z \u2192 A (reverse alphabetical)') + write_subagent('largest', 'largest diff first') approach_lines = [ f'**Large PR ({pr_size}):** Use the **Pick Three, Keep Many** process \u2014 spawn 3 `code-review` sub-agents in parallel:', '', @@ -1383,8 +1387,7 @@ jobs: '## Comment Format', '', 'Call **`create_pull_request_review_comment`** with:', - '- The file path and the **exact line number from reading the file** (not estimated from the diff)', - '- The line must be within the diff (an added or context line in the patch)', + '- The file path and a **line number that appears in the numbered diff** (`/tmp/pr-context/diffs/.diff`). Only lines with a number prefix are commentable. If your target line has no number in the diff, include the finding in the review body instead.', '', t5, '**[SEVERITY] Brief title**', diff --git a/.github/workflows/gh-aw-pr-review-addresser.lock.yml b/.github/workflows/gh-aw-pr-review-addresser.lock.yml index 17fc486e..47b341e2 100644 --- a/.github/workflows/gh-aw-pr-review-addresser.lock.yml +++ b/.github/workflows/gh-aw-pr-review-addresser.lock.yml @@ -40,7 +40,7 @@ # # inlined-imports: true # -# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"13741b39aaf4036f92846e5f2b6441746df050f49c42458469d4bcd2b5f64730"} +# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"6e17208d9f4d770fb5184a3a523a8975cfb5f2b6158304cb2f4abd38ae81db6a"} name: "PR Review Addresser" "on": @@ -595,7 +595,7 @@ jobs: GH_TOKEN: ${{ github.token }} PR_NUMBER: ${{ github.event.pull_request.number || inputs.target-pr-number || github.event.issue.number }} name: Fetch PR context to disk - run: "set -euo pipefail\nmkdir -p /tmp/pr-context\n\n# PR metadata\ngh pr view \"$PR_NUMBER\" --json title,body,author,baseRefName,headRefName,headRefOid,url \\\n > /tmp/pr-context/pr.json\n\n# Full diff\nif ! gh pr diff \"$PR_NUMBER\" > /tmp/pr-context/pr.diff; then\n echo \"::warning::Failed to fetch full PR diff; per-file diffs from files.json are still available.\"\n : > /tmp/pr-context/pr.diff\nfi\n\n# Changed files list (--paginate may output concatenated arrays; jq -s 'add' merges them)\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/files.json\n\n# Per-file diffs\njq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do\n filename=$(echo \"$entry\" | jq -r '.filename')\n mkdir -p \"/tmp/pr-context/diffs/$(dirname \"$filename\")\"\n echo \"$entry\" | jq -r '.patch // empty' > \"/tmp/pr-context/diffs/${filename}.diff\"\ndone\n\n# File orderings for sub-agent review (3 strategies)\njq -r '[.[] | .filename] | sort | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_az.txt\njq -r '[.[] | .filename] | sort | reverse | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_za.txt\njq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_largest.txt\n\n# Compute PR size metrics for review fan-out decisions\nFILE_COUNT=$(jq 'length' /tmp/pr-context/files.json)\nDIFF_LINES=$(wc -l < /tmp/pr-context/pr.diff | tr -d ' ')\necho \"${FILE_COUNT} files, ${DIFF_LINES} diff lines\" > /tmp/pr-context/pr-size.txt\necho \"PR size: ${FILE_COUNT} files, ${DIFF_LINES} diff lines\"\n\n# Existing reviews\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/reviews.json\n\n# Review threads with resolution status (GraphQL — REST lacks isResolved/isOutdated)\ngh api graphql --paginate -f query='\n query($owner: String!, $repo: String!, $number: Int!, $endCursor: String) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $number) {\n reviewThreads(first: 100, after: $endCursor) {\n pageInfo { hasNextPage endCursor }\n nodes {\n id\n isResolved\n isOutdated\n isCollapsed\n path\n line\n startLine\n comments(first: 100) {\n nodes {\n id\n databaseId\n body\n author { login }\n createdAt\n }\n }\n }\n }\n }\n }\n }\n' -F owner=\"${GITHUB_REPOSITORY%/*}\" -F repo=\"${GITHUB_REPOSITORY#*/}\" -F \"number=$PR_NUMBER\" \\\n --jq '.data.repository.pullRequest.reviewThreads.nodes' \\\n | jq -s 'add // []' > /tmp/pr-context/review_comments.json\n\n# Filtered review thread views (pre-computed so agents don't need to parse review_comments.json)\njq '[.[] | select(.isResolved == false)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/unresolved_threads.json\njq '[.[] | select(.isResolved == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/resolved_threads.json\njq '[.[] | select(.isOutdated == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/outdated_threads.json\n\n# Per-file review threads (mirrors diffs/ structure)\njq -c '.[]' /tmp/pr-context/review_comments.json | while IFS= read -r thread; do\n filepath=$(echo \"$thread\" | jq -r '.path // empty')\n [ -z \"$filepath\" ] && continue\n mkdir -p \"/tmp/pr-context/threads/$(dirname \"$filepath\")\"\n echo \"$thread\" >> \"/tmp/pr-context/threads/${filepath}.jsonl\"\ndone\n# Convert per-file JSONL to proper JSON arrays\nmkdir -p /tmp/pr-context/threads\nfind /tmp/pr-context/threads -name '*.jsonl' 2>/dev/null | while IFS= read -r jsonl; do\n jq -s '.' \"$jsonl\" > \"${jsonl%.jsonl}.json\"\n rm \"$jsonl\"\ndone\n\n# PR discussion comments\ngh api \"repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/comments.json\n\n# Linked issues\njq -r '.body // \"\"' /tmp/pr-context/pr.json 2>/dev/null \\\n | grep -oiE '(fixes|closes|resolves)\\s+#[0-9]+' \\\n | grep -oE '[0-9]+$' \\\n | sort -u \\\n | while read -r issue; do\n gh api \"repos/$GITHUB_REPOSITORY/issues/$issue\" > \"/tmp/pr-context/issue-${issue}.json\" || true\n done || true\n\n# Write manifest\ncat > /tmp/pr-context/README.md << 'MANIFEST'\n# PR Context\n\nPre-fetched PR data. All files are in `/tmp/pr-context/`.\n\n| File | Description |\n| --- | --- |\n| `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL |\n| `pr.diff` | Full unified diff of all changes |\n| `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` |\n| `diffs/.diff` | Per-file diffs — one file per changed file, mirroring the repo path under `diffs/` |\n| `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line |\n| `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line |\n| `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line |\n| `reviews.json` | Prior review submissions — author, state (APPROVED/CHANGES_REQUESTED/COMMENTED), body |\n| `review_comments.json` | All review threads (GraphQL) — each thread has `id` (node ID for resolving), `isResolved`, `isOutdated`, `path`, `line`, and nested `comments` with `id`, `databaseId` (numeric REST ID for replies), body/author |\n| `unresolved_threads.json` | Unresolved review threads — subset of `review_comments.json` where `isResolved` is false |\n| `resolved_threads.json` | Resolved review threads — subset of `review_comments.json` where `isResolved` is true |\n| `outdated_threads.json` | Outdated review threads — subset of `review_comments.json` where `isOutdated` is true (code changed since comment) |\n| `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` |\n| `comments.json` | PR discussion comments (not inline) |\n| `issue-{N}.json` | Linked issue details (one file per linked issue, if any) |\n| `pr-size.txt` | PR size metrics: `{N} files, {M} diff lines` — used by the agent to decide review fan-out |\n| `agents.md` | Not used in PR context — repository conventions are at `/tmp/agents.md` (pre-fetched) |\n| `review-instructions.md` | Review instructions, criteria, and calibration examples for sub-agents |\n| `agent-review.md` | Main agent instructions — review approach resolved from PR size (written when `ready_to_code_review` is called) |\n| `parent-review.md` | Comment format and inline severity threshold for the parent agent (written when `ready_to_code_review` is called) |\n| `subagent-az.md` | Sub-agent instructions: review files A → Z (written when `ready_to_code_review` is called) |\n| `subagent-za.md` | Sub-agent instructions: review files Z → A (written when `ready_to_code_review` is called) |\n| `subagent-largest.md` | Sub-agent instructions: review files largest diff first (written when `ready_to_code_review` is called) |\nMANIFEST\n\necho \"PR context written to /tmp/pr-context/\"\nls -la /tmp/pr-context/" + run: "set -euo pipefail\nmkdir -p /tmp/pr-context\n\n# PR metadata\ngh pr view \"$PR_NUMBER\" --json title,body,author,baseRefName,headRefName,headRefOid,url \\\n > /tmp/pr-context/pr.json\n\n# Full diff\nif ! gh pr diff \"$PR_NUMBER\" > /tmp/pr-context/pr.diff; then\n echo \"::warning::Failed to fetch full PR diff; per-file diffs from files.json are still available.\"\n : > /tmp/pr-context/pr.diff\nfi\n\n# Changed files list (--paginate may output concatenated arrays; jq -s 'add' merges them)\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/files.json\n\n# Per-file diffs with line numbers\n# Lines with a number prefix are commentable (RIGHT side: added or context).\n# Deleted lines get no number (LEFT side only).\njq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do\n filename=$(echo \"$entry\" | jq -r '.filename')\n mkdir -p \"/tmp/pr-context/diffs/$(dirname \"$filename\")\"\n echo \"$entry\" | jq -r '.patch // empty' | awk '\n /^@@/ { s=$3; gsub(/\\+/,\"\",s); split(s,a,\",\"); line=a[1]+0; print; next }\n /^\\+/ { printf \"%d\\t%s\\n\", line++, $0; next }\n /^-/ { printf \" \\t%s\\n\", $0; next }\n /^ / { printf \"%d\\t%s\\n\", line++, $0; next }\n { print }\n ' > \"/tmp/pr-context/diffs/${filename}.diff\"\ndone\n\n# File orderings for sub-agent review (3 strategies)\njq -r '[.[] | .filename] | sort | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_az.txt\njq -r '[.[] | .filename] | sort | reverse | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_za.txt\njq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_largest.txt\n\n# Compute PR size metrics for review fan-out decisions\nFILE_COUNT=$(jq 'length' /tmp/pr-context/files.json)\nDIFF_LINES=$(wc -l < /tmp/pr-context/pr.diff | tr -d ' ')\necho \"${FILE_COUNT} files, ${DIFF_LINES} diff lines\" > /tmp/pr-context/pr-size.txt\necho \"PR size: ${FILE_COUNT} files, ${DIFF_LINES} diff lines\"\n\n# Existing reviews\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/reviews.json\n\n# Review threads with resolution status (GraphQL — REST lacks isResolved/isOutdated)\ngh api graphql --paginate -f query='\n query($owner: String!, $repo: String!, $number: Int!, $endCursor: String) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $number) {\n reviewThreads(first: 100, after: $endCursor) {\n pageInfo { hasNextPage endCursor }\n nodes {\n id\n isResolved\n isOutdated\n isCollapsed\n path\n line\n startLine\n comments(first: 100) {\n nodes {\n id\n databaseId\n body\n author { login }\n createdAt\n }\n }\n }\n }\n }\n }\n }\n' -F owner=\"${GITHUB_REPOSITORY%/*}\" -F repo=\"${GITHUB_REPOSITORY#*/}\" -F \"number=$PR_NUMBER\" \\\n --jq '.data.repository.pullRequest.reviewThreads.nodes' \\\n | jq -s 'add // []' > /tmp/pr-context/review_comments.json\n\n# Filtered review thread views (pre-computed so agents don't need to parse review_comments.json)\njq '[.[] | select(.isResolved == false)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/unresolved_threads.json\njq '[.[] | select(.isResolved == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/resolved_threads.json\njq '[.[] | select(.isOutdated == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/outdated_threads.json\n\n# Per-file review threads (mirrors diffs/ structure)\njq -c '.[]' /tmp/pr-context/review_comments.json | while IFS= read -r thread; do\n filepath=$(echo \"$thread\" | jq -r '.path // empty')\n [ -z \"$filepath\" ] && continue\n mkdir -p \"/tmp/pr-context/threads/$(dirname \"$filepath\")\"\n echo \"$thread\" >> \"/tmp/pr-context/threads/${filepath}.jsonl\"\ndone\n# Convert per-file JSONL to proper JSON arrays\nmkdir -p /tmp/pr-context/threads\nfind /tmp/pr-context/threads -name '*.jsonl' 2>/dev/null | while IFS= read -r jsonl; do\n jq -s '.' \"$jsonl\" > \"${jsonl%.jsonl}.json\"\n rm \"$jsonl\"\ndone\n\n# PR discussion comments\ngh api \"repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/comments.json\n\n# Linked issues\njq -r '.body // \"\"' /tmp/pr-context/pr.json 2>/dev/null \\\n | grep -oiE '(fixes|closes|resolves)\\s+#[0-9]+' \\\n | grep -oE '[0-9]+$' \\\n | sort -u \\\n | while read -r issue; do\n gh api \"repos/$GITHUB_REPOSITORY/issues/$issue\" > \"/tmp/pr-context/issue-${issue}.json\" || true\n done || true\n\n# Write manifest\ncat > /tmp/pr-context/README.md << 'MANIFEST'\n# PR Context\n\nPre-fetched PR data. All files are in `/tmp/pr-context/`.\n\n| File | Description |\n| --- | --- |\n| `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL |\n| `pr.diff` | Full unified diff of all changes |\n| `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` |\n| `diffs/.diff` | Per-file diffs with line numbers — numbered lines (e.g. `405\\t+ code`) are commentable; lines without numbers are deleted (LEFT side only) |\n| `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line |\n| `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line |\n| `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line |\n| `reviews.json` | Prior review submissions — author, state (APPROVED/CHANGES_REQUESTED/COMMENTED), body |\n| `review_comments.json` | All review threads (GraphQL) — each thread has `id` (node ID for resolving), `isResolved`, `isOutdated`, `path`, `line`, and nested `comments` with `id`, `databaseId` (numeric REST ID for replies), body/author |\n| `unresolved_threads.json` | Unresolved review threads — subset of `review_comments.json` where `isResolved` is false |\n| `resolved_threads.json` | Resolved review threads — subset of `review_comments.json` where `isResolved` is true |\n| `outdated_threads.json` | Outdated review threads — subset of `review_comments.json` where `isOutdated` is true (code changed since comment) |\n| `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` |\n| `comments.json` | PR discussion comments (not inline) |\n| `issue-{N}.json` | Linked issue details (one file per linked issue, if any) |\n| `pr-size.txt` | PR size metrics: `{N} files, {M} diff lines` — used by the agent to decide review fan-out |\n| `agents.md` | Not used in PR context — repository conventions are at `/tmp/agents.md` (pre-fetched) |\n| `review-instructions.md` | Review instructions, criteria, and calibration examples for sub-agents |\n| `agent-review.md` | Main agent instructions — review approach resolved from PR size (written when `ready_to_code_review` is called) |\n| `parent-review.md` | Comment format and inline severity threshold for the parent agent (written when `ready_to_code_review` is called) |\n| `subagent-*.md` | Sub-agent instructions (only written for medium/large PRs when `ready_to_code_review` is called — check `agent-review.md` for which exist) |\nMANIFEST\n\necho \"PR context written to /tmp/pr-context/\"\nls -la /tmp/pr-context/" - env: GITHUB_TOKEN: ${{ github.token }} REPO_NAME: ${{ github.repository }} diff --git a/.github/workflows/gh-aw-pr-review.lock.yml b/.github/workflows/gh-aw-pr-review.lock.yml index bf850a8f..24364905 100644 --- a/.github/workflows/gh-aw-pr-review.lock.yml +++ b/.github/workflows/gh-aw-pr-review.lock.yml @@ -41,7 +41,7 @@ # # inlined-imports: true # -# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"fd7dc0e8bc82541146175ab8479c15fbb4acc7a5c0af5c5ec011a6a8b3e2f30f"} +# gh-aw-metadata: {"schema_version":"v1","frontmatter_hash":"7d7cbe037102d003f74f2d84bf72e75b56d16df6fd88178d28df62a7ca826910"} name: "PR Review" "on": @@ -316,9 +316,9 @@ jobs: ## create-pull-request-review-comment - **Required fields**: `path` (file path), `line` (line number), and `body` (comment text). - - **Line**: Must be within the diff — an added or context line in the patch. Must be the **exact line number from reading the file** (not estimated from the patch). Lines outside the diff will fail. + - **Line**: Must be a numbered line in the per-file diff (`/tmp/pr-context/diffs/.diff`). Only lines with a number prefix are commentable. Lines outside the diff will fail with a GitHub API error. - **Body**: Sanitized with the standard pipeline (mentions neutralized, HTML filtered, URLs restricted). GitHub API limit is ~65,536 characters. - - **Side**: Defaults to `RIGHT` (the new code). Use `LEFT` only when commenting on deleted lines. + - **Side**: Always `RIGHT` (the new code). Deleted lines have no number prefix in the diff and cannot be targeted for inline comments — include findings about deleted code in the review body instead. - **Suggestion blocks**: Use ` ```suggestion ` fences for concrete code fixes. The suggestion must actually change the code — don't suggest identical code. Only include a `suggestion` block when you can provide a concrete code fix that **actually changes** the code. Only flag issues you are confident are real problems — false positives erode trust. Once you have flagged an issue, you cannot unflag it. @@ -407,6 +407,8 @@ jobs: Only leave a comment if the finding survives all four checks. Findings flagged independently by multiple sub-agents are stronger candidates. Findings from only one sub-agent deserve extra scrutiny. + Before posting each comment, **verify the target line is in the numbered diff**: check `/tmp/pr-context/diffs/.diff` — only lines with a number prefix are commentable. If the line has no number in the diff, do NOT attempt an inline comment — include the finding in the review body instead. + Leave inline comments (`create_pull_request_review_comment`) per the **Code Review Reference** above for each finding that survives verification. Comment on each file's findings before moving to the next file. If no findings survive verification, proceed directly to Step 4. ### Step 4: Submit the Review @@ -609,9 +611,9 @@ jobs: GH_TOKEN: ${{ github.token }} PR_NUMBER: ${{ github.event.pull_request.number || inputs.target-pr-number || github.event.issue.number }} name: Fetch PR context to disk - run: "set -euo pipefail\nmkdir -p /tmp/pr-context\n\n# PR metadata\ngh pr view \"$PR_NUMBER\" --json title,body,author,baseRefName,headRefName,headRefOid,url \\\n > /tmp/pr-context/pr.json\n\n# Full diff\nif ! gh pr diff \"$PR_NUMBER\" > /tmp/pr-context/pr.diff; then\n echo \"::warning::Failed to fetch full PR diff; per-file diffs from files.json are still available.\"\n : > /tmp/pr-context/pr.diff\nfi\n\n# Changed files list (--paginate may output concatenated arrays; jq -s 'add' merges them)\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/files.json\n\n# Per-file diffs\njq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do\n filename=$(echo \"$entry\" | jq -r '.filename')\n mkdir -p \"/tmp/pr-context/diffs/$(dirname \"$filename\")\"\n echo \"$entry\" | jq -r '.patch // empty' > \"/tmp/pr-context/diffs/${filename}.diff\"\ndone\n\n# File orderings for sub-agent review (3 strategies)\njq -r '[.[] | .filename] | sort | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_az.txt\njq -r '[.[] | .filename] | sort | reverse | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_za.txt\njq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_largest.txt\n\n# Compute PR size metrics for review fan-out decisions\nFILE_COUNT=$(jq 'length' /tmp/pr-context/files.json)\nDIFF_LINES=$(wc -l < /tmp/pr-context/pr.diff | tr -d ' ')\necho \"${FILE_COUNT} files, ${DIFF_LINES} diff lines\" > /tmp/pr-context/pr-size.txt\necho \"PR size: ${FILE_COUNT} files, ${DIFF_LINES} diff lines\"\n\n# Existing reviews\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/reviews.json\n\n# Review threads with resolution status (GraphQL — REST lacks isResolved/isOutdated)\ngh api graphql --paginate -f query='\n query($owner: String!, $repo: String!, $number: Int!, $endCursor: String) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $number) {\n reviewThreads(first: 100, after: $endCursor) {\n pageInfo { hasNextPage endCursor }\n nodes {\n id\n isResolved\n isOutdated\n isCollapsed\n path\n line\n startLine\n comments(first: 100) {\n nodes {\n id\n databaseId\n body\n author { login }\n createdAt\n }\n }\n }\n }\n }\n }\n }\n' -F owner=\"${GITHUB_REPOSITORY%/*}\" -F repo=\"${GITHUB_REPOSITORY#*/}\" -F \"number=$PR_NUMBER\" \\\n --jq '.data.repository.pullRequest.reviewThreads.nodes' \\\n | jq -s 'add // []' > /tmp/pr-context/review_comments.json\n\n# Filtered review thread views (pre-computed so agents don't need to parse review_comments.json)\njq '[.[] | select(.isResolved == false)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/unresolved_threads.json\njq '[.[] | select(.isResolved == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/resolved_threads.json\njq '[.[] | select(.isOutdated == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/outdated_threads.json\n\n# Per-file review threads (mirrors diffs/ structure)\njq -c '.[]' /tmp/pr-context/review_comments.json | while IFS= read -r thread; do\n filepath=$(echo \"$thread\" | jq -r '.path // empty')\n [ -z \"$filepath\" ] && continue\n mkdir -p \"/tmp/pr-context/threads/$(dirname \"$filepath\")\"\n echo \"$thread\" >> \"/tmp/pr-context/threads/${filepath}.jsonl\"\ndone\n# Convert per-file JSONL to proper JSON arrays\nmkdir -p /tmp/pr-context/threads\nfind /tmp/pr-context/threads -name '*.jsonl' 2>/dev/null | while IFS= read -r jsonl; do\n jq -s '.' \"$jsonl\" > \"${jsonl%.jsonl}.json\"\n rm \"$jsonl\"\ndone\n\n# PR discussion comments\ngh api \"repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/comments.json\n\n# Linked issues\njq -r '.body // \"\"' /tmp/pr-context/pr.json 2>/dev/null \\\n | grep -oiE '(fixes|closes|resolves)\\s+#[0-9]+' \\\n | grep -oE '[0-9]+$' \\\n | sort -u \\\n | while read -r issue; do\n gh api \"repos/$GITHUB_REPOSITORY/issues/$issue\" > \"/tmp/pr-context/issue-${issue}.json\" || true\n done || true\n\n# Write manifest\ncat > /tmp/pr-context/README.md << 'MANIFEST'\n# PR Context\n\nPre-fetched PR data. All files are in `/tmp/pr-context/`.\n\n| File | Description |\n| --- | --- |\n| `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL |\n| `pr.diff` | Full unified diff of all changes |\n| `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` |\n| `diffs/.diff` | Per-file diffs — one file per changed file, mirroring the repo path under `diffs/` |\n| `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line |\n| `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line |\n| `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line |\n| `reviews.json` | Prior review submissions — author, state (APPROVED/CHANGES_REQUESTED/COMMENTED), body |\n| `review_comments.json` | All review threads (GraphQL) — each thread has `id` (node ID for resolving), `isResolved`, `isOutdated`, `path`, `line`, and nested `comments` with `id`, `databaseId` (numeric REST ID for replies), body/author |\n| `unresolved_threads.json` | Unresolved review threads — subset of `review_comments.json` where `isResolved` is false |\n| `resolved_threads.json` | Resolved review threads — subset of `review_comments.json` where `isResolved` is true |\n| `outdated_threads.json` | Outdated review threads — subset of `review_comments.json` where `isOutdated` is true (code changed since comment) |\n| `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` |\n| `comments.json` | PR discussion comments (not inline) |\n| `issue-{N}.json` | Linked issue details (one file per linked issue, if any) |\n| `pr-size.txt` | PR size metrics: `{N} files, {M} diff lines` — used by the agent to decide review fan-out |\n| `agents.md` | Not used in PR context — repository conventions are at `/tmp/agents.md` (pre-fetched) |\n| `review-instructions.md` | Review instructions, criteria, and calibration examples for sub-agents |\n| `agent-review.md` | Main agent instructions — review approach resolved from PR size (written when `ready_to_code_review` is called) |\n| `parent-review.md` | Comment format and inline severity threshold for the parent agent (written when `ready_to_code_review` is called) |\n| `subagent-az.md` | Sub-agent instructions: review files A → Z (written when `ready_to_code_review` is called) |\n| `subagent-za.md` | Sub-agent instructions: review files Z → A (written when `ready_to_code_review` is called) |\n| `subagent-largest.md` | Sub-agent instructions: review files largest diff first (written when `ready_to_code_review` is called) |\nMANIFEST\n\necho \"PR context written to /tmp/pr-context/\"\nls -la /tmp/pr-context/" + run: "set -euo pipefail\nmkdir -p /tmp/pr-context\n\n# PR metadata\ngh pr view \"$PR_NUMBER\" --json title,body,author,baseRefName,headRefName,headRefOid,url \\\n > /tmp/pr-context/pr.json\n\n# Full diff\nif ! gh pr diff \"$PR_NUMBER\" > /tmp/pr-context/pr.diff; then\n echo \"::warning::Failed to fetch full PR diff; per-file diffs from files.json are still available.\"\n : > /tmp/pr-context/pr.diff\nfi\n\n# Changed files list (--paginate may output concatenated arrays; jq -s 'add' merges them)\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/files\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/files.json\n\n# Per-file diffs with line numbers\n# Lines with a number prefix are commentable (RIGHT side: added or context).\n# Deleted lines get no number (LEFT side only).\njq -c '.[]' /tmp/pr-context/files.json | while IFS= read -r entry; do\n filename=$(echo \"$entry\" | jq -r '.filename')\n mkdir -p \"/tmp/pr-context/diffs/$(dirname \"$filename\")\"\n echo \"$entry\" | jq -r '.patch // empty' | awk '\n /^@@/ { s=$3; gsub(/\\+/,\"\",s); split(s,a,\",\"); line=a[1]+0; print; next }\n /^\\+/ { printf \"%d\\t%s\\n\", line++, $0; next }\n /^-/ { printf \" \\t%s\\n\", $0; next }\n /^ / { printf \"%d\\t%s\\n\", line++, $0; next }\n { print }\n ' > \"/tmp/pr-context/diffs/${filename}.diff\"\ndone\n\n# File orderings for sub-agent review (3 strategies)\njq -r '[.[] | .filename] | sort | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_az.txt\njq -r '[.[] | .filename] | sort | reverse | .[]' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_za.txt\njq -r '[.[] | {filename, size: ((.additions // 0) + (.deletions // 0))}] | sort_by(-.size) | .[].filename' /tmp/pr-context/files.json \\\n > /tmp/pr-context/file_order_largest.txt\n\n# Compute PR size metrics for review fan-out decisions\nFILE_COUNT=$(jq 'length' /tmp/pr-context/files.json)\nDIFF_LINES=$(wc -l < /tmp/pr-context/pr.diff | tr -d ' ')\necho \"${FILE_COUNT} files, ${DIFF_LINES} diff lines\" > /tmp/pr-context/pr-size.txt\necho \"PR size: ${FILE_COUNT} files, ${DIFF_LINES} diff lines\"\n\n# Existing reviews\ngh api \"repos/$GITHUB_REPOSITORY/pulls/$PR_NUMBER/reviews\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/reviews.json\n\n# Review threads with resolution status (GraphQL — REST lacks isResolved/isOutdated)\ngh api graphql --paginate -f query='\n query($owner: String!, $repo: String!, $number: Int!, $endCursor: String) {\n repository(owner: $owner, name: $repo) {\n pullRequest(number: $number) {\n reviewThreads(first: 100, after: $endCursor) {\n pageInfo { hasNextPage endCursor }\n nodes {\n id\n isResolved\n isOutdated\n isCollapsed\n path\n line\n startLine\n comments(first: 100) {\n nodes {\n id\n databaseId\n body\n author { login }\n createdAt\n }\n }\n }\n }\n }\n }\n }\n' -F owner=\"${GITHUB_REPOSITORY%/*}\" -F repo=\"${GITHUB_REPOSITORY#*/}\" -F \"number=$PR_NUMBER\" \\\n --jq '.data.repository.pullRequest.reviewThreads.nodes' \\\n | jq -s 'add // []' > /tmp/pr-context/review_comments.json\n\n# Filtered review thread views (pre-computed so agents don't need to parse review_comments.json)\njq '[.[] | select(.isResolved == false)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/unresolved_threads.json\njq '[.[] | select(.isResolved == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/resolved_threads.json\njq '[.[] | select(.isOutdated == true)]' /tmp/pr-context/review_comments.json \\\n > /tmp/pr-context/outdated_threads.json\n\n# Per-file review threads (mirrors diffs/ structure)\njq -c '.[]' /tmp/pr-context/review_comments.json | while IFS= read -r thread; do\n filepath=$(echo \"$thread\" | jq -r '.path // empty')\n [ -z \"$filepath\" ] && continue\n mkdir -p \"/tmp/pr-context/threads/$(dirname \"$filepath\")\"\n echo \"$thread\" >> \"/tmp/pr-context/threads/${filepath}.jsonl\"\ndone\n# Convert per-file JSONL to proper JSON arrays\nmkdir -p /tmp/pr-context/threads\nfind /tmp/pr-context/threads -name '*.jsonl' 2>/dev/null | while IFS= read -r jsonl; do\n jq -s '.' \"$jsonl\" > \"${jsonl%.jsonl}.json\"\n rm \"$jsonl\"\ndone\n\n# PR discussion comments\ngh api \"repos/$GITHUB_REPOSITORY/issues/$PR_NUMBER/comments\" --paginate \\\n | jq -s 'add // []' > /tmp/pr-context/comments.json\n\n# Linked issues\njq -r '.body // \"\"' /tmp/pr-context/pr.json 2>/dev/null \\\n | grep -oiE '(fixes|closes|resolves)\\s+#[0-9]+' \\\n | grep -oE '[0-9]+$' \\\n | sort -u \\\n | while read -r issue; do\n gh api \"repos/$GITHUB_REPOSITORY/issues/$issue\" > \"/tmp/pr-context/issue-${issue}.json\" || true\n done || true\n\n# Write manifest\ncat > /tmp/pr-context/README.md << 'MANIFEST'\n# PR Context\n\nPre-fetched PR data. All files are in `/tmp/pr-context/`.\n\n| File | Description |\n| --- | --- |\n| `pr.json` | PR metadata — title, body, author, base/head branches, head commit SHA (`headRefOid`), URL |\n| `pr.diff` | Full unified diff of all changes |\n| `files.json` | Changed files array — each entry has `filename`, `status`, `additions`, `deletions`, `patch` |\n| `diffs/.diff` | Per-file diffs with line numbers — numbered lines (e.g. `405\\t+ code`) are commentable; lines without numbers are deleted (LEFT side only) |\n| `file_order_az.txt` | Changed files sorted alphabetically (A→Z), one filename per line |\n| `file_order_za.txt` | Changed files sorted reverse-alphabetically (Z→A), one filename per line |\n| `file_order_largest.txt` | Changed files sorted by diff size descending (largest first), one filename per line |\n| `reviews.json` | Prior review submissions — author, state (APPROVED/CHANGES_REQUESTED/COMMENTED), body |\n| `review_comments.json` | All review threads (GraphQL) — each thread has `id` (node ID for resolving), `isResolved`, `isOutdated`, `path`, `line`, and nested `comments` with `id`, `databaseId` (numeric REST ID for replies), body/author |\n| `unresolved_threads.json` | Unresolved review threads — subset of `review_comments.json` where `isResolved` is false |\n| `resolved_threads.json` | Resolved review threads — subset of `review_comments.json` where `isResolved` is true |\n| `outdated_threads.json` | Outdated review threads — subset of `review_comments.json` where `isOutdated` is true (code changed since comment) |\n| `threads/.json` | Per-file review threads — one file per changed file with existing threads, mirroring the repo path under `threads/` |\n| `comments.json` | PR discussion comments (not inline) |\n| `issue-{N}.json` | Linked issue details (one file per linked issue, if any) |\n| `pr-size.txt` | PR size metrics: `{N} files, {M} diff lines` — used by the agent to decide review fan-out |\n| `agents.md` | Not used in PR context — repository conventions are at `/tmp/agents.md` (pre-fetched) |\n| `review-instructions.md` | Review instructions, criteria, and calibration examples for sub-agents |\n| `agent-review.md` | Main agent instructions — review approach resolved from PR size (written when `ready_to_code_review` is called) |\n| `parent-review.md` | Comment format and inline severity threshold for the parent agent (written when `ready_to_code_review` is called) |\n| `subagent-*.md` | Sub-agent instructions (only written for medium/large PRs when `ready_to_code_review` is called — check `agent-review.md` for which exist) |\nMANIFEST\n\necho \"PR context written to /tmp/pr-context/\"\nls -la /tmp/pr-context/" - name: Write review instructions to disk - run: "mkdir -p /tmp/pr-context\ncat > /tmp/pr-context/review-instructions.md << 'REVIEW_EOF'\n# Review Instructions for Sub-agents\n\nYou are a code review sub-agent. Read these instructions, then review the PR files in the order provided in your prompt.\n\n## Context\n\nBefore reviewing files, read these to understand the PR:\n\n1. `/tmp/pr-context/pr.json` — PR title, description, author, and branches. Understand what the PR is trying to accomplish.\n2. `/tmp/agents.md` — Repository coding conventions and guidelines (skip if missing).\n3. `/tmp/pr-context/review_comments.json` — Existing review threads. Note which files already have threads so you don't duplicate.\n4. `/tmp/pr-context/issue-*.json` — Linked issue details (if any). Understand the motivation and acceptance criteria.\n\n## Process\n\nReview the PR diff file by file in your assigned order. For each changed file:\n\n1. **Read the diff** for this file from `/tmp/pr-context/diffs/.diff` to understand what changed. If the diff is empty or truncated (e.g., binary files or very large changes), fall back to reading the full file from the workspace and comparing against context.\n2. **Read the full file from the workspace.** The PR branch is checked out locally — open the file directly to get complete contents with line numbers.\n3. **Check existing threads** for this file from `/tmp/pr-context/threads/.json` (if it exists). Skip issues that are already under discussion — each thread has `isResolved` and `isOutdated` fields.\n4. **Identify potential issues** matching the review criteria below.\n5. **Quick-check each issue** before including it:\n - What specific code pattern or change triggers this concern?\n - Is there an obvious guard, handler, or mitigation visible in the immediate context?\n - Can you describe a concrete failure scenario (the `evidence` field)? If you cannot articulate what specific input or state triggers the problem, drop the finding.\n - If the issue is clearly handled, skip it. If you're unsure, include it — the parent will verify.\n6. **Add to your findings list.** Do NOT leave inline comments — you don't have that tool. Return findings in this format:\n\n```\n- file: path/to/file\n line: 42\n severity: HIGH\n title: Brief title\n description: What the issue is and why it matters\n evidence: The specific code pattern and failure scenario\n suggestion: corrected code here (optional — only if you can provide a concrete fix)\n```\n\n**Review every file in your assigned order.** Files reviewed earlier get more attention, which is why different sub-agents use different orderings.\n\n**Check existing threads** — per-file threads are at `/tmp/pr-context/threads/.json` (step 3 above). The full list is at `/tmp/pr-context/review_comments.json`. Do not flag issues that are already under discussion (resolved or unresolved). For outdated threads, only re-flag if the issue still applies to the current diff.\n\n**Return your full findings list** when done, or an empty list if no issues were found.\n\n## Review Criteria\n\nFocus on these categories in priority order:\n\n1. Security vulnerabilities (injection, XSS, auth bypass, secrets exposure)\n2. Logic bugs that could cause runtime failures or incorrect behavior\n3. Data integrity issues (race conditions, missing transactions, corruption risk)\n4. Performance bottlenecks (N+1 queries, memory leaks, blocking operations)\n5. Error handling gaps (unhandled exceptions, missing validation)\n6. Breaking changes to public APIs without migration path\n7. Missing or incorrect test coverage for critical paths\n\n## What NOT to Flag\n\nOnly review the diff — do not flag issues in unchanged code, pre-existing problems not introduced by this PR, or style preferences handled by linters or formatters.\n\n**Common false positives** — these patterns look like issues but usually aren't. Before flagging anything in these categories, confirm the problem is real by reading the surrounding code:\n\n- **Security — input already sanitized:** Don't flag injection or XSS risks when inputs are sanitized upstream, parameterized queries are used, or the framework auto-escapes output.\n- **Null/undefined — guarded elsewhere:** Don't flag potential null dereferences if the value is guaranteed by a type guard, assertion, schema validation, or upstream null check.\n- **Error handling — handled at a different layer:** Don't flag missing try/catch if the caller, middleware, or framework catches and handles the error (e.g., Express error middleware, React error boundaries).\n- **Performance — theoretical, not practical:** Don't flag algorithmic complexity (e.g., O(n^2)) unless N is demonstrably large enough to matter in the actual usage context. \"This could be slow\" without evidence is not actionable.\n- **Validation — exists at another layer:** Don't flag missing input validation if it's handled by an API gateway, middleware, schema validator, or type system.\n- **Test coverage — trivial or generated code:** Don't flag missing tests for trivial getters/setters, auto-generated code, or simple delegation methods.\n- **Style or naming — not in coding guidelines:** Don't flag naming conventions or code style unless they violate the repository's documented coding guidelines (from `/tmp/agents.md` or CONTRIBUTING docs).\n\n**Existing review threads** — check BEFORE flagging any issue:\n\n- **Resolved with reviewer reply** (e.g. \"This is intentional\") — reviewer's decision is final. Do NOT re-flag.\n- **Resolved without reply** — author likely fixed it. Do NOT re-raise unless the fix introduced a new problem.\n- **Unresolved** — already flagged. Do NOT duplicate.\n- **Outdated** — only re-flag if the issue still applies to the current diff.\n\nWhen in doubt, do not duplicate. Redundant comments erode trust.\n\nFinding no issues is a valid and valuable outcome. An empty findings list is better than findings that waste the author's time or erode trust. Do not manufacture findings to justify your review — if the code is sound, return an empty list.\n\n## Severity Classification\n\nDetermine severity AFTER investigating the issue, not before. First identify the problem and trace through the code, then assign a severity based on the evidence you found.\n\n- 🔴 CRITICAL — Must fix before merge (security vulnerabilities, data corruption, production-breaking bugs)\n- 🟠 HIGH — Should fix before merge (logic errors, missing validation, significant performance issues)\n- 🟡 MEDIUM — Address soon, non-blocking (error handling gaps, suboptimal patterns, missing edge cases)\n- ⚪ LOW — Author discretion (minor improvements, documentation gaps)\n- 💬 NITPICK — Truly optional (stylistic preferences, alternative approaches)\n\n## Review Intensity\n\nThe review intensity is `${{ inputs.intensity || 'balanced' }}`.\n\n- **conservative**: High evidence bar. Only flag when you can demonstrate a concrete failure scenario. If you can construct a reasonable counterargument, do not flag. Approval with zero findings is the expected outcome for most PRs.\n- **balanced**: Standard evidence bar. Flag when you can point to specific code that would fail. If the issue is ambiguous, lean toward not flagging.\n- **aggressive**: Lower evidence bar. Flag when evidence exists even if the failure scenario is not fully confirmed. Improvement suggestions welcome but must cite specific code.\n\n## Calibration Examples\n\nUse these examples to calibrate your judgment. Each pair shows a real issue and a similar-looking pattern that is NOT an issue.\n\n### Example 1: Null/Undefined Access\n\n**True positive — flag this:**\n\n```js\n// PR adds this handler\napp.get('/user/:id', async (req, res) => {\n const user = await db.findUser(req.params.id);\n res.json({ name: user.name, email: user.email });\n});\n```\n\nWhy flag: `db.findUser()` can return `null` when no user matches the ID. Accessing `user.name` will throw a TypeError at runtime. No upstream guard exists — the route handler is the entry point.\n\n**False positive — do NOT flag this:**\n\n```ts\n// PR adds this line inside an existing function\nconst settings = user.getSettings();\n```\n\nWhy skip: Reading the full file reveals `user` is typed as `User` (not `User | null`), and the calling function only runs after `authenticateUser()` middleware which guarantees a valid user object. The null case is handled at a different layer.\n\n### Example 2: SQL Injection\n\n**True positive — flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE customer_id = '{customer_id}'\")\n```\n\nWhy flag: String interpolation in a SQL query with user-controlled input (`customer_id` comes from the request). No parameterization or sanitization anywhere in the call chain.\n\n**False positive — do NOT flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE status = '{OrderStatus.PENDING.value}'\")\n```\n\nWhy skip: The interpolated value is a hardcoded enum constant (`OrderStatus.PENDING`), not user input. There is no injection vector.\n\n### Example 3: Borderline — Do NOT Flag\n\n```go\n// PR adds this function\nfunc processItems(items []Item) []Result {\n results := make([]Result, 0)\n for _, item := range items {\n for _, tag := range item.Tags {\n results = append(results, process(item, tag))\n }\n }\n return results\n}\n```\n\nThis looks like an O(n*m) performance concern. But without evidence that `items` or `Tags` are large in practice, this is speculative. The function processes a bounded dataset (items from a single user request). Do not flag theoretical performance issues without evidence of real-world impact.\nREVIEW_EOF" + run: "mkdir -p /tmp/pr-context\ncat > /tmp/pr-context/review-instructions.md << 'REVIEW_EOF'\n# Review Instructions for Sub-agents\n\nYou are a code review sub-agent. Read these instructions, then review the PR files in the order provided in your prompt.\n\n## Context\n\nBefore reviewing files, read these to understand the PR:\n\n1. `/tmp/pr-context/pr.json` — PR title, description, author, and branches. Understand what the PR is trying to accomplish.\n2. `/tmp/agents.md` — Repository coding conventions and guidelines (skip if missing).\n3. `/tmp/pr-context/review_comments.json` — Existing review threads. Note which files already have threads so you don't duplicate.\n4. `/tmp/pr-context/issue-*.json` — Linked issue details (if any). Understand the motivation and acceptance criteria.\n\n## Process\n\nReview the PR diff file by file in your assigned order. For each changed file:\n\n1. **Read the diff** for this file from `/tmp/pr-context/diffs/.diff` to understand what changed. Each line is prefixed with its line number (e.g., `405\\t+ code`). Lines with numbers are commentable; lines without numbers are deleted lines (LEFT side only). If the diff is empty or truncated (e.g., binary files or very large changes), fall back to reading the full file from the workspace and comparing against context.\n2. **Read the full file from the workspace.** The PR branch is checked out locally — open the file directly to get complete contents with line numbers.\n3. **Check existing threads** for this file from `/tmp/pr-context/threads/.json` (if it exists). Skip issues that are already under discussion — each thread has `isResolved` and `isOutdated` fields.\n4. **Identify potential issues** matching the review criteria below.\n5. **Quick-check each issue** before including it:\n - What specific code pattern or change triggers this concern?\n - Is there an obvious guard, handler, or mitigation visible in the immediate context?\n - Can you describe a concrete failure scenario (the `evidence` field)? If you cannot articulate what specific input or state triggers the problem, drop the finding.\n - If the issue is clearly handled, skip it. If you're unsure, include it — the parent will verify.\n6. **Add to your findings list.** Do NOT leave inline comments — you don't have that tool. Return findings in this format:\n\n```\n- file: path/to/file\n line: 42\n severity: HIGH\n title: Brief title\n description: What the issue is and why it matters\n evidence: The specific code pattern and failure scenario\n suggestion: corrected code here (optional — only if you can provide a concrete fix)\n```\n\n**Review every file in your assigned order.** Files reviewed earlier get more attention, which is why different sub-agents use different orderings.\n\n**Check existing threads** — per-file threads are at `/tmp/pr-context/threads/.json` (step 3 above). The full list is at `/tmp/pr-context/review_comments.json`. Do not flag issues that are already under discussion (resolved or unresolved). For outdated threads, only re-flag if the issue still applies to the current diff.\n\n**Return your full findings list** when done, or an empty list if no issues were found.\n\n## Review Criteria\n\nFocus on these categories in priority order:\n\n1. Security vulnerabilities (injection, XSS, auth bypass, secrets exposure)\n2. Logic bugs that could cause runtime failures or incorrect behavior\n3. Data integrity issues (race conditions, missing transactions, corruption risk)\n4. Performance bottlenecks (N+1 queries, memory leaks, blocking operations)\n5. Error handling gaps (unhandled exceptions, missing validation)\n6. Breaking changes to public APIs without migration path\n7. Missing or incorrect test coverage for critical paths\n\n## What NOT to Flag\n\nOnly review the diff — do not flag issues in unchanged code, pre-existing problems not introduced by this PR, or style preferences handled by linters or formatters.\n\n**Common false positives** — these patterns look like issues but usually aren't. Before flagging anything in these categories, confirm the problem is real by reading the surrounding code:\n\n- **Security — input already sanitized:** Don't flag injection or XSS risks when inputs are sanitized upstream, parameterized queries are used, or the framework auto-escapes output.\n- **Null/undefined — guarded elsewhere:** Don't flag potential null dereferences if the value is guaranteed by a type guard, assertion, schema validation, or upstream null check.\n- **Error handling — handled at a different layer:** Don't flag missing try/catch if the caller, middleware, or framework catches and handles the error (e.g., Express error middleware, React error boundaries).\n- **Performance — theoretical, not practical:** Don't flag algorithmic complexity (e.g., O(n^2)) unless N is demonstrably large enough to matter in the actual usage context. \"This could be slow\" without evidence is not actionable.\n- **Validation — exists at another layer:** Don't flag missing input validation if it's handled by an API gateway, middleware, schema validator, or type system.\n- **Test coverage — trivial or generated code:** Don't flag missing tests for trivial getters/setters, auto-generated code, or simple delegation methods.\n- **Style or naming — not in coding guidelines:** Don't flag naming conventions or code style unless they violate the repository's documented coding guidelines (from `/tmp/agents.md` or CONTRIBUTING docs).\n\n**Existing review threads** — check BEFORE flagging any issue:\n\n- **Resolved with reviewer reply** (e.g. \"This is intentional\") — reviewer's decision is final. Do NOT re-flag.\n- **Resolved without reply** — author likely fixed it. Do NOT re-raise unless the fix introduced a new problem.\n- **Unresolved** — already flagged. Do NOT duplicate.\n- **Outdated** — only re-flag if the issue still applies to the current diff.\n\nWhen in doubt, do not duplicate. Redundant comments erode trust.\n\nFinding no issues is a valid and valuable outcome. An empty findings list is better than findings that waste the author's time or erode trust. Do not manufacture findings to justify your review — if the code is sound, return an empty list.\n\n## Severity Classification\n\nDetermine severity AFTER investigating the issue, not before. First identify the problem and trace through the code, then assign a severity based on the evidence you found.\n\n- 🔴 CRITICAL — Must fix before merge (security vulnerabilities, data corruption, production-breaking bugs)\n- 🟠 HIGH — Should fix before merge (logic errors, missing validation, significant performance issues)\n- 🟡 MEDIUM — Address soon, non-blocking (error handling gaps, suboptimal patterns, missing edge cases)\n- ⚪ LOW — Author discretion (minor improvements, documentation gaps)\n- 💬 NITPICK — Truly optional (stylistic preferences, alternative approaches)\n\n## Review Intensity\n\nThe review intensity is `${{ inputs.intensity || 'balanced' }}`.\n\n- **conservative**: High evidence bar. Only flag when you can demonstrate a concrete failure scenario. If you can construct a reasonable counterargument, do not flag. Approval with zero findings is the expected outcome for most PRs.\n- **balanced**: Standard evidence bar. Flag when you can point to specific code that would fail. If the issue is ambiguous, lean toward not flagging.\n- **aggressive**: Lower evidence bar. Flag when evidence exists even if the failure scenario is not fully confirmed. Improvement suggestions welcome but must cite specific code.\n\n## Calibration Examples\n\nUse these examples to calibrate your judgment. Each pair shows a real issue and a similar-looking pattern that is NOT an issue.\n\n### Example 1: Null/Undefined Access\n\n**True positive — flag this:**\n\n```js\n// PR adds this handler\napp.get('/user/:id', async (req, res) => {\n const user = await db.findUser(req.params.id);\n res.json({ name: user.name, email: user.email });\n});\n```\n\nWhy flag: `db.findUser()` can return `null` when no user matches the ID. Accessing `user.name` will throw a TypeError at runtime. No upstream guard exists — the route handler is the entry point.\n\n**False positive — do NOT flag this:**\n\n```ts\n// PR adds this line inside an existing function\nconst settings = user.getSettings();\n```\n\nWhy skip: Reading the full file reveals `user` is typed as `User` (not `User | null`), and the calling function only runs after `authenticateUser()` middleware which guarantees a valid user object. The null case is handled at a different layer.\n\n### Example 2: SQL Injection\n\n**True positive — flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE customer_id = '{customer_id}'\")\n```\n\nWhy flag: String interpolation in a SQL query with user-controlled input (`customer_id` comes from the request). No parameterization or sanitization anywhere in the call chain.\n\n**False positive — do NOT flag this:**\n\n```python\n# PR adds this query\ncursor.execute(f\"SELECT * FROM orders WHERE status = '{OrderStatus.PENDING.value}'\")\n```\n\nWhy skip: The interpolated value is a hardcoded enum constant (`OrderStatus.PENDING`), not user input. There is no injection vector.\n\n### Example 3: Borderline — Do NOT Flag\n\n```go\n// PR adds this function\nfunc processItems(items []Item) []Result {\n results := make([]Result, 0)\n for _, item := range items {\n for _, tag := range item.Tags {\n results = append(results, process(item, tag))\n }\n }\n return results\n}\n```\n\nThis looks like an O(n*m) performance concern. But without evidence that `items` or `Tags` are large in practice, this is speculative. The function processes a bounded dataset (items from a single user request). Do not flag theoretical performance issues without evidence of real-world impact.\nREVIEW_EOF" - env: SETUP_COMMANDS: ${{ inputs.setup-commands }} if: ${{ inputs.setup-commands != '' }} @@ -1031,9 +1033,9 @@ jobs: pr_size = 'unknown size' diff_lines = 0 - # Write one instruction file per sub-agent file ordering strategy - for key, desc in [('az', 'A \u2192 Z (alphabetical)'), ('za', 'Z \u2192 A (reverse alphabetical)'), ('largest', 'largest diff first')]: - lines = [ + # Determine review approach based on PR size + def write_subagent(key, desc): + sa_lines = [ '# PR Review Sub-Agent', '', 'Review the PR as a code review sub-agent. Return findings only \u2014 do NOT leave inline comments.', @@ -1054,16 +1056,17 @@ jobs: '', f'Review files in this order: `/tmp/pr-context/file_order_{key}.txt` ({desc})', ] - with open(f'/tmp/pr-context/subagent-{key}.md', 'w') as f: - f.write('\n'.join(lines) + '\n') + with open(f'/tmp/pr-context/subagent-{key}.md', 'w') as sa_f: + sa_f.write('\n'.join(sa_lines) + '\n') - # Determine review approach based on PR size if diff_lines < 200: approach_lines = [ f'**Small PR ({pr_size}):** Review directly \u2014 no sub-agents. Review files in order from `/tmp/pr-context/file_order_az.txt`, reading each diff from `/tmp/pr-context/diffs/.diff` and the full file from the workspace.', ] size_key = 'small' elif diff_lines < 800: + write_subagent('az', 'A \u2192 Z (alphabetical)') + write_subagent('za', 'Z \u2192 A (reverse alphabetical)') approach_lines = [ f'**Medium PR ({pr_size}):** Use the **Pick Three, Keep Many** process \u2014 spawn 2 `code-review` sub-agents in parallel:', '', @@ -1074,6 +1077,9 @@ jobs: ] size_key = 'medium' else: + write_subagent('az', 'A \u2192 Z (alphabetical)') + write_subagent('za', 'Z \u2192 A (reverse alphabetical)') + write_subagent('largest', 'largest diff first') approach_lines = [ f'**Large PR ({pr_size}):** Use the **Pick Three, Keep Many** process \u2014 spawn 3 `code-review` sub-agents in parallel:', '', @@ -1098,8 +1104,7 @@ jobs: '## Comment Format', '', 'Call **`create_pull_request_review_comment`** with:', - '- The file path and the **exact line number from reading the file** (not estimated from the diff)', - '- The line must be within the diff (an added or context line in the patch)', + '- The file path and a **line number that appears in the numbered diff** (`/tmp/pr-context/diffs/.diff`). Only lines with a number prefix are commentable. If your target line has no number in the diff, include the finding in the review body instead.', '', t5, '**[SEVERITY] Brief title**', diff --git a/.github/workflows/gh-aw-pr-review.md b/.github/workflows/gh-aw-pr-review.md index 46836d40..8100fe92 100644 --- a/.github/workflows/gh-aw-pr-review.md +++ b/.github/workflows/gh-aw-pr-review.md @@ -137,6 +137,8 @@ If sub-agents were used, merge and deduplicate findings per the Pick Three, Keep Only leave a comment if the finding survives all four checks. Findings flagged independently by multiple sub-agents are stronger candidates. Findings from only one sub-agent deserve extra scrutiny. +Before posting each comment, **verify the target line is in the numbered diff**: check `/tmp/pr-context/diffs/.diff` — only lines with a number prefix are commentable. If the line has no number in the diff, do NOT attempt an inline comment — include the finding in the review body instead. + Leave inline comments (`create_pull_request_review_comment`) per the **Code Review Reference** above for each finding that survives verification. Comment on each file's findings before moving to the next file. If no findings survive verification, proceed directly to Step 4. ### Step 4: Submit the Review