Parallelize assessment phase: one agent per finding#3
Merged
Conversation
Assessment was a bottleneck — one agent validating all findings (up to 30) plus the full diff in a single context. On complex PRs this would time out, blow context limits, or produce shallow analysis. Split into per-finding parallel agents. The orchestrator (SKILL.md) reads consolidated.md, extracts each finding section, and dispatches one assessor per finding with the finding text inline. After all complete, the orchestrator reads individual assessment files and assembles assessed.md. No Python added — assessment dispatch is pure LLM orchestration (reading and dispatching), matching how the expert-reviewer system works. Python stays limited to deterministic tasks (diffs, file discovery, chunking). Assessor agent rewritten for single-finding input: receives finding_text inline, diff_path, rules_dir, output_path. Same 3-check validation procedure, same output format. Full context budget for deep code tracing. Downstream phases (rebuttal, reporter) unchanged — assessed.md format is identical.
Concerns (bugs, security, architecture) each contained duplicate
framework: output format, evidence standards structure, NO FINDINGS
sentinel, constraints. Adding 'discovery phase awareness' would have
meant editing all three (and any future user-defined concerns).
New: concern-framework.md — generic wrapper loaded by Python and
applied around any concern body via {concern_body} placeholder.
Concern files now contain only domain content: Role, What to Check,
Evidence Requirements, Anti-patterns. Users can define custom concerns
with any structure — the framework handles output format, phase
awareness, and common constraints.
Also adds discovery-phase framing: 'You are a discovery agent —
a separate assessment phase will verify each finding. Optimize for
recall over precision.' This reduces context spent on exhaustive
proof during the find phase.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Assessment ran as a single agent validating all findings (up to 30) plus the full diff in one context. On complex PRs (like dotnet/msbuild#13350 — 36 files, 2500+ lines) this would time out, blow context limits, or produce shallow analysis.
This is the same architectural insight behind the MSBuild expert-reviewer's separate Find/Validate waves: discovery needs breadth (full diff), but validation needs depth (one claim at a time). Separate contexts let each agent spend its full budget on proving or disproving a single finding.
Solution
Split assessment into per-finding parallel agents, matching the existing dispatch pattern used by rule agents and concern agents.
Python (mechanical text ops)
Orchestrator (SKILL.md)
Phase 3 now runs:
Assessor agent
Downstream unchanged
\�ssessed.md\ format is identical — rebuttal and reporter work as before.
Testing