Parallelize assessment phase: one agent per finding by 0101 · Pull Request #3 · 0101/focused-review

0101 · 2026-03-26T12:43:22Z

Problem

Assessment ran as a single agent validating all findings (up to 30) plus the full diff in one context. On complex PRs (like dotnet/msbuild#13350 — 36 files, 2500+ lines) this would time out, blow context limits, or produce shallow analysis.

This is the same architectural insight behind the MSBuild expert-reviewer's separate Find/Validate waves: discovery needs breadth (full diff), but validation needs depth (one claim at a time). Separate contexts let each agent spend its full budget on proving or disproving a single finding.

Solution

Split assessment into per-finding parallel agents, matching the existing dispatch pattern used by rule agents and concern agents.

Python (mechanical text ops)

\prepare-assessment: splits \consolidated.md\ on ### C-XX\ section headers → individual finding files + \�ssessment-dispatch.json\
\merge-assessments: concatenates individual \A-XX.md\ assessment files → \�ssessed.md\ with verdict counts

Orchestrator (SKILL.md)

Phase 3 now runs:

\python prepare-assessment --repo .\ → dispatch JSON
Dispatch N parallel assessor agents (batches of 12)
\python merge-assessments --repo .\ → \�ssessed.md\

Assessor agent

Rewritten for single-finding input (\inding_path, \diff_path, \output_path)
Same 3-check validation procedure, same output format per finding
Full context budget available for deep code tracing on each finding

Downstream unchanged

\�ssessed.md\ format is identical — rebuttal and reporter work as before.

Testing

301 pytest tests pass (0 failures)
Manual integration test of prepare-assessment → merge-assessments pipeline
Tested on MSBuild PR #13350 with default rules (earlier in investigation session)

Assessment was a bottleneck — one agent validating all findings (up to 30) plus the full diff in a single context. On complex PRs this would time out, blow context limits, or produce shallow analysis. Split into per-finding parallel agents. The orchestrator (SKILL.md) reads consolidated.md, extracts each finding section, and dispatches one assessor per finding with the finding text inline. After all complete, the orchestrator reads individual assessment files and assembles assessed.md. No Python added — assessment dispatch is pure LLM orchestration (reading and dispatching), matching how the expert-reviewer system works. Python stays limited to deterministic tasks (diffs, file discovery, chunking). Assessor agent rewritten for single-finding input: receives finding_text inline, diff_path, rules_dir, output_path. Same 3-check validation procedure, same output format. Full context budget for deep code tracing. Downstream phases (rebuttal, reporter) unchanged — assessed.md format is identical.

Concerns (bugs, security, architecture) each contained duplicate framework: output format, evidence standards structure, NO FINDINGS sentinel, constraints. Adding 'discovery phase awareness' would have meant editing all three (and any future user-defined concerns). New: concern-framework.md — generic wrapper loaded by Python and applied around any concern body via {concern_body} placeholder. Concern files now contain only domain content: Role, What to Check, Evidence Requirements, Anti-patterns. Users can define custom concerns with any structure — the framework handles output format, phase awareness, and common constraints. Also adds discovery-phase framing: 'You are a discovery agent — a separate assessment phase will verify each finding. Optimize for recall over precision.' This reduces context spent on exhaustive proof during the find phase.

0101 force-pushed the expert-reviewer branch from ed7d7aa to 732eacc Compare March 26, 2026 13:25

0101 merged commit f6884b1 into main Mar 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize assessment phase: one agent per finding#3

Parallelize assessment phase: one agent per finding#3
0101 merged 2 commits intomainfrom
expert-reviewer

0101 commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0101 commented Mar 26, 2026

Problem

Solution

Python (mechanical text ops)

Orchestrator (SKILL.md)

Assessor agent

Downstream unchanged

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant