Skip to content

Parallelize assessment phase: one agent per finding#3

Merged
0101 merged 2 commits intomainfrom
expert-reviewer
Mar 26, 2026
Merged

Parallelize assessment phase: one agent per finding#3
0101 merged 2 commits intomainfrom
expert-reviewer

Conversation

@0101
Copy link
Copy Markdown
Owner

@0101 0101 commented Mar 26, 2026

Problem

Assessment ran as a single agent validating all findings (up to 30) plus the full diff in one context. On complex PRs (like dotnet/msbuild#13350 — 36 files, 2500+ lines) this would time out, blow context limits, or produce shallow analysis.

This is the same architectural insight behind the MSBuild expert-reviewer's separate Find/Validate waves: discovery needs breadth (full diff), but validation needs depth (one claim at a time). Separate contexts let each agent spend its full budget on proving or disproving a single finding.

Solution

Split assessment into per-finding parallel agents, matching the existing dispatch pattern used by rule agents and concern agents.

Python (mechanical text ops)

  • \prepare-assessment: splits \consolidated.md\ on ### C-XX\ section headers → individual finding files + \�ssessment-dispatch.json\
  • \merge-assessments: concatenates individual \A-XX.md\ assessment files → \�ssessed.md\ with verdict counts

Orchestrator (SKILL.md)

Phase 3 now runs:

  1. \python prepare-assessment --repo .\ → dispatch JSON
  2. Dispatch N parallel assessor agents (batches of 12)
  3. \python merge-assessments --repo .\ → \�ssessed.md\

Assessor agent

  • Rewritten for single-finding input (\ inding_path, \diff_path, \output_path)
  • Same 3-check validation procedure, same output format per finding
  • Full context budget available for deep code tracing on each finding

Downstream unchanged

\�ssessed.md\ format is identical — rebuttal and reporter work as before.

Testing

  • 301 pytest tests pass (0 failures)
  • Manual integration test of prepare-assessment → merge-assessments pipeline
  • Tested on MSBuild PR #13350 with default rules (earlier in investigation session)

Assessment was a bottleneck — one agent validating all findings (up to 30)
plus the full diff in a single context. On complex PRs this would time out,
blow context limits, or produce shallow analysis.

Split into per-finding parallel agents. The orchestrator (SKILL.md) reads
consolidated.md, extracts each finding section, and dispatches one assessor
per finding with the finding text inline. After all complete, the
orchestrator reads individual assessment files and assembles assessed.md.

No Python added — assessment dispatch is pure LLM orchestration (reading
and dispatching), matching how the expert-reviewer system works. Python
stays limited to deterministic tasks (diffs, file discovery, chunking).

Assessor agent rewritten for single-finding input: receives finding_text
inline, diff_path, rules_dir, output_path. Same 3-check validation
procedure, same output format. Full context budget for deep code tracing.

Downstream phases (rebuttal, reporter) unchanged — assessed.md format
is identical.
@0101 0101 force-pushed the expert-reviewer branch from ed7d7aa to 732eacc Compare March 26, 2026 13:25
Concerns (bugs, security, architecture) each contained duplicate
framework: output format, evidence standards structure, NO FINDINGS
sentinel, constraints. Adding 'discovery phase awareness' would have
meant editing all three (and any future user-defined concerns).

New: concern-framework.md — generic wrapper loaded by Python and
applied around any concern body via {concern_body} placeholder.

Concern files now contain only domain content: Role, What to Check,
Evidence Requirements, Anti-patterns. Users can define custom concerns
with any structure — the framework handles output format, phase
awareness, and common constraints.

Also adds discovery-phase framing: 'You are a discovery agent —
a separate assessment phase will verify each finding. Optimize for
recall over precision.' This reduces context spent on exhaustive
proof during the find phase.
@0101 0101 merged commit f6884b1 into main Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant