refactor: eval quality fixes — dedup, types, docs, tests by soilSpoon · Pull Request #359 · soilSpoon/opengoose

soilSpoon · 2026-03-26T04:33:49Z

Summary

home_dir() 3중 중복 제거: opengoose-rig에 단일 pub fn home_dir() 통합. evolver에서 dirs 의존 제거.
RigId 타입 일관성: Board stamp/query API의 rig_id: &str → &RigId로 통일. AsRef<str> 구현 추가. completed_by_rig → completed_by 이름 통일.
ARCHITECTURE.md 동기화: 크레이트 수 수정 (4→5), sandbox 섹션 추가, 설계 결정 ADR 2건
Board→Worker 통합 테스트: 4개 시나리오 (lifecycle, blocked skip, retry→stuck, concurrent claim)
Doc-tests: board/rig/skills 핵심 공개 API에 /// # Examples 추가
Runtime graceful degradation: Worker 생성 실패 시 warn 로깅 후 계속 진행

Test plan

cargo check — 클린 빌드
cargo test — 모든 테스트 통과 (0 failures)
cargo clippy --all-targets — 경고 없음
cargo fmt — 포맷 적용

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Created opengoose-evolver crate for autonomous skill evolution workflows
- Added agent factory utilities (AgentConfig, create_agent) in opengoose-rig for flexible agent initialization
- Introduced RigId type with validation support (try_new())
- Optional worker initialization with graceful error handling
Improvements
- Unified home directory resolution across crates
- Enhanced type safety using typed RigId in board APIs
Documentation
- Added doctests and examples to public APIs
- Updated architecture documentation reflecting new crate topology
- Added design specifications for recent changes
Tests
- Added integration tests for worker lifecycle management

Covers all 7 improvement points from architecture evaluation: ARCHITECTURE.md sync, integration tests, doc-tests, runtime error handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace ADR-1 (don't split binary crate) with evolver extraction plan. Keep ADR on Board dual responsibility. Update crate count to 6. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

8 tasks: evolver extraction, ARCHITECTURE.md sync, integration tests, doc-tests, runtime graceful degradation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tory Extracts the standalone agent creation logic from the binary crate into opengoose-rig so it can be shared by both the binary and the upcoming opengoose-evolver crate without circular dependencies. runtime.rs re-exports the symbols to keep existing call sites working unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move the evolver module (2,490 LOC) into a standalone library crate. This decouples the skill evolution pipeline from the CLI binary, improving build parallelism and enabling independent testing. - Create crates/opengoose-evolver with lib.rs, loop_driver.rs, pipeline.rs, sweep.rs (all 55 tests migrated and passing) - Replace crate::skills::{evolve, load} references with direct opengoose_skills imports - Move read_conversation_log to the evolver crate (its only consumer) - Add local home_dir() and test_env_lock() to avoid binary crate deps - Gate now-test-only re-exports in skills/evolve.rs and skills/load.rs behind #[cfg(test)] to eliminate unused-import warnings - Update runtime.rs to use opengoose_evolver::run - Remove mod evolver from main.rs and delete old evolver directory Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add no_run doc-test to WorkMode trait demonstrating TaskMode.session_for(), and a runnable doc-test to SkillMetadata demonstrating JSON deserialization. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Consolidate three copies of home_dir() into opengoose-rig (single source of truth). Unify Board stamp/query APIs to use &RigId instead of raw &str, add AsRef<str> impl, and rename completed_by_rig to completed_by for naming consistency with claimed_by. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-03-26T04:34:03Z

📝 Walkthrough

Walkthrough

The PR extracts the evolver subsystem into a dedicated opengoose-evolver crate, refactors APIs to consistently use typed RigId instead of &str, moves agent factory utilities to opengoose-rig, makes the runtime worker optional for graceful degradation, and reorganizes skill-evolution module dependencies to use opengoose_skills directly.

Changes

Cohort / File(s)	Summary
Workspace Setup `Cargo.toml`, `crates/opengoose-evolver/Cargo.toml`, `crates/opengoose/Cargo.toml`	Added new `opengoose-evolver` workspace member and dependency, configured new crate manifest with dependencies on board, rig, skills, goose, async-trait, anyhow, tokio, tracing, futures.
RigId API Enhancement `crates/opengoose-board/src/work_item.rs`	Added `RigId::try_new()` validation constructor (rejects empty, >64 chars, `..`, `/`), implemented `AsRef<str>` trait, added doc comments for `RigId::new()` and `Status::precedence()`.
Board API Type Consistency `crates/opengoose-board/src/stamp_ops.rs`, `crates/opengoose-board/src/work_items/queries.rs`	Updated public methods to accept `&RigId` instead of `&str`: `stamps_for_rig`, `weighted_score`, `trust_level`, `stamps_with_scores`, `claimed_by`, `completed_by` (renamed from `completed_by_rig`). Updated SeaORM filters to use `rig_id.as_ref()`.
Board Documentation & Utilities `crates/opengoose-board/src/beads.rs`, `crates/opengoose-board/src/board.rs`, `crates/opengoose-board/src/work_items/mod.rs`	Added doc-test examples to `filter_ready`, `prime_summary`, and new `Board::in_memory()` constructor. Updated module documentation references.
Board Test Updates `crates/opengoose-board/src/rigs.rs`	Updated test calls to `trust_level` to construct `RigId` values instead of passing string literals.
Agent Factory Extraction `crates/opengoose-rig/src/agent_factory.rs`, `crates/opengoose-rig/src/lib.rs`, `crates/opengoose-rig/src/work_mode.rs`	New module `agent_factory` with `AgentConfig` struct and async `create_agent()` function. Added `home_dir()` helper and doc-tests. Exposed these in lib.rs.
Home Directory Refactoring `crates/opengoose-rig/src/conversation_log/io.rs`	Changed `opengoose_home_dir()` to use `crate::home_dir()` instead of `dirs::home_dir()` fallback.
RigId Integration Tests `crates/opengoose-rig/tests/worker_integration.rs`	Added integration tests covering Board→Worker claim/submit/retry/stuck transitions, blocked-item filtering, and concurrent claim prevention.
Skills Metadata Documentation `crates/opengoose-skills/src/metadata.rs`	Added doc-test example for `SkillMetadata` JSON parsing and field access.
Evolver Crate Implementation `crates/opengoose-evolver/src/lib.rs`, `crates/opengoose-evolver/src/loop_driver.rs`, `crates/opengoose-evolver/src/pipeline.rs`, `crates/opengoose-evolver/src/sweep.rs`	New evolver crate with module structure. Updated imports to use `opengoose_skills` modules directly (`evolution/{parser,prompts,validator,writer}`, `loader`, `metadata`). Added `read_conversation_log()` helper and `test_env_lock()`. Changed visibility from `pub(super)` to `pub(crate)` for `process_stamp` and `run_sweep`.
Runtime & Worker Optionality `crates/opengoose/src/runtime.rs`	Changed `worker: Arc<Worker>` to `worker: Option<Arc<Worker>>`. Moved `AgentConfig` and `create_agent` to re-exports from `opengoose_rig::agent_factory`. Made worker agent creation non-fatal (logs warning, continues with `None`). Updated evolver spawn to call `opengoose_evolver::run()`.
CLI & Web API RigId Updates `crates/opengoose/src/cli/commands.rs`, `crates/opengoose/src/cli/setup.rs`, `crates/opengoose/src/commands/board.rs`, `crates/opengoose/src/commands/rigs.rs`, `crates/opengoose/src/tui/event/rigs.rs`, `crates/opengoose/src/tui/log_entry.rs`, `crates/opengoose/src/web/api/rigs.rs`	Updated all call sites to construct and pass `RigId` values to board methods instead of strings. Made worker cancellation conditional on `Option`. Replaced local `home_dir()` with import from `opengoose_rig`. Updated TUI log path construction to use `crate::home_dir()`.
Skills Module Re-exports `crates/opengoose/src/skills/evolve.rs`, `crates/opengoose/src/skills/load.rs`	Removed local `read_conversation_log()`. Added/updated re-exports from `opengoose_skills` modules (`prompts`, `validator`, `writer`, `loader`, `metadata`). Made `LoadedSkill`, `SkillScope`, `load_skills` unconditionally exported.
Evolver Module Removal `crates/opengoose/src/main.rs`	Removed `mod evolver;` declaration (functionality moved to `opengoose-evolver` crate).
Architecture & Design Documentation `docs/v0.2/ARCHITECTURE.md`, `docs/superpowers/plans/2026-03-26-eval-quality-fixes.md`, `docs/superpowers/specs/2026-03-26-eval-quality-fixes-design.md`	Updated architecture docs to reflect new 6-crate topology (`opengoose-evolver`, `opengoose-sandbox` experimental). Added ADR-1 on Board ownership. Added design spec and implementation plan documenting evolver extraction, agent factory, RigId typing, worker optionality, and integration test strategy.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

FP quality sweep v4: assertions, newtypes, error hygiene #357 — Both PRs add RigId::try_new() validation and update board/evolver callsites to use typed RigId instead of &str strings.
refactor: FP quality sweep v3 — tests, decomposition, code quality #355 — Both PRs refactor evolver pipeline/sweep to wire directly to opengoose_skills modules and reorganize loader/writer dependencies.
feat: upgrade goose v1.28.0 and rmcp 1.2 #343 — Both PRs modify Goose session creation (GooseMode::Auto, session context) affecting agent initialization in the extracted agent factory.

Poem

🐰 A new crate hopped into view,
With RigId typed strong and true!
The evolver escaped its nest,
And workers now rest when stressed—
Graceful degradation shines anew! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'refactor: eval quality fixes — dedup, types, docs, tests' accurately summarizes the main changes: deduplication of home_dir(), RigId type consistency improvements, documentation updates, and test additions across the codebase.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch soilSpoon/eval-quality-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

devin-ai-integration

Devin Review found 2 potential issues.

View 4 additional findings in Devin Review.

devin-ai-integration · 2026-03-26T04:38:07Z

🔴 Stale tracing target filter: evolver logs no longer detected as structured after crate extraction

The evolver module was moved from crates/opengoose/src/evolver/ to crates/opengoose-evolver/src/. Tracing macros (e.g., info!, warn!) in the new crate will automatically use the target prefix opengoose_evolver::* (e.g., opengoose_evolver::loop_driver). However, is_structured_target() at crates/opengoose/src/tui/log_entry.rs:18 still checks target.starts_with("opengoose::evolver"), which no longer matches. This means evolver log messages in the TUI will not be marked as structured, breaking the log filtering behavior in the Logs tab. The test at crates/opengoose/src/tui/tui_layer.rs:188 also hardcodes the stale target "opengoose::evolver", masking the real issue.

(Refers to line 18)

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-03-26T04:38:09Z

+[package]
+name = "opengoose-evolver"


🔴 AGENTS.md not updated to reflect new crate — violates mandatory rule file

AGENTS.md line 9 states "3개 크레이트만 존재" (only 3 crates exist), listing only opengoose, opengoose-board, and opengoose-rig. This PR creates a new crate opengoose-evolver (and the workspace already had opengoose-skills and opengoose-sandbox), but AGENTS.md was not updated. The PR does update docs/v0.2/ARCHITECTURE.md to say "6개 크레이트" but leaves the mandatory AGENTS.md rule file inconsistent with the actual codebase.

Prompt for agents

Update AGENTS.md line 9 to reflect the actual crate structure. Change "3개 크레이트만 존재: opengoose (CLI), opengoose-board (데이터), opengoose-rig (에이전트)" to "6개 크레이트: opengoose (CLI), opengoose-board (데이터), opengoose-rig (에이전트), opengoose-skills (스킬), opengoose-evolver (스킬 진화), opengoose-sandbox (실험적)". Also update the dependency rule on line 16 to include the evolver and skills crates in the dependency chain description.

Was this helpful? React with 👍 or 👎 to provide feedback.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

crates/opengoose-board/src/work_item.rs (1)
42-55: ⚠️ Potential issue | 🟠 Major

Close the path-separator validation gap in RigId::try_new.

try_new blocks / but not \. If RigId is ever used as a path segment, this leaves a traversal/sanitization bypass on Windows-style separators.
Suggested hardening
         if s.contains('/') {
             return Err(format!("RigId cannot contain '/': {s}"));
         }
+        if s.contains('\\') {
+            return Err(format!("RigId cannot contain '\\\\': {s}"));
+        }
         Ok(Self(s))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/opengoose-board/src/work_item.rs` around lines 42 - 55, RigId::try_new
currently forbids '/' but not Windows backslash, allowing path-separator bypass;
update RigId::try_new to also reject '\' (backslash) characters (and optionally
canonical path-separators if your platform has others) by adding a check similar
to the existing '/' check that returns an Err with a clear message (e.g., "RigId
cannot contain '\\'"). Ensure the check references the same s variable and
follows the same error formatting used in try_new so validation behavior is
consistent.
docs/v0.2/ARCHITECTURE.md (1)
197-226: ⚠️ Potential issue | 🟡 Minor

Section 3 still shows two different evolver locations.

This adds opengoose-evolver as a standalone crate, but the same crate tree above still lists crates/opengoose/src/evolver/* under the binary. That leaves the architecture page internally inconsistent about where the evolver lives after the extraction. Please remove or rewrite the old in-binary evolver subtree so the topology reads one way.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/v0.2/ARCHITECTURE.md` around lines 197 - 226, Section 3 currently
documents the evolver in two places; remove the outdated in-binary subtree under
crates/opengoose/src/evolver/* and update the topology to reference only the
extracted crate opengoose-evolver (and its src/ files) so the dependency graph
and the binary tree are consistent; specifically delete or rewrite the lines
listing crates/opengoose/src/evolver/* and ensure opengoose only references
opengoose-evolver in the dependency list and directory tree.
crates/opengoose/src/web/api/rigs.rs (1)
87-112: ⚠️ Potential issue | 🟠 Major

Validate the route param before turning it into RigId.

RigId::new() is the unchecked constructor, so this endpoint now lets malformed path values skip the RigId::try_new() rules and flow into stamps_with_scores() / completed_by(). Parse the id once at the HTTP boundary and return 400 Bad Request on failure instead of constructing an unchecked domain id from user input.
Suggested fix
-    let rig = state
-        .board
-        .get_rig(&id)
+    let rig_id = opengoose_board::RigId::try_new(&id)
+        .map_err(|_| StatusCode::BAD_REQUEST)?;
+
+    let rig = state
+        .board
+        .get_rig(rig_id.as_ref())
         .await
         .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?
         .ok_or(StatusCode::NOT_FOUND)?;
-
-    let rig_id = opengoose_board::RigId::new(&id);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/opengoose/src/web/api/rigs.rs` around lines 87 - 112, Parse and
validate the incoming route param before constructing a RigId: replace the
unchecked opengoose_board::RigId::new(&id) with the fallible
opengoose_board::RigId::try_new(&id) (or equivalent) and if it fails return
StatusCode::BAD_REQUEST; then pass the validated rig_id into
state.board.stamps_with_scores(...) and state.board.completed_by(...). Ensure
any error mapping around stamps_with_scores / completed_by still returns
INTERNAL_SERVER_ERROR only for backend failures, not for malformed route params.
crates/opengoose-evolver/src/pipeline.rs (1)
336-364: ⚠️ Potential issue | 🟠 Major

Don't leave failed evolver items in Claimed.

prepare_context() claims the generated work item before execute_action(). If any later step fails, the error arm here calls board.abandon(ctx.evolver_item_id), but that is not a valid transition from Claimed, so the item can stay stranded instead of becoming retryable or visibly failed. Use the valid claimed-state failure path here, such as the rig-aware release/mark-stuck flow, instead of abandoning directly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/opengoose-evolver/src/pipeline.rs` around lines 336 - 364,
process_stamp currently calls board.abandon(ctx.evolver_item_id) on action
failure but prepare_context() already claimed the item, so abandoning is an
invalid transition; replace that abandon call with the rig-aware claimed-failure
path using the evolver rig (RigId::new("evolver")) — e.g., call the Board API
that releases/marks-stuck a claimed item from a specific rig (use the same
evolver_rig used on success) such as board.release_with_rig(ctx.evolver_item_id,
&evolver_rig).await or board.mark_stuck(ctx.evolver_item_id, &evolver_rig,
...).await (choose the available method on Board), preserve the existing logging
of both the original error and any error from the failure-handling call, and
update the match arm in process_stamp to use that rig-aware method instead of
board.abandon(...).

🧹 Nitpick comments (6)

crates/opengoose/src/tui/event/rigs.rs (1)
60-65: Consider batching trust lookups to avoid N+1 queries.

board.trust_level(...) is called once per rig in the loop. If this path runs frequently, switching to a single batched score/level fetch will scale better.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/opengoose/src/tui/event/rigs.rs` around lines 60 - 65, The loop is
performing an N+1 call by calling board.trust_level(&rig_id) per rig; change to
a batched lookup: add or use a bulk API (e.g., board.trust_levels(&[RigId]) or
board.list_trust_levels) to fetch all rigs' trust levels in one call after
board.list_rigs() and then map results by RigId before building infos; update
the code that currently constructs RigId and calls board.trust_level inside the
for loop (and keep checks against app.board.items.iter().any(...)) to use the
pre-fetched trust map instead.
crates/opengoose/src/commands/rigs.rs (1)
39-41: Trust command does duplicate score work.

This path fetches score, then fetches trust level (which recomputes score). Consider a single-query/single-computation path to avoid extra DB work and potential drift between displayed pts and level.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/opengoose/src/commands/rigs.rs` around lines 39 - 41, The code
currently calls board.weighted_score(&rig_id).await? and then
board.trust_level(&rig_id).await?, causing duplicate score computation; change
this to a single call that returns both values (e.g. add/consume a
board.weighted_score_and_trust_level(&rig_id).await? or a
board.trust_level_from_score(&rig_id, pts).await?), update the call site in
rigs.rs to use that combined result (replace separate weighted_score and
trust_level calls), and implement the combined method on opengoose_board to
compute score and trust in one DB/query pass to avoid extra work and drift
between pts and level.
crates/opengoose-rig/tests/worker_integration.rs (1)
1-142: These are Board transition tests, not Board→Worker integration tests.

Every scenario here drives Board directly, and the file never instantiates Worker or any other opengoose_rig API. Even the retry/stuck case jumps straight to mark_stuck(), so regressions in the real claim/execute/retry loop can still slip through. I'd either add at least one test around the real Worker with a stub agent, or rename the suite/comments so the scope is clear.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/opengoose-rig/tests/worker_integration.rs` around lines 1 - 142,
Rename the test suite and its module doc comment to reflect these are unit tests
that exercise Board state transitions (e.g., change the file header and module
name from "Board → Worker integration tests" to "Board state transition tests")
or alternatively add at least one integration test that instantiates
opengoose_rig::Worker and a stub agent to exercise the real claim/execute/retry
loop (create a new async test that spawns Worker and uses functions like
Worker::start/Worker::handle_loop with a stubbed agent to perform claim ->
execute -> retry/mark_stuck flow); update or add tests with names such as
post_claim_submit_lifecycle, claim_then_mark_stuck, or a new
worker_integration_with_stub to make intent explicit.
crates/opengoose/src/tui/log_entry.rs (1)
23-40: Surface the current-directory fallback here.

These helpers now inherit crate::home_dir()'s "." fallback, so a process without a resolvable home directory will start creating and pruning ./.opengoose/logs under whatever directory launched it. The previous code surfaced that as an error; at minimum I'd log a warning here so operators can tell why logs ended up in the working directory.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/opengoose/src/tui/log_entry.rs` around lines 23 - 40,
create_session_log_file and cleanup_old_logs currently use crate::home_dir()
which falls back to "." silently; detect when crate::home_dir() is the
current-directory fallback (e.g., home_dir().as_os_str() == "." or
home_dir().is_relative() && home_dir() == Path::new(".")) and emit a visible
warning (via log::warn! or eprintln!) before creating/pruning logs so operators
know logs will be created under the working directory; apply the same
check+warning in both create_session_log_file and cleanup_old_logs (and before
calling cleanup_old_logs_in) so the behavior is surfaced consistently.
crates/opengoose-evolver/src/lib.rs (1)
80-83: Reuse evolve_session_id() for log lookup.

RealAgentCaller writes evolver conversations under evolve_session_id(work_item_id), but this helper reconstructs the key by hand. If those ever drift, evolver will stop finding its own logs and silently lose conversation context. Reuse the shared helper here instead of duplicating the format string.
Suggested change
 pub(crate) fn read_conversation_log(work_item_id: i64) -> String {
-    let session_id = format!("task-{work_item_id}");
+    let session_id = evolve_session_id(work_item_id);
     opengoose_rig::conversation_log::read_log(&session_id)
         .map(|content| opengoose_skills::evolution::prompts::summarize_for_prompt(&content, 4000))
         .unwrap_or_default()
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/opengoose-evolver/src/lib.rs` around lines 80 - 83, The
read_conversation_log helper currently constructs the session key with
format!("task-{work_item_id}") instead of using the shared evolve_session_id
function used by RealAgentCaller; change read_conversation_log to call
evolve_session_id(work_item_id) to obtain the session_id and pass that to
opengoose_rig::conversation_log::read_log so the evolver reads the same log key
as RealAgentCaller (keep the rest of the mapping to summarize_for_prompt as-is).
docs/superpowers/plans/2026-03-26-eval-quality-fixes.md (1)
29-29: Consider adding language specifiers to fenced code blocks.

Several fenced code blocks are missing language identifiers. For better rendering and syntax highlighting, consider adding appropriate specifiers:

text for expected output

plaintext for directory structures

Language-specific identifiers for code snippets

This improves documentation readability in rendered Markdown.

Also applies to: 35-35, 72-72, 86-86, 105-105, 121-121, 219-219, 414-414, 488-488, 639-639
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-03-26-eval-quality-fixes.md` at line 29, Several
fenced code blocks in the document are missing language specifiers; update each
triple-backtick fenced block (``` ... ```) to include an appropriate language
identifier: use ```text for expected output, ```plaintext for directory
structures, and proper language tags (e.g., ```bash, ```json, ```ts, ```js,
etc.) for code snippets so syntax highlighting renders correctly; scan the file
for all fenced code blocks (look for ``` markers) and apply the mapping
consistently to the blocks noted in the review.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/opengoose-evolver/src/sweep.rs`:
- Around line 132-133: The sweep currently always calls caller.call(&prompt, 0)
which forces RealAgentCaller to reuse the global session "task-0"; update the
sweep to use a dedicated sweep session id (e.g., generate a fresh id per sweep
from a UUID/timestamp or a sweep-specific counter) and pass that id into
AgentCaller::call instead of 0, or alternatively extend the AgentCaller::call
signature to accept an explicit session key (string or u64) and update
RealAgentCaller to map that to its session namespace; locate the call site
(build_sweep_prompt and caller.call) and the AgentCaller/RealAgentCaller
implementations to wire the new session id through so each sweep gets an
isolated session.

In `@crates/opengoose-rig/src/agent_factory.rs`:
- Around line 60-68: The code calls agent.extend_system_prompt with
config.session_id but the agent is created/updated for the actual session ID
stored in session.id (used in agent.update_provider), so change the call to use
session.id instead of config.session_id; locate the agent.extend_system_prompt
invocation and replace the session identifier argument with session.id to ensure
the system prompt is attached to the same session the agent was updated for.

In `@crates/opengoose/src/runtime.rs`:
- Around line 15-16: The Operator UI still claims a Worker will auto-claim tasks
even when runtime.worker is None; update the prompt-generation and task-enqueue
logic to reflect actual worker availability: where the Operator prompt is built
(the code that currently prints the "separate Worker rig will automatically
claim and execute tasks" message), check the runtime struct's worker:
Option<Arc<opengoose_rig::rig::Worker>> and only show that message when
worker.is_some(); additionally, gate task creation/enqueue paths to prevent
submitting tasks when worker.is_none() (or show a clear warning and require
explicit confirmation), so both the prompt text and the enqueue code consult
runtime.worker before advertising or accepting work.

In `@docs/superpowers/specs/2026-03-26-eval-quality-fixes-design.md`:
- Around line 52-63: The fenced code blocks (e.g., the block showing the module
hierarchy around opengoose-board / opengoose-rig / opengoose-evolver and the
other blocks at ranges 69-77, 93-96, 102-121, 125-136, 218-226) need language
identifiers to satisfy markdownlint MD040; edit each triple-backtick fence
(including the example tree/diff blocks) to add an appropriate language token
such as `text`, `tree`, or `diff` (e.g., ```text or ```diff) so the linter
recognizes the block type and the document stops failing MD040. Ensure every
fenced block in the file is annotated consistently.

---

Outside diff comments:
In `@crates/opengoose-board/src/work_item.rs`:
- Around line 42-55: RigId::try_new currently forbids '/' but not Windows
backslash, allowing path-separator bypass; update RigId::try_new to also reject
'\' (backslash) characters (and optionally canonical path-separators if your
platform has others) by adding a check similar to the existing '/' check that
returns an Err with a clear message (e.g., "RigId cannot contain '\\'"). Ensure
the check references the same s variable and follows the same error formatting
used in try_new so validation behavior is consistent.

In `@crates/opengoose-evolver/src/pipeline.rs`:
- Around line 336-364: process_stamp currently calls
board.abandon(ctx.evolver_item_id) on action failure but prepare_context()
already claimed the item, so abandoning is an invalid transition; replace that
abandon call with the rig-aware claimed-failure path using the evolver rig
(RigId::new("evolver")) — e.g., call the Board API that releases/marks-stuck a
claimed item from a specific rig (use the same evolver_rig used on success) such
as board.release_with_rig(ctx.evolver_item_id, &evolver_rig).await or
board.mark_stuck(ctx.evolver_item_id, &evolver_rig, ...).await (choose the
available method on Board), preserve the existing logging of both the original
error and any error from the failure-handling call, and update the match arm in
process_stamp to use that rig-aware method instead of board.abandon(...).

In `@crates/opengoose/src/web/api/rigs.rs`:
- Around line 87-112: Parse and validate the incoming route param before
constructing a RigId: replace the unchecked opengoose_board::RigId::new(&id)
with the fallible opengoose_board::RigId::try_new(&id) (or equivalent) and if it
fails return StatusCode::BAD_REQUEST; then pass the validated rig_id into
state.board.stamps_with_scores(...) and state.board.completed_by(...). Ensure
any error mapping around stamps_with_scores / completed_by still returns
INTERNAL_SERVER_ERROR only for backend failures, not for malformed route params.

In `@docs/v0.2/ARCHITECTURE.md`:
- Around line 197-226: Section 3 currently documents the evolver in two places;
remove the outdated in-binary subtree under crates/opengoose/src/evolver/* and
update the topology to reference only the extracted crate opengoose-evolver (and
its src/ files) so the dependency graph and the binary tree are consistent;
specifically delete or rewrite the lines listing crates/opengoose/src/evolver/*
and ensure opengoose only references opengoose-evolver in the dependency list
and directory tree.

---

Nitpick comments:
In `@crates/opengoose-evolver/src/lib.rs`:
- Around line 80-83: The read_conversation_log helper currently constructs the
session key with format!("task-{work_item_id}") instead of using the shared
evolve_session_id function used by RealAgentCaller; change read_conversation_log
to call evolve_session_id(work_item_id) to obtain the session_id and pass that
to opengoose_rig::conversation_log::read_log so the evolver reads the same log
key as RealAgentCaller (keep the rest of the mapping to summarize_for_prompt
as-is).

In `@crates/opengoose-rig/tests/worker_integration.rs`:
- Around line 1-142: Rename the test suite and its module doc comment to reflect
these are unit tests that exercise Board state transitions (e.g., change the
file header and module name from "Board → Worker integration tests" to "Board
state transition tests") or alternatively add at least one integration test that
instantiates opengoose_rig::Worker and a stub agent to exercise the real
claim/execute/retry loop (create a new async test that spawns Worker and uses
functions like Worker::start/Worker::handle_loop with a stubbed agent to perform
claim -> execute -> retry/mark_stuck flow); update or add tests with names such
as post_claim_submit_lifecycle, claim_then_mark_stuck, or a new
worker_integration_with_stub to make intent explicit.

In `@crates/opengoose/src/commands/rigs.rs`:
- Around line 39-41: The code currently calls
board.weighted_score(&rig_id).await? and then board.trust_level(&rig_id).await?,
causing duplicate score computation; change this to a single call that returns
both values (e.g. add/consume a
board.weighted_score_and_trust_level(&rig_id).await? or a
board.trust_level_from_score(&rig_id, pts).await?), update the call site in
rigs.rs to use that combined result (replace separate weighted_score and
trust_level calls), and implement the combined method on opengoose_board to
compute score and trust in one DB/query pass to avoid extra work and drift
between pts and level.

In `@crates/opengoose/src/tui/event/rigs.rs`:
- Around line 60-65: The loop is performing an N+1 call by calling
board.trust_level(&rig_id) per rig; change to a batched lookup: add or use a
bulk API (e.g., board.trust_levels(&[RigId]) or board.list_trust_levels) to
fetch all rigs' trust levels in one call after board.list_rigs() and then map
results by RigId before building infos; update the code that currently
constructs RigId and calls board.trust_level inside the for loop (and keep
checks against app.board.items.iter().any(...)) to use the pre-fetched trust map
instead.

In `@crates/opengoose/src/tui/log_entry.rs`:
- Around line 23-40: create_session_log_file and cleanup_old_logs currently use
crate::home_dir() which falls back to "." silently; detect when
crate::home_dir() is the current-directory fallback (e.g.,
home_dir().as_os_str() == "." or home_dir().is_relative() && home_dir() ==
Path::new(".")) and emit a visible warning (via log::warn! or eprintln!) before
creating/pruning logs so operators know logs will be created under the working
directory; apply the same check+warning in both create_session_log_file and
cleanup_old_logs (and before calling cleanup_old_logs_in) so the behavior is
surfaced consistently.

In `@docs/superpowers/plans/2026-03-26-eval-quality-fixes.md`:
- Line 29: Several fenced code blocks in the document are missing language
specifiers; update each triple-backtick fenced block (``` ... ```) to include an
appropriate language identifier: use ```text for expected output, ```plaintext
for directory structures, and proper language tags (e.g., ```bash, ```json,
```ts, ```js, etc.) for code snippets so syntax highlighting renders correctly;
scan the file for all fenced code blocks (look for ``` markers) and apply the
mapping consistently to the blocks noted in the review.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8d53676a-c445-4ded-9c11-f458caa818a3

📥 Commits

Reviewing files that changed from the base of the PR and between 668a564 and 8ed7362.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (34)

Cargo.toml
crates/opengoose-board/src/beads.rs
crates/opengoose-board/src/board.rs
crates/opengoose-board/src/rigs.rs
crates/opengoose-board/src/stamp_ops.rs
crates/opengoose-board/src/work_item.rs
crates/opengoose-board/src/work_items/mod.rs
crates/opengoose-board/src/work_items/queries.rs
crates/opengoose-evolver/Cargo.toml
crates/opengoose-evolver/src/lib.rs
crates/opengoose-evolver/src/loop_driver.rs
crates/opengoose-evolver/src/pipeline.rs
crates/opengoose-evolver/src/sweep.rs
crates/opengoose-rig/src/agent_factory.rs
crates/opengoose-rig/src/conversation_log/io.rs
crates/opengoose-rig/src/lib.rs
crates/opengoose-rig/src/work_mode.rs
crates/opengoose-rig/tests/worker_integration.rs
crates/opengoose-skills/src/metadata.rs
crates/opengoose/Cargo.toml
crates/opengoose/src/cli/commands.rs
crates/opengoose/src/cli/setup.rs
crates/opengoose/src/commands/board.rs
crates/opengoose/src/commands/rigs.rs
crates/opengoose/src/main.rs
crates/opengoose/src/runtime.rs
crates/opengoose/src/skills/evolve.rs
crates/opengoose/src/skills/load.rs
crates/opengoose/src/tui/event/rigs.rs
crates/opengoose/src/tui/log_entry.rs
crates/opengoose/src/web/api/rigs.rs
docs/superpowers/plans/2026-03-26-eval-quality-fixes.md
docs/superpowers/specs/2026-03-26-eval-quality-fixes-design.md
docs/v0.2/ARCHITECTURE.md

💤 Files with no reviewable changes (1)

crates/opengoose/src/main.rs

coderabbitai · 2026-03-26T04:46:28Z

+    let prompt = build_sweep_prompt(&skill_summaries, &failure_summaries);
    let response = caller.call(&prompt, 0).await?;


⚠️ Potential issue | 🟠 Major

Give sweeps their own session namespace.

Line 133 always passes 0 into AgentCaller::call(). RealAgentCaller turns that into a Goose session id, so every sweep reuses the same persisted conversation (task-0). That lets old sweep history bleed into later runs and makes decisions depend on stale context instead of the current dormant-skill set. Use a fresh sweep session id, or extend the trait so sweeps can pass an explicit session key instead of hard-coding 0.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/opengoose-evolver/src/sweep.rs` around lines 132 - 133, The sweep currently always calls caller.call(&prompt, 0) which forces RealAgentCaller to reuse the global session "task-0"; update the sweep to use a dedicated sweep session id (e.g., generate a fresh id per sweep from a UUID/timestamp or a sweep-specific counter) and pass that id into AgentCaller::call instead of 0, or alternatively extend the AgentCaller::call signature to accept an explicit session key (string or u64) and update RealAgentCaller to map that to its session namespace; locate the call site (build_sweep_prompt and caller.call) and the AgentCaller/RealAgentCaller implementations to wire the new session id through so each sweep gets an isolated session.

coderabbitai · 2026-03-26T04:46:28Z

+    agent
+        .update_provider(provider, &session.id)
+        .await
+        .context("failed to set provider")?;
+
+    if let Some(prompt) = config.system_prompt {
+        agent
+            .extend_system_prompt(config.session_id.clone(), prompt)
+            .await;


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In the Goose Agent API, when session_manager.create_session(...)returns a session, issession.idguaranteed to equal the caller-provided session name, and shouldAgent::extend_system_prompt(...)be called withsession.id or with the requested name?

💡 Result:

No, session.id is not guaranteed to equal the caller-provided session name. Goose generates session IDs automatically (typically timestamp-based like yyyymmdd_hhmmss), and CLI flags like --name or -n attempt to resume existing sessions rather than create new ones with custom names. Use --session-id for explicit control, but names and IDs are distinct concepts. Use session.id (the generated/returned ID) with Agent::extend_system_prompt(...), as agents are created per-session using the session ID for isolation via AgentManager/SessionManager.

Citations:

1: Cant name a session in cli aaif-goose/goose#4893

2: Session manager aaif-goose/goose#4648

3: feat: AgentManager - foundation for unified execution (#4389) aaif-goose/goose#4684

4: Add session to agents aaif-goose/goose#4216

5: Expose GOOSE_SESSION_ID environment variable in CLI sessions aaif-goose/goose#6308

Use session.id instead of config.session_id for extend_system_prompt().

Goose generates session IDs automatically (distinct from caller-supplied names), and agents are isolated per session using the session ID. Line 67 should pass session.id to match the session used in update_provider() on line 61. Passing config.session_id will attach the role-specific system prompt to the wrong session or fail silently, causing the agent to run without its Operator/Worker/Evolver instructions.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/opengoose-rig/src/agent_factory.rs` around lines 60 - 68, The code calls agent.extend_system_prompt with config.session_id but the agent is created/updated for the actual session ID stored in session.id (used in agent.update_provider), so change the call to use session.id instead of config.session_id; locate the agent.extend_system_prompt invocation and replace the session identifier argument with session.id to ensure the system prompt is attached to the same session the agent was updated for.

coderabbitai · 2026-03-26T04:46:28Z

+    pub worker: Option<Arc<opengoose_rig::rig::Worker>>,
 }


⚠️ Potential issue | 🟠 Major

Degraded mode still tells the Operator that a worker exists.

After this branch returns worker: None, the Operator prompt in Lines 63-71 still says a separate Worker rig will automatically claim and execute tasks and that the Operator should not handle them. In the degraded path that's false, so users can enqueue work that nothing will ever pick up. Plumb worker availability into the Operator/UI prompt, or gate task creation while the worker is offline.

Also applies to: 28-50

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/opengoose/src/runtime.rs` around lines 15 - 16, The Operator UI still claims a Worker will auto-claim tasks even when runtime.worker is None; update the prompt-generation and task-enqueue logic to reflect actual worker availability: where the Operator prompt is built (the code that currently prints the "separate Worker rig will automatically claim and execute tasks" message), check the runtime struct's worker: Option<Arc<opengoose_rig::rig::Worker>> and only show that message when worker.is_some(); additionally, gate task creation/enqueue paths to prevent submitting tasks when worker.is_none() (or show a clear warning and require explicit confirmation), so both the prompt text and the enqueue code consult runtime.worker before advertising or accepting work.

coderabbitai · 2026-03-26T04:46:28Z

+```
+opengoose-board           (독립)
+       ↑
+opengoose-rig             (board, goose)
+       ↑
+opengoose-evolver  [NEW]  (board, rig, skills, goose)
+       ↑
+opengoose                 (board, rig, skills, evolver)
+
+opengoose-skills          (독립)
+opengoose-sandbox         (독립, 실험적)
+```


⚠️ Potential issue | 🟡 Minor

Add language identifiers to the fenced code blocks.

markdownlint is already flagging these fences with MD040, so this document will keep failing lint until the blocks are annotated with a language like text, tree, or diff.

Also applies to: 69-77, 93-96, 102-121, 125-136, 218-226

🧰 Tools

🪛 markdownlint-cli2 (0.22.0)

[warning] 52-52: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/specs/2026-03-26-eval-quality-fixes-design.md` around lines 52 - 63, The fenced code blocks (e.g., the block showing the module hierarchy around opengoose-board / opengoose-rig / opengoose-evolver and the other blocks at ranges 69-77, 93-96, 102-121, 125-136, 218-226) need language identifiers to satisfy markdownlint MD040; edit each triple-backtick fence (including the example tree/diff blocks) to add an appropriate language token such as `text`, `tree`, or `diff` (e.g., ```text or ```diff) so the linter recognizes the block type and the document stops failing MD040. Ensure every fenced block in the file is annotated consistently.

soilSpoon and others added 13 commits March 26, 2026 10:59

docs: add design spec for code quality evaluation fixes

b3240b5

Covers all 7 improvement points from architecture evaluation: ARCHITECTURE.md sync, integration tests, doc-tests, runtime error handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: update eval quality spec — evolver crate extraction, ADR revision

5bc6da4

Replace ADR-1 (don't split binary crate) with evolver extraction plan. Keep ADR on Board dual responsibility. Update crate count to 6. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: add implementation plan for eval quality fixes

c37ced3

8 tasks: evolver extraction, ARCHITECTURE.md sync, integration tests, doc-tests, runtime graceful degradation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: move test-only deps to dev-dependencies in opengoose-evolver

844e316

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: sync ARCHITECTURE.md — 6 crates, evolver + sandbox, ADR-1

97997b5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: add Board→Worker integration tests (4 scenarios)

481d99c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: add doc-tests to opengoose-rig and opengoose-skills

34b84da

Add no_run doc-test to WorkMode trait demonstrating TaskMode.session_for(), and a runnable doc-test to SkillMetadata demonstrating JSON deserialization. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: add doc-tests to opengoose-board core APIs

b728874

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: graceful degradation when worker agent creation fails

b55bafe

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

style: cargo fmt

0e90850

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

devin-ai-integration Bot reviewed Mar 26, 2026

View reviewed changes

coderabbitai Bot reviewed Mar 26, 2026

View reviewed changes

soilSpoon merged commit 6061697 into main Mar 26, 2026
18 checks passed

soilSpoon deleted the soilSpoon/eval-quality-fixes branch March 26, 2026 05:06

coderabbitai Bot mentioned this pull request Mar 26, 2026

refactor: code quality evaluation fixes — evolver extraction, tests, docs #358

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: eval quality fixes — dedup, types, docs, tests#359

refactor: eval quality fixes — dedup, types, docs, tests#359
soilSpoon merged 13 commits into
mainfrom
soilSpoon/eval-quality-fixes

soilSpoon commented Mar 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 26, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Mar 26, 2026

Uh oh!

devin-ai-integration Bot Mar 26, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Mar 26, 2026

Uh oh!

coderabbitai Bot Mar 26, 2026

Uh oh!

coderabbitai Bot Mar 26, 2026

Uh oh!

coderabbitai Bot Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		let prompt = build_sweep_prompt(&skill_summaries, &failure_summaries);
		let response = caller.call(&prompt, 0).await?;

Conversation

soilSpoon commented Mar 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

soilSpoon commented Mar 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 26, 2026 •

edited

Loading