Skip to content

refactor: eval quality fixes — dedup, types, docs, tests#359

Merged
soilSpoon merged 13 commits into
mainfrom
soilSpoon/eval-quality-fixes
Mar 26, 2026
Merged

refactor: eval quality fixes — dedup, types, docs, tests#359
soilSpoon merged 13 commits into
mainfrom
soilSpoon/eval-quality-fixes

Conversation

@soilSpoon
Copy link
Copy Markdown
Owner

@soilSpoon soilSpoon commented Mar 26, 2026

Summary

  • home_dir() 3중 중복 제거: opengoose-rig에 단일 pub fn home_dir() 통합. evolver에서 dirs 의존 제거.
  • RigId 타입 일관성: Board stamp/query API의 rig_id: &str&RigId로 통일. AsRef<str> 구현 추가. completed_by_rigcompleted_by 이름 통일.
  • ARCHITECTURE.md 동기화: 크레이트 수 수정 (4→5), sandbox 섹션 추가, 설계 결정 ADR 2건
  • Board→Worker 통합 테스트: 4개 시나리오 (lifecycle, blocked skip, retry→stuck, concurrent claim)
  • Doc-tests: board/rig/skills 핵심 공개 API에 /// # Examples 추가
  • Runtime graceful degradation: Worker 생성 실패 시 warn 로깅 후 계속 진행

Test plan

  • cargo check — 클린 빌드
  • cargo test — 모든 테스트 통과 (0 failures)
  • cargo clippy --all-targets — 경고 없음
  • cargo fmt — 포맷 적용

🤖 Generated with Claude Code


Open with Devin

Summary by CodeRabbit

Release Notes

  • New Features

    • Created opengoose-evolver crate for autonomous skill evolution workflows
    • Added agent factory utilities (AgentConfig, create_agent) in opengoose-rig for flexible agent initialization
    • Introduced RigId type with validation support (try_new())
    • Optional worker initialization with graceful error handling
  • Improvements

    • Unified home directory resolution across crates
    • Enhanced type safety using typed RigId in board APIs
  • Documentation

    • Added doctests and examples to public APIs
    • Updated architecture documentation reflecting new crate topology
    • Added design specifications for recent changes
  • Tests

    • Added integration tests for worker lifecycle management

soilSpoon and others added 13 commits March 26, 2026 10:59
Covers all 7 improvement points from architecture evaluation:
ARCHITECTURE.md sync, integration tests, doc-tests, runtime error handling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace ADR-1 (don't split binary crate) with evolver extraction plan.
Keep ADR on Board dual responsibility. Update crate count to 6.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8 tasks: evolver extraction, ARCHITECTURE.md sync, integration tests,
doc-tests, runtime graceful degradation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tory

Extracts the standalone agent creation logic from the binary crate into
opengoose-rig so it can be shared by both the binary and the upcoming
opengoose-evolver crate without circular dependencies. runtime.rs
re-exports the symbols to keep existing call sites working unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move the evolver module (2,490 LOC) into a standalone library crate.
This decouples the skill evolution pipeline from the CLI binary,
improving build parallelism and enabling independent testing.

- Create crates/opengoose-evolver with lib.rs, loop_driver.rs,
  pipeline.rs, sweep.rs (all 55 tests migrated and passing)
- Replace crate::skills::{evolve, load} references with direct
  opengoose_skills imports
- Move read_conversation_log to the evolver crate (its only consumer)
- Add local home_dir() and test_env_lock() to avoid binary crate deps
- Gate now-test-only re-exports in skills/evolve.rs and skills/load.rs
  behind #[cfg(test)] to eliminate unused-import warnings
- Update runtime.rs to use opengoose_evolver::run
- Remove mod evolver from main.rs and delete old evolver directory

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add no_run doc-test to WorkMode trait demonstrating TaskMode.session_for(),
and a runnable doc-test to SkillMetadata demonstrating JSON deserialization.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consolidate three copies of home_dir() into opengoose-rig (single
source of truth). Unify Board stamp/query APIs to use &RigId instead
of raw &str, add AsRef<str> impl, and rename completed_by_rig to
completed_by for naming consistency with claimed_by.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 26, 2026

📝 Walkthrough

Walkthrough

The PR extracts the evolver subsystem into a dedicated opengoose-evolver crate, refactors APIs to consistently use typed RigId instead of &str, moves agent factory utilities to opengoose-rig, makes the runtime worker optional for graceful degradation, and reorganizes skill-evolution module dependencies to use opengoose_skills directly.

Changes

Cohort / File(s) Summary
Workspace Setup
Cargo.toml, crates/opengoose-evolver/Cargo.toml, crates/opengoose/Cargo.toml
Added new opengoose-evolver workspace member and dependency, configured new crate manifest with dependencies on board, rig, skills, goose, async-trait, anyhow, tokio, tracing, futures.
RigId API Enhancement
crates/opengoose-board/src/work_item.rs
Added RigId::try_new() validation constructor (rejects empty, >64 chars, .., /), implemented AsRef<str> trait, added doc comments for RigId::new() and Status::precedence().
Board API Type Consistency
crates/opengoose-board/src/stamp_ops.rs, crates/opengoose-board/src/work_items/queries.rs
Updated public methods to accept &RigId instead of &str: stamps_for_rig, weighted_score, trust_level, stamps_with_scores, claimed_by, completed_by (renamed from completed_by_rig). Updated SeaORM filters to use rig_id.as_ref().
Board Documentation & Utilities
crates/opengoose-board/src/beads.rs, crates/opengoose-board/src/board.rs, crates/opengoose-board/src/work_items/mod.rs
Added doc-test examples to filter_ready, prime_summary, and new Board::in_memory() constructor. Updated module documentation references.
Board Test Updates
crates/opengoose-board/src/rigs.rs
Updated test calls to trust_level to construct RigId values instead of passing string literals.
Agent Factory Extraction
crates/opengoose-rig/src/agent_factory.rs, crates/opengoose-rig/src/lib.rs, crates/opengoose-rig/src/work_mode.rs
New module agent_factory with AgentConfig struct and async create_agent() function. Added home_dir() helper and doc-tests. Exposed these in lib.rs.
Home Directory Refactoring
crates/opengoose-rig/src/conversation_log/io.rs
Changed opengoose_home_dir() to use crate::home_dir() instead of dirs::home_dir() fallback.
RigId Integration Tests
crates/opengoose-rig/tests/worker_integration.rs
Added integration tests covering Board→Worker claim/submit/retry/stuck transitions, blocked-item filtering, and concurrent claim prevention.
Skills Metadata Documentation
crates/opengoose-skills/src/metadata.rs
Added doc-test example for SkillMetadata JSON parsing and field access.
Evolver Crate Implementation
crates/opengoose-evolver/src/lib.rs, crates/opengoose-evolver/src/loop_driver.rs, crates/opengoose-evolver/src/pipeline.rs, crates/opengoose-evolver/src/sweep.rs
New evolver crate with module structure. Updated imports to use opengoose_skills modules directly (evolution/{parser,prompts,validator,writer}, loader, metadata). Added read_conversation_log() helper and test_env_lock(). Changed visibility from pub(super) to pub(crate) for process_stamp and run_sweep.
Runtime & Worker Optionality
crates/opengoose/src/runtime.rs
Changed worker: Arc<Worker> to worker: Option<Arc<Worker>>. Moved AgentConfig and create_agent to re-exports from opengoose_rig::agent_factory. Made worker agent creation non-fatal (logs warning, continues with None). Updated evolver spawn to call opengoose_evolver::run().
CLI & Web API RigId Updates
crates/opengoose/src/cli/commands.rs, crates/opengoose/src/cli/setup.rs, crates/opengoose/src/commands/board.rs, crates/opengoose/src/commands/rigs.rs, crates/opengoose/src/tui/event/rigs.rs, crates/opengoose/src/tui/log_entry.rs, crates/opengoose/src/web/api/rigs.rs
Updated all call sites to construct and pass RigId values to board methods instead of strings. Made worker cancellation conditional on Option. Replaced local home_dir() with import from opengoose_rig. Updated TUI log path construction to use crate::home_dir().
Skills Module Re-exports
crates/opengoose/src/skills/evolve.rs, crates/opengoose/src/skills/load.rs
Removed local read_conversation_log(). Added/updated re-exports from opengoose_skills modules (prompts, validator, writer, loader, metadata). Made LoadedSkill, SkillScope, load_skills unconditionally exported.
Evolver Module Removal
crates/opengoose/src/main.rs
Removed mod evolver; declaration (functionality moved to opengoose-evolver crate).
Architecture & Design Documentation
docs/v0.2/ARCHITECTURE.md, docs/superpowers/plans/2026-03-26-eval-quality-fixes.md, docs/superpowers/specs/2026-03-26-eval-quality-fixes-design.md
Updated architecture docs to reflect new 6-crate topology (opengoose-evolver, opengoose-sandbox experimental). Added ADR-1 on Board ownership. Added design spec and implementation plan documenting evolver extraction, agent factory, RigId typing, worker optionality, and integration test strategy.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 A new crate hopped into view,
With RigId typed strong and true!
The evolver escaped its nest,
And workers now rest when stressed—
Graceful degradation shines anew! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'refactor: eval quality fixes — dedup, types, docs, tests' accurately summarizes the main changes: deduplication of home_dir(), RigId type consistency improvements, documentation updates, and test additions across the codebase.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch soilSpoon/eval-quality-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 4 additional findings in Devin Review.

Open in Devin Review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Stale tracing target filter: evolver logs no longer detected as structured after crate extraction

The evolver module was moved from crates/opengoose/src/evolver/ to crates/opengoose-evolver/src/. Tracing macros (e.g., info!, warn!) in the new crate will automatically use the target prefix opengoose_evolver::* (e.g., opengoose_evolver::loop_driver). However, is_structured_target() at crates/opengoose/src/tui/log_entry.rs:18 still checks target.starts_with("opengoose::evolver"), which no longer matches. This means evolver log messages in the TUI will not be marked as structured, breaking the log filtering behavior in the Logs tab. The test at crates/opengoose/src/tui/tui_layer.rs:188 also hardcodes the stale target "opengoose::evolver", masking the real issue.

(Refers to line 18)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +1 to +2
[package]
name = "opengoose-evolver"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 AGENTS.md not updated to reflect new crate — violates mandatory rule file

AGENTS.md line 9 states "3개 크레이트만 존재" (only 3 crates exist), listing only opengoose, opengoose-board, and opengoose-rig. This PR creates a new crate opengoose-evolver (and the workspace already had opengoose-skills and opengoose-sandbox), but AGENTS.md was not updated. The PR does update docs/v0.2/ARCHITECTURE.md to say "6개 크레이트" but leaves the mandatory AGENTS.md rule file inconsistent with the actual codebase.

Prompt for agents
Update AGENTS.md line 9 to reflect the actual crate structure. Change "3개 크레이트만 존재: opengoose (CLI), opengoose-board (데이터), opengoose-rig (에이전트)" to "6개 크레이트: opengoose (CLI), opengoose-board (데이터), opengoose-rig (에이전트), opengoose-skills (스킬), opengoose-evolver (스킬 진화), opengoose-sandbox (실험적)". Also update the dependency rule on line 16 to include the evolver and skills crates in the dependency chain description.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
crates/opengoose-board/src/work_item.rs (1)

42-55: ⚠️ Potential issue | 🟠 Major

Close the path-separator validation gap in RigId::try_new.

try_new blocks / but not \. If RigId is ever used as a path segment, this leaves a traversal/sanitization bypass on Windows-style separators.

Suggested hardening
         if s.contains('/') {
             return Err(format!("RigId cannot contain '/': {s}"));
         }
+        if s.contains('\\') {
+            return Err(format!("RigId cannot contain '\\\\': {s}"));
+        }
         Ok(Self(s))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/opengoose-board/src/work_item.rs` around lines 42 - 55, RigId::try_new
currently forbids '/' but not Windows backslash, allowing path-separator bypass;
update RigId::try_new to also reject '\' (backslash) characters (and optionally
canonical path-separators if your platform has others) by adding a check similar
to the existing '/' check that returns an Err with a clear message (e.g., "RigId
cannot contain '\\'"). Ensure the check references the same s variable and
follows the same error formatting used in try_new so validation behavior is
consistent.
docs/v0.2/ARCHITECTURE.md (1)

197-226: ⚠️ Potential issue | 🟡 Minor

Section 3 still shows two different evolver locations.

This adds opengoose-evolver as a standalone crate, but the same crate tree above still lists crates/opengoose/src/evolver/* under the binary. That leaves the architecture page internally inconsistent about where the evolver lives after the extraction. Please remove or rewrite the old in-binary evolver subtree so the topology reads one way.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/v0.2/ARCHITECTURE.md` around lines 197 - 226, Section 3 currently
documents the evolver in two places; remove the outdated in-binary subtree under
crates/opengoose/src/evolver/* and update the topology to reference only the
extracted crate opengoose-evolver (and its src/ files) so the dependency graph
and the binary tree are consistent; specifically delete or rewrite the lines
listing crates/opengoose/src/evolver/* and ensure opengoose only references
opengoose-evolver in the dependency list and directory tree.
crates/opengoose/src/web/api/rigs.rs (1)

87-112: ⚠️ Potential issue | 🟠 Major

Validate the route param before turning it into RigId.

RigId::new() is the unchecked constructor, so this endpoint now lets malformed path values skip the RigId::try_new() rules and flow into stamps_with_scores() / completed_by(). Parse the id once at the HTTP boundary and return 400 Bad Request on failure instead of constructing an unchecked domain id from user input.

Suggested fix
-    let rig = state
-        .board
-        .get_rig(&id)
+    let rig_id = opengoose_board::RigId::try_new(&id)
+        .map_err(|_| StatusCode::BAD_REQUEST)?;
+
+    let rig = state
+        .board
+        .get_rig(rig_id.as_ref())
         .await
         .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?
         .ok_or(StatusCode::NOT_FOUND)?;
-
-    let rig_id = opengoose_board::RigId::new(&id);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/opengoose/src/web/api/rigs.rs` around lines 87 - 112, Parse and
validate the incoming route param before constructing a RigId: replace the
unchecked opengoose_board::RigId::new(&id) with the fallible
opengoose_board::RigId::try_new(&id) (or equivalent) and if it fails return
StatusCode::BAD_REQUEST; then pass the validated rig_id into
state.board.stamps_with_scores(...) and state.board.completed_by(...). Ensure
any error mapping around stamps_with_scores / completed_by still returns
INTERNAL_SERVER_ERROR only for backend failures, not for malformed route params.
crates/opengoose-evolver/src/pipeline.rs (1)

336-364: ⚠️ Potential issue | 🟠 Major

Don't leave failed evolver items in Claimed.

prepare_context() claims the generated work item before execute_action(). If any later step fails, the error arm here calls board.abandon(ctx.evolver_item_id), but that is not a valid transition from Claimed, so the item can stay stranded instead of becoming retryable or visibly failed. Use the valid claimed-state failure path here, such as the rig-aware release/mark-stuck flow, instead of abandoning directly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/opengoose-evolver/src/pipeline.rs` around lines 336 - 364,
process_stamp currently calls board.abandon(ctx.evolver_item_id) on action
failure but prepare_context() already claimed the item, so abandoning is an
invalid transition; replace that abandon call with the rig-aware claimed-failure
path using the evolver rig (RigId::new("evolver")) — e.g., call the Board API
that releases/marks-stuck a claimed item from a specific rig (use the same
evolver_rig used on success) such as board.release_with_rig(ctx.evolver_item_id,
&evolver_rig).await or board.mark_stuck(ctx.evolver_item_id, &evolver_rig,
...).await (choose the available method on Board), preserve the existing logging
of both the original error and any error from the failure-handling call, and
update the match arm in process_stamp to use that rig-aware method instead of
board.abandon(...).
🧹 Nitpick comments (6)
crates/opengoose/src/tui/event/rigs.rs (1)

60-65: Consider batching trust lookups to avoid N+1 queries.

board.trust_level(...) is called once per rig in the loop. If this path runs frequently, switching to a single batched score/level fetch will scale better.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/opengoose/src/tui/event/rigs.rs` around lines 60 - 65, The loop is
performing an N+1 call by calling board.trust_level(&rig_id) per rig; change to
a batched lookup: add or use a bulk API (e.g., board.trust_levels(&[RigId]) or
board.list_trust_levels) to fetch all rigs' trust levels in one call after
board.list_rigs() and then map results by RigId before building infos; update
the code that currently constructs RigId and calls board.trust_level inside the
for loop (and keep checks against app.board.items.iter().any(...)) to use the
pre-fetched trust map instead.
crates/opengoose/src/commands/rigs.rs (1)

39-41: Trust command does duplicate score work.

This path fetches score, then fetches trust level (which recomputes score). Consider a single-query/single-computation path to avoid extra DB work and potential drift between displayed pts and level.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/opengoose/src/commands/rigs.rs` around lines 39 - 41, The code
currently calls board.weighted_score(&rig_id).await? and then
board.trust_level(&rig_id).await?, causing duplicate score computation; change
this to a single call that returns both values (e.g. add/consume a
board.weighted_score_and_trust_level(&rig_id).await? or a
board.trust_level_from_score(&rig_id, pts).await?), update the call site in
rigs.rs to use that combined result (replace separate weighted_score and
trust_level calls), and implement the combined method on opengoose_board to
compute score and trust in one DB/query pass to avoid extra work and drift
between pts and level.
crates/opengoose-rig/tests/worker_integration.rs (1)

1-142: These are Board transition tests, not Board→Worker integration tests.

Every scenario here drives Board directly, and the file never instantiates Worker or any other opengoose_rig API. Even the retry/stuck case jumps straight to mark_stuck(), so regressions in the real claim/execute/retry loop can still slip through. I'd either add at least one test around the real Worker with a stub agent, or rename the suite/comments so the scope is clear.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/opengoose-rig/tests/worker_integration.rs` around lines 1 - 142,
Rename the test suite and its module doc comment to reflect these are unit tests
that exercise Board state transitions (e.g., change the file header and module
name from "Board → Worker integration tests" to "Board state transition tests")
or alternatively add at least one integration test that instantiates
opengoose_rig::Worker and a stub agent to exercise the real claim/execute/retry
loop (create a new async test that spawns Worker and uses functions like
Worker::start/Worker::handle_loop with a stubbed agent to perform claim ->
execute -> retry/mark_stuck flow); update or add tests with names such as
post_claim_submit_lifecycle, claim_then_mark_stuck, or a new
worker_integration_with_stub to make intent explicit.
crates/opengoose/src/tui/log_entry.rs (1)

23-40: Surface the current-directory fallback here.

These helpers now inherit crate::home_dir()'s "." fallback, so a process without a resolvable home directory will start creating and pruning ./.opengoose/logs under whatever directory launched it. The previous code surfaced that as an error; at minimum I'd log a warning here so operators can tell why logs ended up in the working directory.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/opengoose/src/tui/log_entry.rs` around lines 23 - 40,
create_session_log_file and cleanup_old_logs currently use crate::home_dir()
which falls back to "." silently; detect when crate::home_dir() is the
current-directory fallback (e.g., home_dir().as_os_str() == "." or
home_dir().is_relative() && home_dir() == Path::new(".")) and emit a visible
warning (via log::warn! or eprintln!) before creating/pruning logs so operators
know logs will be created under the working directory; apply the same
check+warning in both create_session_log_file and cleanup_old_logs (and before
calling cleanup_old_logs_in) so the behavior is surfaced consistently.
crates/opengoose-evolver/src/lib.rs (1)

80-83: Reuse evolve_session_id() for log lookup.

RealAgentCaller writes evolver conversations under evolve_session_id(work_item_id), but this helper reconstructs the key by hand. If those ever drift, evolver will stop finding its own logs and silently lose conversation context. Reuse the shared helper here instead of duplicating the format string.

Suggested change
 pub(crate) fn read_conversation_log(work_item_id: i64) -> String {
-    let session_id = format!("task-{work_item_id}");
+    let session_id = evolve_session_id(work_item_id);
     opengoose_rig::conversation_log::read_log(&session_id)
         .map(|content| opengoose_skills::evolution::prompts::summarize_for_prompt(&content, 4000))
         .unwrap_or_default()
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/opengoose-evolver/src/lib.rs` around lines 80 - 83, The
read_conversation_log helper currently constructs the session key with
format!("task-{work_item_id}") instead of using the shared evolve_session_id
function used by RealAgentCaller; change read_conversation_log to call
evolve_session_id(work_item_id) to obtain the session_id and pass that to
opengoose_rig::conversation_log::read_log so the evolver reads the same log key
as RealAgentCaller (keep the rest of the mapping to summarize_for_prompt as-is).
docs/superpowers/plans/2026-03-26-eval-quality-fixes.md (1)

29-29: Consider adding language specifiers to fenced code blocks.

Several fenced code blocks are missing language identifiers. For better rendering and syntax highlighting, consider adding appropriate specifiers:

  • text for expected output
  • plaintext for directory structures
  • Language-specific identifiers for code snippets

This improves documentation readability in rendered Markdown.

Also applies to: 35-35, 72-72, 86-86, 105-105, 121-121, 219-219, 414-414, 488-488, 639-639

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-03-26-eval-quality-fixes.md` at line 29, Several
fenced code blocks in the document are missing language specifiers; update each
triple-backtick fenced block (``` ... ```) to include an appropriate language
identifier: use ```text for expected output, ```plaintext for directory
structures, and proper language tags (e.g., ```bash, ```json, ```ts, ```js,
etc.) for code snippets so syntax highlighting renders correctly; scan the file
for all fenced code blocks (look for ``` markers) and apply the mapping
consistently to the blocks noted in the review.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/opengoose-evolver/src/sweep.rs`:
- Around line 132-133: The sweep currently always calls caller.call(&prompt, 0)
which forces RealAgentCaller to reuse the global session "task-0"; update the
sweep to use a dedicated sweep session id (e.g., generate a fresh id per sweep
from a UUID/timestamp or a sweep-specific counter) and pass that id into
AgentCaller::call instead of 0, or alternatively extend the AgentCaller::call
signature to accept an explicit session key (string or u64) and update
RealAgentCaller to map that to its session namespace; locate the call site
(build_sweep_prompt and caller.call) and the AgentCaller/RealAgentCaller
implementations to wire the new session id through so each sweep gets an
isolated session.

In `@crates/opengoose-rig/src/agent_factory.rs`:
- Around line 60-68: The code calls agent.extend_system_prompt with
config.session_id but the agent is created/updated for the actual session ID
stored in session.id (used in agent.update_provider), so change the call to use
session.id instead of config.session_id; locate the agent.extend_system_prompt
invocation and replace the session identifier argument with session.id to ensure
the system prompt is attached to the same session the agent was updated for.

In `@crates/opengoose/src/runtime.rs`:
- Around line 15-16: The Operator UI still claims a Worker will auto-claim tasks
even when runtime.worker is None; update the prompt-generation and task-enqueue
logic to reflect actual worker availability: where the Operator prompt is built
(the code that currently prints the "separate Worker rig will automatically
claim and execute tasks" message), check the runtime struct's worker:
Option<Arc<opengoose_rig::rig::Worker>> and only show that message when
worker.is_some(); additionally, gate task creation/enqueue paths to prevent
submitting tasks when worker.is_none() (or show a clear warning and require
explicit confirmation), so both the prompt text and the enqueue code consult
runtime.worker before advertising or accepting work.

In `@docs/superpowers/specs/2026-03-26-eval-quality-fixes-design.md`:
- Around line 52-63: The fenced code blocks (e.g., the block showing the module
hierarchy around opengoose-board / opengoose-rig / opengoose-evolver and the
other blocks at ranges 69-77, 93-96, 102-121, 125-136, 218-226) need language
identifiers to satisfy markdownlint MD040; edit each triple-backtick fence
(including the example tree/diff blocks) to add an appropriate language token
such as `text`, `tree`, or `diff` (e.g., ```text or ```diff) so the linter
recognizes the block type and the document stops failing MD040. Ensure every
fenced block in the file is annotated consistently.

---

Outside diff comments:
In `@crates/opengoose-board/src/work_item.rs`:
- Around line 42-55: RigId::try_new currently forbids '/' but not Windows
backslash, allowing path-separator bypass; update RigId::try_new to also reject
'\' (backslash) characters (and optionally canonical path-separators if your
platform has others) by adding a check similar to the existing '/' check that
returns an Err with a clear message (e.g., "RigId cannot contain '\\'"). Ensure
the check references the same s variable and follows the same error formatting
used in try_new so validation behavior is consistent.

In `@crates/opengoose-evolver/src/pipeline.rs`:
- Around line 336-364: process_stamp currently calls
board.abandon(ctx.evolver_item_id) on action failure but prepare_context()
already claimed the item, so abandoning is an invalid transition; replace that
abandon call with the rig-aware claimed-failure path using the evolver rig
(RigId::new("evolver")) — e.g., call the Board API that releases/marks-stuck a
claimed item from a specific rig (use the same evolver_rig used on success) such
as board.release_with_rig(ctx.evolver_item_id, &evolver_rig).await or
board.mark_stuck(ctx.evolver_item_id, &evolver_rig, ...).await (choose the
available method on Board), preserve the existing logging of both the original
error and any error from the failure-handling call, and update the match arm in
process_stamp to use that rig-aware method instead of board.abandon(...).

In `@crates/opengoose/src/web/api/rigs.rs`:
- Around line 87-112: Parse and validate the incoming route param before
constructing a RigId: replace the unchecked opengoose_board::RigId::new(&id)
with the fallible opengoose_board::RigId::try_new(&id) (or equivalent) and if it
fails return StatusCode::BAD_REQUEST; then pass the validated rig_id into
state.board.stamps_with_scores(...) and state.board.completed_by(...). Ensure
any error mapping around stamps_with_scores / completed_by still returns
INTERNAL_SERVER_ERROR only for backend failures, not for malformed route params.

In `@docs/v0.2/ARCHITECTURE.md`:
- Around line 197-226: Section 3 currently documents the evolver in two places;
remove the outdated in-binary subtree under crates/opengoose/src/evolver/* and
update the topology to reference only the extracted crate opengoose-evolver (and
its src/ files) so the dependency graph and the binary tree are consistent;
specifically delete or rewrite the lines listing crates/opengoose/src/evolver/*
and ensure opengoose only references opengoose-evolver in the dependency list
and directory tree.

---

Nitpick comments:
In `@crates/opengoose-evolver/src/lib.rs`:
- Around line 80-83: The read_conversation_log helper currently constructs the
session key with format!("task-{work_item_id}") instead of using the shared
evolve_session_id function used by RealAgentCaller; change read_conversation_log
to call evolve_session_id(work_item_id) to obtain the session_id and pass that
to opengoose_rig::conversation_log::read_log so the evolver reads the same log
key as RealAgentCaller (keep the rest of the mapping to summarize_for_prompt
as-is).

In `@crates/opengoose-rig/tests/worker_integration.rs`:
- Around line 1-142: Rename the test suite and its module doc comment to reflect
these are unit tests that exercise Board state transitions (e.g., change the
file header and module name from "Board → Worker integration tests" to "Board
state transition tests") or alternatively add at least one integration test that
instantiates opengoose_rig::Worker and a stub agent to exercise the real
claim/execute/retry loop (create a new async test that spawns Worker and uses
functions like Worker::start/Worker::handle_loop with a stubbed agent to perform
claim -> execute -> retry/mark_stuck flow); update or add tests with names such
as post_claim_submit_lifecycle, claim_then_mark_stuck, or a new
worker_integration_with_stub to make intent explicit.

In `@crates/opengoose/src/commands/rigs.rs`:
- Around line 39-41: The code currently calls
board.weighted_score(&rig_id).await? and then board.trust_level(&rig_id).await?,
causing duplicate score computation; change this to a single call that returns
both values (e.g. add/consume a
board.weighted_score_and_trust_level(&rig_id).await? or a
board.trust_level_from_score(&rig_id, pts).await?), update the call site in
rigs.rs to use that combined result (replace separate weighted_score and
trust_level calls), and implement the combined method on opengoose_board to
compute score and trust in one DB/query pass to avoid extra work and drift
between pts and level.

In `@crates/opengoose/src/tui/event/rigs.rs`:
- Around line 60-65: The loop is performing an N+1 call by calling
board.trust_level(&rig_id) per rig; change to a batched lookup: add or use a
bulk API (e.g., board.trust_levels(&[RigId]) or board.list_trust_levels) to
fetch all rigs' trust levels in one call after board.list_rigs() and then map
results by RigId before building infos; update the code that currently
constructs RigId and calls board.trust_level inside the for loop (and keep
checks against app.board.items.iter().any(...)) to use the pre-fetched trust map
instead.

In `@crates/opengoose/src/tui/log_entry.rs`:
- Around line 23-40: create_session_log_file and cleanup_old_logs currently use
crate::home_dir() which falls back to "." silently; detect when
crate::home_dir() is the current-directory fallback (e.g.,
home_dir().as_os_str() == "." or home_dir().is_relative() && home_dir() ==
Path::new(".")) and emit a visible warning (via log::warn! or eprintln!) before
creating/pruning logs so operators know logs will be created under the working
directory; apply the same check+warning in both create_session_log_file and
cleanup_old_logs (and before calling cleanup_old_logs_in) so the behavior is
surfaced consistently.

In `@docs/superpowers/plans/2026-03-26-eval-quality-fixes.md`:
- Line 29: Several fenced code blocks in the document are missing language
specifiers; update each triple-backtick fenced block (``` ... ```) to include an
appropriate language identifier: use ```text for expected output, ```plaintext
for directory structures, and proper language tags (e.g., ```bash, ```json,
```ts, ```js, etc.) for code snippets so syntax highlighting renders correctly;
scan the file for all fenced code blocks (look for ``` markers) and apply the
mapping consistently to the blocks noted in the review.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8d53676a-c445-4ded-9c11-f458caa818a3

📥 Commits

Reviewing files that changed from the base of the PR and between 668a564 and 8ed7362.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (34)
  • Cargo.toml
  • crates/opengoose-board/src/beads.rs
  • crates/opengoose-board/src/board.rs
  • crates/opengoose-board/src/rigs.rs
  • crates/opengoose-board/src/stamp_ops.rs
  • crates/opengoose-board/src/work_item.rs
  • crates/opengoose-board/src/work_items/mod.rs
  • crates/opengoose-board/src/work_items/queries.rs
  • crates/opengoose-evolver/Cargo.toml
  • crates/opengoose-evolver/src/lib.rs
  • crates/opengoose-evolver/src/loop_driver.rs
  • crates/opengoose-evolver/src/pipeline.rs
  • crates/opengoose-evolver/src/sweep.rs
  • crates/opengoose-rig/src/agent_factory.rs
  • crates/opengoose-rig/src/conversation_log/io.rs
  • crates/opengoose-rig/src/lib.rs
  • crates/opengoose-rig/src/work_mode.rs
  • crates/opengoose-rig/tests/worker_integration.rs
  • crates/opengoose-skills/src/metadata.rs
  • crates/opengoose/Cargo.toml
  • crates/opengoose/src/cli/commands.rs
  • crates/opengoose/src/cli/setup.rs
  • crates/opengoose/src/commands/board.rs
  • crates/opengoose/src/commands/rigs.rs
  • crates/opengoose/src/main.rs
  • crates/opengoose/src/runtime.rs
  • crates/opengoose/src/skills/evolve.rs
  • crates/opengoose/src/skills/load.rs
  • crates/opengoose/src/tui/event/rigs.rs
  • crates/opengoose/src/tui/log_entry.rs
  • crates/opengoose/src/web/api/rigs.rs
  • docs/superpowers/plans/2026-03-26-eval-quality-fixes.md
  • docs/superpowers/specs/2026-03-26-eval-quality-fixes-design.md
  • docs/v0.2/ARCHITECTURE.md
💤 Files with no reviewable changes (1)
  • crates/opengoose/src/main.rs

Comment on lines +132 to 133
let prompt = build_sweep_prompt(&skill_summaries, &failure_summaries);
let response = caller.call(&prompt, 0).await?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Give sweeps their own session namespace.

Line 133 always passes 0 into AgentCaller::call(). RealAgentCaller turns that into a Goose session id, so every sweep reuses the same persisted conversation (task-0). That lets old sweep history bleed into later runs and makes decisions depend on stale context instead of the current dormant-skill set. Use a fresh sweep session id, or extend the trait so sweeps can pass an explicit session key instead of hard-coding 0.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/opengoose-evolver/src/sweep.rs` around lines 132 - 133, The sweep
currently always calls caller.call(&prompt, 0) which forces RealAgentCaller to
reuse the global session "task-0"; update the sweep to use a dedicated sweep
session id (e.g., generate a fresh id per sweep from a UUID/timestamp or a
sweep-specific counter) and pass that id into AgentCaller::call instead of 0, or
alternatively extend the AgentCaller::call signature to accept an explicit
session key (string or u64) and update RealAgentCaller to map that to its
session namespace; locate the call site (build_sweep_prompt and caller.call) and
the AgentCaller/RealAgentCaller implementations to wire the new session id
through so each sweep gets an isolated session.

Comment on lines +60 to +68
agent
.update_provider(provider, &session.id)
.await
.context("failed to set provider")?;

if let Some(prompt) = config.system_prompt {
agent
.extend_system_prompt(config.session_id.clone(), prompt)
.await;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In the Goose Agent API, when session_manager.create_session(...)returns a session, issession.idguaranteed to equal the caller-provided session name, and shouldAgent::extend_system_prompt(...)be called withsession.id or with the requested name?

💡 Result:

No, session.id is not guaranteed to equal the caller-provided session name. Goose generates session IDs automatically (typically timestamp-based like yyyymmdd_hhmmss), and CLI flags like --name or -n attempt to resume existing sessions rather than create new ones with custom names. Use --session-id for explicit control, but names and IDs are distinct concepts. Use session.id (the generated/returned ID) with Agent::extend_system_prompt(...), as agents are created per-session using the session ID for isolation via AgentManager/SessionManager.

Citations:


Use session.id instead of config.session_id for extend_system_prompt().

Goose generates session IDs automatically (distinct from caller-supplied names), and agents are isolated per session using the session ID. Line 67 should pass session.id to match the session used in update_provider() on line 61. Passing config.session_id will attach the role-specific system prompt to the wrong session or fail silently, causing the agent to run without its Operator/Worker/Evolver instructions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/opengoose-rig/src/agent_factory.rs` around lines 60 - 68, The code
calls agent.extend_system_prompt with config.session_id but the agent is
created/updated for the actual session ID stored in session.id (used in
agent.update_provider), so change the call to use session.id instead of
config.session_id; locate the agent.extend_system_prompt invocation and replace
the session identifier argument with session.id to ensure the system prompt is
attached to the same session the agent was updated for.

Comment on lines +15 to 16
pub worker: Option<Arc<opengoose_rig::rig::Worker>>,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Degraded mode still tells the Operator that a worker exists.

After this branch returns worker: None, the Operator prompt in Lines 63-71 still says a separate Worker rig will automatically claim and execute tasks and that the Operator should not handle them. In the degraded path that's false, so users can enqueue work that nothing will ever pick up. Plumb worker availability into the Operator/UI prompt, or gate task creation while the worker is offline.

Also applies to: 28-50

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/opengoose/src/runtime.rs` around lines 15 - 16, The Operator UI still
claims a Worker will auto-claim tasks even when runtime.worker is None; update
the prompt-generation and task-enqueue logic to reflect actual worker
availability: where the Operator prompt is built (the code that currently prints
the "separate Worker rig will automatically claim and execute tasks" message),
check the runtime struct's worker: Option<Arc<opengoose_rig::rig::Worker>> and
only show that message when worker.is_some(); additionally, gate task
creation/enqueue paths to prevent submitting tasks when worker.is_none() (or
show a clear warning and require explicit confirmation), so both the prompt text
and the enqueue code consult runtime.worker before advertising or accepting
work.

Comment on lines +52 to +63
```
opengoose-board (독립)
opengoose-rig (board, goose)
opengoose-evolver [NEW] (board, rig, skills, goose)
opengoose (board, rig, skills, evolver)

opengoose-skills (독립)
opengoose-sandbox (독립, 실험적)
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language identifiers to the fenced code blocks.

markdownlint is already flagging these fences with MD040, so this document will keep failing lint until the blocks are annotated with a language like text, tree, or diff.

Also applies to: 69-77, 93-96, 102-121, 125-136, 218-226

🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 52-52: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/specs/2026-03-26-eval-quality-fixes-design.md` around lines
52 - 63, The fenced code blocks (e.g., the block showing the module hierarchy
around opengoose-board / opengoose-rig / opengoose-evolver and the other blocks
at ranges 69-77, 93-96, 102-121, 125-136, 218-226) need language identifiers to
satisfy markdownlint MD040; edit each triple-backtick fence (including the
example tree/diff blocks) to add an appropriate language token such as `text`,
`tree`, or `diff` (e.g., ```text or ```diff) so the linter recognizes the block
type and the document stops failing MD040. Ensure every fenced block in the file
is annotated consistently.

@soilSpoon soilSpoon merged commit 6061697 into main Mar 26, 2026
18 checks passed
@soilSpoon soilSpoon deleted the soilSpoon/eval-quality-fixes branch March 26, 2026 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant