feat: discovery-evidence dedupe + schema versioning#43
Conversation
- Add schemaVersion field to discovery-evidence.json (versioned wrapper format) - Dedupe on append by (category + normalized description + fixture) key with last-write-wins semantics, preventing duplicate accumulation - Support legacy plain-array format with transparent fallback on load - Auto-migrate to versioned format on next write (append/prune) - Add comprehensive tests: dedupe, case-insensitivity, legacy compat, cross-call and intra-call dedup, migration on append Closes #29 https://claude.ai/code/session_01RuwPZ9dwyq37M5rR9CUQnG
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds a versioned on-disk discovery-evidence format with a schema constant and Zod schema; loader accepts legacy arrays or the versioned object and validates entries; append/prune normalize, deduplicate (last-write-wins) and persist the versioned format; CLI/reporting gains defensive error handling. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Collector as EvidenceCollector
participant FS as FileSystem
participant Validator as SchemaValidator
Client->>Collector: loadDiscoveryEvidence()
Collector->>FS: read file
FS-->>Collector: raw JSON
Collector->>Validator: detect shape (legacy array or object) & validate entries
Validator-->>Collector: validated entries (invalid skipped)
Collector-->>Client: return entries
Client->>Collector: appendDiscoveryEvidence(newEntries)
Collector->>FS: read file
FS-->>Collector: raw JSON
Collector->>Validator: validate existing + new entries
Validator-->>Collector: validated entries
Collector->>Collector: deduplicate (normalized key, last-write-wins)
Collector->>FS: write `{ schemaVersion: 1, entries: [...] }`
Client->>Collector: pruneDiscoveryEvidence(categories)
Collector->>FS: read file
FS-->>Collector: raw JSON
Collector->>Validator: validate entries
Collector->>Collector: filter/prune (trim + case-insensitive)
Collector->>FS: write `{ schemaVersion: 1, entries: [...] }`
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/agents/contracts/evidence.ts`:
- Around line 41-44: Change the Zod schema for DiscoveryEvidenceFileSchema so
schemaVersion is constrained to the current constant
DISCOVERY_EVIDENCE_SCHEMA_VERSION instead of a free number: replace z.number()
with a literal/enum that matches DISCOVERY_EVIDENCE_SCHEMA_VERSION (e.g.
z.literal(DISCOVERY_EVIDENCE_SCHEMA_VERSION) or z.union if needed) so
readDiscoveryEvidence and any append/prune operations will reject unsupported
versions rather than silently downgrading; update DiscoveryEvidenceFileSchema in
src/agents/contracts/evidence.ts and ensure imports/reference to
DISCOVERY_EVIDENCE_SCHEMA_VERSION and DiscoveryEvidenceEntrySchema are present.
In `@src/agents/evidence-collector.ts`:
- Around line 147-161: The current versioned-path uses
DiscoveryEvidenceFileSchema.safeParse(raw) and returns
versionedParse.data.entries as a whole, which makes the load all-or-nothing;
change it to first parse the wrapper (use DiscoveryEvidenceFileSchema.safeParse
to confirm wrapper structure and read the entries array) and then iterate over
wrapper.entries validating each item with
DiscoveryEvidenceEntrySchema.safeParse, collecting only successfully parsed
entries (like the legacy Array.isArray raw branch) so malformed rows are skipped
but valid rows preserved; ensure you still return an empty array only if the
wrapper is invalid or no entries pass validation.
- Around line 228-230: The pruning misses categories with surrounding whitespace
because catSet is built using categories.map(c => c.toLowerCase()) while
discoveryDedupeKey normalizes with trim().toLowerCase(); update the
normalization so both sides trim and lowercase: build catSet using
categories.map(c => c.trim().toLowerCase()) and compare using
e.category.trim().toLowerCase() when filtering existing (functions/variables to
edit: categories, catSet, existing, pruned, readDiscoveryEvidence; refer to
discoveryDedupeKey normalization for exact behavior).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 04019b68-e5bd-455c-bfe6-abcfac08a328
⛔ Files ignored due to path filters (1)
data/discovery-evidence.jsonis excluded by!data/**
📒 Files selected for processing (3)
src/agents/contracts/evidence.tssrc/agents/evidence-collector.test.tssrc/agents/evidence-collector.ts
| export const DiscoveryEvidenceFileSchema = z.object({ | ||
| schemaVersion: z.number(), | ||
| entries: z.array(DiscoveryEvidenceEntrySchema), | ||
| }); |
There was a problem hiding this comment.
Constrain schemaVersion to the current format.
schemaVersion: number means a future { schemaVersion: 2, entries: [...] } file still satisfies this contract. readDiscoveryEvidence() will then load it as if it were v1, and the next append/prune can silently rewrite it back as version 1. Make this field accept only DISCOVERY_EVIDENCE_SCHEMA_VERSION so unsupported formats are rejected instead of downgraded. As per coding guidelines, "Validate all external inputs with Zod schemas".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/agents/contracts/evidence.ts` around lines 41 - 44, Change the Zod schema
for DiscoveryEvidenceFileSchema so schemaVersion is constrained to the current
constant DISCOVERY_EVIDENCE_SCHEMA_VERSION instead of a free number: replace
z.number() with a literal/enum that matches DISCOVERY_EVIDENCE_SCHEMA_VERSION
(e.g. z.literal(DISCOVERY_EVIDENCE_SCHEMA_VERSION) or z.union if needed) so
readDiscoveryEvidence and any append/prune operations will reject unsupported
versions rather than silently downgrading; update DiscoveryEvidenceFileSchema in
src/agents/contracts/evidence.ts and ensure imports/reference to
DISCOVERY_EVIDENCE_SCHEMA_VERSION and DiscoveryEvidenceEntrySchema are present.
- Constrain schemaVersion to literal(1) so unsupported versions are rejected - Validate versioned entries individually (partial corruption preserves valid rows) - Trim categories in pruneDiscoveryEvidence for consistent matching - Add tests: partial corruption, unsupported version, category trim on prune https://claude.ai/code/session_01RuwPZ9dwyq37M5rR9CUQnG
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/agents/evidence-collector.ts`:
- Around line 147-173: The reader currently treats a versioned object that fails
DiscoveryEvidenceFileSchema.safeParse as an empty list and risks overwriting
unknown schema data; change the parsing logic in the evidence reader to detect a
versioned object with an unrecognized schemaVersion and return/throw an explicit
"unsupported schema version" error/state instead of returning [] (i.e., when raw
is an object with a schemaVersion field but versionedParse.success is false).
Then update appendDiscoveryEvidence and pruneDiscoveryEvidence to check for that
unsupported-state and become no-op or fail-fast (propagate the error) rather
than writing a new v1 file, referencing DiscoveryEvidenceFileSchema,
DiscoveryEvidenceEntrySchema, appendDiscoveryEvidence and pruneDiscoveryEvidence
to find the relevant code paths.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 67da2dfe-bf12-43f0-8d2d-53d2456a6301
📒 Files selected for processing (3)
src/agents/contracts/evidence.tssrc/agents/evidence-collector.test.tssrc/agents/evidence-collector.ts
src/agents/evidence-collector.ts
Outdated
| const versionedParse = DiscoveryEvidenceFileSchema.safeParse(raw); | ||
| if (versionedParse.success) { | ||
| // Validate entries individually so one bad row doesn't discard all | ||
| const result: DiscoveryEvidenceEntry[] = []; | ||
| for (const item of versionedParse.data.entries) { | ||
| const parsed = DiscoveryEvidenceEntrySchema.safeParse(item); | ||
| if (parsed.success && parsed.data !== undefined) { | ||
| result.push(parsed.data); | ||
| } | ||
| } | ||
| return result; | ||
| } | ||
|
|
||
| // Legacy format: plain array (v0, before schemaVersion was introduced) | ||
| if (Array.isArray(raw)) { | ||
| const result: DiscoveryEvidenceEntry[] = []; | ||
| for (const item of raw) { | ||
| const parsed = DiscoveryEvidenceEntrySchema.safeParse(item); | ||
| if (parsed.success && parsed.data !== undefined) { | ||
| result.push(parsed.data); | ||
| } | ||
| } | ||
| return result; | ||
| } | ||
|
|
||
| return []; | ||
| } catch { |
There was a problem hiding this comment.
Unsupported schema versions can be silently overwritten by later writes.
When the file is a versioned object with an unknown schemaVersion, this reader returns []. Then appendDiscoveryEvidence/pruneDiscoveryEvidence proceed and write v1, which can erase unsupported-version data. Please return an explicit “unsupported version” state (or throw) and make append/prune no-op/fail-fast for that state.
As per coding guidelines, src/agents/**: "Check for proper error handling in pipeline steps".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/agents/evidence-collector.ts` around lines 147 - 173, The reader
currently treats a versioned object that fails
DiscoveryEvidenceFileSchema.safeParse as an empty list and risks overwriting
unknown schema data; change the parsing logic in the evidence reader to detect a
versioned object with an unrecognized schemaVersion and return/throw an explicit
"unsupported schema version" error/state instead of returning [] (i.e., when raw
is an object with a schemaVersion field but versionedParse.success is false).
Then update appendDiscoveryEvidence and pruneDiscoveryEvidence to check for that
unsupported-state and become no-op or fail-fast (propagate the error) rather
than writing a new v1 file, referencing DiscoveryEvidenceFileSchema,
DiscoveryEvidenceEntrySchema, appendDiscoveryEvidence and pruneDiscoveryEvidence
to find the relevant code paths.
When the file contains a versioned object with an unrecognized schemaVersion, throw instead of returning []. This prevents append/prune from silently overwriting future-version data with v1. https://claude.ai/code/session_01RuwPZ9dwyq37M5rR9CUQnG
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/agents/evidence-collector.test.ts`:
- Around line 261-284: The two tests that assert appendDiscoveryEvidence and
pruneDiscoveryEvidence throw on unsupported schemaVersion are currently placed
under the loadDiscoveryEvidence describe block; move the append test (the one
calling appendDiscoveryEvidence) into the describe("appendDiscoveryEvidence")
block and move the prune test (the one calling pruneDiscoveryEvidence) into the
describe("pruneDiscoveryEvidence") block so each function's error behavior is
tested within its own describe for better organization and discoverability;
ensure you copy the entire test bodies (including file setup, writeFileSync,
expectations, and the final file-content check) into the respective describe
blocks and remove the originals from loadDiscoveryEvidence.
In `@src/agents/evidence-collector.ts`:
- Around line 170-173: Wrap calls to loadDiscoveryEvidence() in
gap-rule-report.ts and pruneDiscoveryEvidence() in cli/index.ts with try/catch
blocks following the appendDiscoveryEvidence() pattern used in orchestrator.ts:
catch any thrown error from the call and emit a non-fatal warning via
console.warn() (include the caught error message and context), then continue
without rethrowing so a future-version discovery-evidence file does not crash
the process. Locate the invocations by the function names loadDiscoveryEvidence
and pruneDiscoveryEvidence and apply the same error-handling structure as
appendDiscoveryEvidence.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f88cd3ad-b4e2-4b0f-99f0-20f01941f324
📒 Files selected for processing (2)
src/agents/evidence-collector.test.tssrc/agents/evidence-collector.ts
| throw new Error( | ||
| `Unsupported discovery-evidence schemaVersion: ${String(version)} (expected ${DISCOVERY_EVIDENCE_SCHEMA_VERSION}). ` + | ||
| `Upgrade canicode to read this file, or delete it to start fresh.` | ||
| ); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Find all call sites of discovery evidence functions and check for error handling
echo "=== Call sites for loadDiscoveryEvidence ==="
rg -n -B5 -A5 'loadDiscoveryEvidence\s*\(' --type=ts -g '!*.test.ts'
echo ""
echo "=== Call sites for appendDiscoveryEvidence ==="
rg -n -B5 -A5 'appendDiscoveryEvidence\s*\(' --type=ts -g '!*.test.ts'
echo ""
echo "=== Call sites for pruneDiscoveryEvidence ==="
rg -n -B5 -A5 'pruneDiscoveryEvidence\s*\(' --type=ts -g '!*.test.ts'Repository: let-sunny/canicode
Length of output: 4355
🏁 Script executed:
# Get context around the loadDiscoveryEvidence() call in gap-rule-report.ts
cat -n src/agents/gap-rule-report.ts | sed -n '500,520p'Repository: let-sunny/canicode
Length of output: 1061
🏁 Script executed:
# Get context around the pruneDiscoveryEvidence() call in cli/index.ts
cat -n src/cli/index.ts | sed -n '690,710p'Repository: let-sunny/canicode
Length of output: 844
🏁 Script executed:
# Get full context of the orchestrator.ts error handling for comparison
cat -n src/agents/orchestrator.ts | sed -n '300,320p'Repository: let-sunny/canicode
Length of output: 827
Add error handling to discovery evidence function calls in gap-rule-report.ts and cli/index.ts to match orchestrator.ts pattern.
Calls to loadDiscoveryEvidence() (gap-rule-report.ts:511) and pruneDiscoveryEvidence() (cli/index.ts:703) lack try/catch wrappers, unlike appendDiscoveryEvidence() in orchestrator.ts (lines 313-315). If a user has a future-version evidence file, these code paths will crash with an unhandled exception. Follow the orchestrator.ts pattern: wrap calls in try/catch and log as non-fatal with console.warn().
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/agents/evidence-collector.ts` around lines 170 - 173, Wrap calls to
loadDiscoveryEvidence() in gap-rule-report.ts and pruneDiscoveryEvidence() in
cli/index.ts with try/catch blocks following the appendDiscoveryEvidence()
pattern used in orchestrator.ts: catch any thrown error from the call and emit a
non-fatal warning via console.warn() (include the caught error message and
context), then continue without rethrowing so a future-version
discovery-evidence file does not crash the process. Locate the invocations by
the function names loadDiscoveryEvidence and pruneDiscoveryEvidence and apply
the same error-handling structure as appendDiscoveryEvidence.
- Wrap loadDiscoveryEvidence in gap-rule-report.ts with try/catch (non-fatal) - Wrap pruneDiscoveryEvidence in cli/index.ts with try/catch (exit code 1) - Move append/prune error tests to their respective describe blocks https://claude.ai/code/session_01RuwPZ9dwyq37M5rR9CUQnG
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/agents/evidence-collector.test.ts`:
- Around line 303-315: Add a new unit test in
src/agents/evidence-collector.test.ts that mirrors the existing "dedupe is
case-insensitive for category and description" case but exercises fixture
casing: call appendDiscoveryEvidence twice with identical description/category
but differing fixture case (e.g., "FX1" vs "fx1"), read the file via
readFileSync and JSON.parse into { entries: DiscoveryEvidenceEntry[] }, then
assert entries length is 1 and that the later entry's impact wins (e.g., expect
entries[0]!.impact toBe the second call's value); reference
appendDiscoveryEvidence, disPath, and DiscoveryEvidenceEntry to locate where to
add the test.
In `@src/cli/index.ts`:
- Around line 706-708: The catch block that currently does
console.error("[evidence] Failed to prune discovery evidence:", err) should
normalize the error into a stable string before logging; capture the thrown
value (e.g., via a local variable msg = err instanceof Error ? err.message :
String(err)) and log only the prefix plus that msg using the existing
console.error call pattern (keep process.exitCode = 1 unchanged) so CLI output
is deterministic and free of raw stack/object dumps.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 328c465d-95c1-4afc-8136-694283dbf3b8
📒 Files selected for processing (3)
src/agents/evidence-collector.test.tssrc/agents/gap-rule-report.tssrc/cli/index.ts
…utput - Add test for fixture casing in dedupe key (FX1 vs fx1) - Normalize error to message string in CLI catch block https://claude.ai/code/session_01RuwPZ9dwyq37M5rR9CUQnG
Summary
discovery-evidence.json에schemaVersion필드 도입 (versioned wrapper 포맷)appendDiscoveryEvidence시(category + normalized description + fixture)키로 dedupe (last-write-wins)Changes
src/agents/contracts/evidence.tsDISCOVERY_EVIDENCE_SCHEMA_VERSION,DiscoveryEvidenceFileSchema추가src/agents/evidence-collector.tsappendDiscoveryEvidencededupe 로직src/agents/evidence-collector.test.tsdata/discovery-evidence.json[]→{ schemaVersion: 1, entries: [] }마이그레이션Test plan
pnpm lint(tsc --noEmit) 통과Closes #29
https://claude.ai/code/session_01RuwPZ9dwyq37M5rR9CUQnG
Summary by CodeRabbit
New Features
Bug Fixes / Improvements
Tests