CLI: Improve self-healing scoring observability#34699
Conversation
📝 WalkthroughWalkthroughAdds persistent per-story test history and integrates it into telemetry: a disk-backed cache merges run results by storyId, analyzeTestResults now returns both run- and cumulative-scoped metrics, and the Vitest telemetry reporter uses the cache to emit run + cumulative analysis. ChangesStory Test History Tracking
Sequence DiagramsequenceDiagram
participant Reporter as Test Reporter
participant Cache as History Cache
participant Disk as Disk Storage
participant Analyzer as Analysis Engine
participant Telemetry as Telemetry Sink
Reporter->>Reporter: Collect test results for current run
Reporter->>Cache: mergeAndWriteStoryHistory(runResults)
Cache->>Disk: Read persisted history
Disk-->>Cache: Return cached entries
Cache->>Cache: Merge run results into history (keep newest timestamp per storyId)
Cache->>Disk: Write updated history back
Cache-->>Reporter: Return merged cumulative results
Reporter->>Analyzer: analyzeTestResults(runResults, cumulativeResults)
Analyzer->>Analyzer: Compute run* metrics from runResults and cumulative* from cumulativeResults
Analyzer-->>Reporter: Return TestRunAnalysis with run* and cumulative* fields
Reporter->>Telemetry: Emit telemetry with run and cumulative stats
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly Related PRs
✨ Finishing Touches📝 Generate docstrings
Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.ts (1)
68-93:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard cache failures so telemetry/reporter state stays stable.
If history merge fails,
onTestRunEndcan abort before telemetry emission and before resettingthis.testResults. Add fallback analysis +finallyreset.💡 Proposed resiliency patch
async onTestRunEnd( testModules: readonly TestModule[], unhandledErrors: readonly SerializedError[] ) { - // Merge the current run into the persisted per-story history (kept on - // disk only — storyIds never enter telemetry) and use the merged set - // to compute cumulative stats across runs. - const cumulativeResults = await mergeAndWriteStoryHistory(this.testResults); - const analysis = analyzeTestResults(this.testResults, cumulativeResults); + let analysis = analyzeTestResults(this.testResults); + try { + // Merge the current run into the persisted per-story history (kept on + // disk only — storyIds never enter telemetry) and use the merged set + // to compute cumulative stats across runs. + const cumulativeResults = await mergeAndWriteStoryHistory(this.testResults); + analysis = analyzeTestResults(this.testResults, cumulativeResults); + } catch { + // Best-effort telemetry: keep run-scoped analysis when cache is unavailable. + } const duration = Date.now() - this.startTime; @@ - // Reset for next run (watch mode) - this.testResults = []; + // Reset for next run (watch mode) + this.testResults = []; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.ts` around lines 68 - 93, Wrap the history merge + analysis in a try/catch so a failing merge does not abort telemetry emission or leave state dirty: call mergeAndWriteStoryHistory(this.testResults) inside try, compute analysis = analyzeTestResults(this.testResults, cumulativeResults) normally, but in catch set a safe fallback (e.g., empty cumulativeResults / empty history) and run analyzeTestResults with that fallback, and include the caught error (mergeError) in the telemetry payload if helpful; always call telemetry afterwards and use a finally block to reset this.testResults = [] so watch-mode state is cleared even on errors.
🧹 Nitpick comments (1)
code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.test.ts (1)
15-17: ⚡ Quick winUse a spy mock declaration here and keep behavior in
beforeEach.This inline factory mock conflicts with the test mocking rules and duplicates behavior already configured in
beforeEach.✅ Suggested adjustment
-vi.mock('./agent-story-history-cache.ts', () => ({ - mergeAndWriteStoryHistory: vi.fn(async (results) => results), -})); +vi.mock('./agent-story-history-cache.ts', { spy: true });As per coding guidelines, "Use
vi.mock()with thespy: trueoption for all package and file mocks in Vitest tests" and "Implement mock behaviors inbeforeEachblocks in Vitest tests."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.test.ts` around lines 15 - 17, Replace the inline factory mock for agent-story-history-cache.ts with a spy-style mock and move the implementation into the test setup: call vi.mock('./agent-story-history-cache.ts', undefined, { spy: true }) (so the real exported names are spied) or otherwise enable { spy: true } for that module, remove the inline async factory that returns mergeAndWriteStoryHistory: vi.fn(async (results) => results), and in beforeEach assign the behavior via the spy (e.g., set mergeAndWriteStoryHistory.mockResolvedValue or mockImplementation to return results) so the mock declaration is only a spy and the actual behavior is configured in beforeEach.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@code/addons/vitest/src/vitest-plugin/agent-story-history-cache.ts`:
- Around line 24-35: The current read/merge/write in the block around
readStoryHistory(), history and historyCache.set(CACHE_KEY, ...) is not atomic
and can lose concurrent updates; change it to an atomic update with a retry/CAS
or lock: fetch the current cache from historyCache.get(CACHE_KEY) (or use
historyCache's lock/CAS API if available), merge in results into that latest
snapshot (using result.storyId keys and timestamp logic currently in the loop),
then attempt to write back only if the snapshot hasn't changed (or hold the
lock) and retry a few times on conflict; update the code paths that call
readStoryHistory()/historyCache.set(CACHE_KEY, ...) so they use this looped CAS
or an acquired lock to guarantee no concurrent overwrite of other reporters'
updates.
---
Outside diff comments:
In `@code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.ts`:
- Around line 68-93: Wrap the history merge + analysis in a try/catch so a
failing merge does not abort telemetry emission or leave state dirty: call
mergeAndWriteStoryHistory(this.testResults) inside try, compute analysis =
analyzeTestResults(this.testResults, cumulativeResults) normally, but in catch
set a safe fallback (e.g., empty cumulativeResults / empty history) and run
analyzeTestResults with that fallback, and include the caught error (mergeError)
in the telemetry payload if helpful; always call telemetry afterwards and use a
finally block to reset this.testResults = [] so watch-mode state is cleared even
on errors.
---
Nitpick comments:
In `@code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.test.ts`:
- Around line 15-17: Replace the inline factory mock for
agent-story-history-cache.ts with a spy-style mock and move the implementation
into the test setup: call vi.mock('./agent-story-history-cache.ts', undefined, {
spy: true }) (so the real exported names are spied) or otherwise enable { spy:
true } for that module, remove the inline async factory that returns
mergeAndWriteStoryHistory: vi.fn(async (results) => results), and in beforeEach
assign the behavior via the spy (e.g., set
mergeAndWriteStoryHistory.mockResolvedValue or mockImplementation to return
results) so the mock declaration is only a spy and the actual behavior is
configured in beforeEach.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 90f9930f-19a2-40a9-8180-f6dcd24f2ea7
📒 Files selected for processing (8)
code/addons/vitest/src/vitest-plugin/agent-story-history-cache.tscode/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.test.tscode/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.tscode/core/src/core-server/index.tscode/core/src/core-server/utils/ghost-stories/parse-vitest-report.test.tscode/core/src/shared/utils/analyze-test-results.test.tscode/core/src/shared/utils/analyze-test-results.tscode/core/src/shared/utils/test-result-types.ts
There was a problem hiding this comment.
🧹 Nitpick comments (1)
code/core/src/core-server/server-channel/ghost-stories-channel.test.ts (1)
323-330: ⚡ Quick winAlso assert
cumulative*metrics in the failure-path payload.On Line 323 onward, this test only validates
run*metrics. Since cumulative scoring is the core behavior change, assertingcumulative*fields here would prevent regressions in failed-run scenarios.Proposed test expectation update
results: expect.objectContaining({ runTotal: 2, runPassed: 0, runSuccessRate: 0, // runCategorizedErrors is an object keyed by error category runCategorizedErrors: expect.any(Object), runCssCheck: 'not-run', runUniqueErrorCount: expect.any(Number), runPassedButEmptyRender: 0, + cumulativeTotal: 2, + cumulativePassed: 0, + cumulativeSuccessRate: 0, + cumulativeSuccessRateWithoutEmptyRender: 0, + cumulativeCategorizedErrors: expect.any(Object), + cumulativeCssCheck: 'not-run', + cumulativeUniqueErrorCount: expect.any(Number), + cumulativePassedButEmptyRender: 0, }),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@code/core/src/core-server/server-channel/ghost-stories-channel.test.ts` around lines 323 - 330, The test in ghost-stories-channel.test.ts currently asserts only the run* metrics (e.g., runTotal, runPassed, runSuccessRate, runCategorizedErrors, runCssCheck, runUniqueErrorCount, runPassedButEmptyRender) for the failure-path payload; update the expectation to also assert the corresponding cumulative* metrics (e.g., cumulativeTotal, cumulativePassed, cumulativeSuccessRate, cumulativeCategorizedErrors, cumulativeCssCheck, cumulativeUniqueErrorCount, cumulativePassedButEmptyRender) using the same matcher types (expect.any(Number) or expect.any(Object) or exact string values) so the failure-path verifies cumulative scoring fields alongside the run metrics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@code/core/src/core-server/server-channel/ghost-stories-channel.test.ts`:
- Around line 323-330: The test in ghost-stories-channel.test.ts currently
asserts only the run* metrics (e.g., runTotal, runPassed, runSuccessRate,
runCategorizedErrors, runCssCheck, runUniqueErrorCount, runPassedButEmptyRender)
for the failure-path payload; update the expectation to also assert the
corresponding cumulative* metrics (e.g., cumulativeTotal, cumulativePassed,
cumulativeSuccessRate, cumulativeCategorizedErrors, cumulativeCssCheck,
cumulativeUniqueErrorCount, cumulativePassedButEmptyRender) using the same
matcher types (expect.any(Number) or expect.any(Object) or exact string values)
so the failure-path verifies cumulative scoring fields alongside the run
metrics.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 06481320-031d-4ada-9283-a383c2671c2a
📒 Files selected for processing (3)
code/core/src/core-server/server-channel/ghost-stories-channel.test.tsscripts/eval/lib/grade.tsscripts/eval/lib/story-render.ts
Closes #
What I did
This PR adds logic to accumulate self-healing scoring over an agentic session
Checklist for Contributors
Testing
The changes in this PR are covered in the following automated tests:
Manual testing
Caution
This section is mandatory for all contributions. If you believe no manual test is necessary, please state so explicitly. Thanks!
Documentation
MIGRATION.MD
Checklist for Maintainers
When this PR is ready for testing, make sure to add
ci:normal,ci:mergedorci:dailyGH label to it to run a specific set of sandboxes. The particular set of sandboxes can be found incode/lib/cli-storybook/src/sandbox-templates.tsMake sure this PR contains one of the labels below:
Available labels
bug: Internal changes that fixes incorrect behavior.maintenance: User-facing maintenance tasks.dependencies: Upgrading (sometimes downgrading) dependencies.build: Internal-facing build tooling & test updates. Will not show up in release changelog.cleanup: Minor cleanup style change. Will not show up in release changelog.documentation: Documentation only changes. Will not show up in release changelog.feature request: Introducing a new feature.BREAKING CHANGE: Changes that break compatibility in some way with current major version.other: Changes that don't fit in the above categories.🦋 Canary release
This pull request has been released as version
0.0.0-pr-34699-sha-a88c0eaa. Try it out in a new sandbox by runningnpx storybook@0.0.0-pr-34699-sha-a88c0eaa sandboxor in an existing project withnpx storybook@0.0.0-pr-34699-sha-a88c0eaa upgrade.More information
0.0.0-pr-34699-sha-a88c0eaayann/cummulative-self-healing-scorea88c0eaa1777894123)To request a new release of this pull request, mention the
@storybookjs/coreteam.core team members can create a new canary release here or locally with
gh workflow run --repo storybookjs/storybook publish.yml --field pr=34699Summary by CodeRabbit
New Features
Tests