CLI: Improve self-healing scoring observability by yannbf · Pull Request #34699 · storybookjs/storybook

yannbf · 2026-05-04T10:46:22Z

Closes #

What I did

This PR adds logic to accumulate self-healing scoring over an agentic session

Checklist for Contributors

Testing

The changes in this PR are covered in the following automated tests:

stories
unit tests
integration tests
end-to-end tests

Manual testing

Caution

This section is mandatory for all contributions. If you believe no manual test is necessary, please state so explicitly. Thanks!

Documentation

Add or update documentation reflecting your changes
If you are deprecating/removing a feature, make sure to update
MIGRATION.MD

Checklist for Maintainers

When this PR is ready for testing, make sure to add ci:normal, ci:merged or ci:daily GH label to it to run a specific set of sandboxes. The particular set of sandboxes can be found in code/lib/cli-storybook/src/sandbox-templates.ts
Make sure this PR contains one of the labels below:
Available labels
- bug: Internal changes that fixes incorrect behavior.
- maintenance: User-facing maintenance tasks.
- dependencies: Upgrading (sometimes downgrading) dependencies.
- build: Internal-facing build tooling & test updates. Will not show up in release changelog.
- cleanup: Minor cleanup style change. Will not show up in release changelog.
- documentation: Documentation only changes. Will not show up in release changelog.
- feature request: Introducing a new feature.
- BREAKING CHANGE: Changes that break compatibility in some way with current major version.
- other: Changes that don't fit in the above categories.

🦋 Canary release

This pull request has been released as version 0.0.0-pr-34699-sha-a88c0eaa. Try it out in a new sandbox by running npx storybook@0.0.0-pr-34699-sha-a88c0eaa sandbox or in an existing project with npx storybook@0.0.0-pr-34699-sha-a88c0eaa upgrade.

More information


Published version	`0.0.0-pr-34699-sha-a88c0eaa`
Triggered by	@yannbf
Repository	storybookjs/storybook
Branch	`yann/cummulative-self-healing-score`
Commit	`a88c0eaa`
Datetime	Mon May 4 11:28:43 UTC 2026 (`1777894123`)
Workflow run	25316410456

To request a new release of this pull request, mention the @storybookjs/core team.

core team members can create a new canary release here or locally with gh workflow run --repo storybookjs/storybook publish.yml --field pr=34699

Summary by CodeRabbit

New Features
- Story test history is now persisted across runs, enabling cumulative metrics and combined run+cumulative analysis.
- Telemetry and reports now provide separate run- and cumulative-scoped metrics for clearer trends; telemetry no longer includes raw story identifiers.
Tests
- Test suites and assertions updated to validate the new run/cumulative metrics shape and cumulative-history behavior.

…ry test results

coderabbitai · 2026-05-04T10:53:57Z

📝 Walkthrough

Walkthrough

Adds persistent per-story test history and integrates it into telemetry: a disk-backed cache merges run results by storyId, analyzeTestResults now returns both run- and cumulative-scoped metrics, and the Vitest telemetry reporter uses the cache to emit run + cumulative analysis.

Changes

Story Test History Tracking

Layer / File(s)	Summary
Type Definitions `code/core/src/shared/utils/test-result-types.ts`	Adds `StoryTestResultHistoryEntry` (extends `StoryTestResult` with `timestamp`), `StoryTestResultHistory` (map of storyId → entry), `CssCheckOutcome` type, and refactors `TestRunAnalysis` to use `run` and `cumulative` fields.
Type Exports `code/core/src/core-server/index.ts`	Expands re-exports to include `StoryTestResultHistory` and `StoryTestResultHistoryEntry` alongside `StoryTestResult`.
Cache Implementation `code/addons/vitest/src/vitest-plugin/agent-story-history-cache.ts`	New on-disk cache module with `readStoryHistory()` to load persisted history and `mergeAndWriteStoryHistory(results)` to merge current `StoryTestResult[]` into persisted history (keep newest timestamp per storyId), persist merged history, and return merged results (timestamps stripped).
Reporter Integration `code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.ts`	Imports and calls `mergeAndWriteStoryHistory(this.testResults)` on run end and passes returned cumulative results into `analyzeTestResults(this.testResults, cumulativeResults)`.
Core Analysis Logic `code/core/src/shared/utils/analyze-test-results.ts`	Introduces `summarizeResults()` and refactors `analyzeTestResults(results, cumulativeResults?)` to produce separate `run` and `cumulative` metrics (including `runCssCheck` / `cumulativeCssCheck`).
Tests / Consumers `code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.test.ts`, `code/core/src/core-server/utils/ghost-stories/parse-vitest-report.test.ts`, `code/core/src/shared/utils/analyze-test-results.test.ts`, `code/core/src/core-server/server-channel/ghost-stories-channel.test.ts`, `scripts/eval/*`, `scripts/eval/lib/story-render.ts`	Mocks `mergeAndWriteStoryHistory`; updates assertions to `run`/`cumulative` field names; adds cumulative-stat tests; adjusts consumers to read `run*` fields (e.g., grade/eval and story-render mapping).

Sequence Diagram

sequenceDiagram
    participant Reporter as Test Reporter
    participant Cache as History Cache
    participant Disk as Disk Storage
    participant Analyzer as Analysis Engine
    participant Telemetry as Telemetry Sink

    Reporter->>Reporter: Collect test results for current run
    Reporter->>Cache: mergeAndWriteStoryHistory(runResults)
    Cache->>Disk: Read persisted history
    Disk-->>Cache: Return cached entries
    Cache->>Cache: Merge run results into history (keep newest timestamp per storyId)
    Cache->>Disk: Write updated history back
    Cache-->>Reporter: Return merged cumulative results
    Reporter->>Analyzer: analyzeTestResults(runResults, cumulativeResults)
    Analyzer->>Analyzer: Compute run* metrics from runResults and cumulative* from cumulativeResults
    Analyzer-->>Reporter: Return TestRunAnalysis with run* and cumulative* fields
    Reporter->>Telemetry: Emit telemetry with run and cumulative stats

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly Related PRs

Telemetry: Report CssCheck result in ai-setup-final-scoring #34600 — Related changes to cssCheck/CssCheckOutcome and updates to TestRunAnalysis shape and tests.
Core: Agentic observability for vitest and ghost stories #34537 — Prior work on the Vitest agent reporter and tests that this PR extends with history merging.
Build: Update tests to expect any numeric value and fix flake #33547 — Related telemetry test adjustments in ghost-stories-channel.test.ts (assertion updates and telemetry payload expectations).

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

_{Review rate limit: 4/5 reviews remaining, refill in 12 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.ts (1)

68-93: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard cache failures so telemetry/reporter state stays stable.

If history merge fails, onTestRunEnd can abort before telemetry emission and before resetting this.testResults. Add fallback analysis + finally reset.

💡 Proposed resiliency patch

   async onTestRunEnd(
     testModules: readonly TestModule[],
     unhandledErrors: readonly SerializedError[]
   ) {
-    // Merge the current run into the persisted per-story history (kept on
-    // disk only — storyIds never enter telemetry) and use the merged set
-    // to compute cumulative stats across runs.
-    const cumulativeResults = await mergeAndWriteStoryHistory(this.testResults);
-    const analysis = analyzeTestResults(this.testResults, cumulativeResults);
+    let analysis = analyzeTestResults(this.testResults);
+    try {
+      // Merge the current run into the persisted per-story history (kept on
+      // disk only — storyIds never enter telemetry) and use the merged set
+      // to compute cumulative stats across runs.
+      const cumulativeResults = await mergeAndWriteStoryHistory(this.testResults);
+      analysis = analyzeTestResults(this.testResults, cumulativeResults);
+    } catch {
+      // Best-effort telemetry: keep run-scoped analysis when cache is unavailable.
+    }
     const duration = Date.now() - this.startTime;
@@
-    // Reset for next run (watch mode)
-    this.testResults = [];
+    // Reset for next run (watch mode)
+    this.testResults = [];
   }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.ts` around
lines 68 - 93, Wrap the history merge + analysis in a try/catch so a failing
merge does not abort telemetry emission or leave state dirty: call
mergeAndWriteStoryHistory(this.testResults) inside try, compute analysis =
analyzeTestResults(this.testResults, cumulativeResults) normally, but in catch
set a safe fallback (e.g., empty cumulativeResults / empty history) and run
analyzeTestResults with that fallback, and include the caught error (mergeError)
in the telemetry payload if helpful; always call telemetry afterwards and use a
finally block to reset this.testResults = [] so watch-mode state is cleared even
on errors.

🧹 Nitpick comments (1)

code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.test.ts (1)
15-17: ⚡ Quick win

Use a spy mock declaration here and keep behavior in beforeEach.

This inline factory mock conflicts with the test mocking rules and duplicates behavior already configured in beforeEach.
✅ Suggested adjustment
-vi.mock('./agent-story-history-cache.ts', () => ({
-  mergeAndWriteStoryHistory: vi.fn(async (results) => results),
-}));
+vi.mock('./agent-story-history-cache.ts', { spy: true });
As per coding guidelines, "Use vi.mock() with the spy: true option for all package and file mocks in Vitest tests" and "Implement mock behaviors in beforeEach blocks in Vitest tests."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.test.ts` around
lines 15 - 17, Replace the inline factory mock for agent-story-history-cache.ts
with a spy-style mock and move the implementation into the test setup: call
vi.mock('./agent-story-history-cache.ts', undefined, { spy: true }) (so the real
exported names are spied) or otherwise enable { spy: true } for that module,
remove the inline async factory that returns mergeAndWriteStoryHistory:
vi.fn(async (results) => results), and in beforeEach assign the behavior via the
spy (e.g., set mergeAndWriteStoryHistory.mockResolvedValue or mockImplementation
to return results) so the mock declaration is only a spy and the actual behavior
is configured in beforeEach.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@code/addons/vitest/src/vitest-plugin/agent-story-history-cache.ts`:
- Around line 24-35: The current read/merge/write in the block around
readStoryHistory(), history and historyCache.set(CACHE_KEY, ...) is not atomic
and can lose concurrent updates; change it to an atomic update with a retry/CAS
or lock: fetch the current cache from historyCache.get(CACHE_KEY) (or use
historyCache's lock/CAS API if available), merge in results into that latest
snapshot (using result.storyId keys and timestamp logic currently in the loop),
then attempt to write back only if the snapshot hasn't changed (or hold the
lock) and retry a few times on conflict; update the code paths that call
readStoryHistory()/historyCache.set(CACHE_KEY, ...) so they use this looped CAS
or an acquired lock to guarantee no concurrent overwrite of other reporters'
updates.

---

Outside diff comments:
In `@code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.ts`:
- Around line 68-93: Wrap the history merge + analysis in a try/catch so a
failing merge does not abort telemetry emission or leave state dirty: call
mergeAndWriteStoryHistory(this.testResults) inside try, compute analysis =
analyzeTestResults(this.testResults, cumulativeResults) normally, but in catch
set a safe fallback (e.g., empty cumulativeResults / empty history) and run
analyzeTestResults with that fallback, and include the caught error (mergeError)
in the telemetry payload if helpful; always call telemetry afterwards and use a
finally block to reset this.testResults = [] so watch-mode state is cleared even
on errors.

---

Nitpick comments:
In `@code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.test.ts`:
- Around line 15-17: Replace the inline factory mock for
agent-story-history-cache.ts with a spy-style mock and move the implementation
into the test setup: call vi.mock('./agent-story-history-cache.ts', undefined, {
spy: true }) (so the real exported names are spied) or otherwise enable { spy:
true } for that module, remove the inline async factory that returns
mergeAndWriteStoryHistory: vi.fn(async (results) => results), and in beforeEach
assign the behavior via the spy (e.g., set
mergeAndWriteStoryHistory.mockResolvedValue or mockImplementation to return
results) so the mock declaration is only a spy and the actual behavior is
configured in beforeEach.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 90f9930f-19a2-40a9-8180-f6dcd24f2ea7

📥 Commits

Reviewing files that changed from the base of the PR and between 5393fa9 and 1804ced.

📒 Files selected for processing (8)

code/addons/vitest/src/vitest-plugin/agent-story-history-cache.ts
code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.test.ts
code/addons/vitest/src/vitest-plugin/agent-telemetry-reporter.ts
code/core/src/core-server/index.ts
code/core/src/core-server/utils/ghost-stories/parse-vitest-report.test.ts
code/core/src/shared/utils/analyze-test-results.test.ts
code/core/src/shared/utils/analyze-test-results.ts
code/core/src/shared/utils/test-result-types.ts

coderabbitai

🧹 Nitpick comments (1)

code/core/src/core-server/server-channel/ghost-stories-channel.test.ts (1)

323-330: ⚡ Quick win

Also assert cumulative* metrics in the failure-path payload.

On Line 323 onward, this test only validates run* metrics. Since cumulative scoring is the core behavior change, asserting cumulative* fields here would prevent regressions in failed-run scenarios.

Proposed test expectation update

           results: expect.objectContaining({
             runTotal: 2,
             runPassed: 0,
             runSuccessRate: 0,
             // runCategorizedErrors is an object keyed by error category
             runCategorizedErrors: expect.any(Object),
             runCssCheck: 'not-run',
             runUniqueErrorCount: expect.any(Number),
             runPassedButEmptyRender: 0,
+            cumulativeTotal: 2,
+            cumulativePassed: 0,
+            cumulativeSuccessRate: 0,
+            cumulativeSuccessRateWithoutEmptyRender: 0,
+            cumulativeCategorizedErrors: expect.any(Object),
+            cumulativeCssCheck: 'not-run',
+            cumulativeUniqueErrorCount: expect.any(Number),
+            cumulativePassedButEmptyRender: 0,
           }),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@code/core/src/core-server/server-channel/ghost-stories-channel.test.ts`
around lines 323 - 330, The test in ghost-stories-channel.test.ts currently
asserts only the run* metrics (e.g., runTotal, runPassed, runSuccessRate,
runCategorizedErrors, runCssCheck, runUniqueErrorCount, runPassedButEmptyRender)
for the failure-path payload; update the expectation to also assert the
corresponding cumulative* metrics (e.g., cumulativeTotal, cumulativePassed,
cumulativeSuccessRate, cumulativeCategorizedErrors, cumulativeCssCheck,
cumulativeUniqueErrorCount, cumulativePassedButEmptyRender) using the same
matcher types (expect.any(Number) or expect.any(Object) or exact string values)
so the failure-path verifies cumulative scoring fields alongside the run
metrics.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@code/core/src/core-server/server-channel/ghost-stories-channel.test.ts`:
- Around line 323-330: The test in ghost-stories-channel.test.ts currently
asserts only the run* metrics (e.g., runTotal, runPassed, runSuccessRate,
runCategorizedErrors, runCssCheck, runUniqueErrorCount, runPassedButEmptyRender)
for the failure-path payload; update the expectation to also assert the
corresponding cumulative* metrics (e.g., cumulativeTotal, cumulativePassed,
cumulativeSuccessRate, cumulativeCategorizedErrors, cumulativeCssCheck,
cumulativeUniqueErrorCount, cumulativePassedButEmptyRender) using the same
matcher types (expect.any(Number) or expect.any(Object) or exact string values)
so the failure-path verifies cumulative scoring fields alongside the run
metrics.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 06481320-031d-4ada-9283-a383c2671c2a

📥 Commits

Reviewing files that changed from the base of the PR and between 1804ced and a88c0ea.

📒 Files selected for processing (3)

code/core/src/core-server/server-channel/ghost-stories-channel.test.ts
scripts/eval/lib/grade.ts
scripts/eval/lib/story-render.ts

Implement cumulative self-healing scoring and history caching for sto…

1804ced

…ry test results

yannbf self-assigned this May 4, 2026

yannbf added bug cli ci:normal labels May 4, 2026

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

Comment thread code/addons/vitest/src/vitest-plugin/agent-story-history-cache.ts

fix issues

a88c0ea

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

Sidnioulz approved these changes May 4, 2026

View reviewed changes

Comment thread code/core/src/shared/utils/analyze-test-results.ts

Comment thread code/core/src/shared/utils/analyze-test-results.test.ts

Sidnioulz merged commit 250db3f into next May 4, 2026
124 checks passed

Sidnioulz deleted the yann/cummulative-self-healing-score branch May 4, 2026 13:49

github-actions Bot mentioned this pull request May 4, 2026

Release: Prerelease 10.4.0-alpha.16 #34695

Merged

14 tasks

coderabbitai Bot mentioned this pull request May 4, 2026

Maintenance: Enhance ghost stories internal tests #34707

Merged

8 tasks

coderabbitai Bot mentioned this pull request May 13, 2026

Maintenance: Fix self healing payload #34782

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CLI: Improve self-healing scoring observability#34699

CLI: Improve self-healing scoring observability#34699
Sidnioulz merged 2 commits into
nextfrom
yann/cummulative-self-healing-score

yannbf commented May 4, 2026 •

edited by storybook-bot

Loading

Uh oh!

coderabbitai Bot commented May 4, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Possibly Related PRs

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

yannbf commented May 4, 2026 • edited by storybook-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What I did

Checklist for Contributors

Testing

The changes in this PR are covered in the following automated tests:

Manual testing

Documentation

Checklist for Maintainers

🦋 Canary release

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Possibly Related PRs

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yannbf commented May 4, 2026 •

edited by storybook-bot

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading