Skip to content

Telemetry: Add timeout to event-log POST to prevent build hang#35085

Merged
Sidnioulz merged 1 commit into
storybookjs:nextfrom
badams:fix/telemetry-fetch-timeout
Jun 8, 2026
Merged

Telemetry: Add timeout to event-log POST to prevent build hang#35085
Sidnioulz merged 1 commit into
storybookjs:nextfrom
badams:fix/telemetry-fetch-timeout

Conversation

@badams

@badams badams commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Closes #35084 (and likely the root cause of #29828).

What I did

storybook build can hang after compiling: the post-build telemetry POST is a fetch with no timeout, so a stalled connection to the event-log endpoint never settles, the awaited call never returns, and the open socket keeps the event loop alive — the CLI never exits. (try/catch can't catch a non-rejection; fetch-retry only retries a settled attempt.)

Time-box the request in prepareRequest (core/src/telemetry/telemetry.ts):

  1. Timeoutsignal: AbortSignal.timeout(30_000), created once so it bounds the whole request.
  2. Stop retrying on abortretryOn becomes a function returning false once signal.aborted. Otherwise fetch-retry treats the AbortError as retryable and burns the 1s/2s/4s backoff on instant-aborting attempts (~37s, not 30s). The function also bounds attempts explicitly (attempt >= maxRetries), which the array form did implicitly.
const signal = AbortSignal.timeout(30_000);
const maxRetries = 3;

return retryingFetch(URL, {
  method: 'post',
  body: JSON.stringify(body),
  headers: { 'Content-Type': 'application/json' },
  retryDelay: (attempt) => 2 ** attempt * (options?.retryDelay ?? 1000),
  retryOn: (attempt, error, response) => {
    if (signal.aborted || attempt >= maxRetries) return false;
    return Boolean(error) || response?.status === 503 || response?.status === 504;
  },
  signal,
});

503/504 and transient errors still retry; sendTelemetry already swallows errors, so a timed-out event is just dropped — no user-visible change.

Checklist for Contributors

Testing

The changes in this PR are covered in the following automated tests:

  • stories
  • unit tests
  • integration tests
  • end-to-end tests

core/src/telemetry/telemetry.test.ts adds a regression test: a fetch that only settles once aborted is abandoned after the timeout and not retried (exactly one attempt). Existing 503/give-up tests still pass.

Manual testing

The repro ships this exact change as a patch, so you can see before/after on the released build:

git clone https://github.com/badams/storybook-telemetry-hang-repro
cd storybook-telemetry-hang-repro && npm install   # unpatched storybook 10.4.1

# blackhole endpoint: accepts the connection, never replies
node -e 'require("net").createServer(s => s.on("data", () => {})).listen(9999)' &

# 1. Before — reproduce the hang
STORYBOOK_TELEMETRY_URL=http://localhost:9999/event-log npx storybook build
#    → compiles, then stalls instead of exiting (ctrl-C to stop)

# 2. Apply this fix (patches/storybook+10.4.1.patch == the change in this PR)
npx patch-package

# 3. After — same command now exits
STORYBOOK_TELEMETRY_URL=http://localhost:9999/event-log npx storybook build
#    → aborts the telemetry POST at 30s and exits cleanly

Documentation

  • No documentation change needed (internal bugfix).

Notes

  • AbortSignal.timeout() needs Node ≥17.3; Storybook requires Node ≥20.
  • Scope: time-boxes the event-log POST only; other telemetry fetches (version-check / notify) are a follow-up.
  • Suggested label: bug.

Disclosure (per CONTRIBUTING.md): I used Claude heavily for the investigation, reproduction, and fix — all under my direction and review. Happy to go deeper on any of it.

Summary by CodeRabbit

  • Bug Fixes

    • Telemetry requests now include a 30-second timeout to prevent indefinite hangs and improve application responsiveness.
    • Improved retry behavior to handle transient network errors and temporary server issues more effectively.
  • Tests

    • Added comprehensive test coverage to verify telemetry requests complete properly without hanging or unnecessary retries.

A stalled connection to the telemetry endpoint could keep the Node event
loop alive indefinitely, hanging `storybook build` after compilation. The
post-build telemetry POST used `fetch` with no timeout/AbortSignal, so a
connection that never settles was never abandoned, and the awaited call
never returned.

Time-box the request with `AbortSignal.timeout(30_000)` and switch
`retryOn` to a function that stops retrying once the signal aborts (the
array form treats the resulting AbortError as retryable, burning the
backoff). 503/504 and transient network errors still retry.

Closes storybookjs#35084
@badams badams changed the title fix(telemetry): add timeout to event-log POST to prevent build hang Telemetry: Add timeout to event-log POST to prevent build hang Jun 7, 2026
@coderabbitai

coderabbitai Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds explicit timeout handling to telemetry requests by introducing a 30-second abort signal and refactoring retry logic to respect abort conditions. The implementation prevents hanging requests by aborting on timeout and skipping retries for aborted requests. A new test validates the timeout behavior without hanging.

Changes

Telemetry timeout and abort signal handling

Layer / File(s) Summary
Timeout and retry implementation
code/core/src/telemetry/telemetry.ts
prepareRequest creates a 30-second AbortSignal timeout and configures retryingFetch with maxRetries = 3. Retry logic stops when the request is aborted or attempt count reaches the maximum, and only retries on transient network errors or server overload (503/504 responses). The abort signal is passed into the fetch request.
Test setup and timeout validation
code/core/src/telemetry/telemetry.test.ts
Import afterEach and add a cleanup hook that restores mocks after each test. New test case verifies sendTelemetry does not hang when fetch times out: wires AbortSignal.timeout() to a controlled AbortController, mocks fetch to reject only on abort, asserts the 30-second timeout is set, aborts the controller, and confirms the aborted request is not retried.

Sequence Diagram

sequenceDiagram
  participant sendTelemetry
  participant prepareRequest
  participant retryingFetch
  participant fetch
  participant AbortSignal as AbortSignal.timeout()
  
  sendTelemetry->>prepareRequest: initiate request
  prepareRequest->>AbortSignal: create 30s timeout signal
  prepareRequest->>retryingFetch: call with signal and maxRetries=3
  loop Retry on network error or 503/504
    retryingFetch->>fetch: call with abort signal
    Note over fetch: fetch rejects on abort
    fetch-->>retryingFetch: error or response
    alt Request aborted
      retryingFetch-->>sendTelemetry: stop retrying, settle promise
    else Network error or 503/504 and attempt < maxRetries
      retryingFetch->>fetch: retry
    else Success or other error
      retryingFetch-->>sendTelemetry: settle with result
    end
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@code/core/src/telemetry/telemetry.test.ts`:
- Around line 40-77: Wrap this test in a describe block and move all mock
setup/behavior into a beforeEach: relocate the vi.spyOn(AbortSignal,
'timeout').mockReturnValue(...), fetchMock.mockImplementation(...) and any
related AbortController creation to the describe-level beforeEach so the test
body only performs invocation and assertions for sendTelemetry; keep the test
name and assertions unchanged and reference fetchMock, sendTelemetry,
AbortSignal.timeout, controller.abort and vi.waitFor when moving the setup so
the mock behavior is applied before each test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5646a024-583c-46f2-b51b-998373c09a98

📥 Commits

Reviewing files that changed from the base of the PR and between 78e4393 and 7ef5919.

📒 Files selected for processing (2)
  • code/core/src/telemetry/telemetry.test.ts
  • code/core/src/telemetry/telemetry.ts

Comment thread code/core/src/telemetry/telemetry.test.ts
@Sidnioulz Sidnioulz added bug ci:normal Run our default set of CI jobs (choose this for most PRs). qa:skip Pull Requests that do not need any QA. labels Jun 8, 2026
@Sidnioulz Sidnioulz self-assigned this Jun 8, 2026
@Sidnioulz Sidnioulz added the patch:yes Bugfix & documentation PR that need to be picked to main branch label Jun 8, 2026

@Sidnioulz Sidnioulz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the report, investigation and PR @badams!

I've enabled CI, let's see how it goes. I'll notify you if further work is needed on your end.

@storybook-app-bot

Copy link
Copy Markdown

Package Benchmarks

Commit: 7ef5919, ran on 8 June 2026 at 09:22:06 UTC

The following packages have significant changes to their size or dependencies:

storybook

Before After Difference
Dependency count 72 74 🚨 +2 🚨
Self size 20.73 MB 20.40 MB 🎉 -333 KB 🎉
Dependency size 36.11 MB 36.65 MB 🚨 +539 KB 🚨
Bundle Size Analyzer Link Link

@storybook/cli

Before After Difference
Dependency count 203 205 🚨 +2 🚨
Self size 908 KB 908 KB 🎉 -144 B 🎉
Dependency size 88.98 MB 89.19 MB 🚨 +205 KB 🚨
Bundle Size Analyzer Link Link

@storybook/codemod

Before After Difference
Dependency count 196 198 🚨 +2 🚨
Self size 32 KB 32 KB 0 B
Dependency size 87.47 MB 87.67 MB 🚨 +205 KB 🚨
Bundle Size Analyzer Link Link

create-storybook

Before After Difference
Dependency count 73 75 🚨 +2 🚨
Self size 1.08 MB 1.08 MB 0 B
Dependency size 56.84 MB 57.05 MB 🚨 +205 KB 🚨
Bundle Size Analyzer node node

@Sidnioulz Sidnioulz merged commit 9b194a2 into storybookjs:next Jun 8, 2026
148 of 152 checks passed
@badams badams deleted the fix/telemetry-fetch-timeout branch June 10, 2026 02:48
@github-actions github-actions Bot mentioned this pull request Jun 10, 2026
24 tasks
@github-actions github-actions Bot added the patch:done Patch/release PRs already cherry-picked to main/release branch label Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-scan:human bug ci:normal Run our default set of CI jobs (choose this for most PRs). patch:done Patch/release PRs already cherry-picked to main/release branch patch:yes Bugfix & documentation PR that need to be picked to main branch qa:skip Pull Requests that do not need any QA.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: storybook build hangs forever when the telemetry endpoint stalls (no fetch timeout)

2 participants