Skip to content

Build: Improve NX Cloud performance and flakiness with enterprise plan#34122

Merged
kasperpeulen merged 63 commits into
nextfrom
kasper/stabilize-onboarding-e2e-waits
Mar 23, 2026
Merged

Build: Improve NX Cloud performance and flakiness with enterprise plan#34122
kasperpeulen merged 63 commits into
nextfrom
kasper/stabilize-onboarding-e2e-waits

Conversation

@kasperpeulen
Copy link
Copy Markdown
Member

@kasperpeulen kasperpeulen commented Mar 13, 2026

Closes #

What I did

We're trying out NX Cloud as an alternative for Circle CI — we have the free enterprise plan for a month to evaluate. This PR improves the NX Cloud CI pipeline.

Removed uncacheable prepare-sandbox NX target

The sandbox preparation step (copying from cache, waiting for verdaccio, running yarn install) is now inlined into each downstream task's run() function. These targets were uncacheable because they moved cached output to the working directory — NX Cloud reruns work best when every target in the graph is cacheable. This also makes NX Cloud runs significantly faster.

Dynamic changeset-based agent distribution

Now that we're on the NX enterprise plan for a month, we can use more agents. Agents now scale based on changeset size (small/medium/large/extra-large) instead of a fixed 20-agent pool. A separate daily distribution config uses even more agents for the larger daily sandbox set — the daily run now takes less than 10 minutes.

Reduced agent disk usage

NX Cloud agents were running out of disk space. To fix this:

  • Switched to the global yarn cache (enableGlobalCache: true) so each sandbox doesn't duplicate its own .yarn/cache directory.
  • Removed node_modules from the NX Cloud cache paths (the yarn global cache is sufficient).
  • Removed Cypress browser cache and installed only Playwright Chromium instead of all browsers.

Because the global cache is shared across sandboxes, stale @storybook/* packages (same version, different contents from verdaccio) can cause yarn install to pick up published versions instead of local builds. A cleanup step purges these entries before each run to force fresh copies from verdaccio.

Core compile DTS retry logic

Found a way to make the core compile task less flaky. Parallel DTS generation processes can race when one reads another's partially-written .d.ts output (TS2306). Instead of immediately bailing and killing all processes on the first failure, failed entries are now retried sequentially after all parallel processes complete. By that point all dependency .d.ts files are fully written, so the retry almost always succeeds. This adds ~8s only when the race actually occurs.

NX Cloud remote caching disabled on Circle CI (temporary)

NX Cloud remote caching is disabled on Circle CI so we can measure the true cost of NX Cloud without also getting cache-hit costs from Circle CI runs. This will be reverted once the NX Cloud experiment concludes.

E2E test stabilization

  • Playwright setup project. Added a dedicated Playwright setup project that waits for Storybook readiness through /index.json, opens example-button--primary, verifies the manager selection, and waits for the preview to settle before the browser tests start.
  • Onboarding test fixes. Tightened E2E synchronization by clearing the checklist cache and using trySelectStory for reliable story selection after creation.

Misc

  • Removed cypress from ALL_TASKS in the NX workflow. We can likely remove cypress from CI altogether, but for now it's removed from the NX experiment.
  • Upgraded NX from 22.1.3 to 22.6.1.
  • Added isNxTaskExecution() helper replacing scattered NX_CLI_SET checks.
  • Added $schema references to project.json files.
  • Added Playwright result outputs to NX task cache configuration.

Checklist for Contributors

Testing

The changes in this PR are covered in the following automated tests:

  • stories
  • unit tests
  • integration tests
  • end-to-end tests

Manual testing

Verify in the NX Cloud dashboard that runs are faster and less flaky compared to previous runs.

Documentation

  • Add or update documentation reflecting your changes
  • If you are deprecating/removing a feature, make sure to update
    MIGRATION.MD

Checklist for Maintainers

  • When this PR is ready for testing, make sure to add ci:normal, ci:merged or ci:daily GH label to it to run a specific set of sandboxes. The particular set of sandboxes can be found in code/lib/cli-storybook/src/sandbox-templates.ts

  • Make sure this PR contains one of the labels below:

    Available labels
    • bug: Internal changes that fixes incorrect behavior.
    • maintenance: User-facing maintenance tasks.
    • dependencies: Upgrading (sometimes downgrading) dependencies.
    • build: Internal-facing build tooling & test updates. Will not show up in release changelog.
    • cleanup: Minor cleanup style change. Will not show up in release changelog.
    • documentation: Documentation only changes. Will not show up in release changelog.
    • feature request: Introducing a new feature.
    • BREAKING CHANGE: Changes that break compatibility in some way with current major version.
    • other: Changes that don't fit in the above categories.

🦋 Canary release

This PR does not have a canary release associated. You can request a canary release of this pull request by mentioning the @storybookjs/core team here.

core team members can create a canary release here or locally with gh workflow run --repo storybookjs/storybook publish.yml --field pr=<PR_NUMBER>

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new QA GitHub Actions workflow that runs Nx/Playwright e2e attempts, collects per-attempt artifacts and summaries; updates an onboarding e2e test to wait for story navigation and perform survey interactions inside the "Storybook user survey" dialog.

Changes

Cohort / File(s) Summary
E2E Test Flow Adjustment
code/e2e-tests/addon-onboarding.spec.ts
After creating a story, wait for and validate story URL and "Last" control visibility. Move subsequent user-input interactions (checkboxes, referrer selection, submit) into the "Storybook user survey" dialog, asserting dialog visibility before interacting.
QA CI Workflow
.github/workflows/qa.yml
Add new "QA" workflow that prepares an attempts matrix and runs a matrix job per attempt: checkout, Node setup, install deps and Playwright deps, start Nx Cloud distributed run, run e2e target with production config, extract Nx Cloud run URL, upload nx-output.log and conditional failure artifacts, and write a QA summary to the workflow step summary.

Sequence Diagram(s)

sequenceDiagram
    participant GH as GitHub Actions
    participant Matrix as Matrix Coordinator
    participant Job as QA Job (attempt)
    participant Runner as Job Runner
    participant NxCloud as Nx Cloud
    participant Artifact as Artifact Storage

    GH->>Matrix: trigger QA workflow
    Matrix->>Job: spawn attempt jobs
    loop per attempt
      Job->>Runner: checkout + setup node + install deps + playwright deps
      Runner->>NxCloud: start nx distributed run (nx run-many ...)
      NxCloud-->>Runner: return nx_cloud_run link & status
      Runner->>Artifact: upload nx-output.log
      alt failure
        Runner->>Artifact: upload test-results and playwright artifacts
      end
      Runner->>GH: emit per-attempt qa-status/result.json
    end
    GH->>Artifact: collect per-attempt artifacts
    GH->>GH: write workflow step summary with links
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/qa.yml:
- Line 155: The f-string passed to handle.write currently uses shell-style
`${...}` so it will not interpolate; update the string in the handle.write call
that references results (the expression "handle.write(f\"- Target:
`${results[0]['template']}:{results[0]['target']}`\\n\" if results else \"-
Target: unknown\\n\")") to use Python curly-brace interpolation (e.g.
`{results[0]['template']}` and `{results[0]['target']}`) so the template and
target values are correctly formatted when results is non-empty.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ebecf822-b5a5-4044-b71d-4a1017c020ea

📥 Commits

Reviewing files that changed from the base of the PR and between a8e8078 and 0e9c2ce.

📒 Files selected for processing (1)
  • .github/workflows/qa.yml

Comment thread .github/workflows/qa.yml Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.github/workflows/qa.yml (1)

59-59: Avoid nx@latest in CI to keep runs reproducible.

Line 59 pulls whatever Nx is latest at runtime, which can introduce sudden breakage or behavior drift across attempts.

♻️ Suggested change
-          npx nx@latest start-ci-run
+          yarn nx start-ci-run
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/qa.yml at line 59, The CI step currently runs "npx
nx@latest start-ci-run", which makes runs non-reproducible; update that step to
pin NX to a specific version or use the repository's pinned local binary (for
example replace "npx nx@latest start-ci-run" with a pinned invocation like "npx
nx@<SPECIFIC_VERSION> start-ci-run" or call the project script/local binary such
as "npm run start-ci-run" / "pnpm run start-ci-run" to ensure deterministic CI
behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/qa.yml:
- Around line 36-43: Job-level expressions use the env context incorrectly
(QA_TEMPLATE, QA_TARGET, QA_MAX_PARALLEL) and will fail evaluation; update the
job "name" and the "strategy.max-parallel" and "matrix.attempt" expressions to
use a valid context (e.g., vars.*, inputs.*, or needs.*) or hardcoded values:
replace `${{ env.QA_TEMPLATE }}`, `${{ env.QA_TARGET }}`, and `${{
fromJSON(env.QA_MAX_PARALLEL) }}` with appropriate `vars`/`inputs`/`needs`
references and ensure the matrix `attempt` source
(`needs.prepare.outputs.matrix`) is parsed in a valid context (use needs.prepare
outputs or vars instead of env). Locate these in the job definition where "name"
and "strategy" are defined and update the expressions accordingly.

---

Nitpick comments:
In @.github/workflows/qa.yml:
- Line 59: The CI step currently runs "npx nx@latest start-ci-run", which makes
runs non-reproducible; update that step to pin NX to a specific version or use
the repository's pinned local binary (for example replace "npx nx@latest
start-ci-run" with a pinned invocation like "npx nx@<SPECIFIC_VERSION>
start-ci-run" or call the project script/local binary such as "npm run
start-ci-run" / "pnpm run start-ci-run" to ensure deterministic CI behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 72dc380b-11f8-4b4c-b9b1-632012377286

📥 Commits

Reviewing files that changed from the base of the PR and between 0e9c2ce and 5ed7705.

📒 Files selected for processing (1)
  • .github/workflows/qa.yml

Comment thread .github/workflows/qa.yml Outdated
@nx-cloud
Copy link
Copy Markdown

nx-cloud Bot commented Mar 13, 2026

View your CI Pipeline Execution ↗ for commit a471ffa

Command Status Duration Result
nx run-many -t compile,check,knip,test,pretty-d... ✅ Succeeded 9m 50s View ↗

☁️ Nx Cloud last updated this comment at 2026-03-23 13:11:29 UTC

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/qa.yml:
- Around line 4-8: The workflow trigger is hard-coded to a feature branch
("push" -> "branches" -> kasper/stabilize-onboarding-e2e-waits); update the
trigger so the workflow runs on the intended long-lived branch(s) instead of the
feature branch—for example replace or extend the branches list under on: push:
to include "next" (or "main"/"develop" as appropriate) and/or add a manual
trigger (workflow_dispatch) so the job runs post-merge; modify the on: push:
branches array in the QA workflow accordingly.
- Around line 35-39: Replace the runtime-installed Nx invocation "npx nx@latest
start-ci-run" with the workspace-pinned runner "yarn nx start-ci-run" so the CI
uses the repository's pinned Nx version (matching the later step that uses "yarn
nx"); keep the existing flags (--distribute-on and --stop-agents-after)
unchanged so the behavior is identical except for using the pinned Nx version.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a1316a82-88e3-4ef4-8386-4755fc5fc3b9

📥 Commits

Reviewing files that changed from the base of the PR and between 5ed7705 and 928a5bb.

📒 Files selected for processing (1)
  • .github/workflows/qa.yml

Comment thread .github/workflows/qa.yml Outdated
Comment thread .github/workflows/qa.yml Outdated
@kasperpeulen kasperpeulen changed the title Stabilize onboarding e2e waits Onboarding: Stabilize survey wait in E2E flow Mar 13, 2026
@kasperpeulen kasperpeulen added build Internal-facing build tooling & test updates ci:merged Run the CI jobs that normally run when merged. labels Mar 13, 2026
@kasperpeulen kasperpeulen changed the title Onboarding: Stabilize survey wait in E2E flow Build: Stabilize Playwright readiness and onboarding waits Mar 13, 2026
@storybook-app-bot
Copy link
Copy Markdown

storybook-app-bot Bot commented Mar 13, 2026

Package Benchmarks

Commit: a471ffa, ran on 23 March 2026 at 13:10:30 UTC

No significant changes detected, all good. 👏

@kasperpeulen kasperpeulen changed the title Build: Stabilize Playwright readiness and onboarding waits Onboarding: Stabilize E2E waits and Playwright setup Mar 14, 2026
@kasperpeulen kasperpeulen changed the title Onboarding: Stabilize E2E waits and Playwright setup Build: Improve NX flakiness Mar 14, 2026
Comment thread code/e2e-tests/storybook.setup.ts
@kasperpeulen kasperpeulen added ci:daily Run the CI jobs that normally run in the daily job. and removed ci:merged Run the CI jobs that normally run when merged. labels Mar 14, 2026
@kasperpeulen kasperpeulen force-pushed the kasper/stabilize-onboarding-e2e-waits branch from 7930d4d to dbbfabf Compare March 21, 2026 11:44
Comment thread .nx/workflows/agents.yaml
Comment thread scripts/build/utils/generate-types.ts Outdated
Comment thread scripts/tasks/compile.ts
@kasperpeulen kasperpeulen changed the title Build: Improve NX flakiness Build: Improve NX Cloud performance and flakiness with enterprise plan Mar 23, 2026
@kasperpeulen kasperpeulen added ci:normal and removed ci:daily Run the CI jobs that normally run in the daily job. labels Mar 23, 2026
Copy link
Copy Markdown
Contributor

@valentinpalkovic valentinpalkovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pair-reviewed with @kasperpeulen. LGTM!

@kasperpeulen kasperpeulen merged commit 2649513 into next Mar 23, 2026
123 checks passed
@kasperpeulen kasperpeulen deleted the kasper/stabilize-onboarding-e2e-waits branch March 23, 2026 13:54
@github-actions github-actions Bot mentioned this pull request Mar 23, 2026
17 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Internal-facing build tooling & test updates ci:normal

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants