Only run failed tests for snapshot update #5954

benceruleanlu · 2025-10-07T02:13:26Z

Note for reviewers: Contains a lot of bash scripting

Summary

Before, our snapshot update would run all playwright tests, even the ones that did not fail.

With this PR, we now only run the failed tests for snapshot updates.

Changes

Generates a custom text manifest containing failed test_file:line, which playwright supports as a way to run that test in particular
Update workflow finds this latest text manifest from the GH API, using it to update snapshots

┆Issue is synchronized with this Notion page by Unito

github-actions · 2025-10-07T02:13:44Z

🎨 Storybook Build Status

✅ Build completed successfully!

⏰ Completed at: 10/08/2025, 02:34:24 AM UTC

🔗 Links

📊 View Workflow Run
🎨 View Storybook

🎉 Your Storybook is ready for review!

github-actions · 2025-10-07T02:13:53Z

🎭 Playwright Test Results

⚠️ Tests passed with flaky tests

⏰ Completed at: 10/08/2025, 02:44:39 AM UTC

📈 Summary

Total Tests: 488
Passed: 455 ✅
Failed: 0
Flaky: 3 ⚠️
Skipped: 30 ⏭️

📊 Test Reports by Browser

✅ chromium: View Report • ✅ 446 / ❌ 0 / ⚠️ 3 / ⏭️ 30
✅ chromium-2x: View Report • ✅ 2 / ❌ 0 / ⚠️ 0 / ⏭️ 0
✅ chromium-0.5x: View Report • ✅ 1 / ❌ 0 / ⚠️ 0 / ⏭️ 0
✅ mobile-chrome: View Report • ✅ 6 / ❌ 0 / ⚠️ 0 / ⏭️ 0

🎉 Click on the links above to view detailed test results for each browser configuration.

benceruleanlu · 2025-10-07T02:14:09Z

@christian-byrne I manually wrote this PR description to be as helpful as possible, LMKWYT if you have time

DrJKL

Discussed offline, but for the record:
I can't block this, but I would strongly oppose adding such complexity before even trying simpler solutions like filtering the update runs using --last-failed with the results we're already storing.
I'm also worried about the mix in here of having: an inlined github-script, an inlined bash script, and a separate typescript script that collectively depend on one another by loose contract.

benceruleanlu · 2025-10-08T02:24:31Z

Discussed offline, but for the record: I can't block this, but I would strongly oppose adding such complexity before even trying simpler solutions like filtering the update runs using --last-failed with the results we're already storing. I'm also worried about the mix in here of having: an inlined github-script, an inlined bash script, and a separate typescript script that collectively depend on one another by loose contract.

We considered --last-failed, but it relies on Playwright’s local state from the
previous run. Our update job runs in a fresh workflow and is sharded, so we’d have to
persist and merge Playwright’s internal “last failed” state, which isn’t exposed via
the reporters we already store. That ends up being more brittle and doesn’t filter to
screenshot-only failures.
The current approach uses artifacts we already produce (merged JSON) and targets only
tests that have screenshot attachments, which reduces unnecessary re-runs.
I agree on keeping things simple. If helpful, I can collapse the bash loop and
manifest handoff into a single Node step that reads the merged JSON and invokes
Playwright per project directly. That removes the extra artifact/loop while keeping
the same precision.

…-update

claude · 2025-10-08T17:39:26Z

.github/workflows/tests-ci.yaml

+        if: ${{ needs.playwright-tests-chromium-sharded.result == 'failure' }}
+        run: |
+          set -euo pipefail
+          pnpm tsx scripts/cicd/build-failed-screenshot-manifest.ts


[quality] medium Priority

Issue: Using pnpm tsx without explicit error handling for TypeScript execution
Context: If the TypeScript script fails to compile or execute, the failure might be silently ignored
Suggestion: Add explicit error handling or use --exit-code to ensure script failures propagate properly

claude · 2025-10-08T17:39:34Z

.github/workflows/update-playwright-expectations.yaml

+              continue
+            fi
+            echo "Re-running ${#filtered[@]} tests for project $project"
+            PLAYWRIGHT_JSON_OUTPUT_NAME=playwright-report/report.json \


[security] high Priority

Issue: Potential command injection vulnerability when processing test file names
Context: Files are read from the manifest and passed directly as command arguments without validation. Malicious test file names could inject commands
Suggestion: Add input validation and sanitization for test file names, or use safer parameter passing mechanisms

claude · 2025-10-08T17:39:40Z

scripts/cicd/build-failed-screenshot-manifest.ts

+  }
+
+  const raw = await fsp.readFile(reportPath, 'utf8')
+  const data = JSON.parse(raw)


[quality] medium Priority

Issue: JSON.parse without error handling for malformed JSON
Context: If the Playwright report JSON is malformed or corrupted, the script will crash with an unclear error message
Suggestion: Add try-catch block around JSON.parse with descriptive error message for debugging

claude · 2025-10-08T17:39:47Z

scripts/cicd/build-failed-screenshot-manifest.ts

+          last && last.status === 'failed' && hasScreenshotSignal(last)
+        if (!failedScreenshot) continue
+        if (!out.has(project)) out.set(project, new Set())
+        out.get(project)!.add(loc)


[quality] medium Priority

Issue: Non-null assertion operator without proper null checking
Context: Using out.get(project)! assumes the Map entry exists, but there's a potential race condition or logic error where it might not
Suggestion: Use safer access pattern like const set = out.get(project); if (set) set.add(loc)

claude

Comprehensive PR Review

This review is generated by Claude. It may not always be accurate, as with human reviewers. If you believe that any of the comments are invalid or incorrect, please state why for each. For others, please implement the changes in one way or another.

Review Summary

PR: Only run failed tests for snapshot update (#5954)
Impact: 209 additions, 12 deletions across 3 files

Issue Distribution

Critical: 0
High: 1
Medium: 3
Low: 0

Category Breakdown

Architecture: 0 issues
Security: 1 issues
Performance: 0 issues
Code Quality: 3 issues

Key Findings

Architecture & Design

The PR implements a selective test execution pattern that aligns well with CI optimization practices. The manifest-based approach is a sound architectural choice that reduces unnecessary test runs while maintaining test coverage for failures. The separation of concerns between the manifest generation (TypeScript) and consumption (bash) is appropriate.

Security Considerations

Critical Finding: Command injection vulnerability in the bash script processing test file names. File names from the manifest are passed directly as command arguments without validation, potentially allowing malicious test files to inject commands. This needs immediate attention as it could compromise the CI environment.

Performance Impact

The PR significantly improves CI performance by running only failed screenshot tests instead of the full suite. This is a substantial optimization that will reduce CI execution time and resource usage. The selective execution pattern should provide meaningful speed improvements for large test suites.

Integration Points

The integration between the CI workflow and the snapshot update workflow is well-designed. The artifact-based communication pattern using GitHub Actions artifacts is robust and follows best practices. The fallback mechanisms ensure reliability when manifests are unavailable.

Positive Observations

Excellent bash error handling with set -euo pipefail
Proper fallback mechanisms when manifests are not available
Clean TypeScript code with appropriate type annotations
Good separation of concerns between different workflow steps
Comprehensive artifact handling and retention policies

Next Steps

Address critical security issue before merge - sanitize test file names in bash script
Add error handling for JSON parsing in TypeScript manifest script
Consider safer non-null assertion patterns for TypeScript code
Add explicit error handling for TypeScript script execution in workflow

This is a comprehensive automated review. For architectural decisions requiring human judgment, please request additional manual review.

Only update snapshots of failed

87d3111

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Oct 7, 2025

benceruleanlu requested a review from snomiao October 7, 2025 02:14

DrJKL reviewed Oct 7, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into bl-selective-snapshot…

83ff415

…-update

christian-byrne added the claude-review Add to trigger a PR code review from Claude Code label Oct 8, 2025

claude bot reviewed Oct 8, 2025

View reviewed changes

benceruleanlu marked this pull request as draft October 9, 2025 02:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Only run failed tests for snapshot update #5954

Only run failed tests for snapshot update #5954

Uh oh!

benceruleanlu commented Oct 7, 2025 •

edited by sync-by-unito bot

Loading

Uh oh!

github-actions bot commented Oct 7, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 7, 2025 •

edited

Loading

Uh oh!

benceruleanlu commented Oct 7, 2025

Uh oh!

DrJKL left a comment

Uh oh!

benceruleanlu commented Oct 8, 2025

Uh oh!

claude bot Oct 8, 2025

Uh oh!

claude bot Oct 8, 2025

Uh oh!

claude bot Oct 8, 2025

Uh oh!

claude bot Oct 8, 2025

Uh oh!

claude bot left a comment

Uh oh!

Uh oh!

Only run failed tests for snapshot update #5954

Are you sure you want to change the base?

Only run failed tests for snapshot update #5954

Uh oh!

Conversation

benceruleanlu commented Oct 7, 2025 • edited by sync-by-unito bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Uh oh!

github-actions bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎨 Storybook Build Status

🔗 Links

Uh oh!

github-actions bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎭 Playwright Test Results

📈 Summary

📊 Test Reports by Browser

Uh oh!

benceruleanlu commented Oct 7, 2025

Uh oh!

DrJKL left a comment

Choose a reason for hiding this comment

Uh oh!

benceruleanlu commented Oct 8, 2025

Uh oh!

claude bot Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Comprehensive PR Review

Review Summary

Issue Distribution

Category Breakdown

Key Findings

Architecture & Design

Security Considerations

Performance Impact

Integration Points

Positive Observations

Next Steps

Uh oh!

Uh oh!

benceruleanlu commented Oct 7, 2025 •

edited by sync-by-unito bot

Loading

github-actions bot commented Oct 7, 2025 •

edited

Loading

github-actions bot commented Oct 7, 2025 •

edited

Loading