Skip to content

Conversation

benceruleanlu
Copy link
Member

@benceruleanlu benceruleanlu commented Oct 7, 2025

Note for reviewers: Contains a lot of bash scripting

Summary

Before, our snapshot update would run all playwright tests, even the ones that did not fail.

With this PR, we now only run the failed tests for snapshot updates.

Changes

  1. Generates a custom text manifest containing failed test_file:line, which playwright supports as a way to run that test in particular
  2. Update workflow finds this latest text manifest from the GH API, using it to update snapshots

┆Issue is synchronized with this Notion page by Unito

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Oct 7, 2025
Copy link

github-actions bot commented Oct 7, 2025

🎨 Storybook Build Status

Build completed successfully!

⏰ Completed at: 10/08/2025, 02:34:24 AM UTC

🔗 Links


🎉 Your Storybook is ready for review!

Copy link

github-actions bot commented Oct 7, 2025

🎭 Playwright Test Results

⚠️ Tests passed with flaky tests

⏰ Completed at: 10/08/2025, 02:44:39 AM UTC

📈 Summary

  • Total Tests: 488
  • Passed: 455 ✅
  • Failed: 0
  • Flaky: 3 ⚠️
  • Skipped: 30 ⏭️

📊 Test Reports by Browser

  • chromium: View Report • ✅ 446 / ❌ 0 / ⚠️ 3 / ⏭️ 30
  • chromium-2x: View Report • ✅ 2 / ❌ 0 / ⚠️ 0 / ⏭️ 0
  • chromium-0.5x: View Report • ✅ 1 / ❌ 0 / ⚠️ 0 / ⏭️ 0
  • mobile-chrome: View Report • ✅ 6 / ❌ 0 / ⚠️ 0 / ⏭️ 0

🎉 Click on the links above to view detailed test results for each browser configuration.

@benceruleanlu
Copy link
Member Author

@christian-byrne I manually wrote this PR description to be as helpful as possible, LMKWYT if you have time

@benceruleanlu benceruleanlu requested a review from snomiao October 7, 2025 02:14
Copy link
Contributor

@DrJKL DrJKL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, but for the record:
I can't block this, but I would strongly oppose adding such complexity before even trying simpler solutions like filtering the update runs using --last-failed with the results we're already storing.
I'm also worried about the mix in here of having: an inlined github-script, an inlined bash script, and a separate typescript script that collectively depend on one another by loose contract.

@benceruleanlu
Copy link
Member Author

Discussed offline, but for the record: I can't block this, but I would strongly oppose adding such complexity before even trying simpler solutions like filtering the update runs using --last-failed with the results we're already storing. I'm also worried about the mix in here of having: an inlined github-script, an inlined bash script, and a separate typescript script that collectively depend on one another by loose contract.

  • We considered --last-failed, but it relies on Playwright’s local state from the
    previous run. Our update job runs in a fresh workflow and is sharded, so we’d have to
    persist and merge Playwright’s internal “last failed” state, which isn’t exposed via
    the reporters we already store. That ends up being more brittle and doesn’t filter to
    screenshot-only failures.
  • The current approach uses artifacts we already produce (merged JSON) and targets only
    tests that have screenshot attachments, which reduces unnecessary re-runs.
  • I agree on keeping things simple. If helpful, I can collapse the bash loop and
    manifest handoff into a single Node step that reads the merged JSON and invokes
    Playwright per project directly. That removes the extra artifact/loop while keeping
    the same precision.

@christian-byrne christian-byrne added the claude-review Add to trigger a PR code review from Claude Code label Oct 8, 2025
if: ${{ needs.playwright-tests-chromium-sharded.result == 'failure' }}
run: |
set -euo pipefail
pnpm tsx scripts/cicd/build-failed-screenshot-manifest.ts
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[quality] medium Priority

Issue: Using pnpm tsx without explicit error handling for TypeScript execution
Context: If the TypeScript script fails to compile or execute, the failure might be silently ignored
Suggestion: Add explicit error handling or use --exit-code to ensure script failures propagate properly

continue
fi
echo "Re-running ${#filtered[@]} tests for project $project"
PLAYWRIGHT_JSON_OUTPUT_NAME=playwright-report/report.json \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[security] high Priority

Issue: Potential command injection vulnerability when processing test file names
Context: Files are read from the manifest and passed directly as command arguments without validation. Malicious test file names could inject commands
Suggestion: Add input validation and sanitization for test file names, or use safer parameter passing mechanisms

}

const raw = await fsp.readFile(reportPath, 'utf8')
const data = JSON.parse(raw)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[quality] medium Priority

Issue: JSON.parse without error handling for malformed JSON
Context: If the Playwright report JSON is malformed or corrupted, the script will crash with an unclear error message
Suggestion: Add try-catch block around JSON.parse with descriptive error message for debugging

last && last.status === 'failed' && hasScreenshotSignal(last)
if (!failedScreenshot) continue
if (!out.has(project)) out.set(project, new Set())
out.get(project)!.add(loc)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[quality] medium Priority

Issue: Non-null assertion operator without proper null checking
Context: Using out.get(project)! assumes the Map entry exists, but there's a potential race condition or logic error where it might not
Suggestion: Use safer access pattern like const set = out.get(project); if (set) set.add(loc)

Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comprehensive PR Review

This review is generated by Claude. It may not always be accurate, as with human reviewers. If you believe that any of the comments are invalid or incorrect, please state why for each. For others, please implement the changes in one way or another.

Review Summary

PR: Only run failed tests for snapshot update (#5954)
Impact: 209 additions, 12 deletions across 3 files

Issue Distribution

  • Critical: 0
  • High: 1
  • Medium: 3
  • Low: 0

Category Breakdown

  • Architecture: 0 issues
  • Security: 1 issues
  • Performance: 0 issues
  • Code Quality: 3 issues

Key Findings

Architecture & Design

The PR implements a selective test execution pattern that aligns well with CI optimization practices. The manifest-based approach is a sound architectural choice that reduces unnecessary test runs while maintaining test coverage for failures. The separation of concerns between the manifest generation (TypeScript) and consumption (bash) is appropriate.

Security Considerations

Critical Finding: Command injection vulnerability in the bash script processing test file names. File names from the manifest are passed directly as command arguments without validation, potentially allowing malicious test files to inject commands. This needs immediate attention as it could compromise the CI environment.

Performance Impact

The PR significantly improves CI performance by running only failed screenshot tests instead of the full suite. This is a substantial optimization that will reduce CI execution time and resource usage. The selective execution pattern should provide meaningful speed improvements for large test suites.

Integration Points

The integration between the CI workflow and the snapshot update workflow is well-designed. The artifact-based communication pattern using GitHub Actions artifacts is robust and follows best practices. The fallback mechanisms ensure reliability when manifests are unavailable.

Positive Observations

  • Excellent bash error handling with set -euo pipefail
  • Proper fallback mechanisms when manifests are not available
  • Clean TypeScript code with appropriate type annotations
  • Good separation of concerns between different workflow steps
  • Comprehensive artifact handling and retention policies

Next Steps

  1. Address critical security issue before merge - sanitize test file names in bash script
  2. Add error handling for JSON parsing in TypeScript manifest script
  3. Consider safer non-null assertion patterns for TypeScript code
  4. Add explicit error handling for TypeScript script execution in workflow

This is a comprehensive automated review. For architectural decisions requiring human judgment, please request additional manual review.

@benceruleanlu benceruleanlu marked this pull request as draft October 9, 2025 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
claude-review Add to trigger a PR code review from Claude Code size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants