-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Improve PR preview site artifact clean-up #6341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the gh-pages cleanup script to remove all non-deployed preview directories under pr-preview/. The implementation switches from a temporary file-based approach with --paths-from-file to a Python callback approach with --filename-callback.
Key changes:
- Replaces temporary file approach with inline Python callback for git-filter-repo
- Removes logic for checking if directories exist before cleanup
- Simplifies filtering by checking all paths in pr-preview/ against the list of directories with visible files
| if [ -n "$dir" ]; then | ||
| if ! echo "$dirs_with_visible_files" | grep -q "^${dir}$"; then | ||
| dir_path="pr-preview/$dir" | ||
| echo "Found directory to remove: $dir_path" | ||
| echo "$dir_path" >> "$REMOVE_PATHS_FILE" | ||
| fi | ||
| fi | ||
| done <<< "$all_dirs" | ||
|
|
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable all_dirs is used in the while loop but is never defined. The line that previously defined it (all_dirs=$(git ls-tree -d origin/gh-pages:pr-preview --name-only 2>/dev/null || true)) was removed in this change. This will cause the while loop to process an empty string, making lines 13-20 effectively dead code that never executes.
| CALLBACK=" | ||
| root, *rest = filename.split(b'/') | ||
| keep = b'''$dirs_with_visible_files'''.splitlines() | ||
| if root != b'pr-preview': | ||
| # keep anything outside of pr-preview | ||
| return filename | ||
| elif rest and rest[0] not in keep: | ||
| return None | ||
| else: | ||
| return filename | ||
| " |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Embedding dirs_with_visible_files directly into the CALLBACK Python snippet via b'''$dirs_with_visible_files''' allows unescaped directory names from origin/gh-pages:pr-preview to become executable Python code. A malicious directory name containing a sequence like '''; __import__('os').system('...') # could break out of the bytes literal and execute arbitrary commands in the context of uvx git-filter-repo. To fix this, avoid inlining raw directory names into the callback code and instead pass them as data (for example via an environment variable, temp file, or proper escaping) so they are not interpreted as Python source.
|
I think we can actually get rid of this script/step entirely if we set force_orphan on the gh-pages deployment as done in #6340 |
|
closing in favor of #6340 |
The previous logic only cleaned up pr-previews which left behind "empty" directories (containing only a hidden .jekyll-ignore file), which seemed to be the case for some but not all of the since-removed preview directories.
This improves it so that we always remove everything under pr-previews/ that isn't a currently-deployed preview directory.