Skip to content

Conversation

@jamadeo
Copy link
Collaborator

@jamadeo jamadeo commented Jan 5, 2026

The previous logic only cleaned up pr-previews which left behind "empty" directories (containing only a hidden .jekyll-ignore file), which seemed to be the case for some but not all of the since-removed preview directories.

This improves it so that we always remove everything under pr-previews/ that isn't a currently-deployed preview directory.

Copilot AI review requested due to automatic review settings January 5, 2026 20:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the gh-pages cleanup script to remove all non-deployed preview directories under pr-preview/. The implementation switches from a temporary file-based approach with --paths-from-file to a Python callback approach with --filename-callback.

Key changes:

  • Replaces temporary file approach with inline Python callback for git-filter-repo
  • Removes logic for checking if directories exist before cleanup
  • Simplifies filtering by checking all paths in pr-preview/ against the list of directories with visible files

Comment on lines 13 to 20
if [ -n "$dir" ]; then
if ! echo "$dirs_with_visible_files" | grep -q "^${dir}$"; then
dir_path="pr-preview/$dir"
echo "Found directory to remove: $dir_path"
echo "$dir_path" >> "$REMOVE_PATHS_FILE"
fi
fi
done <<< "$all_dirs"

Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable all_dirs is used in the while loop but is never defined. The line that previously defined it (all_dirs=$(git ls-tree -d origin/gh-pages:pr-preview --name-only 2>/dev/null || true)) was removed in this change. This will cause the while loop to process an empty string, making lines 13-20 effectively dead code that never executes.

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +31
CALLBACK="
root, *rest = filename.split(b'/')
keep = b'''$dirs_with_visible_files'''.splitlines()
if root != b'pr-preview':
# keep anything outside of pr-preview
return filename
elif rest and rest[0] not in keep:
return None
else:
return filename
"
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embedding dirs_with_visible_files directly into the CALLBACK Python snippet via b'''$dirs_with_visible_files''' allows unescaped directory names from origin/gh-pages:pr-preview to become executable Python code. A malicious directory name containing a sequence like '''; __import__('os').system('...') # could break out of the bytes literal and execute arbitrary commands in the context of uvx git-filter-repo. To fix this, avoid inlining raw directory names into the callback code and instead pass them as data (for example via an environment variable, temp file, or proper escaping) so they are not interpreted as Python source.

Copilot uses AI. Check for mistakes.
@jamadeo jamadeo changed the title Better clean files Improve PR preview site artifact clean-up Jan 5, 2026
@jamadeo
Copy link
Collaborator Author

jamadeo commented Jan 5, 2026

I think we can actually get rid of this script/step entirely if we set force_orphan on the gh-pages deployment as done in #6340

@jamadeo
Copy link
Collaborator Author

jamadeo commented Jan 5, 2026

closing in favor of #6340

@jamadeo jamadeo closed this Jan 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants