fix: scrub private repo names from public issue tracker and add automated sanitization#2303
Conversation
…ated sanitization Root cause: issue-sync-helper.sh synced TODO.md task descriptions verbatim to public GitHub issues, and the supervisor posted comments referencing private repo names and PR numbers. No sanitization layer existed between cross-repo task data and public issue creation. Fix: - Add _sanitize_for_public_repo() to issue-sync-lib.sh that auto-detects private repos from the supervisor DB and strips their names from issue titles and bodies before publishing to public repos - Sanitize titles in both cmd_push() and cmd_enrich() - Pass repo_slug to compose_issue_body() for context-aware sanitization - Add cross-repo privacy rule to build.txt and AGENTS.md - Replace all hardcoded private repo names in pulse.md, runners.md, AGENTS.md examples with generic placeholders - Sanitize t1333/t1334 TODO.md entries and resolve merge conflicts - Scrub 21 existing issue bodies and 45 comments on GitHub via gh API Closes #2281
|
Caution Review failedThe pull request is closed. ℹ️ Recent review infoConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (9)
WalkthroughThis pull request implements a privacy-focused sanitization layer for cross-repository issue syncing, preventing private repository names from appearing in public issue trackers. Documentation is updated to generalize repository references and enforce security guidelines. Changes
Sequence DiagramsequenceDiagram
participant IssueSync as issue-sync-helper.sh
participant IssueSyncLib as issue-sync-lib.sh
participant PrivateCache as _load_private_repo_names()
participant Sanitizer as _sanitize_for_public_repo()
participant PublicRepo as Public Issue Tracker
IssueSync->>IssueSyncLib: compose_issue_body(task_id, project_root, repo_slug)
IssueSyncLib->>IssueSyncLib: Build issue body from TODO.md
alt Is public repo?
IssueSyncLib->>PrivateCache: Load private repo names from supervisor.db
PrivateCache->>PrivateCache: Query GitHub API for isPrivate status
PrivateCache-->>IssueSyncLib: Return _PRIVATE_REPO_NAMES_CACHE
IssueSyncLib->>Sanitizer: _sanitize_for_public_repo(body_text)
Sanitizer->>Sanitizer: Replace private repo names with generic references
Sanitizer-->>IssueSyncLib: Sanitized body
end
IssueSyncLib-->>IssueSync: Sanitized issue body
IssueSync->>IssueSync: Apply title sanitization
IssueSync->>PublicRepo: Post sanitized issue
PublicRepo-->>IssueSync: Success
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested labels
Poem
✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical security vulnerability where private repository names were inadvertently exposed in public GitHub issues and comments. The primary goal is to prevent information leakage by introducing an automated sanitization layer that scrubs private repository names from public-facing content. This change also updates documentation and examples to reinforce privacy best practices, ensuring that cross-repo task data can be safely synchronized without revealing sensitive project details. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces an important security feature by adding an automated sanitization layer to prevent private repository names from leaking into public issue trackers. The changes are well-structured, updating documentation and prompts alongside the core script logic. My review focuses on improving the robustness and efficiency of the new sanitization functions in issue-sync-lib.sh. Specifically, I've suggested removing blanket error suppression in favor of more transparent error handling, which aligns with repository guidelines, and optimizing the text replacement logic for better performance and correctness.
|
|
||
| # Check if repo is private (cache-friendly: gh caches auth) | ||
| local is_private | ||
| is_private=$(gh repo view "$slug" --json isPrivate --jq '.isPrivate' 2>/dev/null || echo "") |
There was a problem hiding this comment.
Suppressing stderr with 2>/dev/null on gh commands is risky as it can hide critical errors like authentication failures, API rate limits, or an invalid repository slug. These errors are important for debugging. This violates the repository rule against suppressing errors for commands that may have authentication issues.
| is_private=$(gh repo view "$slug" --json isPrivate --jq '.isPrivate' 2>/dev/null || echo "") | |
| is_private=$(gh repo view "$slug" --json isPrivate --jq '.isPrivate' || echo "") |
References
- Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
| # Only sanitize when target repo is public | ||
| if [[ -n "$target_repo_slug" ]]; then | ||
| local is_private | ||
| is_private=$(gh repo view "$target_repo_slug" --json isPrivate --jq '.isPrivate' 2>/dev/null || echo "") |
There was a problem hiding this comment.
Suppressing stderr with 2>/dev/null on gh commands is risky as it can hide critical errors like authentication failures, API rate limits, or an invalid repository slug. These errors are important for debugging. This violates the repository rule against suppressing errors for commands that may have authentication issues.
| is_private=$(gh repo view "$target_repo_slug" --json isPrivate --jq '.isPrivate' 2>/dev/null || echo "") | |
| is_private=$(gh repo view "$target_repo_slug" --json isPrivate --jq '.isPrivate' || echo "") |
References
- Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
| local result="$text" | ||
| while IFS= read -r name; do | ||
| [[ -z "$name" ]] && continue | ||
| # Replace patterns: "myapp", "myapp#NNN", "in myapp", "the myapp" | ||
| # Case-insensitive replacement using sed | ||
| result=$(printf '%s' "$result" | sed -E "s/${name}#[0-9]+/a private repo PR/gi") | ||
| result=$(printf '%s' "$result" | sed -E "s/(in|the|from|of|for) ${name}/\1 a managed private repo/gi") | ||
| result=$(printf '%s' "$result" | sed -E "s/${name} (CI|PR|pipeline|repo|project|branch|check)/private repo \1/gi") | ||
| result=$(printf '%s' "$result" | sed -E "s/${name}/a managed private repo/gi") | ||
| done <<<"$private_names" |
There was a problem hiding this comment.
This loop executes sed four times for every private repository name. This can be inefficient if there are many private repos. More importantly, injecting the raw $name variable into the sed expression is unsafe and can lead to errors if a repository name contains characters that are special to sed's regex engine (e.g., /, ., *). A more robust and efficient approach is to build a single, properly escaped regex from all names and run sed only once.
| local result="$text" | |
| while IFS= read -r name; do | |
| [[ -z "$name" ]] && continue | |
| # Replace patterns: "myapp", "myapp#NNN", "in myapp", "the myapp" | |
| # Case-insensitive replacement using sed | |
| result=$(printf '%s' "$result" | sed -E "s/${name}#[0-9]+/a private repo PR/gi") | |
| result=$(printf '%s' "$result" | sed -E "s/(in|the|from|of|for) ${name}/\1 a managed private repo/gi") | |
| result=$(printf '%s' "$result" | sed -E "s/${name} (CI|PR|pipeline|repo|project|branch|check)/private repo \1/gi") | |
| result=$(printf '%s' "$result" | sed -E "s/${name}/a managed private repo/gi") | |
| done <<<"$private_names" | |
| local result="$text" | |
| if [[ -n "$private_names" ]]; then | |
| local names_regex | |
| # Escape regex special characters and join with | | |
| names_regex=$(echo "$private_names" | sed -e 's/[\[\]\\\/.*^$]/\\&/g' | paste -sd'|') | |
| # Combine all replacements into a single, more efficient sed call. | |
| # The order of expressions is important to handle more specific cases first. | |
| result=$(printf '%s' "$result" | sed -E \ | |
| -e "s/($names_regex)#[0-9]+/a private repo PR/gi" \ | |
| -e "s/(in|the|from|of|for) ($names_regex)/\1 a managed private repo/gi" \ | |
| -e "s/($names_regex) (CI|PR|pipeline|repo|project|branch|check)/private repo \2/gi" \ | |
| -e "s/($names_regex)/a managed private repo/gi") | |
| fi |
References
- Optimize shell script pipelines by replacing 'grep | sed' combinations with a single, more efficient 'sed' command where possible to improve performance.
- In shell scripts, move the calculation of loop-invariant variables outside of loops to improve efficiency.
| repo_paths=$(sqlite3 "$supervisor_db" \ | ||
| "SELECT DISTINCT repo FROM tasks WHERE repo IS NOT NULL AND repo != '';" \ | ||
| 2>/dev/null || echo "") |
There was a problem hiding this comment.
Suppressing stderr with 2>/dev/null can hide important errors, such as a malformed database file or permission issues. The repository's general rules advise against blanket error suppression. Since the exit code is already handled by || echo "", it's safer to let stderr be visible for debugging.
| repo_paths=$(sqlite3 "$supervisor_db" \ | |
| "SELECT DISTINCT repo FROM tasks WHERE repo IS NOT NULL AND repo != '';" \ | |
| 2>/dev/null || echo "") | |
| repo_paths=$(sqlite3 "$supervisor_db" \ | |
| "SELECT DISTINCT repo FROM tasks WHERE repo IS NOT NULL AND repo != '';" || echo "") |
References
- Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
| while IFS= read -r repo_path; do | ||
| [[ -z "$repo_path" ]] && continue | ||
| local canonical | ||
| canonical=$(realpath "$repo_path" 2>/dev/null || echo "") |
There was a problem hiding this comment.
Suppressing stderr with 2>/dev/null can hide errors other than 'file not found', such as permission issues on the path components. It's better to allow these errors to be visible for debugging, in line with repository guidelines.
| canonical=$(realpath "$repo_path" 2>/dev/null || echo "") | |
| canonical=$(realpath "$repo_path" || echo "") |
References
- Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
- When reporting errors for failed file operations in shell scripts, such as 'jq' writes, include the file path in the error message. Avoid suppressing stderr with '2>/dev/null' to ensure that diagnostic information about malformed files or write failures is visible.
|
|
||
| # Derive slug from git remote | ||
| local remote_url | ||
| remote_url=$(git -C "$canonical" remote get-url origin 2>/dev/null || echo "") |
There was a problem hiding this comment.
Suppressing stderr with 2>/dev/null can hide important errors from git, such as a missing remote or a repository in a bad state. The repository's general rules advise against this practice to ensure errors are visible for debugging.
| remote_url=$(git -C "$canonical" remote get-url origin 2>/dev/null || echo "") | |
| remote_url=$(git -C "$canonical" remote get-url origin || echo "") |
References
- When using git commands (like 'init', 'remote') in shell scripts, use the '-q' flag to suppress standard output instead of '2>/dev/null', ensuring that error messages on stderr remain visible for debugging.
🤖 Augment PR SummarySummary: Adds a privacy-focused sanitization layer to prevent leaking private repository names when syncing TODO-driven tasks into public issue trackers. Changes:
Technical Notes: Sanitization only applies when publishing to a public target repo (determined via 🤖 Was this summary useful? React with 👍 or 👎 |
|
|
||
| # Check if repo is private (cache-friendly: gh caches auth) | ||
| local is_private | ||
| is_private=$(gh repo view "$slug" --json isPrivate --jq '.isPrivate' 2>/dev/null || echo "") |
There was a problem hiding this comment.
If gh repo view fails (missing gh, no auth, rate limiting), is_private becomes empty and the repo won’t be classified as private, so _load_private_repo_names may return an empty list and sanitization becomes a no-op (potential privacy leak). Since this is a security control, consider failing closed or at least emitting a warning when privacy cannot be determined.
Severity: high
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
| [[ -z "$name" ]] && continue | ||
| # Replace patterns: "myapp", "myapp#NNN", "in myapp", "the myapp" | ||
| # Case-insensitive replacement using sed | ||
| result=$(printf '%s' "$result" | sed -E "s/${name}#[0-9]+/a private repo PR/gi") |
There was a problem hiding this comment.
These sed -E .../gi substitutions are GNU-sed specific and can fail on BSD sed, which could end up blanking result and producing empty titles/bodies. Also ${name} is interpolated as a regex pattern, so repo names containing regex metacharacters (notably .) may over/under-match unexpectedly.
Severity: medium
Other Locations
.agents/scripts/issue-sync-lib.sh:765.agents/scripts/issue-sync-lib.sh:766.agents/scripts/issue-sync-lib.sh:767
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
The code-review-monitoring workflow installed @toon-format/cli via Bun but never used it. The npm registry returns 403 in CI, failing the 'Monitor & Auto-Fix Code Quality' check. Remove the Bun setup step and TOON CLI install since neither is used by the monitoring scripts.
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Wed Feb 25 18:26:56 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|



Summary
_sanitize_for_public_repo()) toissue-sync-lib.shthat detects private repos from the supervisor DB and strips their names from issue titles/bodies before publishing to public reposbuild.txtandAGENTS.mdas a preventive measurepulse.md,runners.md, andAGENTS.mdexamples with generic placeholdersRoot Cause
issue-sync-helper.shsynced TODO.md task descriptions verbatim to public GitHub issues. The supervisor also posted comments referencing private repo names and PR numbers. No sanitization layer existed between cross-repo task data and public issue creation.Changes
.agents/scripts/issue-sync-lib.sh_sanitize_for_public_repo()+_load_private_repo_names()functions.agents/scripts/issue-sync-helper.shcmd_push()andcmd_enrich(), passrepo_slugtocompose_issue_body().agents/prompts/build.txt.agents/AGENTS.md.agents/scripts/commands/pulse.md.agents/scripts/commands/runners.mdTODO.mdCloses #2281
Summary by CodeRabbit
New Features
Documentation
Improvements
Chores