Skip to content

fix: scrub private repo names from public issue tracker and add automated sanitization#2303

Merged
marcusquinn merged 2 commits intomainfrom
bugfix/private-repo-name-sanitization
Feb 25, 2026
Merged

fix: scrub private repo names from public issue tracker and add automated sanitization#2303
marcusquinn merged 2 commits intomainfrom
bugfix/private-repo-name-sanitization

Conversation

@marcusquinn
Copy link
Owner

@marcusquinn marcusquinn commented Feb 25, 2026

Summary

  • Adds automated sanitization layer (_sanitize_for_public_repo()) to issue-sync-lib.sh that detects private repos from the supervisor DB and strips their names from issue titles/bodies before publishing to public repos
  • Scrubs all existing private repo name references from 21 issue bodies and 45 comments on GitHub
  • Adds cross-repo privacy rule to build.txt and AGENTS.md as a preventive measure
  • Replaces hardcoded private repo names in pulse.md, runners.md, and AGENTS.md examples with generic placeholders

Root Cause

issue-sync-helper.sh synced TODO.md task descriptions verbatim to public GitHub issues. The supervisor also posted comments referencing private repo names and PR numbers. No sanitization layer existed between cross-repo task data and public issue creation.

Changes

File Change
.agents/scripts/issue-sync-lib.sh New _sanitize_for_public_repo() + _load_private_repo_names() functions
.agents/scripts/issue-sync-helper.sh Sanitize titles in cmd_push() and cmd_enrich(), pass repo_slug to compose_issue_body()
.agents/prompts/build.txt New security rule: never include private repo names in public issues
.agents/AGENTS.md Cross-repo privacy rule + genericized dispatch examples
.agents/scripts/commands/pulse.md Removed hardcoded private repo names, added privacy rule
.agents/scripts/commands/runners.md Removed hardcoded private repo names
TODO.md Sanitized t1333/t1334 entries, resolved merge conflicts

Closes #2281

Summary by CodeRabbit

  • New Features

    • Added automatic sanitization of private repository names when syncing issues to public repositories, protecting sensitive information.
  • Documentation

    • Updated example commands and guidance to use generic repository path patterns instead of hardcoded references.
    • Enhanced security guidelines for handling repository information in public issue content.
  • Improvements

    • Generalized repository handling across tools for broader compatibility.
  • Chores

    • Removed Bun setup from code review workflows.
    • Updated workflow summary messaging.

…ated sanitization

Root cause: issue-sync-helper.sh synced TODO.md task descriptions verbatim
to public GitHub issues, and the supervisor posted comments referencing
private repo names and PR numbers. No sanitization layer existed between
cross-repo task data and public issue creation.

Fix:
- Add _sanitize_for_public_repo() to issue-sync-lib.sh that auto-detects
  private repos from the supervisor DB and strips their names from issue
  titles and bodies before publishing to public repos
- Sanitize titles in both cmd_push() and cmd_enrich()
- Pass repo_slug to compose_issue_body() for context-aware sanitization
- Add cross-repo privacy rule to build.txt and AGENTS.md
- Replace all hardcoded private repo names in pulse.md, runners.md,
  AGENTS.md examples with generic placeholders
- Sanitize t1333/t1334 TODO.md entries and resolve merge conflicts
- Scrub 21 existing issue bodies and 45 comments on GitHub via gh API

Closes #2281
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 25, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1ebfac and d6318e9.

📒 Files selected for processing (9)
  • .agents/AGENTS.md
  • .agents/prompts/build.txt
  • .agents/scripts/commands/pulse.md
  • .agents/scripts/commands/runners.md
  • .agents/scripts/issue-sync-helper.sh
  • .agents/scripts/issue-sync-lib.sh
  • .agents/scripts/supervisor-archived/ai-context.sh
  • .github/workflows/code-review-monitoring.yml
  • TODO.md

Walkthrough

This pull request implements a privacy-focused sanitization layer for cross-repository issue syncing, preventing private repository names from appearing in public issue trackers. Documentation is updated to generalize repository references and enforce security guidelines.

Changes

Cohort / File(s) Summary
Security Implementation
.agents/scripts/issue-sync-lib.sh, .agents/scripts/issue-sync-helper.sh
Introduces private repo name sanitization when syncing to public trackers. Adds _load_private_repo_names() to cache private repos from supervisor.db, _sanitize_for_public_repo() to replace private names with generic references, and updates compose_issue_body() signature to accept repo_slug parameter for conditional sanitization.
Documentation & Guidelines
.agents/AGENTS.md, .agents/prompts/build.txt, .agents/scripts/commands/pulse.md, .agents/scripts/commands/runners.md, .agents/scripts/supervisor-archived/ai-context.sh
Updates example commands and guidance text to generalize hardcoded repo names (webapp, aidevops) to dynamic placeholders like ~/Git/<repo-name>. Adds security notes discouraging private repo names in public issue content; changes tie-break logic from repo-specific to product-vs-tooling preference.
Workflow Configuration
.github/workflows/code-review-monitoring.yml
Removes Bun setup and installation steps; updates quality status summary from "TOON Format Integration" to "TOON Format Validation."
Project Planning
TODO.md
Contains unresolved merge conflict markers (<<<<<<< Updated upstream, =======, >>>>>>> Stashed changes) requiring manual reconciliation. Includes task reorganization and new CI/quality/self-improvement entries (t1333, t1334).

Sequence Diagram

sequenceDiagram
    participant IssueSync as issue-sync-helper.sh
    participant IssueSyncLib as issue-sync-lib.sh
    participant PrivateCache as _load_private_repo_names()
    participant Sanitizer as _sanitize_for_public_repo()
    participant PublicRepo as Public Issue Tracker

    IssueSync->>IssueSyncLib: compose_issue_body(task_id, project_root, repo_slug)
    IssueSyncLib->>IssueSyncLib: Build issue body from TODO.md
    
    alt Is public repo?
        IssueSyncLib->>PrivateCache: Load private repo names from supervisor.db
        PrivateCache->>PrivateCache: Query GitHub API for isPrivate status
        PrivateCache-->>IssueSyncLib: Return _PRIVATE_REPO_NAMES_CACHE
        
        IssueSyncLib->>Sanitizer: _sanitize_for_public_repo(body_text)
        Sanitizer->>Sanitizer: Replace private repo names with generic references
        Sanitizer-->>IssueSyncLib: Sanitized body
    end
    
    IssueSyncLib-->>IssueSync: Sanitized issue body
    IssueSync->>IssueSync: Apply title sanitization
    IssueSync->>PublicRepo: Post sanitized issue
    PublicRepo-->>IssueSync: Success
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested labels

needs-review

Poem

🔐 Private names fade to whispered "managed repos,"
GitHub's public walls now guard what nobody knows,
Sanitize the stream before it flows,
Secrets tucked where the issue tracker glows! ✨

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bugfix/private-repo-name-sanitization

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical security vulnerability where private repository names were inadvertently exposed in public GitHub issues and comments. The primary goal is to prevent information leakage by introducing an automated sanitization layer that scrubs private repository names from public-facing content. This change also updates documentation and examples to reinforce privacy best practices, ensuring that cross-repo task data can be safely synchronized without revealing sensitive project details.

Highlights

  • Automated Sanitization Layer: Implemented a new _sanitize_for_public_repo() function in issue-sync-lib.sh to automatically detect private repository names from the supervisor database and strip them from issue titles and bodies before publishing to public repositories.
  • Existing Data Scrubbing: Manually scrubbed all existing private repository name references from 21 issue bodies and 45 comments on GitHub to ensure historical data compliance.
  • Cross-Repo Privacy Rules: Added explicit cross-repo privacy rules to build.txt and AGENTS.md as a preventive measure, instructing users to avoid including private repository names in public-facing content.
  • Generic Placeholders in Examples: Replaced hardcoded private repository names in pulse.md, runners.md, and AGENTS.md examples with generic placeholders to prevent accidental leakage and promote best practices.
Changelog
  • .agents/AGENTS.md
    • Updated example commands to use generic project names.
    • Added a new section on cross-repo privacy rules.
  • .agents/prompts/build.txt
    • Added a new security rule regarding private repository names.
  • .agents/scripts/commands/pulse.md
    • Removed hardcoded private repository names from example gh commands.
    • Generalized repository preference rules.
    • Updated dispatch rules to use generic repository names.
    • Added a new privacy rule.
  • .agents/scripts/commands/runners.md
    • Updated dispatch rules to use generic repository names.
    • Generalized GitHub issue URL examples.
  • .agents/scripts/issue-sync-helper.sh
    • Integrated _sanitize_for_public_repo for issue titles in cmd_push and cmd_enrich.
    • Updated compose_issue_body calls to pass the target repo_slug for sanitization.
  • .agents/scripts/issue-sync-lib.sh
    • Introduced _load_private_repo_names to dynamically identify private repositories.
    • Implemented _sanitize_for_public_repo to replace private repo names with generic placeholders.
    • Modified compose_issue_body to utilize the new sanitization logic.
  • .agents/scripts/supervisor-archived/ai-context.sh
    • Updated a comment for improved generalization.
  • TODO.md
    • Sanitized task descriptions t1333 and t1334 to remove private repository names and specific PR references.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an important security feature by adding an automated sanitization layer to prevent private repository names from leaking into public issue trackers. The changes are well-structured, updating documentation and prompts alongside the core script logic. My review focuses on improving the robustness and efficiency of the new sanitization functions in issue-sync-lib.sh. Specifically, I've suggested removing blanket error suppression in favor of more transparent error handling, which aligns with repository guidelines, and optimizing the text replacement logic for better performance and correctness.


# Check if repo is private (cache-friendly: gh caches auth)
local is_private
is_private=$(gh repo view "$slug" --json isPrivate --jq '.isPrivate' 2>/dev/null || echo "")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Suppressing stderr with 2>/dev/null on gh commands is risky as it can hide critical errors like authentication failures, API rate limits, or an invalid repository slug. These errors are important for debugging. This violates the repository rule against suppressing errors for commands that may have authentication issues.

Suggested change
is_private=$(gh repo view "$slug" --json isPrivate --jq '.isPrivate' 2>/dev/null || echo "")
is_private=$(gh repo view "$slug" --json isPrivate --jq '.isPrivate' || echo "")
References
  1. Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

# Only sanitize when target repo is public
if [[ -n "$target_repo_slug" ]]; then
local is_private
is_private=$(gh repo view "$target_repo_slug" --json isPrivate --jq '.isPrivate' 2>/dev/null || echo "")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Suppressing stderr with 2>/dev/null on gh commands is risky as it can hide critical errors like authentication failures, API rate limits, or an invalid repository slug. These errors are important for debugging. This violates the repository rule against suppressing errors for commands that may have authentication issues.

Suggested change
is_private=$(gh repo view "$target_repo_slug" --json isPrivate --jq '.isPrivate' 2>/dev/null || echo "")
is_private=$(gh repo view "$target_repo_slug" --json isPrivate --jq '.isPrivate' || echo "")
References
  1. Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

Comment on lines +759 to +768
local result="$text"
while IFS= read -r name; do
[[ -z "$name" ]] && continue
# Replace patterns: "myapp", "myapp#NNN", "in myapp", "the myapp"
# Case-insensitive replacement using sed
result=$(printf '%s' "$result" | sed -E "s/${name}#[0-9]+/a private repo PR/gi")
result=$(printf '%s' "$result" | sed -E "s/(in|the|from|of|for) ${name}/\1 a managed private repo/gi")
result=$(printf '%s' "$result" | sed -E "s/${name} (CI|PR|pipeline|repo|project|branch|check)/private repo \1/gi")
result=$(printf '%s' "$result" | sed -E "s/${name}/a managed private repo/gi")
done <<<"$private_names"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This loop executes sed four times for every private repository name. This can be inefficient if there are many private repos. More importantly, injecting the raw $name variable into the sed expression is unsafe and can lead to errors if a repository name contains characters that are special to sed's regex engine (e.g., /, ., *). A more robust and efficient approach is to build a single, properly escaped regex from all names and run sed only once.

Suggested change
local result="$text"
while IFS= read -r name; do
[[ -z "$name" ]] && continue
# Replace patterns: "myapp", "myapp#NNN", "in myapp", "the myapp"
# Case-insensitive replacement using sed
result=$(printf '%s' "$result" | sed -E "s/${name}#[0-9]+/a private repo PR/gi")
result=$(printf '%s' "$result" | sed -E "s/(in|the|from|of|for) ${name}/\1 a managed private repo/gi")
result=$(printf '%s' "$result" | sed -E "s/${name} (CI|PR|pipeline|repo|project|branch|check)/private repo \1/gi")
result=$(printf '%s' "$result" | sed -E "s/${name}/a managed private repo/gi")
done <<<"$private_names"
local result="$text"
if [[ -n "$private_names" ]]; then
local names_regex
# Escape regex special characters and join with |
names_regex=$(echo "$private_names" | sed -e 's/[\[\]\\\/.*^$]/\\&/g' | paste -sd'|')
# Combine all replacements into a single, more efficient sed call.
# The order of expressions is important to handle more specific cases first.
result=$(printf '%s' "$result" | sed -E \
-e "s/($names_regex)#[0-9]+/a private repo PR/gi" \
-e "s/(in|the|from|of|for) ($names_regex)/\1 a managed private repo/gi" \
-e "s/($names_regex) (CI|PR|pipeline|repo|project|branch|check)/private repo \2/gi" \
-e "s/($names_regex)/a managed private repo/gi")
fi
References
  1. Optimize shell script pipelines by replacing 'grep | sed' combinations with a single, more efficient 'sed' command where possible to improve performance.
  2. In shell scripts, move the calculation of loop-invariant variables outside of loops to improve efficiency.

Comment on lines +690 to +692
repo_paths=$(sqlite3 "$supervisor_db" \
"SELECT DISTINCT repo FROM tasks WHERE repo IS NOT NULL AND repo != '';" \
2>/dev/null || echo "")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Suppressing stderr with 2>/dev/null can hide important errors, such as a malformed database file or permission issues. The repository's general rules advise against blanket error suppression. Since the exit code is already handled by || echo "", it's safer to let stderr be visible for debugging.

Suggested change
repo_paths=$(sqlite3 "$supervisor_db" \
"SELECT DISTINCT repo FROM tasks WHERE repo IS NOT NULL AND repo != '';" \
2>/dev/null || echo "")
repo_paths=$(sqlite3 "$supervisor_db" \
"SELECT DISTINCT repo FROM tasks WHERE repo IS NOT NULL AND repo != '';" || echo "")
References
  1. Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

while IFS= read -r repo_path; do
[[ -z "$repo_path" ]] && continue
local canonical
canonical=$(realpath "$repo_path" 2>/dev/null || echo "")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Suppressing stderr with 2>/dev/null can hide errors other than 'file not found', such as permission issues on the path components. It's better to allow these errors to be visible for debugging, in line with repository guidelines.

Suggested change
canonical=$(realpath "$repo_path" 2>/dev/null || echo "")
canonical=$(realpath "$repo_path" || echo "")
References
  1. Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
  2. When reporting errors for failed file operations in shell scripts, such as 'jq' writes, include the file path in the error message. Avoid suppressing stderr with '2>/dev/null' to ensure that diagnostic information about malformed files or write failures is visible.


# Derive slug from git remote
local remote_url
remote_url=$(git -C "$canonical" remote get-url origin 2>/dev/null || echo "")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Suppressing stderr with 2>/dev/null can hide important errors from git, such as a missing remote or a repository in a bad state. The repository's general rules advise against this practice to ensure errors are visible for debugging.

Suggested change
remote_url=$(git -C "$canonical" remote get-url origin 2>/dev/null || echo "")
remote_url=$(git -C "$canonical" remote get-url origin || echo "")
References
  1. When using git commands (like 'init', 'remote') in shell scripts, use the '-q' flag to suppress standard output instead of '2>/dev/null', ensuring that error messages on stderr remain visible for debugging.

@augmentcode
Copy link

augmentcode bot commented Feb 25, 2026

🤖 Augment PR Summary

Summary: Adds a privacy-focused sanitization layer to prevent leaking private repository names when syncing TODO-driven tasks into public issue trackers.

Changes:

  • Introduced _sanitize_for_public_repo() and _load_private_repo_names() in .agents/scripts/issue-sync-lib.sh to detect private repos from the supervisor DB and scrub their names from outbound content.
  • Updated issue-sync-helper.sh to sanitize issue titles in cmd_push()/cmd_enrich() and to pass repo_slug into compose_issue_body() for body sanitization.
  • Added explicit cross-repo privacy rules to agent prompts/docs (.agents/prompts/build.txt, .agents/AGENTS.md, and command docs) to prevent leaks at the source.
  • Genericized examples that previously referenced specific private repo names and cleaned up TODO entries that contained private identifiers/merge markers.

Technical Notes: Sanitization only applies when publishing to a public target repo (determined via gh repo view) and uses a cached list of private repo short-names derived from supervisor DB repo paths.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.


# Check if repo is private (cache-friendly: gh caches auth)
local is_private
is_private=$(gh repo view "$slug" --json isPrivate --jq '.isPrivate' 2>/dev/null || echo "")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If gh repo view fails (missing gh, no auth, rate limiting), is_private becomes empty and the repo won’t be classified as private, so _load_private_repo_names may return an empty list and sanitization becomes a no-op (potential privacy leak). Since this is a security control, consider failing closed or at least emitting a warning when privacy cannot be determined.

Severity: high

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

[[ -z "$name" ]] && continue
# Replace patterns: "myapp", "myapp#NNN", "in myapp", "the myapp"
# Case-insensitive replacement using sed
result=$(printf '%s' "$result" | sed -E "s/${name}#[0-9]+/a private repo PR/gi")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These sed -E .../gi substitutions are GNU-sed specific and can fail on BSD sed, which could end up blanking result and producing empty titles/bodies. Also ${name} is interpolated as a regex pattern, so repo names containing regex metacharacters (notably .) may over/under-match unexpectedly.

Severity: medium

Other Locations
  • .agents/scripts/issue-sync-lib.sh:765
  • .agents/scripts/issue-sync-lib.sh:766
  • .agents/scripts/issue-sync-lib.sh:767

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

The code-review-monitoring workflow installed @toon-format/cli via Bun
but never used it. The npm registry returns 403 in CI, failing the
'Monitor & Auto-Fix Code Quality' check. Remove the Bun setup step
and TOON CLI install since neither is used by the monitoring scripts.
@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 90 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Wed Feb 25 18:26:53 UTC 2026: Code review monitoring started
Wed Feb 25 18:26:54 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 90

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 90
  • VULNERABILITIES: 0

Generated on: Wed Feb 25 18:26:56 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

t1333: Investigate cross-repo CI failures on merged PRs

1 participant