t2846: Add secret-leaking prevention guardrails for agent conversations#2847
t2846: Add secret-leaking prevention guardrails for agent conversations#2847alex-solovyev merged 3 commits intomainfrom
Conversation
…s (t2846) Add comprehensive rules to prompts/build.txt preventing agents from suggesting or running commands that expose secret values in conversation transcripts. Replaces the previous single-line blocklist with: - Principle-based rule (not just a command blocklist) with common violations - Safe alternatives for debugging env var issues (key names only) - Pre-staging guidance for credential lookups in user's terminal - Credential-paste detection with rotation warning Closes #2846
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThis PR enhances security guidance in the agent prompt configuration by replacing a single-line prohibition on secret-exposing commands with a detailed security policy block. The expansion covers threat modeling, root causes, prohibited command patterns, safer debugging practices, and credential handling protocols. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the security posture of agent conversations by implementing robust guardrails against accidental secret exposure. It shifts from a limited blocklist to a comprehensive, principle-based strategy, ensuring that sensitive information like credentials remains protected within the conversation context. The changes aim to prevent incidents where agents might inadvertently reveal secrets, thereby safeguarding user data and system integrity. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Wed Mar 4 20:19:26 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Code Review
This pull request significantly enhances the security guardrails for the agent by providing detailed, principle-based rules to prevent secret leakage in conversations. The new instructions are comprehensive, covering a wide range of commands and scenarios, and providing safe alternatives. I've added a couple of suggestions to improve the robustness and consistency of the example commands, which should help the agent interpret them more reliably.
Note: Security Review has been skipped due to the limited scope of the PR.
.agents/prompts/build.txt
Outdated
| - `heroku config`, `vercel env pull`, `fly secrets list` (with values) | ||
| - Any `grep` or `rg` command targeting files known to contain secrets | ||
| - When debugging env var issues, show key NAMES only, never values: | ||
| - SAFE: `pm2 show <app> | grep -oP '^\s+\K[A-Z_]+(?=\s)'` (key names only) |
There was a problem hiding this comment.
The provided grep command for pm2 relies on parsing the text output, which can be fragile if the pm2 version or configuration changes the output format. A more robust method would be to use pm2's JSON output feature and parse it with jq, which appears to be a tool used in this project. This avoids reliance on text formatting and makes the safe alternative more reliable.
- SAFE: `pm2 show <app> --json | jq -r '.[0].pm2_env | keys_unsorted[]'` (key names only, robust)
.agents/prompts/build.txt
Outdated
| - SAFE: `pm2 show <app> | grep -oP '^\s+\K[A-Z_]+(?=\s)'` (key names only) | ||
| - SAFE: `printenv | cut -d= -f1 | sort` (list env var names without values) | ||
| - SAFE: `grep -oP '^[A-Z_]+(?==)' .env` (key names from .env without values) | ||
| - SAFE: `docker inspect <c> --format '{{range .Config.Env}}{{println .}}{{end}}' | cut -d= -f1` |
There was a problem hiding this comment.
For consistency with the unsafe command example on line 183 (docker inspect <container>), it would be clearer to use <container> as the placeholder here instead of <c>. Consistent placeholders help the agent generalize better from examples.
- SAFE: `docker inspect <container> --format '{{range .Config.Env}}{{println .}}{{end}}' | cut -d= -f1`
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/prompts/build.txt:
- Around line 188-193: The policy text has a contradiction: the rule "Any `grep`
or `rg` command targeting files known to contain secrets" is absolute while the
SAFE examples (e.g., `grep -oP '^[A-Z_]+(?==)' .env`, `printenv | cut -d= -f1 |
sort`, and the `pm2` example) explicitly allow name-only inspections; update the
wording to forbid any grep/rg that can expose secret VALUES but permit name-only
inspections using patterns or pipelines that explicitly strip values (mention
the safe patterns shown: `-oP` with a regex capturing only keys or piping to
`cut -d= -f1`), and revise the unsafe rule to read something like "Disallow
grep/rg that may display secret values; allow grep/rg only when using patterns
or processing steps that guarantee values are not printed (see `grep -oP
'^[A-Z_]+(?==)' .env`, `printenv | cut -d= -f1`)." Ensure the SAFE example lines
remain and the unsafe line is replaced with the clarified prohibition text.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8ff0b261-c8f5-4db3-a23e-3da5faadb9dc
📒 Files selected for processing (1)
.agents/prompts/build.txt
…(t2846) - Replace fragile pm2 grep text-parsing with pm2 JSON output + jq - Use consistent <container> placeholder matching unsafe example on line 183 - Resolve policy contradiction: revise absolute grep/rg ban to forbid printing secret VALUES while explicitly allowing key-name-only patterns
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Wed Mar 4 20:28:15 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
Revise the grep/rg prohibition to use explicit allow/disallow language instead of a parenthetical exception. The rule now reads: disallow commands that may display secret values, allow grep/rg only when using patterns or processing steps that guarantee values are not printed. SAFE examples preserved unchanged.
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Wed Mar 4 20:34:20 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|



Summary
prompts/build.txtpreventing agents from suggesting or running commands that expose secret values in conversation transcriptsWhy
During an ILDS session, the agent repeatedly suggested commands (
gopass show,pm2 env,cat dump.pm2) that would expose secret values in the conversation transcript. Credentials were exposed and had to be rotated. The existing rule (line 168) only listed 3 specific commands — this expands to a principle-based approach that covers the full attack surface.Design decisions
Closes #2846
Summary by CodeRabbit