Skip to content

t2846: Add secret-leaking prevention guardrails for agent conversations#2847

Merged
alex-solovyev merged 3 commits intomainfrom
feature/secret-leaking-guardrails
Mar 4, 2026
Merged

t2846: Add secret-leaking prevention guardrails for agent conversations#2847
alex-solovyev merged 3 commits intomainfrom
feature/secret-leaking-guardrails

Conversation

@alex-solovyev
Copy link
Collaborator

@alex-solovyev alex-solovyev commented Mar 4, 2026

Summary

  • Adds comprehensive rules to prompts/build.txt preventing agents from suggesting or running commands that expose secret values in conversation transcripts
  • Replaces the previous single-line command blocklist with a principle-based approach covering password managers, env dumps, container inspection, cloud CLI tools, and scripting one-liners
  • Adds safe alternatives for debugging env var issues (key names only, never values)
  • Adds pre-staging guidance so credential lookups happen in the user's terminal, not in conversation
  • Adds credential-paste detection: when a user pastes a credential value, the agent warns about compromise and suggests rotation

Why

During an ILDS session, the agent repeatedly suggested commands (gopass show, pm2 env, cat dump.pm2) that would expose secret values in the conversation transcript. Credentials were exposed and had to be rotated. The existing rule (line 168) only listed 3 specific commands — this expands to a principle-based approach that covers the full attack surface.

Design decisions

  • Principle over blocklist: The rule says "apply judgment to ANY command that could print credentials" rather than relying on an exhaustive list. This follows the "Intelligence Over Determinism" framework principle — the examples teach the pattern, but the agent should catch novel violations too.
  • Safe alternatives included: Rather than just saying "don't do X", the rules show exactly how to debug env var issues safely (key names only).
  • No new scripts or automation: This is a prompt-level guardrail, not a deterministic filter. The agent's judgment is the enforcement mechanism, which is appropriate for this class of problem (infinite command variations that could leak secrets).

Closes #2846

Summary by CodeRabbit

  • Documentation
    • Expanded and clarified security guidelines for handling sensitive credentials, including prohibited commands, safer debugging practices, and procedures for managing pasted credentials.

…s (t2846)

Add comprehensive rules to prompts/build.txt preventing agents from
suggesting or running commands that expose secret values in conversation
transcripts. Replaces the previous single-line blocklist with:

- Principle-based rule (not just a command blocklist) with common violations
- Safe alternatives for debugging env var issues (key names only)
- Pre-staging guidance for credential lookups in user's terminal
- Credential-paste detection with rotation warning

Closes #2846
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

Warning

Rate limit exceeded

@alex-solovyev has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 11 minutes and 53 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8f3d2195-4ee5-4a31-8e02-73be19a63296

📥 Commits

Reviewing files that changed from the base of the PR and between 2de205e and a207e7e.

📒 Files selected for processing (1)
  • .agents/prompts/build.txt

Walkthrough

This PR enhances security guidance in the agent prompt configuration by replacing a single-line prohibition on secret-exposing commands with a detailed security policy block. The expansion covers threat modeling, root causes, prohibited command patterns, safer debugging practices, and credential handling protocols.

Changes

Cohort / File(s) Summary
Secret-handling security policy
.agents/prompts/build.txt
Replaced brief defensive guideline with comprehensive section #8 detailing secret value leaking prevention: includes threat description, root cause analysis, expanded prohibited command list (gopass show, pm2 env, echo $SECRET variants), safer debugging alternatives, credential lookup protocols, and guidance for handling user-pasted credentials.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • PR #2039: Centralizes security/secret-handling rules in prompts/build.txt as single source-of-truth
  • PR #2048: Tightens language in the same prompts/build.txt secrets/security guidance block
  • PR #2710: Adds new security-rule blocks to prompts/build.txt (prompt-injection rules alongside secret-handling)

Suggested labels

enhancement

Poem

🔐 Secrets sealed within the shell,
No gopass show shall break the spell,
Credentials guarded, warnings clear,
The agent learns what not to sear.
t2846 protects us all.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly addresses the main change: adding secret-leaking prevention guardrails to agent conversation prompts, which aligns with the primary objective of preventing credential exposure.
Linked Issues check ✅ Passed The PR implements all key coding requirements from issue #2846: principle-based command blocklist, safe env-var debugging alternatives, credential-paste detection warnings, and pre-staging guidance in prompts/build.txt.
Out of Scope Changes check ✅ Passed All changes are within scope: the PR modifies only prompts/build.txt with security guardrails directly addressing secret-leaking prevention, with no unrelated alterations.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/secret-leaking-guardrails

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the security posture of agent conversations by implementing robust guardrails against accidental secret exposure. It shifts from a limited blocklist to a comprehensive, principle-based strategy, ensuring that sensitive information like credentials remains protected within the conversation context. The changes aim to prevent incidents where agents might inadvertently reveal secrets, thereby safeguarding user data and system integrity.

Highlights

  • Secret Leak Prevention Rules: Added comprehensive rules to prompts/build.txt to prevent agents from suggesting or running commands that expose secret values in conversation transcripts.
  • Principle-Based Approach: Replaced the previous single-line command blocklist with a principle-based approach covering various secret exposure vectors like password managers, env dumps, container inspection, and cloud CLI tools.
  • Safe Debugging Alternatives: Included safe alternatives for debugging environment variable issues, focusing on showing key names only, never values.
  • Credential Lookup Guidance: Provided pre-staging guidance for credential lookups, instructing users to perform these actions in their terminal rather than in conversation.
  • Credential-Paste Detection: Implemented credential-paste detection, where the agent warns users about compromise and suggests rotation if a credential value is pasted directly into the conversation.
Changelog
  • .agents/prompts/build.txt
    • Expanded secret-leaking prevention rules, replacing a simple blocklist with a principle-based approach.
    • Added safe alternatives for environment variable debugging.
    • Introduced credential-paste detection and warning mechanisms.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 107 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Wed Mar 4 20:19:24 UTC 2026: Code review monitoring started
Wed Mar 4 20:19:24 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 107

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 107
  • VULNERABILITIES: 0

Generated on: Wed Mar 4 20:19:26 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the security guardrails for the agent by providing detailed, principle-based rules to prevent secret leakage in conversations. The new instructions are comprehensive, covering a wide range of commands and scenarios, and providing safe alternatives. I've added a couple of suggestions to improve the robustness and consistency of the example commands, which should help the agent interpret them more reliably.

Note: Security Review has been skipped due to the limited scope of the PR.

- `heroku config`, `vercel env pull`, `fly secrets list` (with values)
- Any `grep` or `rg` command targeting files known to contain secrets
- When debugging env var issues, show key NAMES only, never values:
- SAFE: `pm2 show <app> | grep -oP '^\s+\K[A-Z_]+(?=\s)'` (key names only)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The provided grep command for pm2 relies on parsing the text output, which can be fragile if the pm2 version or configuration changes the output format. A more robust method would be to use pm2's JSON output feature and parse it with jq, which appears to be a tool used in this project. This avoids reliance on text formatting and makes the safe alternative more reliable.

  - SAFE: `pm2 show <app> --json | jq -r '.[0].pm2_env | keys_unsorted[]'` (key names only, robust)

- SAFE: `pm2 show <app> | grep -oP '^\s+\K[A-Z_]+(?=\s)'` (key names only)
- SAFE: `printenv | cut -d= -f1 | sort` (list env var names without values)
- SAFE: `grep -oP '^[A-Z_]+(?==)' .env` (key names from .env without values)
- SAFE: `docker inspect <c> --format '{{range .Config.Env}}{{println .}}{{end}}' | cut -d= -f1`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with the unsafe command example on line 183 (docker inspect <container>), it would be clearer to use <container> as the placeholder here instead of <c>. Consistent placeholders help the agent generalize better from examples.

  - SAFE: `docker inspect <container> --format '{{range .Config.Env}}{{println .}}{{end}}' | cut -d= -f1`

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/prompts/build.txt:
- Around line 188-193: The policy text has a contradiction: the rule "Any `grep`
or `rg` command targeting files known to contain secrets" is absolute while the
SAFE examples (e.g., `grep -oP '^[A-Z_]+(?==)' .env`, `printenv | cut -d= -f1 |
sort`, and the `pm2` example) explicitly allow name-only inspections; update the
wording to forbid any grep/rg that can expose secret VALUES but permit name-only
inspections using patterns or pipelines that explicitly strip values (mention
the safe patterns shown: `-oP` with a regex capturing only keys or piping to
`cut -d= -f1`), and revise the unsafe rule to read something like "Disallow
grep/rg that may display secret values; allow grep/rg only when using patterns
or processing steps that guarantee values are not printed (see `grep -oP
'^[A-Z_]+(?==)' .env`, `printenv | cut -d= -f1`)." Ensure the SAFE example lines
remain and the unsafe line is replaced with the clarified prohibition text.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8ff0b261-c8f5-4db3-a23e-3da5faadb9dc

📥 Commits

Reviewing files that changed from the base of the PR and between 26803f3 and 2de205e.

📒 Files selected for processing (1)
  • .agents/prompts/build.txt

…(t2846)

- Replace fragile pm2 grep text-parsing with pm2 JSON output + jq
- Use consistent <container> placeholder matching unsafe example on line 183
- Resolve policy contradiction: revise absolute grep/rg ban to forbid
  printing secret VALUES while explicitly allowing key-name-only patterns
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 107 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Wed Mar 4 20:28:12 UTC 2026: Code review monitoring started
Wed Mar 4 20:28:13 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 107

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 107
  • VULNERABILITIES: 0

Generated on: Wed Mar 4 20:28:15 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

Revise the grep/rg prohibition to use explicit allow/disallow language
instead of a parenthetical exception. The rule now reads: disallow
commands that may display secret values, allow grep/rg only when using
patterns or processing steps that guarantee values are not printed.

SAFE examples preserved unchanged.
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 107 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Wed Mar 4 20:34:17 UTC 2026: Code review monitoring started
Wed Mar 4 20:34:18 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 107

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 107
  • VULNERABILITIES: 0

Generated on: Wed Mar 4 20:34:20 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 4, 2026

@alex-solovyev alex-solovyev merged commit f9b52ec into main Mar 4, 2026
12 checks passed
@alex-solovyev alex-solovyev deleted the feature/secret-leaking-guardrails branch March 4, 2026 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Auto-created from TODO.md tag

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Secret leaking prevention: add guardrails for credential exposure in agent conversations

1 participant