Skip to content

fix(security): reduce prompt injection guard false positives#126

Merged
stranske merged 2 commits intomainfrom
fix/security-guard-false-positives
Dec 24, 2025
Merged

fix(security): reduce prompt injection guard false positives#126
stranske merged 2 commits intomainfrom
fix/security-guard-false-positives

Conversation

@stranske
Copy link
Copy Markdown
Owner

@stranske stranske commented Dec 24, 2025

Problem

The security guard was way too aggressive and blocked PR #124 because:

  • Pattern <- Pass run_url from workflow to summary[\s\S]*?(ignore|instruction|prompt|secret|token|password)[\s\S]*?--> matched normal HTML comment markers
  • Words like "instruction" and "prompt" appear in legitimate PR bodies constantly
  • Base64-like strings, markdown comments, etc. triggered false positives

Solution

Dramatically reduce scope - the guard should only catch obvious attacks, not slow down every third PR:

  1. Pattern count: 24 → 7 - Only catches:

    • Explicit injection phrases ("ignore all previous instructions")
    • Actual leaked tokens (ghp_, gho_, ghs_, sk-)
  2. Skip content scanning for collaborators - If you have write access, you're trusted

  3. Add bypass label - security:bypass-guard for edge cases

  4. Removed:

    • HTML comment keyword detection (too broad)
    • Base64 pattern (false positives on long strings)
    • Unicode/zero-width detection (legitimate content)
    • Shell pattern detection (eval, curl, etc.)
    • Generic secret patterns (matched too much)

Testing

After this merges, PR #124's keepalive should run without the security gate blocking.

Automated Status Summary

Scope

  • After merging PR chore(codex): bootstrap PR for issue #101 #103 (multi-agent routing infrastructure), we need to:
  • 1. Validate the CLI agent pipeline works end-to-end with the new task-focused prompts
  • 2. Add GITHUB_STEP_SUMMARY output so iteration results are visible in the Actions UI
  • 3. Streamline the Automated Status Summary to reduce clutter when using CLI agents
  • 4. Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments

Tasks

  • ### Pipeline Validation
  • After PR chore(codex): bootstrap PR for issue #101 #103 merges, create a test PR with agent:codex label
  • Verify task appendix appears in Codex prompt (check workflow logs)
  • Verify Codex works on actual tasks (not random infrastructure work)
  • Verify keepalive comment updates with iteration progress
  • ### GITHUB_STEP_SUMMARY
  • Add step summary output to agents-keepalive-loop.yml after agent run
  • Include: iteration number, tasks completed, files changed, outcome
  • Ensure summary is visible in workflow run UI
  • ### Conditional Status Summary
  • Modify buildStatusBlock() in agents_pr_meta_update_body.js to accept agentType parameter
  • When agentType is set (CLI agent): hide workflow table, hide head SHA/required checks
  • Keep Scope/Tasks/Acceptance checkboxes for all cases
  • Pass agent type from workflow to the update_body job
  • ### Comment Pattern Cleanup
  • For CLI agents (agent:* label):
  • Suppress <!-- gate-summary: --> comment posting (use step summary instead)
  • Suppress <!-- keepalive-round: N --> instruction comments (task appendix replaces this)
  • Update <!-- keepalive-loop-summary --> to be the single source of truth
  • Ensure state marker is embedded in the summary comment (not separate)
  • For UI Codex (no agent:* label):
  • Keep existing comment patterns (instruction comments, connector bot reports)
  • Keep <!-- gate-summary: --> comment
  • Add agent_type output to detect job so downstream workflows know the mode
  • Update agents-pr-meta.yml to conditionally skip gate summary for CLI agent PRs

Acceptance criteria

  • CLI agent receives explicit tasks in prompt and works on them
  • Iteration results visible in Actions workflow run summary
  • PR body shows checkboxes but not workflow clutter when using CLI agents
  • UI Codex path (no agent label) continues to show full status summary
  • CLI agent PRs have ≤3 bot comments total (summary, one per iteration update) instead of 10+
  • State tracking is consolidated in the summary comment, not scattered
  • ## Dependencies
  • - Requires PR chore(codex): bootstrap PR for issue #101 #103 to be merged first
  • Head SHA: b665a4a
  • Latest Runs: ✅ success — Gate
  • Required: gate: ✅ success
  • | Workflow / Job | Result | Logs |
  • |----------------|--------|------|
  • | Agents PR meta manager | ❔ in progress | View run |
  • | CI Autofix Loop | ✅ success | View run |
  • | Gate | ✅ success | View run |
  • | Health 40 Sweep | ✅ success | View run |
  • | Health 44 Gate Branch Protection | ✅ success | View run |
  • | Health 45 Agents Guard | ✅ success | View run |
  • | Health 50 Security Scan | ✅ success | View run |
  • | Maint 52 Validate Workflows | ✅ success | View run |
  • | PR 11 - Minimal invariant CI | ✅ success | View run |
  • | Selftest CI | ✅ success | View run |

Head SHA: 53d7b46
Latest Runs: ✅ success — Gate
Required: gate: ✅ success

Workflow / Job Result Logs
Agents PR meta manager ❔ in progress View run
CI Autofix Loop ✅ success View run
Gate ✅ success View run
Health 40 Sweep ✅ success View run
Health 44 Gate Branch Protection ✅ success View run
Health 45 Agents Guard ✅ success View run
Health 50 Security Scan ✅ success View run
Maint 52 Validate Workflows ✅ success View run
PR 11 - Minimal invariant CI ✅ success View run
Selftest CI ✅ success View run

… guard

Key changes:
- Reduce red-flag patterns from 24 to 7 (only obvious injections + leaked tokens)
- Skip content scanning for collaborators (trusted users)
- Add security:bypass-guard label for explicit opt-out
- Remove patterns that triggered on normal words (instruction, prompt, etc.)
- Remove base64, unicode, shell pattern detections (too aggressive)

The guard now only blocks:
1. Forked PRs (unchanged)
2. Non-collaborators (unchanged)
3. Actual token leaks (ghp_, gho_, ghs_, sk-)
4. Explicit injection phrases (ignore all previous instructions, etc.)

Collaborators bypass content scanning entirely since they have write access.
Copilot AI review requested due to automatic review settings December 24, 2025 20:37
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 24, 2025

Automated Status Summary

Head SHA: bf53f3e
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job Result Logs
(no jobs reported) ⏳ pending

Coverage Overview

  • Coverage history entries: 1

Coverage Trend

Metric Value
Current 77.97%
Baseline 0.00%
Delta +77.97%
Minimum 70.00%
Status ✅ Pass

Updated automatically; will refresh on subsequent CI/Docker completions.


Keepalive checklist

Scope

  • After merging PR chore(codex): bootstrap PR for issue #101 #103 (multi-agent routing infrastructure), we need to:
  • 1. Validate the CLI agent pipeline works end-to-end with the new task-focused prompts
  • 2. Add GITHUB_STEP_SUMMARY output so iteration results are visible in the Actions UI
  • 3. Streamline the Automated Status Summary to reduce clutter when using CLI agents
  • 4. Clean up comment patterns to avoid a mix of old UI-agent and new CLI-agent comments

Tasks

  • ### Pipeline Validation
  • After PR chore(codex): bootstrap PR for issue #101 #103 merges, create a test PR with agent:codex label
  • Verify task appendix appears in Codex prompt (check workflow logs)
  • Verify Codex works on actual tasks (not random infrastructure work)
  • Verify keepalive comment updates with iteration progress
  • ### GITHUB_STEP_SUMMARY
  • Add step summary output to agents-keepalive-loop.yml after agent run
  • Include: iteration number, tasks completed, files changed, outcome
  • Ensure summary is visible in workflow run UI
  • ### Conditional Status Summary
  • Modify buildStatusBlock() in agents_pr_meta_update_body.js to accept agentType parameter
  • When agentType is set (CLI agent): hide workflow table, hide head SHA/required checks
  • Keep Scope/Tasks/Acceptance checkboxes for all cases
  • Pass agent type from workflow to the update_body job
  • ### Comment Pattern Cleanup
  • For CLI agents (agent:* label):
  • Suppress <!-- gate-summary: --> comment posting (use step summary instead)
  • Suppress <!-- keepalive-round: N --> instruction comments (task appendix replaces this)
  • Update <!-- keepalive-loop-summary --> to be the single source of truth
  • Ensure state marker is embedded in the summary comment (not separate)
  • For UI Codex (no agent:* label):
  • Keep existing comment patterns (instruction comments, connector bot reports)
  • Keep <!-- gate-summary: --> comment
  • Add agent_type output to detect job so downstream workflows know the mode
  • Update agents-pr-meta.yml to conditionally skip gate summary for CLI agent PRs

Acceptance criteria

  • CLI agent receives explicit tasks in prompt and works on them
  • Iteration results visible in Actions workflow run summary
  • PR body shows checkboxes but not workflow clutter when using CLI agents
  • UI Codex path (no agent label) continues to show full status summary
  • CLI agent PRs have ≤3 bot comments total (summary, one per iteration update) instead of 10+
  • State tracking is consolidated in the summary comment, not scattered
  • ## Dependencies
  • - Requires PR chore(codex): bootstrap PR for issue #101 #103 to be merged first
  • [ ]

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +327 to +329
// 5. Content red-flag scanning - SKIP for collaborators (they're trusted)
const isCollaborator = details.actor.allowed || details.collaborator.isCollaborator;
if (scanContent && promptContent && !isCollaborator) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Re-enable content scanning when allowlist is empty

The new collaborator skip at lines 327‑329 uses details.actor.allowed to decide whether to scan, but validateActorAllowList returns allowed: true whenever no explicit allowlist is configured (the default). As a result isCollaborator is always true in the default configuration and scanForRedFlags never runs, effectively disabling all prompt scanning for every actor (including fork contributors if blockForks is set to false or any non-collaborator trigger) — a regression from the previous behavior where content was always scanned. This removes the main protection against prompt injections in the default setup.

Useful? React with 👍 / 👎.

Update tests to reflect the new minimal security patterns:
- 'ignore previous instructions' -> 'ignore all previous instructions' (more specific)
- HTML comment, base64, zero-width, secrets.*, curl, eval patterns removed
- Collaborators now skip content scanning (trusted users)
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces false positives in the prompt injection security guard by dramatically scaling back pattern detection from 24 to 7 patterns, introducing a bypass label mechanism, and skipping content scanning for trusted collaborators.

Key Changes:

  • Reduced red-flag patterns to only catch explicit injection attempts and actual leaked tokens
  • Added security:bypass-guard label to bypass security checks for edge cases
  • Collaborators now skip content scanning entirely as they are considered trusted

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +314 to +325
// 4. Check for bypass label
const prLabels = (pr?.labels || []).map(l => (typeof l === 'string' ? l : l.name || '').toLowerCase());
const hasBypassLabel = prLabels.includes(BYPASS_LABEL.toLowerCase());
if (hasBypassLabel) {
if (core) core.info(`Security gate bypassed via ${BYPASS_LABEL} label`);
return {
allowed: true,
blocked: false,
reason: 'bypass-label',
details,
};
}
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bypass label check occurs after fork and collaborator checks, which means forked PRs and non-collaborators will be blocked before the bypass label can take effect. If the bypass label is intended to override all security checks (including fork blocking), it should be moved earlier in the evaluation logic, right after extracting the PR labels and before the fork detection check.

Copilot uses AI. Check for mistakes.
Comment on lines +314 to +329
// 4. Check for bypass label
const prLabels = (pr?.labels || []).map(l => (typeof l === 'string' ? l : l.name || '').toLowerCase());
const hasBypassLabel = prLabels.includes(BYPASS_LABEL.toLowerCase());
if (hasBypassLabel) {
if (core) core.info(`Security gate bypassed via ${BYPASS_LABEL} label`);
return {
allowed: true,
blocked: false,
reason: 'bypass-label',
details,
};
}

// 5. Content red-flag scanning - SKIP for collaborators (they're trusted)
const isCollaborator = details.actor.allowed || details.collaborator.isCollaborator;
if (scanContent && promptContent && !isCollaborator) {
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newly introduced bypass label mechanism and the updated collaborator-based content scanning skip logic lack test coverage. Since the test file exists at .github/scripts/__tests__/prompt-injection-guard.test.js and tests other functions in this module, these new behaviors should have corresponding tests to verify: 1) the bypass label properly allows PRs through, 2) collaborators skip content scanning, and 3) non-collaborators still undergo content scanning.

Copilot uses AI. Check for mistakes.
/\bsecrets\.[A-Z_]+\b/,
];

// Label that bypasses security gate content scanning
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "Label that bypasses security gate content scanning" but based on the implementation at lines 314-325, the bypass label actually bypasses all security checks including fork detection and collaborator checks, not just content scanning. The comment should be updated to accurately reflect that it bypasses the entire security gate, or the implementation should be changed to only bypass content scanning as the comment suggests.

Suggested change
// Label that bypasses security gate content scanning
// Label that bypasses the entire security gate (fork detection, collaborator checks, and content scanning)

Copilot uses AI. Check for mistakes.
// Actual leaked secrets (not the word "secret", but actual tokens)
/\bghp_[A-Za-z0-9]{36}\b/, // GitHub personal access token
/\bgho_[A-Za-z0-9]{36}\b/, // GitHub OAuth token
/\bgho_[A-Za-z0-9]{36}\b/, // GitHub OAuth token
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace found at the end of this line. This should be removed for consistency with code style standards.

Suggested change
/\bgho_[A-Za-z0-9]{36}\b/, // GitHub OAuth token
/\bgho_[A-Za-z0-9]{36}\b/, // GitHub OAuth token

Copilot uses AI. Check for mistakes.
@stranske stranske merged commit 640a6b5 into main Dec 24, 2025
121 checks passed
@stranske stranske deleted the fix/security-guard-false-positives branch December 24, 2025 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants