Skip to content

Conversation

@dorien-koelemeijer
Copy link
Collaborator

@dorien-koelemeijer dorien-koelemeijer commented Jan 8, 2026

Summary

There's been some complaints about high false positive rates for the prompt injection detection feature. This PR adjusts the confidence levels of various detections to reduce this. Another update that will be made is to set the default to 0.8 for the time being until we've adjusted things properly.

Type of Change

  • Feature
  • Bug fix
  • Refactor / Code quality
  • Performance improvement
  • Documentation
  • Tests
  • Security fix
  • Build / Release
  • Other (specify below)

AI Assistance

  • This PR was created or reviewed with AI assistance

Testing

Local testing using just run-ui

Copilot AI review requested due to automatic review settings January 8, 2026 00:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces false positive rates in the prompt injection detection system by adjusting confidence scores and risk level classifications. The changes lower confidence thresholds across High, Medium, and Low risk levels by 10 percentage points each, and downgrade three remote code execution patterns from Critical to High risk level.

Key changes:

  • Reduced confidence scores for High (0.85→0.75), Medium (0.70→0.60), and Low (0.55→0.45) risk levels
  • Downgraded bash_process_substitution, python_remote_exec, and powershell_download_exec from Critical to High
  • Maintained Critical status for truly destructive operations like filesystem destruction and privilege escalation

@dorien-koelemeijer dorien-koelemeijer force-pushed the fix/update-confidence-levels-prompt-injection branch from 1d4100e to 2c0d390 Compare January 8, 2026 02:26
Copilot AI review requested due to automatic review settings January 8, 2026 02:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings January 8, 2026 04:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

RiskLevel::High => 0.85,
RiskLevel::Medium => 0.70,
RiskLevel::Low => 0.55,
RiskLevel::High => 0.75,
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new threshold of 0.8 is higher than the High risk level confidence score of 0.75. This means High-level threats (including rm -rf /, PowerShell remote execution, data exfiltration, and many other dangerous patterns) will NOT trigger the security prompt by default. Only Critical-level threats with 0.95 confidence will be detected. This significantly weakens security protection and contradicts the goal of reducing false positives while maintaining threat detection.

Suggested change
RiskLevel::High => 0.75,
RiskLevel::High => 0.85,

Copilot uses AI. Check for mistakes.
@dorien-koelemeijer dorien-koelemeijer merged commit 3d415fc into main Jan 8, 2026
26 checks passed
@dorien-koelemeijer dorien-koelemeijer deleted the fix/update-confidence-levels-prompt-injection branch January 8, 2026 05:51
zanesq added a commit that referenced this pull request Jan 8, 2026
* 'main' of github.com:block/goose:
  Fixed fonts (#6389)
  Update confidence levels prompt injection detection to reduce false positive rates (#6390)
  Add ML-based prompt injection detection  (#5623)
  docs: update custom extensions tutorial (#6388)
  fix ResultsFormat error when loading old sessions (#6385)
  docs: add MCP Apps tutorial and documentation updates (#6384)
  changed z-index to make sure the search highlighter does not appear on modal overlay (#6386)
  Handling special claude model response in github copilot provider (#6369)
  fix: prevent duplicate rendering when tool returns both mcp-ui and mcp-apps resources (#6378)
  fix: update MCP Apps _meta.ui.resourceUri to use nested format (SEP-1865) (#6372)
  feat(providers): add streaming support for Google Gemini provider (#6191)
  Blog: edit links in mcp apps post (#6371)
  fix: prevent infinite loop of tool-input notifications in MCP Apps (#6374)
michaelneale added a commit that referenced this pull request Jan 8, 2026
* main: (31 commits)
  added validation and debug for invalid call tool result (#6368)
  Update MCP apps tutorial: fix _meta structure and version prereq (#6404)
  Fixed fonts (#6389)
  Update confidence levels prompt injection detection to reduce false positive rates (#6390)
  Add ML-based prompt injection detection  (#5623)
  docs: update custom extensions tutorial (#6388)
  fix ResultsFormat error when loading old sessions (#6385)
  docs: add MCP Apps tutorial and documentation updates (#6384)
  changed z-index to make sure the search highlighter does not appear on modal overlay (#6386)
  Handling special claude model response in github copilot provider (#6369)
  fix: prevent duplicate rendering when tool returns both mcp-ui and mcp-apps resources (#6378)
  fix: update MCP Apps _meta.ui.resourceUri to use nested format (SEP-1865) (#6372)
  feat(providers): add streaming support for Google Gemini provider (#6191)
  Blog: edit links in mcp apps post (#6371)
  fix: prevent infinite loop of tool-input notifications in MCP Apps (#6374)
  fix: Show platform-specific keyboard shortcuts in UI (#6323)
  fix: we load extensions when agent starts so don't do it up front (#6350)
  docs: credit HumanLayer in RPI tutorial (#6365)
  Blog: Goose Lands MCP Apps (#6172)
  Claude 3.7 is out. we had some harcoded stuff (#6197)
  ...
wpfleger96 added a commit that referenced this pull request Jan 9, 2026
* main: (89 commits)
  fix(google): treat signed text as regular content in streaming (#6400)
  Add frameDomains and baseUriDomains CSP support for MCP Apps (#6399)
  fix(ci): add missing dependencies to openapi-schema-check job (#6367)
  feat: http proxy support
  Add support for changing working dir and extensions in same window/session (#6057)
  Sort keys in canonical models (#6403)
  added validation and debug for invalid call tool result (#6368)
  Update MCP apps tutorial: fix _meta structure and version prereq (#6404)
  Fixed fonts (#6389)
  Update confidence levels prompt injection detection to reduce false positive rates (#6390)
  Add ML-based prompt injection detection  (#5623)
  docs: update custom extensions tutorial (#6388)
  fix ResultsFormat error when loading old sessions (#6385)
  docs: add MCP Apps tutorial and documentation updates (#6384)
  changed z-index to make sure the search highlighter does not appear on modal overlay (#6386)
  Handling special claude model response in github copilot provider (#6369)
  fix: prevent duplicate rendering when tool returns both mcp-ui and mcp-apps resources (#6378)
  fix: update MCP Apps _meta.ui.resourceUri to use nested format (SEP-1865) (#6372)
  feat(providers): add streaming support for Google Gemini provider (#6191)
  Blog: edit links in mcp apps post (#6371)
  ...
wpfleger96 added a commit that referenced this pull request Jan 9, 2026
* main: (89 commits)
  fix(google): treat signed text as regular content in streaming (#6400)
  Add frameDomains and baseUriDomains CSP support for MCP Apps (#6399)
  fix(ci): add missing dependencies to openapi-schema-check job (#6367)
  feat: http proxy support
  Add support for changing working dir and extensions in same window/session (#6057)
  Sort keys in canonical models (#6403)
  added validation and debug for invalid call tool result (#6368)
  Update MCP apps tutorial: fix _meta structure and version prereq (#6404)
  Fixed fonts (#6389)
  Update confidence levels prompt injection detection to reduce false positive rates (#6390)
  Add ML-based prompt injection detection  (#5623)
  docs: update custom extensions tutorial (#6388)
  fix ResultsFormat error when loading old sessions (#6385)
  docs: add MCP Apps tutorial and documentation updates (#6384)
  changed z-index to make sure the search highlighter does not appear on modal overlay (#6386)
  Handling special claude model response in github copilot provider (#6369)
  fix: prevent duplicate rendering when tool returns both mcp-ui and mcp-apps resources (#6378)
  fix: update MCP Apps _meta.ui.resourceUri to use nested format (SEP-1865) (#6372)
  feat(providers): add streaming support for Google Gemini provider (#6191)
  Blog: edit links in mcp apps post (#6371)
  ...
fbalicchia pushed a commit to fbalicchia/goose that referenced this pull request Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants