Update confidence levels prompt injection detection to reduce false positive rates #6390

dorien-koelemeijer · 2026-01-08T00:11:31Z

Summary

There's been some complaints about high false positive rates for the prompt injection detection feature. This PR adjusts the confidence levels of various detections to reduce this. Another update that will be made is to set the default to 0.8 for the time being until we've adjusted things properly.

Type of Change

AI Assistance

This PR was created or reviewed with AI assistance

Testing

Local testing using just run-ui

…ositive rates

Copilot

Pull request overview

This PR reduces false positive rates in the prompt injection detection system by adjusting confidence scores and risk level classifications. The changes lower confidence thresholds across High, Medium, and Low risk levels by 10 percentage points each, and downgrade three remote code execution patterns from Critical to High risk level.

Key changes:

Reduced confidence scores for High (0.85→0.75), Medium (0.70→0.60), and Low (0.55→0.45) risk levels
Downgraded bash_process_substitution, python_remote_exec, and powershell_download_exec from Critical to High
Maintained Critical status for truly destructive operations like filesystem destruction and privilege escalation

Signed-off-by: Dorien Koelemeijer <[email protected]>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Copilot · 2026-01-08T04:37:06Z

crates/goose/src/security/patterns.rs

-            RiskLevel::High => 0.85,
-            RiskLevel::Medium => 0.70,
-            RiskLevel::Low => 0.55,
+            RiskLevel::High => 0.75,


The new threshold of 0.8 is higher than the High risk level confidence score of 0.75. This means High-level threats (including rm -rf /, PowerShell remote execution, data exfiltration, and many other dangerous patterns) will NOT trigger the security prompt by default. Only Critical-level threats with 0.95 confidence will be detected. This significantly weakens security protection and contradicts the goal of reducing false positives while maintaining threat detection.

Suggested change

RiskLevel::High => 0.75,

RiskLevel::High => 0.85,

* 'main' of github.com:block/goose: Fixed fonts (#6389) Update confidence levels prompt injection detection to reduce false positive rates (#6390) Add ML-based prompt injection detection (#5623) docs: update custom extensions tutorial (#6388) fix ResultsFormat error when loading old sessions (#6385) docs: add MCP Apps tutorial and documentation updates (#6384) changed z-index to make sure the search highlighter does not appear on modal overlay (#6386) Handling special claude model response in github copilot provider (#6369) fix: prevent duplicate rendering when tool returns both mcp-ui and mcp-apps resources (#6378) fix: update MCP Apps _meta.ui.resourceUri to use nested format (SEP-1865) (#6372) feat(providers): add streaming support for Google Gemini provider (#6191) Blog: edit links in mcp apps post (#6371) fix: prevent infinite loop of tool-input notifications in MCP Apps (#6374)

* main: (31 commits) added validation and debug for invalid call tool result (#6368) Update MCP apps tutorial: fix _meta structure and version prereq (#6404) Fixed fonts (#6389) Update confidence levels prompt injection detection to reduce false positive rates (#6390) Add ML-based prompt injection detection (#5623) docs: update custom extensions tutorial (#6388) fix ResultsFormat error when loading old sessions (#6385) docs: add MCP Apps tutorial and documentation updates (#6384) changed z-index to make sure the search highlighter does not appear on modal overlay (#6386) Handling special claude model response in github copilot provider (#6369) fix: prevent duplicate rendering when tool returns both mcp-ui and mcp-apps resources (#6378) fix: update MCP Apps _meta.ui.resourceUri to use nested format (SEP-1865) (#6372) feat(providers): add streaming support for Google Gemini provider (#6191) Blog: edit links in mcp apps post (#6371) fix: prevent infinite loop of tool-input notifications in MCP Apps (#6374) fix: Show platform-specific keyboard shortcuts in UI (#6323) fix: we load extensions when agent starts so don't do it up front (#6350) docs: credit HumanLayer in RPI tutorial (#6365) Blog: Goose Lands MCP Apps (#6172) Claude 3.7 is out. we had some harcoded stuff (#6197) ...

* main: (89 commits) fix(google): treat signed text as regular content in streaming (#6400) Add frameDomains and baseUriDomains CSP support for MCP Apps (#6399) fix(ci): add missing dependencies to openapi-schema-check job (#6367) feat: http proxy support Add support for changing working dir and extensions in same window/session (#6057) Sort keys in canonical models (#6403) added validation and debug for invalid call tool result (#6368) Update MCP apps tutorial: fix _meta structure and version prereq (#6404) Fixed fonts (#6389) Update confidence levels prompt injection detection to reduce false positive rates (#6390) Add ML-based prompt injection detection (#5623) docs: update custom extensions tutorial (#6388) fix ResultsFormat error when loading old sessions (#6385) docs: add MCP Apps tutorial and documentation updates (#6384) changed z-index to make sure the search highlighter does not appear on modal overlay (#6386) Handling special claude model response in github copilot provider (#6369) fix: prevent duplicate rendering when tool returns both mcp-ui and mcp-apps resources (#6378) fix: update MCP Apps _meta.ui.resourceUri to use nested format (SEP-1865) (#6372) feat(providers): add streaming support for Google Gemini provider (#6191) Blog: edit links in mcp apps post (#6371) ...

…ositive rates (block#6390) Signed-off-by: Dorien Koelemeijer <[email protected]>

Update confidence levels prompt injection detection to reduce false p…

24db249

…ositive rates

Copilot AI review requested due to automatic review settings January 8, 2026 00:11

Copilot started reviewing on behalf of dorien-koelemeijer January 8, 2026 00:12 View session

Put default threshold to 0.8 - will do same in goose-releases

e759f6c

Copilot AI reviewed Jan 8, 2026

View reviewed changes

michaelneale approved these changes Jan 8, 2026

View reviewed changes

Merge branch 'main' into fix/update-confidence-levels-prompt-injection

2c0d390

Signed-off-by: Dorien Koelemeijer <[email protected]>

dorien-koelemeijer force-pushed the fix/update-confidence-levels-prompt-injection branch from 1d4100e to 2c0d390 Compare January 8, 2026 02:26

Update threshold default in UI settings

624d29d

Copilot AI review requested due to automatic review settings January 8, 2026 02:36

Copilot started reviewing on behalf of dorien-koelemeijer January 8, 2026 02:36 View session

Copilot AI reviewed Jan 8, 2026

View reviewed changes

shellz-n-stuff approved these changes Jan 8, 2026

View reviewed changes

dorien-koelemeijer added 2 commits January 8, 2026 12:50

fix

3198848

fix tests due to changes in confidence levels and default threshold

774b6c6

Copilot AI review requested due to automatic review settings January 8, 2026 04:34

Copilot started reviewing on behalf of dorien-koelemeijer January 8, 2026 04:34 View session

Copilot AI reviewed Jan 8, 2026

View reviewed changes

dorien-koelemeijer merged commit 3d415fc into main Jan 8, 2026
26 checks passed

dorien-koelemeijer deleted the fix/update-confidence-levels-prompt-injection branch January 8, 2026 05:51

github-actions bot mentioned this pull request Jan 13, 2026

chore(release): release version 1.20.0 (minor) #6457

Merged

fbalicchia pushed a commit to fbalicchia/goose that referenced this pull request Jan 13, 2026

Update confidence levels prompt injection detection to reduce false p…

211f622

…ositive rates (block#6390) Signed-off-by: Dorien Koelemeijer <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update confidence levels prompt injection detection to reduce false positive rates #6390

Update confidence levels prompt injection detection to reduce false positive rates #6390

Uh oh!

dorien-koelemeijer commented Jan 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Update confidence levels prompt injection detection to reduce false positive rates #6390

Update confidence levels prompt injection detection to reduce false positive rates #6390

Uh oh!

Conversation

dorien-koelemeijer commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of Change

AI Assistance

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dorien-koelemeijer commented Jan 8, 2026 •

edited

Loading