-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Update confidence levels prompt injection detection to reduce false positive rates #6390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update confidence levels prompt injection detection to reduce false positive rates #6390
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR reduces false positive rates in the prompt injection detection system by adjusting confidence scores and risk level classifications. The changes lower confidence thresholds across High, Medium, and Low risk levels by 10 percentage points each, and downgrade three remote code execution patterns from Critical to High risk level.
Key changes:
- Reduced confidence scores for High (0.85→0.75), Medium (0.70→0.60), and Low (0.55→0.45) risk levels
- Downgraded bash_process_substitution, python_remote_exec, and powershell_download_exec from Critical to High
- Maintained Critical status for truly destructive operations like filesystem destruction and privilege escalation
Signed-off-by: Dorien Koelemeijer <[email protected]>
1d4100e to
2c0d390
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| RiskLevel::High => 0.85, | ||
| RiskLevel::Medium => 0.70, | ||
| RiskLevel::Low => 0.55, | ||
| RiskLevel::High => 0.75, |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new threshold of 0.8 is higher than the High risk level confidence score of 0.75. This means High-level threats (including rm -rf /, PowerShell remote execution, data exfiltration, and many other dangerous patterns) will NOT trigger the security prompt by default. Only Critical-level threats with 0.95 confidence will be detected. This significantly weakens security protection and contradicts the goal of reducing false positives while maintaining threat detection.
| RiskLevel::High => 0.75, | |
| RiskLevel::High => 0.85, |
* 'main' of github.com:block/goose: Fixed fonts (#6389) Update confidence levels prompt injection detection to reduce false positive rates (#6390) Add ML-based prompt injection detection (#5623) docs: update custom extensions tutorial (#6388) fix ResultsFormat error when loading old sessions (#6385) docs: add MCP Apps tutorial and documentation updates (#6384) changed z-index to make sure the search highlighter does not appear on modal overlay (#6386) Handling special claude model response in github copilot provider (#6369) fix: prevent duplicate rendering when tool returns both mcp-ui and mcp-apps resources (#6378) fix: update MCP Apps _meta.ui.resourceUri to use nested format (SEP-1865) (#6372) feat(providers): add streaming support for Google Gemini provider (#6191) Blog: edit links in mcp apps post (#6371) fix: prevent infinite loop of tool-input notifications in MCP Apps (#6374)
* main: (31 commits) added validation and debug for invalid call tool result (#6368) Update MCP apps tutorial: fix _meta structure and version prereq (#6404) Fixed fonts (#6389) Update confidence levels prompt injection detection to reduce false positive rates (#6390) Add ML-based prompt injection detection (#5623) docs: update custom extensions tutorial (#6388) fix ResultsFormat error when loading old sessions (#6385) docs: add MCP Apps tutorial and documentation updates (#6384) changed z-index to make sure the search highlighter does not appear on modal overlay (#6386) Handling special claude model response in github copilot provider (#6369) fix: prevent duplicate rendering when tool returns both mcp-ui and mcp-apps resources (#6378) fix: update MCP Apps _meta.ui.resourceUri to use nested format (SEP-1865) (#6372) feat(providers): add streaming support for Google Gemini provider (#6191) Blog: edit links in mcp apps post (#6371) fix: prevent infinite loop of tool-input notifications in MCP Apps (#6374) fix: Show platform-specific keyboard shortcuts in UI (#6323) fix: we load extensions when agent starts so don't do it up front (#6350) docs: credit HumanLayer in RPI tutorial (#6365) Blog: Goose Lands MCP Apps (#6172) Claude 3.7 is out. we had some harcoded stuff (#6197) ...
* main: (89 commits) fix(google): treat signed text as regular content in streaming (#6400) Add frameDomains and baseUriDomains CSP support for MCP Apps (#6399) fix(ci): add missing dependencies to openapi-schema-check job (#6367) feat: http proxy support Add support for changing working dir and extensions in same window/session (#6057) Sort keys in canonical models (#6403) added validation and debug for invalid call tool result (#6368) Update MCP apps tutorial: fix _meta structure and version prereq (#6404) Fixed fonts (#6389) Update confidence levels prompt injection detection to reduce false positive rates (#6390) Add ML-based prompt injection detection (#5623) docs: update custom extensions tutorial (#6388) fix ResultsFormat error when loading old sessions (#6385) docs: add MCP Apps tutorial and documentation updates (#6384) changed z-index to make sure the search highlighter does not appear on modal overlay (#6386) Handling special claude model response in github copilot provider (#6369) fix: prevent duplicate rendering when tool returns both mcp-ui and mcp-apps resources (#6378) fix: update MCP Apps _meta.ui.resourceUri to use nested format (SEP-1865) (#6372) feat(providers): add streaming support for Google Gemini provider (#6191) Blog: edit links in mcp apps post (#6371) ...
* main: (89 commits) fix(google): treat signed text as regular content in streaming (#6400) Add frameDomains and baseUriDomains CSP support for MCP Apps (#6399) fix(ci): add missing dependencies to openapi-schema-check job (#6367) feat: http proxy support Add support for changing working dir and extensions in same window/session (#6057) Sort keys in canonical models (#6403) added validation and debug for invalid call tool result (#6368) Update MCP apps tutorial: fix _meta structure and version prereq (#6404) Fixed fonts (#6389) Update confidence levels prompt injection detection to reduce false positive rates (#6390) Add ML-based prompt injection detection (#5623) docs: update custom extensions tutorial (#6388) fix ResultsFormat error when loading old sessions (#6385) docs: add MCP Apps tutorial and documentation updates (#6384) changed z-index to make sure the search highlighter does not appear on modal overlay (#6386) Handling special claude model response in github copilot provider (#6369) fix: prevent duplicate rendering when tool returns both mcp-ui and mcp-apps resources (#6378) fix: update MCP Apps _meta.ui.resourceUri to use nested format (SEP-1865) (#6372) feat(providers): add streaming support for Google Gemini provider (#6191) Blog: edit links in mcp apps post (#6371) ...
…ositive rates (block#6390) Signed-off-by: Dorien Koelemeijer <[email protected]>
Summary
There's been some complaints about high false positive rates for the prompt injection detection feature. This PR adjusts the confidence levels of various detections to reduce this. Another update that will be made is to set the default to 0.8 for the time being until we've adjusted things properly.
Type of Change
AI Assistance
Testing
Local testing using
just run-ui