Fix: Small update in how ML-based prompt injection determines final result #6439

dorien-koelemeijer · 2026-01-12T03:14:23Z

Summary

Small update in how ML-based prompt injection detection, evaluating conversation context, impacts what findings will be flagged and which ones won't. To reduce false positives, if the prompt injection evaluation of conversation context results in a SAFE result, only flag tool call findings that have a CRITICAL result (just to be on the safe side).

This means:

Safe context + non-critical pattern → suppressed
Safe context + critical pattern → flagged
Unsafe context + any pattern → flagged
(if above threshold that user set of course)

Type of Change

AI Assistance

This PR was created or reviewed with AI assistance

Testing

Local testing using just run-ui and various test cases

…l result

Copilot

Pull request overview

This PR improves the ML-based prompt injection detection logic by adding context-aware suppression of non-critical pattern matches when the conversation context is deemed safe by the ML classifier.

Changes:

Modified select_result_with_context_awareness to suppress non-critical pattern detections when conversation context ML confidence is below threshold
Added detailed debug logging for suppression events
Added Clone derive to DetailedScanResult struct

Copilot · 2026-01-12T03:16:57Z

crates/goose/src/security/scanner.rs

+        if context_is_safe && tool_has_only_non_critical {
+            tracing::info!(
+                "Suppressing non-critical pattern match due to safe context evaluation"
+            );
+            tracing::debug!(
+                context_ml_confidence = ?context_result.ml_confidence,
+                pattern_count = tool_result.pattern_matches.len(),
+                max_pattern_risk = ?tool_result.pattern_matches.first().map(|m| &m.threat.risk_level),
+                "Suppression conditions met: safe context + non-critical patterns"
+            );
+
+            DetailedScanResult {
+                confidence: 0.0,
+                pattern_matches: Vec::new(),
+                ml_confidence: context_result.ml_confidence,
+            }


This new suppression logic lacks test coverage. Add a test that verifies the behavior when context is safe (low ML confidence) and the tool result contains only non-critical pattern matches to ensure the suppression works as intended.

…ODEL_MAPPING' is set but 'SECURITY_PROMPT_CLASSIFIER_MODEL' is set to empty string

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Copilot · 2026-01-12T06:29:40Z

crates/goose/src/security/scanner.rs

+        if context_is_safe && tool_has_only_non_critical {
+            tracing::info!("Suppressing non-critical pattern match due to safe context evaluation");
+            tracing::debug!(
+                context_ml_confidence = ?context_result.ml_confidence,
+                pattern_count = tool_result.pattern_matches.len(),
+                max_pattern_risk = ?tool_result.pattern_matches.first().map(|m| &m.threat.risk_level),
+                "Suppression conditions met: safe context + non-critical patterns"
+            );
+
+            DetailedScanResult {
+                confidence: 0.0,
+                pattern_matches: Vec::new(),
+                ml_confidence: context_result.ml_confidence,
+            }
+        } else if tool_result.confidence >= context_result.confidence {


Suppression only checks pattern risk levels but ignores ML confidence from the tool scan. If tool_result has high ML confidence with non-Critical patterns, it will be suppressed despite the ML model flagging the tool content as malicious. Add a check: tool_result.ml_confidence is None or below threshold before suppressing.

This is the entire point of the PR? Maybe i'm misunderstanding copilot, but the idea is to reduce false positives for the time being and generally increase sensitivity again later on. Same goes for comment below

Copilot · 2026-01-12T06:29:40Z

crates/goose/src/security/scanner.rs

+    #[test]
+    fn test_context_aware_suppression() {
+        let scanner = PromptInjectionScanner::new();
+        let threshold = 0.8;
+
+        // Test case 1: Safe context + non-critical pattern → should suppress
+        let result = scanner.select_result_with_context_awareness(
+            DetailedScanResult {
+                confidence: 0.6,
+                pattern_matches: vec![PatternMatch {
+                    matched_text: "test".to_string(),
+                    threat: crate::security::patterns::ThreatPattern {
+                        name: "test_pattern",
+                        pattern: "test",
+                        description: "Test pattern",
+                        risk_level: crate::security::patterns::RiskLevel::Medium,
+                        category: crate::security::patterns::ThreatCategory::CommandInjection,
+                    },
+                    start_pos: 0,
+                    end_pos: 4,
+                }],
+                ml_confidence: None,
+            },
+            DetailedScanResult {
+                confidence: 0.3,
+                pattern_matches: Vec::new(),
+                ml_confidence: Some(0.3),
+            },
+            threshold,
+        );
+        assert_eq!(
+            result.confidence, 0.0,
+            "Should suppress non-critical pattern with safe context"
+        );
+        assert!(result.pattern_matches.is_empty());
+
+        // Test case 2: Safe context + critical pattern → should NOT suppress
+        let result = scanner.select_result_with_context_awareness(
+            DetailedScanResult {
+                confidence: 0.95,
+                pattern_matches: vec![PatternMatch {
+                    matched_text: "rm -rf /".to_string(),
+                    threat: crate::security::patterns::ThreatPattern {
+                        name: "rm_rf_root",
+                        pattern: r"rm\s+-rf",
+                        description: "Dangerous command",
+                        risk_level: crate::security::patterns::RiskLevel::Critical,
+                        category: crate::security::patterns::ThreatCategory::FileSystemDestruction,
+                    },
+                    start_pos: 0,
+                    end_pos: 9,
+                }],
+                ml_confidence: None,
+            },
+            DetailedScanResult {
+                confidence: 0.3,
+                pattern_matches: Vec::new(),
+                ml_confidence: Some(0.3),
+            },
+            threshold,
+        );
+        assert!(
+            result.confidence > 0.0,
+            "Should NOT suppress critical pattern even with safe context"
+        );
+        assert!(!result.pattern_matches.is_empty());
+
+        // Test case 3: Unsafe context + non-critical pattern → should NOT suppress
+        let result = scanner.select_result_with_context_awareness(
+            DetailedScanResult {
+                confidence: 0.6,
+                pattern_matches: vec![PatternMatch {
+                    matched_text: "test".to_string(),
+                    threat: crate::security::patterns::ThreatPattern {
+                        name: "test_pattern",
+                        pattern: "test",
+                        description: "Test pattern",
+                        risk_level: crate::security::patterns::RiskLevel::Medium,
+                        category: crate::security::patterns::ThreatCategory::CommandInjection,
+                    },
+                    start_pos: 0,
+                    end_pos: 4,
+                }],
+                ml_confidence: None,
+            },
+            DetailedScanResult {
+                confidence: 0.9,
+                pattern_matches: Vec::new(),
+                ml_confidence: Some(0.9),
+            },
+            threshold,
+        );
+        assert!(
+            result.confidence > 0.0,
+            "Should NOT suppress with unsafe context"
+        );
+
+        // Test case 4: No ML confidence (ML disabled) + non-critical pattern → should NOT suppress
+        let result = scanner.select_result_with_context_awareness(
+            DetailedScanResult {
+                confidence: 0.6,
+                pattern_matches: vec![PatternMatch {
+                    matched_text: "test".to_string(),
+                    threat: crate::security::patterns::ThreatPattern {
+                        name: "test_pattern",
+                        pattern: "test",
+                        description: "Test pattern",
+                        risk_level: crate::security::patterns::RiskLevel::Medium,
+                        category: crate::security::patterns::ThreatCategory::CommandInjection,
+                    },
+                    start_pos: 0,
+                    end_pos: 4,
+                }],
+                ml_confidence: None,
+            },
+            DetailedScanResult {
+                confidence: 0.0,
+                pattern_matches: Vec::new(),
+                ml_confidence: None,
+            },
+            threshold,
+        );
+        assert!(
+            result.confidence > 0.0,
+            "Should NOT suppress when ML is disabled"
+        );
+    }


Test coverage is missing for the case where tool_result has high ML confidence with non-Critical patterns in a safe context. This scenario could incorrectly suppress ML-detected threats.

Copilot · 2026-01-12T06:29:41Z

crates/goose/src/security/scanner.rs

+                    if let Some(first_model) = mapping.models.keys().next() {
+                        tracing::info!(
+                            default_model = %first_model,
+                            "SECURITY_ML_MODEL_MAPPING available but no model selected - using first available model as default"


HashMap iteration order is randomized in Rust, making default model selection non-deterministic across runs. For security features, consider sorting keys or using the first key alphabetically to ensure consistent behavior.

Suggested change

if let Some(first_model) = mapping.models.keys().next() {

tracing::info!(

default_model = %first_model,

"SECURITY_ML_MODEL_MAPPING available but no model selected - using first available model as default"

let mut model_names: Vec<_> = mapping.models.keys().cloned().collect();

model_names.sort();

if let Some(first_model) = model_names.first() {

tracing::info!(

default_model = %first_model,

"SECURITY_ML_MODEL_MAPPING available but no model selected - using first available model as default (lexicographically smallest)"

Not really a critical comment

DOsinga

given that we increased the cut-off for pattern matching, does this make a change

DOsinga · 2026-01-12T15:55:19Z

crates/goose/src/security/scanner.rs

        assert!(result.explanation.contains("Security threat"));
    }
+
+    #[test]


this is 130 lines of code to essentially test 2 booleans and how they interact. Delete

DOsinga · 2026-01-12T15:55:42Z

crates/goose/src/security/scanner.rs

+                pattern_count = tool_result.pattern_matches.len(),
+                max_pattern_risk = ?tool_result.pattern_matches.first().map(|m| &m.threat.risk_level),
+                "Suppression conditions met: safe context + non-critical patterns"
+            );


do you want to info & debug log here?

Good point, this was helpful for local testing but can probably get rid of these for now, ty!

dorien-koelemeijer · 2026-01-12T21:52:49Z

given that we increased the cut-off for pattern matching, does this make a change

It will slightly, if conversation context seems fully safe, I don't think it's necessary to alert on a tool call. However, it's a critical tool call finding, we'd probably want to safe rather than sorry (it's not only for pattern matching, but also going forward when we expand on the pattern matching)

…esult (block#6439)

…ased * 'main' of github.com:block/goose: (23 commits) Use Intl.NumberFormat for token formatting in SessionsInsights (#6466) feat(ui): format large and small token counts for readability (#6449) fix: apply subrecipes when using slash commands (#6460) Fix: exclude platform_schedule_tool in CLI (#6442) Fix: Small update in how ML-based prompt injection determines final result (#6439) docs: remove SSE transport and rename to Streamable HTTP (#6319) fix: correct Cloudinary extension command and env variable (#6453) fix: add gap between buttons in MacDesktopInstallButtons.js (#6452) refactor: include hidden dotfiles folders in file picker search (#6315) upgraded safe npm packages (#6450) chore(deps): bump react-router and react-router-dom in /ui/desktop (#6408) chore(deps): bump lru from 0.12.5 to 0.16.3 (#6379) chore(deps-dev): bump @modelcontextprotocol/sdk from 1.24.0 to 1.25.2 in /ui/desktop (#6375) fix: inconsistent API url requirement between desktop and CLI versions (#6419) feat(vertexai): Add streaming support (#6409) fix deeplink recipe launch cold start (#6210) Spell check setting (#6446) File bug directly (#6413) fix(cli): incorrect bin name in shell completions (#6444) Use crunchy from crates instead of git fork (#6415) ...

* main: (41 commits) Allow customizing the new line keybinding in the CLI (#5956) Ask for permission in the CLI (#6475) docs: add Ralph Loop tutorial for multi-model iterative development (#6455) Remove gitignore fallback from gooseignore docs (#6480) fix: clean up result recording for code mode (#6343) fix(code_execution): handle model quirks with tool calls (#6352) feat(ui): support prefersBorder option for MCP Apps (#6465) fixed line breaks (#6459) Use Intl.NumberFormat for token formatting in SessionsInsights (#6466) feat(ui): format large and small token counts for readability (#6449) fix: apply subrecipes when using slash commands (#6460) Fix: exclude platform_schedule_tool in CLI (#6442) Fix: Small update in how ML-based prompt injection determines final result (#6439) docs: remove SSE transport and rename to Streamable HTTP (#6319) fix: correct Cloudinary extension command and env variable (#6453) fix: add gap between buttons in MacDesktopInstallButtons.js (#6452) refactor: include hidden dotfiles folders in file picker search (#6315) upgraded safe npm packages (#6450) chore(deps): bump react-router and react-router-dom in /ui/desktop (#6408) chore(deps): bump lru from 0.12.5 to 0.16.3 (#6379) ...

If context safe, don't flag prompt injection unless critical tool cal…

fece726

…l result

Copilot AI review requested due to automatic review settings January 12, 2026 03:14

dorien-koelemeijer marked this pull request as draft January 12, 2026 03:14

Copilot started reviewing on behalf of dorien-koelemeijer January 12, 2026 03:14 View session

Copilot AI reviewed Jan 12, 2026

View reviewed changes

dorien-koelemeijer added 2 commits January 12, 2026 13:57

fix

836884b

Make sure 'SECURITY_PROMPT_CLASSIFIER_MODEL' is set if 'SECURITY_ML_M…

ce9e44f

…ODEL_MAPPING' is set but 'SECURITY_PROMPT_CLASSIFIER_MODEL' is set to empty string

block deleted a comment from Copilot AI Jan 12, 2026

Add test

7143412

dorien-koelemeijer force-pushed the fix/small-update-ml-based-prompt-injection-detection branch from 668fecf to 7143412 Compare January 12, 2026 06:07

dorien-koelemeijer marked this pull request as ready for review January 12, 2026 06:25

Copilot AI review requested due to automatic review settings January 12, 2026 06:25

Copilot started reviewing on behalf of dorien-koelemeijer January 12, 2026 06:26 View session

Copilot AI reviewed Jan 12, 2026

View reviewed changes

dorien-koelemeijer marked this pull request as draft January 12, 2026 06:57

dorien-koelemeijer marked this pull request as ready for review January 12, 2026 07:00

DOsinga approved these changes Jan 12, 2026

View reviewed changes

address PR comments

5a703ef

dorien-koelemeijer merged commit 8a9b090 into main Jan 12, 2026
18 of 20 checks passed

dorien-koelemeijer deleted the fix/small-update-ml-based-prompt-injection-detection branch January 12, 2026 22:20

github-actions bot mentioned this pull request Jan 13, 2026

chore(release): release version 1.20.0 (minor) #6457

Merged

fbalicchia pushed a commit to fbalicchia/goose that referenced this pull request Jan 13, 2026

Fix: Small update in how ML-based prompt injection determines final r…

731b6f2

…esult (block#6439)

Fix: Small update in how ML-based prompt injection determines final result #6439

Fix: Small update in how ML-based prompt injection determines final result #6439

Uh oh!

Conversation

dorien-koelemeijer commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of Change

AI Assistance

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

dorien-koelemeijer Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

dorien-koelemeijer Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

DOsinga left a comment

Choose a reason for hiding this comment

Uh oh!

DOsinga Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

DOsinga Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

dorien-koelemeijer Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

dorien-koelemeijer commented Jan 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dorien-koelemeijer commented Jan 12, 2026 •

edited

Loading