-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Fix: Small update in how ML-based prompt injection determines final result #6439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Small update in how ML-based prompt injection determines final result #6439
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR improves the ML-based prompt injection detection logic by adding context-aware suppression of non-critical pattern matches when the conversation context is deemed safe by the ML classifier.
Changes:
- Modified
select_result_with_context_awarenessto suppress non-critical pattern detections when conversation context ML confidence is below threshold - Added detailed debug logging for suppression events
- Added
Clonederive toDetailedScanResultstruct
| if context_is_safe && tool_has_only_non_critical { | ||
| tracing::info!( | ||
| "Suppressing non-critical pattern match due to safe context evaluation" | ||
| ); | ||
| tracing::debug!( | ||
| context_ml_confidence = ?context_result.ml_confidence, | ||
| pattern_count = tool_result.pattern_matches.len(), | ||
| max_pattern_risk = ?tool_result.pattern_matches.first().map(|m| &m.threat.risk_level), | ||
| "Suppression conditions met: safe context + non-critical patterns" | ||
| ); | ||
|
|
||
| DetailedScanResult { | ||
| confidence: 0.0, | ||
| pattern_matches: Vec::new(), | ||
| ml_confidence: context_result.ml_confidence, | ||
| } |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new suppression logic lacks test coverage. Add a test that verifies the behavior when context is safe (low ML confidence) and the tool result contains only non-critical pattern matches to ensure the suppression works as intended.
…ODEL_MAPPING' is set but 'SECURITY_PROMPT_CLASSIFIER_MODEL' is set to empty string
668fecf to
7143412
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.
| if context_is_safe && tool_has_only_non_critical { | ||
| tracing::info!("Suppressing non-critical pattern match due to safe context evaluation"); | ||
| tracing::debug!( | ||
| context_ml_confidence = ?context_result.ml_confidence, | ||
| pattern_count = tool_result.pattern_matches.len(), | ||
| max_pattern_risk = ?tool_result.pattern_matches.first().map(|m| &m.threat.risk_level), | ||
| "Suppression conditions met: safe context + non-critical patterns" | ||
| ); | ||
|
|
||
| DetailedScanResult { | ||
| confidence: 0.0, | ||
| pattern_matches: Vec::new(), | ||
| ml_confidence: context_result.ml_confidence, | ||
| } | ||
| } else if tool_result.confidence >= context_result.confidence { |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppression only checks pattern risk levels but ignores ML confidence from the tool scan. If tool_result has high ML confidence with non-Critical patterns, it will be suppressed despite the ML model flagging the tool content as malicious. Add a check: tool_result.ml_confidence is None or below threshold before suppressing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the entire point of the PR? Maybe i'm misunderstanding copilot, but the idea is to reduce false positives for the time being and generally increase sensitivity again later on. Same goes for comment below
crates/goose/src/security/scanner.rs
Outdated
| #[test] | ||
| fn test_context_aware_suppression() { | ||
| let scanner = PromptInjectionScanner::new(); | ||
| let threshold = 0.8; | ||
|
|
||
| // Test case 1: Safe context + non-critical pattern → should suppress | ||
| let result = scanner.select_result_with_context_awareness( | ||
| DetailedScanResult { | ||
| confidence: 0.6, | ||
| pattern_matches: vec![PatternMatch { | ||
| matched_text: "test".to_string(), | ||
| threat: crate::security::patterns::ThreatPattern { | ||
| name: "test_pattern", | ||
| pattern: "test", | ||
| description: "Test pattern", | ||
| risk_level: crate::security::patterns::RiskLevel::Medium, | ||
| category: crate::security::patterns::ThreatCategory::CommandInjection, | ||
| }, | ||
| start_pos: 0, | ||
| end_pos: 4, | ||
| }], | ||
| ml_confidence: None, | ||
| }, | ||
| DetailedScanResult { | ||
| confidence: 0.3, | ||
| pattern_matches: Vec::new(), | ||
| ml_confidence: Some(0.3), | ||
| }, | ||
| threshold, | ||
| ); | ||
| assert_eq!( | ||
| result.confidence, 0.0, | ||
| "Should suppress non-critical pattern with safe context" | ||
| ); | ||
| assert!(result.pattern_matches.is_empty()); | ||
|
|
||
| // Test case 2: Safe context + critical pattern → should NOT suppress | ||
| let result = scanner.select_result_with_context_awareness( | ||
| DetailedScanResult { | ||
| confidence: 0.95, | ||
| pattern_matches: vec![PatternMatch { | ||
| matched_text: "rm -rf /".to_string(), | ||
| threat: crate::security::patterns::ThreatPattern { | ||
| name: "rm_rf_root", | ||
| pattern: r"rm\s+-rf", | ||
| description: "Dangerous command", | ||
| risk_level: crate::security::patterns::RiskLevel::Critical, | ||
| category: crate::security::patterns::ThreatCategory::FileSystemDestruction, | ||
| }, | ||
| start_pos: 0, | ||
| end_pos: 9, | ||
| }], | ||
| ml_confidence: None, | ||
| }, | ||
| DetailedScanResult { | ||
| confidence: 0.3, | ||
| pattern_matches: Vec::new(), | ||
| ml_confidence: Some(0.3), | ||
| }, | ||
| threshold, | ||
| ); | ||
| assert!( | ||
| result.confidence > 0.0, | ||
| "Should NOT suppress critical pattern even with safe context" | ||
| ); | ||
| assert!(!result.pattern_matches.is_empty()); | ||
|
|
||
| // Test case 3: Unsafe context + non-critical pattern → should NOT suppress | ||
| let result = scanner.select_result_with_context_awareness( | ||
| DetailedScanResult { | ||
| confidence: 0.6, | ||
| pattern_matches: vec![PatternMatch { | ||
| matched_text: "test".to_string(), | ||
| threat: crate::security::patterns::ThreatPattern { | ||
| name: "test_pattern", | ||
| pattern: "test", | ||
| description: "Test pattern", | ||
| risk_level: crate::security::patterns::RiskLevel::Medium, | ||
| category: crate::security::patterns::ThreatCategory::CommandInjection, | ||
| }, | ||
| start_pos: 0, | ||
| end_pos: 4, | ||
| }], | ||
| ml_confidence: None, | ||
| }, | ||
| DetailedScanResult { | ||
| confidence: 0.9, | ||
| pattern_matches: Vec::new(), | ||
| ml_confidence: Some(0.9), | ||
| }, | ||
| threshold, | ||
| ); | ||
| assert!( | ||
| result.confidence > 0.0, | ||
| "Should NOT suppress with unsafe context" | ||
| ); | ||
|
|
||
| // Test case 4: No ML confidence (ML disabled) + non-critical pattern → should NOT suppress | ||
| let result = scanner.select_result_with_context_awareness( | ||
| DetailedScanResult { | ||
| confidence: 0.6, | ||
| pattern_matches: vec![PatternMatch { | ||
| matched_text: "test".to_string(), | ||
| threat: crate::security::patterns::ThreatPattern { | ||
| name: "test_pattern", | ||
| pattern: "test", | ||
| description: "Test pattern", | ||
| risk_level: crate::security::patterns::RiskLevel::Medium, | ||
| category: crate::security::patterns::ThreatCategory::CommandInjection, | ||
| }, | ||
| start_pos: 0, | ||
| end_pos: 4, | ||
| }], | ||
| ml_confidence: None, | ||
| }, | ||
| DetailedScanResult { | ||
| confidence: 0.0, | ||
| pattern_matches: Vec::new(), | ||
| ml_confidence: None, | ||
| }, | ||
| threshold, | ||
| ); | ||
| assert!( | ||
| result.confidence > 0.0, | ||
| "Should NOT suppress when ML is disabled" | ||
| ); | ||
| } |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test coverage is missing for the case where tool_result has high ML confidence with non-Critical patterns in a safe context. This scenario could incorrectly suppress ML-detected threats.
| if let Some(first_model) = mapping.models.keys().next() { | ||
| tracing::info!( | ||
| default_model = %first_model, | ||
| "SECURITY_ML_MODEL_MAPPING available but no model selected - using first available model as default" |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HashMap iteration order is randomized in Rust, making default model selection non-deterministic across runs. For security features, consider sorting keys or using the first key alphabetically to ensure consistent behavior.
| if let Some(first_model) = mapping.models.keys().next() { | |
| tracing::info!( | |
| default_model = %first_model, | |
| "SECURITY_ML_MODEL_MAPPING available but no model selected - using first available model as default" | |
| let mut model_names: Vec<_> = mapping.models.keys().cloned().collect(); | |
| model_names.sort(); | |
| if let Some(first_model) = model_names.first() { | |
| tracing::info!( | |
| default_model = %first_model, | |
| "SECURITY_ML_MODEL_MAPPING available but no model selected - using first available model as default (lexicographically smallest)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really a critical comment
DOsinga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that we increased the cut-off for pattern matching, does this make a change
crates/goose/src/security/scanner.rs
Outdated
| assert!(result.explanation.contains("Security threat")); | ||
| } | ||
|
|
||
| #[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is 130 lines of code to essentially test 2 booleans and how they interact. Delete
crates/goose/src/security/scanner.rs
Outdated
| pattern_count = tool_result.pattern_matches.len(), | ||
| max_pattern_risk = ?tool_result.pattern_matches.first().map(|m| &m.threat.risk_level), | ||
| "Suppression conditions met: safe context + non-critical patterns" | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to info & debug log here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, this was helpful for local testing but can probably get rid of these for now, ty!
It will slightly, if conversation context seems fully safe, I don't think it's necessary to alert on a tool call. However, it's a critical tool call finding, we'd probably want to safe rather than sorry (it's not only for pattern matching, but also going forward when we expand on the pattern matching) |
…ased * 'main' of github.com:block/goose: (23 commits) Use Intl.NumberFormat for token formatting in SessionsInsights (#6466) feat(ui): format large and small token counts for readability (#6449) fix: apply subrecipes when using slash commands (#6460) Fix: exclude platform_schedule_tool in CLI (#6442) Fix: Small update in how ML-based prompt injection determines final result (#6439) docs: remove SSE transport and rename to Streamable HTTP (#6319) fix: correct Cloudinary extension command and env variable (#6453) fix: add gap between buttons in MacDesktopInstallButtons.js (#6452) refactor: include hidden dotfiles folders in file picker search (#6315) upgraded safe npm packages (#6450) chore(deps): bump react-router and react-router-dom in /ui/desktop (#6408) chore(deps): bump lru from 0.12.5 to 0.16.3 (#6379) chore(deps-dev): bump @modelcontextprotocol/sdk from 1.24.0 to 1.25.2 in /ui/desktop (#6375) fix: inconsistent API url requirement between desktop and CLI versions (#6419) feat(vertexai): Add streaming support (#6409) fix deeplink recipe launch cold start (#6210) Spell check setting (#6446) File bug directly (#6413) fix(cli): incorrect bin name in shell completions (#6444) Use crunchy from crates instead of git fork (#6415) ...
* main: (41 commits) Allow customizing the new line keybinding in the CLI (#5956) Ask for permission in the CLI (#6475) docs: add Ralph Loop tutorial for multi-model iterative development (#6455) Remove gitignore fallback from gooseignore docs (#6480) fix: clean up result recording for code mode (#6343) fix(code_execution): handle model quirks with tool calls (#6352) feat(ui): support prefersBorder option for MCP Apps (#6465) fixed line breaks (#6459) Use Intl.NumberFormat for token formatting in SessionsInsights (#6466) feat(ui): format large and small token counts for readability (#6449) fix: apply subrecipes when using slash commands (#6460) Fix: exclude platform_schedule_tool in CLI (#6442) Fix: Small update in how ML-based prompt injection determines final result (#6439) docs: remove SSE transport and rename to Streamable HTTP (#6319) fix: correct Cloudinary extension command and env variable (#6453) fix: add gap between buttons in MacDesktopInstallButtons.js (#6452) refactor: include hidden dotfiles folders in file picker search (#6315) upgraded safe npm packages (#6450) chore(deps): bump react-router and react-router-dom in /ui/desktop (#6408) chore(deps): bump lru from 0.12.5 to 0.16.3 (#6379) ...
Summary
Small update in how ML-based prompt injection detection, evaluating conversation context, impacts what findings will be flagged and which ones won't. To reduce false positives, if the prompt injection evaluation of conversation context results in a SAFE result, only flag tool call findings that have a CRITICAL result (just to be on the safe side).
This means:
(if above threshold that user set of course)
Type of Change
AI Assistance
Testing
Local testing using
just run-uiand various test cases