Skip to content

Conversation

@dorien-koelemeijer
Copy link
Collaborator

@dorien-koelemeijer dorien-koelemeijer commented Jan 12, 2026

Summary

Small update in how ML-based prompt injection detection, evaluating conversation context, impacts what findings will be flagged and which ones won't. To reduce false positives, if the prompt injection evaluation of conversation context results in a SAFE result, only flag tool call findings that have a CRITICAL result (just to be on the safe side).

This means:

  • Safe context + non-critical pattern → suppressed
  • Safe context + critical pattern → flagged
  • Unsafe context + any pattern → flagged
    (if above threshold that user set of course)

Type of Change

  • Feature
  • Bug fix
  • Refactor / Code quality
  • Performance improvement
  • Documentation
  • Tests
  • Security fix
  • Build / Release
  • Other (specify below)

AI Assistance

  • This PR was created or reviewed with AI assistance

Testing

Local testing using just run-ui and various test cases

Copilot AI review requested due to automatic review settings January 12, 2026 03:14
@dorien-koelemeijer dorien-koelemeijer marked this pull request as draft January 12, 2026 03:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the ML-based prompt injection detection logic by adding context-aware suppression of non-critical pattern matches when the conversation context is deemed safe by the ML classifier.

Changes:

  • Modified select_result_with_context_awareness to suppress non-critical pattern detections when conversation context ML confidence is below threshold
  • Added detailed debug logging for suppression events
  • Added Clone derive to DetailedScanResult struct

Comment on lines 192 to 207
if context_is_safe && tool_has_only_non_critical {
tracing::info!(
"Suppressing non-critical pattern match due to safe context evaluation"
);
tracing::debug!(
context_ml_confidence = ?context_result.ml_confidence,
pattern_count = tool_result.pattern_matches.len(),
max_pattern_risk = ?tool_result.pattern_matches.first().map(|m| &m.threat.risk_level),
"Suppression conditions met: safe context + non-critical patterns"
);

DetailedScanResult {
confidence: 0.0,
pattern_matches: Vec::new(),
ml_confidence: context_result.ml_confidence,
}
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new suppression logic lacks test coverage. Add a test that verifies the behavior when context is safe (low ML confidence) and the tool result contains only non-critical pattern matches to ensure the suppression works as intended.

Copilot uses AI. Check for mistakes.
…ODEL_MAPPING' is set but 'SECURITY_PROMPT_CLASSIFIER_MODEL' is set to empty string
@block block deleted a comment from Copilot AI Jan 12, 2026
@block block deleted a comment from Copilot AI Jan 12, 2026
@dorien-koelemeijer dorien-koelemeijer force-pushed the fix/small-update-ml-based-prompt-injection-detection branch from 668fecf to 7143412 Compare January 12, 2026 06:07
@dorien-koelemeijer dorien-koelemeijer marked this pull request as ready for review January 12, 2026 06:25
Copilot AI review requested due to automatic review settings January 12, 2026 06:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment on lines 208 to 222
if context_is_safe && tool_has_only_non_critical {
tracing::info!("Suppressing non-critical pattern match due to safe context evaluation");
tracing::debug!(
context_ml_confidence = ?context_result.ml_confidence,
pattern_count = tool_result.pattern_matches.len(),
max_pattern_risk = ?tool_result.pattern_matches.first().map(|m| &m.threat.risk_level),
"Suppression conditions met: safe context + non-critical patterns"
);

DetailedScanResult {
confidence: 0.0,
pattern_matches: Vec::new(),
ml_confidence: context_result.ml_confidence,
}
} else if tool_result.confidence >= context_result.confidence {
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppression only checks pattern risk levels but ignores ML confidence from the tool scan. If tool_result has high ML confidence with non-Critical patterns, it will be suppressed despite the ML model flagging the tool content as malicious. Add a check: tool_result.ml_confidence is None or below threshold before suppressing.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the entire point of the PR? Maybe i'm misunderstanding copilot, but the idea is to reduce false positives for the time being and generally increase sensitivity again later on. Same goes for comment below

Comment on lines 363 to 489
#[test]
fn test_context_aware_suppression() {
let scanner = PromptInjectionScanner::new();
let threshold = 0.8;

// Test case 1: Safe context + non-critical pattern → should suppress
let result = scanner.select_result_with_context_awareness(
DetailedScanResult {
confidence: 0.6,
pattern_matches: vec![PatternMatch {
matched_text: "test".to_string(),
threat: crate::security::patterns::ThreatPattern {
name: "test_pattern",
pattern: "test",
description: "Test pattern",
risk_level: crate::security::patterns::RiskLevel::Medium,
category: crate::security::patterns::ThreatCategory::CommandInjection,
},
start_pos: 0,
end_pos: 4,
}],
ml_confidence: None,
},
DetailedScanResult {
confidence: 0.3,
pattern_matches: Vec::new(),
ml_confidence: Some(0.3),
},
threshold,
);
assert_eq!(
result.confidence, 0.0,
"Should suppress non-critical pattern with safe context"
);
assert!(result.pattern_matches.is_empty());

// Test case 2: Safe context + critical pattern → should NOT suppress
let result = scanner.select_result_with_context_awareness(
DetailedScanResult {
confidence: 0.95,
pattern_matches: vec![PatternMatch {
matched_text: "rm -rf /".to_string(),
threat: crate::security::patterns::ThreatPattern {
name: "rm_rf_root",
pattern: r"rm\s+-rf",
description: "Dangerous command",
risk_level: crate::security::patterns::RiskLevel::Critical,
category: crate::security::patterns::ThreatCategory::FileSystemDestruction,
},
start_pos: 0,
end_pos: 9,
}],
ml_confidence: None,
},
DetailedScanResult {
confidence: 0.3,
pattern_matches: Vec::new(),
ml_confidence: Some(0.3),
},
threshold,
);
assert!(
result.confidence > 0.0,
"Should NOT suppress critical pattern even with safe context"
);
assert!(!result.pattern_matches.is_empty());

// Test case 3: Unsafe context + non-critical pattern → should NOT suppress
let result = scanner.select_result_with_context_awareness(
DetailedScanResult {
confidence: 0.6,
pattern_matches: vec![PatternMatch {
matched_text: "test".to_string(),
threat: crate::security::patterns::ThreatPattern {
name: "test_pattern",
pattern: "test",
description: "Test pattern",
risk_level: crate::security::patterns::RiskLevel::Medium,
category: crate::security::patterns::ThreatCategory::CommandInjection,
},
start_pos: 0,
end_pos: 4,
}],
ml_confidence: None,
},
DetailedScanResult {
confidence: 0.9,
pattern_matches: Vec::new(),
ml_confidence: Some(0.9),
},
threshold,
);
assert!(
result.confidence > 0.0,
"Should NOT suppress with unsafe context"
);

// Test case 4: No ML confidence (ML disabled) + non-critical pattern → should NOT suppress
let result = scanner.select_result_with_context_awareness(
DetailedScanResult {
confidence: 0.6,
pattern_matches: vec![PatternMatch {
matched_text: "test".to_string(),
threat: crate::security::patterns::ThreatPattern {
name: "test_pattern",
pattern: "test",
description: "Test pattern",
risk_level: crate::security::patterns::RiskLevel::Medium,
category: crate::security::patterns::ThreatCategory::CommandInjection,
},
start_pos: 0,
end_pos: 4,
}],
ml_confidence: None,
},
DetailedScanResult {
confidence: 0.0,
pattern_matches: Vec::new(),
ml_confidence: None,
},
threshold,
);
assert!(
result.confidence > 0.0,
"Should NOT suppress when ML is disabled"
);
}
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage is missing for the case where tool_result has high ML confidence with non-Critical patterns in a safe context. This scenario could incorrectly suppress ML-detected threats.

Copilot uses AI. Check for mistakes.
Comment on lines +68 to +71
if let Some(first_model) = mapping.models.keys().next() {
tracing::info!(
default_model = %first_model,
"SECURITY_ML_MODEL_MAPPING available but no model selected - using first available model as default"
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HashMap iteration order is randomized in Rust, making default model selection non-deterministic across runs. For security features, consider sorting keys or using the first key alphabetically to ensure consistent behavior.

Suggested change
if let Some(first_model) = mapping.models.keys().next() {
tracing::info!(
default_model = %first_model,
"SECURITY_ML_MODEL_MAPPING available but no model selected - using first available model as default"
let mut model_names: Vec<_> = mapping.models.keys().cloned().collect();
model_names.sort();
if let Some(first_model) = model_names.first() {
tracing::info!(
default_model = %first_model,
"SECURITY_ML_MODEL_MAPPING available but no model selected - using first available model as default (lexicographically smallest)"

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a critical comment

@dorien-koelemeijer dorien-koelemeijer marked this pull request as draft January 12, 2026 06:57
@dorien-koelemeijer dorien-koelemeijer marked this pull request as ready for review January 12, 2026 07:00
Copy link
Collaborator

@DOsinga DOsinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that we increased the cut-off for pattern matching, does this make a change

assert!(result.explanation.contains("Security threat"));
}

#[test]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is 130 lines of code to essentially test 2 booleans and how they interact. Delete

pattern_count = tool_result.pattern_matches.len(),
max_pattern_risk = ?tool_result.pattern_matches.first().map(|m| &m.threat.risk_level),
"Suppression conditions met: safe context + non-critical patterns"
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to info & debug log here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this was helpful for local testing but can probably get rid of these for now, ty!

@dorien-koelemeijer
Copy link
Collaborator Author

given that we increased the cut-off for pattern matching, does this make a change

It will slightly, if conversation context seems fully safe, I don't think it's necessary to alert on a tool call. However, it's a critical tool call finding, we'd probably want to safe rather than sorry (it's not only for pattern matching, but also going forward when we expand on the pattern matching)

@dorien-koelemeijer dorien-koelemeijer merged commit 8a9b090 into main Jan 12, 2026
18 of 20 checks passed
@dorien-koelemeijer dorien-koelemeijer deleted the fix/small-update-ml-based-prompt-injection-detection branch January 12, 2026 22:20
fbalicchia pushed a commit to fbalicchia/goose that referenced this pull request Jan 13, 2026
zanesq added a commit that referenced this pull request Jan 13, 2026
…ased

* 'main' of github.com:block/goose: (23 commits)
  Use Intl.NumberFormat for token formatting in SessionsInsights (#6466)
  feat(ui): format large and small token counts for readability (#6449)
  fix: apply subrecipes when using slash commands (#6460)
  Fix: exclude platform_schedule_tool in CLI (#6442)
  Fix: Small update in how ML-based prompt injection determines final result (#6439)
  docs: remove SSE transport and rename to Streamable HTTP (#6319)
  fix: correct Cloudinary extension command and env variable (#6453)
  fix: add gap between buttons in MacDesktopInstallButtons.js (#6452)
  refactor: include hidden dotfiles folders in file picker search (#6315)
  upgraded safe npm packages (#6450)
  chore(deps): bump react-router and react-router-dom in /ui/desktop (#6408)
  chore(deps): bump lru from 0.12.5 to 0.16.3 (#6379)
  chore(deps-dev): bump @modelcontextprotocol/sdk from 1.24.0 to 1.25.2 in /ui/desktop (#6375)
  fix: inconsistent API url requirement between desktop and CLI versions (#6419)
  feat(vertexai): Add streaming support (#6409)
  fix deeplink recipe launch cold start (#6210)
  Spell check setting (#6446)
  File bug directly (#6413)
  fix(cli): incorrect bin name in shell completions (#6444)
  Use crunchy from crates instead of git fork (#6415)
  ...
wpfleger96 added a commit that referenced this pull request Jan 13, 2026
* main: (41 commits)
  Allow customizing the new line keybinding in the CLI (#5956)
  Ask for permission in the CLI (#6475)
  docs: add Ralph Loop tutorial for multi-model iterative development (#6455)
  Remove gitignore fallback from gooseignore docs (#6480)
  fix: clean up result recording for code mode (#6343)
  fix(code_execution): handle model quirks with tool calls (#6352)
  feat(ui): support prefersBorder option for MCP Apps (#6465)
  fixed line breaks (#6459)
  Use Intl.NumberFormat for token formatting in SessionsInsights (#6466)
  feat(ui): format large and small token counts for readability (#6449)
  fix: apply subrecipes when using slash commands (#6460)
  Fix: exclude platform_schedule_tool in CLI (#6442)
  Fix: Small update in how ML-based prompt injection determines final result (#6439)
  docs: remove SSE transport and rename to Streamable HTTP (#6319)
  fix: correct Cloudinary extension command and env variable (#6453)
  fix: add gap between buttons in MacDesktopInstallButtons.js (#6452)
  refactor: include hidden dotfiles folders in file picker search (#6315)
  upgraded safe npm packages (#6450)
  chore(deps): bump react-router and react-router-dom in /ui/desktop (#6408)
  chore(deps): bump lru from 0.12.5 to 0.16.3 (#6379)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants