Skip to content

Conversation

@dorien-koelemeijer
Copy link
Collaborator

@dorien-koelemeijer dorien-koelemeijer commented Jan 21, 2026

Summary

This PR adds BERT-based command injection detection to complement the existing prompt injection detection system. The implementation introduces a dual-classifier architecture that can analyse both tool call commands (for command injection) and conversation history (for prompt injection) independently.

Changes:

  • Added ClassifierType enum to distinguish between command and prompt classifiers
  • Refactored UI to extract ClassifierEndpointInputs component and added command classifier settings
  • Only developer__shell tool commands are evaluated to prevent false positives from non-executable content (e.g., rm -rf appearing in code examples or test cases)

Type of Change

  • Feature
  • Bug fix
  • Refactor / Code quality
  • Performance improvement
  • Documentation
  • Tests
  • Security fix
  • Build / Release
  • Other (specify below)

AI Assistance

  • This PR was created or reviewed with AI assistance

Testing

Local testing using just run-ui.
🚨 Need to do some final testing after cleanup before merging (in progress)

Notes

🚨 This PR shouldn't be merged until the SECURITY_ML_MODEL_MAPPING var in internal-releases has been updated (edit: this has now been merged, so should be all good).

Screenshots/Demos (for UX changes)

Prompt injection detection settings in UI have been updated (new toggle added):

Screenshot 2026-01-21 at 12 41 37 pm Screenshot 2026-01-21 at 12 42 35 pm

Copilot AI review requested due to automatic review settings January 21, 2026 01:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds BERT-based command injection detection to complement the existing prompt injection detection system. The implementation introduces a dual-classifier architecture that can analyze both tool call commands (for command injection) and conversation history (for prompt injection) independently.

Changes:

  • Added ClassifierType enum to distinguish between command and prompt classifiers
  • Refactored UI to extract ClassifierEndpointInputs component and added command classifier settings
  • Extended ClassificationClient with from_model_type method to support model type-based lookups
  • Modified security alert messages to remove confidence percentage and add command previews

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File Description
ui/desktop/src/components/settings/security/SecurityToggle.tsx Refactored endpoint input UI into reusable component; added command classifier configuration section with toggle and endpoint inputs
crates/goose/src/security/scanner.rs Introduced dual-classifier architecture with separate command and prompt classifiers; updated explanation builder to include command previews; modified logging messages
crates/goose/src/security/classification_client.rs Added model_type field to ModelEndpointInfo; implemented from_model_type method for type-based model lookup; removed detailed error context and logging statements
crates/goose/src/security/security_inspector.rs Simplified security alert message format by removing confidence percentage display

@block block deleted a comment from Copilot AI Jan 21, 2026
Copilot AI review requested due to automatic review settings January 21, 2026 02:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Copilot AI review requested due to automatic review settings January 21, 2026 03:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Comment on lines 47 to 60
pub fn with_ml_detection() -> Result<Self> {
let classifier_client = Self::create_classifier_from_config()?;
let command_classifier = Self::create_classifier(ClassifierType::Command).ok();
let prompt_classifier = Self::create_classifier(ClassifierType::Prompt).ok();

if command_classifier.is_none() && prompt_classifier.is_none() {
anyhow::bail!("ML detection enabled but no classifiers could be initialized");
}

Ok(Self {
pattern_matcher: PatternMatcher::new(),
classifier_client: Some(classifier_client),
command_classifier,
prompt_classifier,
})
}
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command classifier functionality lacks test coverage. Consider adding tests that verify: 1) command classifier is used when enabled and available, 2) falls back to pattern matching when command classifier is not available, 3) the dual-classifier architecture works correctly with both classifiers enabled.

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +116
pub fn from_model_type(model_type: &str, timeout_ms: Option<u64>) -> Result<Self> {
let mapping = serde_json::from_str::<ModelMappingConfig>(
&std::env::var("SECURITY_ML_MODEL_MAPPING")
.context("SECURITY_ML_MODEL_MAPPING environment variable not set")?,
)
.context("Failed to parse SECURITY_ML_MODEL_MAPPING JSON")?;

let (_, model_info) = mapping
.models
.iter()
.find(|(_, info)| info.model_type.as_deref() == Some(model_type))
.context(format!(
"No model with type '{}' found in SECURITY_ML_MODEL_MAPPING",
model_type
))?;

Self::new(
model_info.endpoint.clone(),
timeout_ms,
None,
Some(model_info.extra_params.clone()),
)
}
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new from_model_type method lacks test coverage. Consider adding tests for: 1) successfully finding a model by type, 2) handling cases where no model with the specified type exists, 3) handling invalid JSON in SECURITY_ML_MODEL_MAPPING.

Copilot uses AI. Check for mistakes.
@dorien-koelemeijer dorien-koelemeijer force-pushed the feat/command-injection-detection-tool-calls branch from 8be782f to 8eade1b Compare January 21, 2026 04:22
Copilot AI review requested due to automatic review settings January 21, 2026 04:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Copilot AI review requested due to automatic review settings January 22, 2026 06:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

@dorien-koelemeijer dorien-koelemeijer changed the title BERT-based command injection detection in tool calls [WIP] BERT-based command injection detection in tool calls Jan 22, 2026
Copilot AI review requested due to automatic review settings January 22, 2026 07:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

@dorien-koelemeijer dorien-koelemeijer force-pushed the feat/command-injection-detection-tool-calls branch from dbc571b to 189c6f0 Compare January 22, 2026 07:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Collaborator

@michaelneale michaelneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - didn't get a chance to hand test it. if you update to main soon it should hopefiully clear out the live failure.

@dorien-koelemeijer dorien-koelemeijer force-pushed the feat/command-injection-detection-tool-calls branch from 189c6f0 to 028effb Compare January 23, 2026 02:11
@zanesq
Copy link
Collaborator

zanesq commented Jan 26, 2026

hi @dorien-koelemeijer checking if you are still planning on merging?

@dorien-koelemeijer dorien-koelemeijer merged commit 3dae127 into main Jan 27, 2026
19 checks passed
@dorien-koelemeijer dorien-koelemeijer deleted the feat/command-injection-detection-tool-calls branch January 27, 2026 03:46
@dorien-koelemeijer dorien-koelemeijer changed the title BERT-based command injection detection in tool calls Classifier command injection detection in tool calls Jan 27, 2026
@dorien-koelemeijer dorien-koelemeijer changed the title Classifier command injection detection in tool calls Classifier model command injection detection in tool calls Jan 27, 2026
zanesq added a commit that referenced this pull request Jan 27, 2026
…upport

* origin/main: (79 commits)
  fix[format/openai]: return error on empty msg. (#6511)
  Fix: ElevenLabs API Key Not Persisting (#6557)
  Logging uplift for model training purposes (command injection model) [Small change] (#6330)
  fix(goose): only send agent-session-id when a session exists (#6657)
  BERT-based command injection detection in tool calls (#6599)
  chore: [CONTRIBUTING.md] add Hermit to instructions (#6518)
  fix: update Gemini context limits (#6536)
  Document r slash command (#6724)
  Upgrade GitHub Actions to latest versions (#6700)
  fix: Manual compaction does not update context window. (#6682)
  Removed the Acceptable Usage Policy (#6204)
  Document spellcheck toggle (#6721)
  fix: docs workflow cleanup and prevent cancellations (#6713)
  Docs: file bug directly (#6718)
  fix: dispatch ADD_ACTIVE_SESSION event before navigating from "View All" (#6679)
  Speed up Databricks provider init by removing fetch of supported models (#6616)
  fix: correct typos in documentation and Justfile (#6686)
  docs: frameDomains and baseUriDomains for mcp apps (#6684)
  docs: add Remotion video creation tutorial (#6675)
  docs: export recipe and copy yaml (#6680)
  ...

# Conflicts:
#	ui/desktop/src/hooks/useChatStream.ts
katzdave added a commit that referenced this pull request Jan 27, 2026
…ovider

* 'main' of github.com:block/goose:
  fix slash and @ keyboard navigation popover background color (#6550)
  fix[format/openai]: return error on empty msg. (#6511)
  Fix: ElevenLabs API Key Not Persisting (#6557)
  Logging uplift for model training purposes (command injection model) [Small change] (#6330)
  fix(goose): only send agent-session-id when a session exists (#6657)
  BERT-based command injection detection in tool calls (#6599)
  chore: [CONTRIBUTING.md] add Hermit to instructions (#6518)
  fix: update Gemini context limits (#6536)
  Document r slash command (#6724)
  Upgrade GitHub Actions to latest versions (#6700)
zanesq added a commit that referenced this pull request Jan 27, 2026
* 'main' of github.com:block/goose:
  Create default gooseignore file when missing (#6498)
  fix slash and @ keyboard navigation popover background color (#6550)
  fix[format/openai]: return error on empty msg. (#6511)
  Fix: ElevenLabs API Key Not Persisting (#6557)
  Logging uplift for model training purposes (command injection model) [Small change] (#6330)
  fix(goose): only send agent-session-id when a session exists (#6657)
  BERT-based command injection detection in tool calls (#6599)
  chore: [CONTRIBUTING.md] add Hermit to instructions (#6518)
  fix: update Gemini context limits (#6536)
  Document r slash command (#6724)
  Upgrade GitHub Actions to latest versions (#6700)
  fix: Manual compaction does not update context window. (#6682)
  Removed the Acceptable Usage Policy (#6204)
  Document spellcheck toggle (#6721)
  fix: docs workflow cleanup and prevent cancellations (#6713)
  Docs: file bug directly (#6718)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants