Classifier model command injection detection in tool calls #6599

dorien-koelemeijer · 2026-01-21T01:19:03Z

Summary

This PR adds BERT-based command injection detection to complement the existing prompt injection detection system. The implementation introduces a dual-classifier architecture that can analyse both tool call commands (for command injection) and conversation history (for prompt injection) independently.

Changes:

Added ClassifierType enum to distinguish between command and prompt classifiers
Refactored UI to extract ClassifierEndpointInputs component and added command classifier settings
Only developer__shell tool commands are evaluated to prevent false positives from non-executable content (e.g., rm -rf appearing in code examples or test cases)

Type of Change

AI Assistance

This PR was created or reviewed with AI assistance

Testing

Local testing using just run-ui.
🚨 Need to do some final testing after cleanup before merging (in progress)

Notes

🚨 This PR shouldn't be merged until the SECURITY_ML_MODEL_MAPPING var in internal-releases has been updated (edit: this has now been merged, so should be all good).

Screenshots/Demos (for UX changes)

Prompt injection detection settings in UI have been updated (new toggle added):

Copilot

Pull request overview

This PR adds BERT-based command injection detection to complement the existing prompt injection detection system. The implementation introduces a dual-classifier architecture that can analyze both tool call commands (for command injection) and conversation history (for prompt injection) independently.

Changes:

Added ClassifierType enum to distinguish between command and prompt classifiers
Refactored UI to extract ClassifierEndpointInputs component and added command classifier settings
Extended ClassificationClient with from_model_type method to support model type-based lookups
Modified security alert messages to remove confidence percentage and add command previews

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File	Description
ui/desktop/src/components/settings/security/SecurityToggle.tsx	Refactored endpoint input UI into reusable component; added command classifier configuration section with toggle and endpoint inputs
crates/goose/src/security/scanner.rs	Introduced dual-classifier architecture with separate command and prompt classifiers; updated explanation builder to include command previews; modified logging messages
crates/goose/src/security/classification_client.rs	Added `model_type` field to `ModelEndpointInfo`; implemented `from_model_type` method for type-based model lookup; removed detailed error context and logging statements
crates/goose/src/security/security_inspector.rs	Simplified security alert message format by removing confidence percentage display

crates/goose/src/security/scanner.rs

ui/desktop/src/components/settings/security/SecurityToggle.tsx

crates/goose/src/security/security_inspector.rs

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

crates/goose/src/security/scanner.rs

ui/desktop/src/components/settings/security/SecurityToggle.tsx

crates/goose/src/security/scanner.rs

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

crates/goose/src/security/scanner.rs

Copilot · 2026-01-21T03:50:15Z

crates/goose/src/security/scanner.rs

    pub fn with_ml_detection() -> Result<Self> {
-        let classifier_client = Self::create_classifier_from_config()?;
+        let command_classifier = Self::create_classifier(ClassifierType::Command).ok();
+        let prompt_classifier = Self::create_classifier(ClassifierType::Prompt).ok();
+
+        if command_classifier.is_none() && prompt_classifier.is_none() {
+            anyhow::bail!("ML detection enabled but no classifiers could be initialized");
+        }
+
        Ok(Self {
            pattern_matcher: PatternMatcher::new(),
-            classifier_client: Some(classifier_client),
+            command_classifier,
+            prompt_classifier,
        })
    }


The command classifier functionality lacks test coverage. Consider adding tests that verify: 1) command classifier is used when enabled and available, 2) falls back to pattern matching when command classifier is not available, 3) the dual-classifier architecture works correctly with both classifiers enabled.

Copilot · 2026-01-21T03:50:15Z

crates/goose/src/security/classification_client.rs

+    pub fn from_model_type(model_type: &str, timeout_ms: Option<u64>) -> Result<Self> {
+        let mapping = serde_json::from_str::<ModelMappingConfig>(
+            &std::env::var("SECURITY_ML_MODEL_MAPPING")
+                .context("SECURITY_ML_MODEL_MAPPING environment variable not set")?,
+        )
+        .context("Failed to parse SECURITY_ML_MODEL_MAPPING JSON")?;
+
+        let (_, model_info) = mapping
+            .models
+            .iter()
+            .find(|(_, info)| info.model_type.as_deref() == Some(model_type))
+            .context(format!(
+                "No model with type '{}' found in SECURITY_ML_MODEL_MAPPING",
+                model_type
+            ))?;
+
+        Self::new(
+            model_info.endpoint.clone(),
+            timeout_ms,
+            None,
+            Some(model_info.extra_params.clone()),
+        )
+    }


The new from_model_type method lacks test coverage. Consider adding tests for: 1) successfully finding a model by type, 2) handling cases where no model with the specified type exists, 3) handling invalid JSON in SECURITY_ML_MODEL_MAPPING.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

crates/goose/src/security/scanner.rs

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

crates/goose/src/security/scanner.rs

ui/desktop/src/components/settings/security/SecurityToggle.tsx

crates/goose/src/security/classification_client.rs

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

crates/goose/src/security/classification_client.rs

crates/goose/src/security/scanner.rs

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

michaelneale

LGTM - didn't get a chance to hand test it. if you update to main soon it should hopefiully clear out the live failure.

…s buggy

…een prompt and command injection models

…command injection model

zanesq · 2026-01-26T23:38:38Z

hi @dorien-koelemeijer checking if you are still planning on merging?

…upport * origin/main: (79 commits) fix[format/openai]: return error on empty msg. (#6511) Fix: ElevenLabs API Key Not Persisting (#6557) Logging uplift for model training purposes (command injection model) [Small change] (#6330) fix(goose): only send agent-session-id when a session exists (#6657) BERT-based command injection detection in tool calls (#6599) chore: [CONTRIBUTING.md] add Hermit to instructions (#6518) fix: update Gemini context limits (#6536) Document r slash command (#6724) Upgrade GitHub Actions to latest versions (#6700) fix: Manual compaction does not update context window. (#6682) Removed the Acceptable Usage Policy (#6204) Document spellcheck toggle (#6721) fix: docs workflow cleanup and prevent cancellations (#6713) Docs: file bug directly (#6718) fix: dispatch ADD_ACTIVE_SESSION event before navigating from "View All" (#6679) Speed up Databricks provider init by removing fetch of supported models (#6616) fix: correct typos in documentation and Justfile (#6686) docs: frameDomains and baseUriDomains for mcp apps (#6684) docs: add Remotion video creation tutorial (#6675) docs: export recipe and copy yaml (#6680) ... # Conflicts: # ui/desktop/src/hooks/useChatStream.ts

…ovider * 'main' of github.com:block/goose: fix slash and @ keyboard navigation popover background color (#6550) fix[format/openai]: return error on empty msg. (#6511) Fix: ElevenLabs API Key Not Persisting (#6557) Logging uplift for model training purposes (command injection model) [Small change] (#6330) fix(goose): only send agent-session-id when a session exists (#6657) BERT-based command injection detection in tool calls (#6599) chore: [CONTRIBUTING.md] add Hermit to instructions (#6518) fix: update Gemini context limits (#6536) Document r slash command (#6724) Upgrade GitHub Actions to latest versions (#6700)

* 'main' of github.com:block/goose: Create default gooseignore file when missing (#6498) fix slash and @ keyboard navigation popover background color (#6550) fix[format/openai]: return error on empty msg. (#6511) Fix: ElevenLabs API Key Not Persisting (#6557) Logging uplift for model training purposes (command injection model) [Small change] (#6330) fix(goose): only send agent-session-id when a session exists (#6657) BERT-based command injection detection in tool calls (#6599) chore: [CONTRIBUTING.md] add Hermit to instructions (#6518) fix: update Gemini context limits (#6536) Document r slash command (#6724) Upgrade GitHub Actions to latest versions (#6700) fix: Manual compaction does not update context window. (#6682) Removed the Acceptable Usage Policy (#6204) Document spellcheck toggle (#6721) fix: docs workflow cleanup and prevent cancellations (#6713) Docs: file bug directly (#6718)

Copilot AI review requested due to automatic review settings January 21, 2026 01:19

Copilot started reviewing on behalf of dorien-koelemeijer January 21, 2026 01:19 View session

Copilot AI reviewed Jan 21, 2026

View reviewed changes

block deleted a comment from Copilot AI Jan 21, 2026

Copilot AI review requested due to automatic review settings January 21, 2026 02:37

Copilot started reviewing on behalf of dorien-koelemeijer January 21, 2026 02:37 View session

Copilot AI reviewed Jan 21, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings January 21, 2026 03:46

Copilot started reviewing on behalf of dorien-koelemeijer January 21, 2026 03:46 View session

Copilot AI reviewed Jan 21, 2026

View reviewed changes

dorien-koelemeijer force-pushed the feat/command-injection-detection-tool-calls branch from 8be782f to 8eade1b Compare January 21, 2026 04:22

Copilot AI review requested due to automatic review settings January 21, 2026 04:29

Copilot started reviewing on behalf of dorien-koelemeijer January 21, 2026 04:30 View session

Copilot AI reviewed Jan 21, 2026

View reviewed changes

crates/goose/src/security/scanner.rs Outdated Show resolved Hide resolved

Copilot AI review requested due to automatic review settings January 22, 2026 06:10

Copilot started reviewing on behalf of dorien-koelemeijer January 22, 2026 06:11 View session

Copilot AI reviewed Jan 22, 2026

View reviewed changes

crates/goose/src/security/scanner.rs Show resolved Hide resolved

ui/desktop/src/components/settings/security/SecurityToggle.tsx Show resolved Hide resolved

crates/goose/src/security/classification_client.rs Show resolved Hide resolved

dorien-koelemeijer changed the title ~~BERT-based command injection detection in tool calls [WIP]~~ BERT-based command injection detection in tool calls Jan 22, 2026

Copilot AI review requested due to automatic review settings January 22, 2026 07:53

Copilot started reviewing on behalf of dorien-koelemeijer January 22, 2026 07:54 View session

Copilot AI reviewed Jan 22, 2026

View reviewed changes

crates/goose/src/security/classification_client.rs Show resolved Hide resolved

crates/goose/src/security/scanner.rs Show resolved Hide resolved

crates/goose/src/security/scanner.rs Show resolved Hide resolved

dorien-koelemeijer force-pushed the feat/command-injection-detection-tool-calls branch from dbc571b to 189c6f0 Compare January 22, 2026 07:58

Copilot AI reviewed Jan 22, 2026

View reviewed changes

michaelneale approved these changes Jan 23, 2026

View reviewed changes

dorien-koelemeijer added 6 commits January 23, 2026 12:05

first commit - still messy, needs to be cleaned up

fec954d

save wip - fixes to be made, average confidence score determination i…

15cd7f2

…s buggy

Specify 'model_type' in model mapping dict so we can distinguish betw…

1b29b0a

…een prompt and command injection models

formatting changes (not sure why)

03632f7

Frontend change to allow OSS users to provide endpoint and token for …

5da2df0

…command injection model

mini cleanup

700f9a6

dorien-koelemeijer added 9 commits January 23, 2026 12:05

fix log levels

88f83da

minor fix - copilot comments

d28e74f

fmt

d79a6bd

Fix build failures

0aa9cf4

fix

b28ae56

Only evaluate developer__shell tool calls

a540bf2

small fixes

b13dfcb

revert some changes to make code review little easier

4bc98a7

more cleanup - not tested yet

028effb

dorien-koelemeijer force-pushed the feat/command-injection-detection-tool-calls branch from 189c6f0 to 028effb Compare January 23, 2026 02:11

dorien-koelemeijer merged commit 3dae127 into main Jan 27, 2026
19 checks passed

dorien-koelemeijer deleted the feat/command-injection-detection-tool-calls branch January 27, 2026 03:46

dorien-koelemeijer changed the title ~~BERT-based command injection detection in tool calls~~ Classifier command injection detection in tool calls Jan 27, 2026

dorien-koelemeijer changed the title ~~Classifier command injection detection in tool calls~~ Classifier model command injection detection in tool calls Jan 27, 2026

This was referenced Jan 29, 2026

chore(release): release version 1.22.0 (minor) #6812

Closed

chore(release): release version 1.22.0 (minor) #6813

Open

Classifier model command injection detection in tool calls #6599

Classifier model command injection detection in tool calls #6599

Uh oh!

Conversation

dorien-koelemeijer commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of Change

AI Assistance

Testing

Notes

Screenshots/Demos (for UX changes)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

michaelneale left a comment

Choose a reason for hiding this comment

Uh oh!

zanesq commented Jan 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dorien-koelemeijer commented Jan 21, 2026 •

edited

Loading