-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Classifier model command injection detection in tool calls #6599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Classifier model command injection detection in tool calls #6599
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds BERT-based command injection detection to complement the existing prompt injection detection system. The implementation introduces a dual-classifier architecture that can analyze both tool call commands (for command injection) and conversation history (for prompt injection) independently.
Changes:
- Added
ClassifierTypeenum to distinguish between command and prompt classifiers - Refactored UI to extract
ClassifierEndpointInputscomponent and added command classifier settings - Extended
ClassificationClientwithfrom_model_typemethod to support model type-based lookups - Modified security alert messages to remove confidence percentage and add command previews
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| ui/desktop/src/components/settings/security/SecurityToggle.tsx | Refactored endpoint input UI into reusable component; added command classifier configuration section with toggle and endpoint inputs |
| crates/goose/src/security/scanner.rs | Introduced dual-classifier architecture with separate command and prompt classifiers; updated explanation builder to include command previews; modified logging messages |
| crates/goose/src/security/classification_client.rs | Added model_type field to ModelEndpointInfo; implemented from_model_type method for type-based model lookup; removed detailed error context and logging statements |
| crates/goose/src/security/security_inspector.rs | Simplified security alert message format by removing confidence percentage display |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
| pub fn with_ml_detection() -> Result<Self> { | ||
| let classifier_client = Self::create_classifier_from_config()?; | ||
| let command_classifier = Self::create_classifier(ClassifierType::Command).ok(); | ||
| let prompt_classifier = Self::create_classifier(ClassifierType::Prompt).ok(); | ||
|
|
||
| if command_classifier.is_none() && prompt_classifier.is_none() { | ||
| anyhow::bail!("ML detection enabled but no classifiers could be initialized"); | ||
| } | ||
|
|
||
| Ok(Self { | ||
| pattern_matcher: PatternMatcher::new(), | ||
| classifier_client: Some(classifier_client), | ||
| command_classifier, | ||
| prompt_classifier, | ||
| }) | ||
| } |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The command classifier functionality lacks test coverage. Consider adding tests that verify: 1) command classifier is used when enabled and available, 2) falls back to pattern matching when command classifier is not available, 3) the dual-classifier architecture works correctly with both classifiers enabled.
| pub fn from_model_type(model_type: &str, timeout_ms: Option<u64>) -> Result<Self> { | ||
| let mapping = serde_json::from_str::<ModelMappingConfig>( | ||
| &std::env::var("SECURITY_ML_MODEL_MAPPING") | ||
| .context("SECURITY_ML_MODEL_MAPPING environment variable not set")?, | ||
| ) | ||
| .context("Failed to parse SECURITY_ML_MODEL_MAPPING JSON")?; | ||
|
|
||
| let (_, model_info) = mapping | ||
| .models | ||
| .iter() | ||
| .find(|(_, info)| info.model_type.as_deref() == Some(model_type)) | ||
| .context(format!( | ||
| "No model with type '{}' found in SECURITY_ML_MODEL_MAPPING", | ||
| model_type | ||
| ))?; | ||
|
|
||
| Self::new( | ||
| model_info.endpoint.clone(), | ||
| timeout_ms, | ||
| None, | ||
| Some(model_info.extra_params.clone()), | ||
| ) | ||
| } |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new from_model_type method lacks test coverage. Consider adding tests for: 1) successfully finding a model by type, 2) handling cases where no model with the specified type exists, 3) handling invalid JSON in SECURITY_ML_MODEL_MAPPING.
8be782f to
8eade1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
dbc571b to
189c6f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
michaelneale
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - didn't get a chance to hand test it. if you update to main soon it should hopefiully clear out the live failure.
…een prompt and command injection models
…command injection model
189c6f0 to
028effb
Compare
|
hi @dorien-koelemeijer checking if you are still planning on merging? |
…upport * origin/main: (79 commits) fix[format/openai]: return error on empty msg. (#6511) Fix: ElevenLabs API Key Not Persisting (#6557) Logging uplift for model training purposes (command injection model) [Small change] (#6330) fix(goose): only send agent-session-id when a session exists (#6657) BERT-based command injection detection in tool calls (#6599) chore: [CONTRIBUTING.md] add Hermit to instructions (#6518) fix: update Gemini context limits (#6536) Document r slash command (#6724) Upgrade GitHub Actions to latest versions (#6700) fix: Manual compaction does not update context window. (#6682) Removed the Acceptable Usage Policy (#6204) Document spellcheck toggle (#6721) fix: docs workflow cleanup and prevent cancellations (#6713) Docs: file bug directly (#6718) fix: dispatch ADD_ACTIVE_SESSION event before navigating from "View All" (#6679) Speed up Databricks provider init by removing fetch of supported models (#6616) fix: correct typos in documentation and Justfile (#6686) docs: frameDomains and baseUriDomains for mcp apps (#6684) docs: add Remotion video creation tutorial (#6675) docs: export recipe and copy yaml (#6680) ... # Conflicts: # ui/desktop/src/hooks/useChatStream.ts
…ovider * 'main' of github.com:block/goose: fix slash and @ keyboard navigation popover background color (#6550) fix[format/openai]: return error on empty msg. (#6511) Fix: ElevenLabs API Key Not Persisting (#6557) Logging uplift for model training purposes (command injection model) [Small change] (#6330) fix(goose): only send agent-session-id when a session exists (#6657) BERT-based command injection detection in tool calls (#6599) chore: [CONTRIBUTING.md] add Hermit to instructions (#6518) fix: update Gemini context limits (#6536) Document r slash command (#6724) Upgrade GitHub Actions to latest versions (#6700)
* 'main' of github.com:block/goose: Create default gooseignore file when missing (#6498) fix slash and @ keyboard navigation popover background color (#6550) fix[format/openai]: return error on empty msg. (#6511) Fix: ElevenLabs API Key Not Persisting (#6557) Logging uplift for model training purposes (command injection model) [Small change] (#6330) fix(goose): only send agent-session-id when a session exists (#6657) BERT-based command injection detection in tool calls (#6599) chore: [CONTRIBUTING.md] add Hermit to instructions (#6518) fix: update Gemini context limits (#6536) Document r slash command (#6724) Upgrade GitHub Actions to latest versions (#6700) fix: Manual compaction does not update context window. (#6682) Removed the Acceptable Usage Policy (#6204) Document spellcheck toggle (#6721) fix: docs workflow cleanup and prevent cancellations (#6713) Docs: file bug directly (#6718)
Summary
This PR adds BERT-based command injection detection to complement the existing prompt injection detection system. The implementation introduces a dual-classifier architecture that can analyse both tool call commands (for command injection) and conversation history (for prompt injection) independently.
Changes:
developer__shelltool commands are evaluated to prevent false positives from non-executable content (e.g.,rm -rfappearing in code examples or test cases)Type of Change
AI Assistance
Testing
Local testing using
just run-ui.🚨 Need to do some final testing after cleanup before merging (in progress)
Notes
🚨 This PR shouldn't be merged until the
SECURITY_ML_MODEL_MAPPINGvar in internal-releases has been updated (edit: this has now been merged, so should be all good).Screenshots/Demos (for UX changes)
Prompt injection detection settings in UI have been updated (new toggle added):