Skip to content

Conversation

@dorien-koelemeijer
Copy link
Collaborator

@dorien-koelemeijer dorien-koelemeijer commented Jan 27, 2026

Summary

Simplifies the logic for combining tool and context confidence scores; the previous approach was slightly too aggressive in tuning down false positives by zeroing out confidence in certain cases. This refactor uses a more balanced threshold-based approach using dampening/boosting rules to reduce false positives.

We'll need to do some user testing to determine whether this approach is solid or whether we need to tweak this slightly later on.

Type of Change

  • Feature
  • Bug fix
  • Refactor / Code quality
  • Performance improvement
  • Documentation
  • Tests
  • Security fix
  • Build / Release
  • Other (specify below)

AI Assistance

  • This PR was created or reviewed with AI assistance

Testing

Quick local testing, but I can only test so much that way - will need broader user testing to determine if this is actually an improvement or if things need to be tweaked further.

Copilot AI review requested due to automatic review settings January 27, 2026 07:52
@dorien-koelemeijer dorien-koelemeijer changed the title Different approach to determining final confidence level Different approach to determining final confidence level [WIP] Jan 27, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts how the prompt-injection scanner combines the tool-level and conversation-context classifier outputs to derive a final security confidence score and logging details. The goal appears to be a more nuanced confidence-combination heuristic and richer logging while keeping the external ScanResult interface stable.

Changes:

  • Replace context-aware result selection (select_result_with_context_awareness) with a new numeric combination heuristic in combine_confidences, using both tool and context confidences.
  • Update logging in analyze_tool_call_with_context to structured fields, including per-signal confidences, presence of ML and pattern matches, and the effective malicious decision.
  • Build the final ScanResult from a synthetic DetailedScanResult that uses the combined confidence along with the tool’s pattern matches and ML confidence.

@dorien-koelemeijer dorien-koelemeijer changed the title Different approach to determining final confidence level [WIP] Different approach to determining final confidence level of prompt injection evaluation outcomes Jan 27, 2026
Copilot AI review requested due to automatic review settings January 27, 2026 08:27
@dorien-koelemeijer dorien-koelemeijer force-pushed the feat/improve-prompt-injection-final-confidence-determination branch from c098b4e to 95465f8 Compare January 27, 2026 08:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

Copy link
Collaborator

@michaelneale michaelneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems good

@dorien-koelemeijer dorien-koelemeijer merged commit 66e4a1a into main Jan 29, 2026
17 checks passed
@dorien-koelemeijer dorien-koelemeijer deleted the feat/improve-prompt-injection-final-confidence-determination branch January 29, 2026 00:41
michaelneale added a commit that referenced this pull request Jan 29, 2026
* main: (30 commits)
  Different approach to determining final confidence level of prompt injection evaluation outcomes (#6729)
  fix: read_resource_tool deadlock causing test_compaction to hang (#6737)
  Upgrade error handling (#6747)
  Fix/filter audience 6703 local (#6773)
  chore: re-sync package-lock.json (#6783)
  upgrade electron to 39.3.0 (#6779)
  allow skipping providers in test_providers.sh (#6778)
  fix: enable custom model entry for OpenRouter provider (#6761)
  Remove codex skills flag support (#6775)
  Improve mcp test (#6671)
  Feat/anthropic custom headers (#6774)
  Fix/GitHub copilot error handling 5845 (#6771)
  fix(ui): respect width parameter in MCP app size-changed notifications (#6376)
  fix: address compilation issue in main (#6776)
  Upgrade GitHub Actions for Node 24 compatibility (#6699)
  fix(google): preserve thought signatures in streaming responses (#6708)
  added reduce motion support for css animations and streaming text (#6551)
  fix: Re-enable subagents for Gemini models (#6513)
  fix(google): use parametersJsonSchema for full JSON Schema support (#6555)
  fix: respect GOOSE_CLI_MIN_PRIORITY for shell streaming output (#6558)
  ...
zanesq added a commit that referenced this pull request Jan 29, 2026
* 'main' of github.com:block/goose: (62 commits)
  Swap canonical model from openrouter to models.dev (#6625)
  Hook thinking status (#6815)
  Fetch new skills hourly (#6814)
  copilot instructions: Update "No prerelease docs" instruction (#6795)
  refactor: centralize audience filtering before providers receive messages (#6728)
  update doc to remind contributors to activate hermit and document minimal npm and node version (#6727)
  nit: don't spit out compaction when in term mode as it fills up the screen (#6799)
  fix: correct tool support detection in Tetrate provider model fetching (#6808)
  Session manager fixes (#6809)
  fix(desktop): handle quoted paths with spaces in extension commands (#6430)
  fix: we can default gooseignore without writing it out (#6802)
  fix broken link (#6810)
  docs: add Beads MCP extension tutorial (#6792)
  feat(goose): add support for AWS_BEARER_TOKEN_BEDROCK environment variable (#6739)
  [docs] Add OSS Skills Marketplace (#6752)
  feat: make skills available in codemode (#6763)
  Fix: Recipe Extensions Not Loading in Desktop (#6777)
  Different approach to determining final confidence level of prompt injection evaluation outcomes (#6729)
  fix: read_resource_tool deadlock causing test_compaction to hang (#6737)
  Upgrade error handling (#6747)
  ...
zanesq added a commit that referenced this pull request Jan 29, 2026
…sion-session

* 'main' of github.com:block/goose: (78 commits)
  copilot instructions: Update "No prerelease docs" instruction (#6795)
  refactor: centralize audience filtering before providers receive messages (#6728)
  update doc to remind contributors to activate hermit and document minimal npm and node version (#6727)
  nit: don't spit out compaction when in term mode as it fills up the screen (#6799)
  fix: correct tool support detection in Tetrate provider model fetching (#6808)
  Session manager fixes (#6809)
  fix(desktop): handle quoted paths with spaces in extension commands (#6430)
  fix: we can default gooseignore without writing it out (#6802)
  fix broken link (#6810)
  docs: add Beads MCP extension tutorial (#6792)
  feat(goose): add support for AWS_BEARER_TOKEN_BEDROCK environment variable (#6739)
  [docs] Add OSS Skills Marketplace (#6752)
  feat: make skills available in codemode (#6763)
  Fix: Recipe Extensions Not Loading in Desktop (#6777)
  Different approach to determining final confidence level of prompt injection evaluation outcomes (#6729)
  fix: read_resource_tool deadlock causing test_compaction to hang (#6737)
  Upgrade error handling (#6747)
  Fix/filter audience 6703 local (#6773)
  chore: re-sync package-lock.json (#6783)
  upgrade electron to 39.3.0 (#6779)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants