Skip to content

Conversation

@lifeizhou-ap
Copy link
Collaborator

@lifeizhou-ap lifeizhou-ap commented Jan 27, 2026

Summary

Why:
test_compaction smoke test hung infinitely, cancelled after 6 hours by GitHub Actions. This happened frequently now. https://github.com/block/goose/actions/workflows/pr-smoke-test.yml?query=is%3Acancelled

Root Cause:
The compact smoke test does not have developer extension enabled setup. Without developer extension, the LLM sometimes tried to use extensionmanager__read_resource to read a file (eg: read hello.txt). This caused a deadlock:

dispatch_tool_call("extensionmanager__read_resource")
└── Acquires LOCK on extensionmanager
└── read_resource_tool() iterates ALL extensions
└── Tries extensionmanager → needs same LOCK → 💀 DEADLOCK

The bug was intermittent due to HashMap's random iteration order - sometimes apps was checked first (works), sometimes extensionmanager was first (deadlock).

How to reproduce the problem

goose run --text "use the read_resource tool to read the resource with uri ui://apps/clock"

This does not guarantee the command always hangs (due to the order of extensions HashMap), but much more frequent than the compact smoke test.

Fix:
Only iterate extensions that actually support resources (matching existing list_resources() pattern)

Note
In current test_compaction.sh

(cd "$TESTDIR" && "$GOOSE_BIN" run --text "list files and read hello.txt" 2>&1) | tee "$OUTPUT"

We could change it later with something simpler without developer tool (eg: describe Melbourne AU in 100 words) as the test is mainly for compaction feature. However, I still leave the prompt as it is for now so that we can observe whether the problem is fixed.

Type of Change

  • Feature
  • Bug fix
  • Refactor / Code quality
  • Performance improvement
  • Documentation
  • Tests
  • Security fix
  • Build / Release
  • Other (specify below)

AI Assistance

  • This PR was created or reviewed with AI assistance

Testing

Manual testing with the command that reproduces the problem

@lifeizhou-ap lifeizhou-ap marked this pull request as ready for review January 28, 2026 05:45
Copilot AI review requested due to automatic review settings January 28, 2026 05:45
@lifeizhou-ap lifeizhou-ap changed the title debug fix: deadlock on read_resource tool Jan 28, 2026
{{extension.name}} supports resources, you can use platform__read_resource,
and platform__list_resources on this extension.
{{extension.name}} supports resources, you can use extensionmanager__read_resource,
and extensionmanager__list_resources on this extension.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, even with platform__read_resource in the system prompt , LLM is smart to call extensionmanager__read_resource tool. However, it is better to update with proper extension name.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical deadlock bug in the read_resource_tool method that caused the test_compaction smoke test to hang indefinitely. The deadlock occurred when the LLM tried to use extensionmanager__read_resource without specifying an extension name, causing the code to iterate through all extensions while holding a lock that was needed by the called methods.

Changes:

  • Fixed deadlock in read_resource_tool by collecting extension names before iterating and filtering to only resource-capable extensions
  • Corrected tool name documentation from incorrect platform__* to correct extensionmanager__* prefix

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
crates/goose/src/agents/extension_manager.rs Fixed deadlock by filtering extensions to only those supporting resources and collecting names before iteration, matching the pattern used in list_resources()
crates/goose/src/prompts/system.md Corrected tool names from platform__read_resource and platform__list_resources to extensionmanager__read_resource and extensionmanager__list_resources

@lifeizhou-ap lifeizhou-ap changed the title fix: deadlock on read_resource tool fix: read_resource_tool deadlock causing test_compaction to hang Jan 28, 2026
Copy link
Collaborator

@DOsinga DOsinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! nice work

* main: (47 commits)
  Upgrade error handling (#6747)
  Fix/filter audience 6703 local (#6773)
  chore: re-sync package-lock.json (#6783)
  upgrade electron to 39.3.0 (#6779)
  allow skipping providers in test_providers.sh (#6778)
  fix: enable custom model entry for OpenRouter provider (#6761)
  Remove codex skills flag support (#6775)
  Improve mcp test (#6671)
  Feat/anthropic custom headers (#6774)
  Fix/GitHub copilot error handling 5845 (#6771)
  fix(ui): respect width parameter in MCP app size-changed notifications (#6376)
  fix: address compilation issue in main (#6776)
  Upgrade GitHub Actions for Node 24 compatibility (#6699)
  fix(google): preserve thought signatures in streaming responses (#6708)
  added reduce motion support for css animations and streaming text (#6551)
  fix: Re-enable subagents for Gemini models (#6513)
  fix(google): use parametersJsonSchema for full JSON Schema support (#6555)
  fix: respect GOOSE_CLI_MIN_PRIORITY for shell streaming output (#6558)
  feat: add requires_auth flag for custom providers without authentication (#6705)
  fix: normalize extension names consistently in ExtensionManager (#6529)
  ...
Copilot AI review requested due to automatic review settings January 28, 2026 23:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

@lifeizhou-ap lifeizhou-ap merged commit d5bac14 into main Jan 29, 2026
24 checks passed
@lifeizhou-ap lifeizhou-ap deleted the lifei/debug-long-compact-test branch January 29, 2026 00:09
michaelneale added a commit that referenced this pull request Jan 29, 2026
* main: (30 commits)
  Different approach to determining final confidence level of prompt injection evaluation outcomes (#6729)
  fix: read_resource_tool deadlock causing test_compaction to hang (#6737)
  Upgrade error handling (#6747)
  Fix/filter audience 6703 local (#6773)
  chore: re-sync package-lock.json (#6783)
  upgrade electron to 39.3.0 (#6779)
  allow skipping providers in test_providers.sh (#6778)
  fix: enable custom model entry for OpenRouter provider (#6761)
  Remove codex skills flag support (#6775)
  Improve mcp test (#6671)
  Feat/anthropic custom headers (#6774)
  Fix/GitHub copilot error handling 5845 (#6771)
  fix(ui): respect width parameter in MCP app size-changed notifications (#6376)
  fix: address compilation issue in main (#6776)
  Upgrade GitHub Actions for Node 24 compatibility (#6699)
  fix(google): preserve thought signatures in streaming responses (#6708)
  added reduce motion support for css animations and streaming text (#6551)
  fix: Re-enable subagents for Gemini models (#6513)
  fix(google): use parametersJsonSchema for full JSON Schema support (#6555)
  fix: respect GOOSE_CLI_MIN_PRIORITY for shell streaming output (#6558)
  ...
zanesq added a commit that referenced this pull request Jan 29, 2026
* 'main' of github.com:block/goose: (62 commits)
  Swap canonical model from openrouter to models.dev (#6625)
  Hook thinking status (#6815)
  Fetch new skills hourly (#6814)
  copilot instructions: Update "No prerelease docs" instruction (#6795)
  refactor: centralize audience filtering before providers receive messages (#6728)
  update doc to remind contributors to activate hermit and document minimal npm and node version (#6727)
  nit: don't spit out compaction when in term mode as it fills up the screen (#6799)
  fix: correct tool support detection in Tetrate provider model fetching (#6808)
  Session manager fixes (#6809)
  fix(desktop): handle quoted paths with spaces in extension commands (#6430)
  fix: we can default gooseignore without writing it out (#6802)
  fix broken link (#6810)
  docs: add Beads MCP extension tutorial (#6792)
  feat(goose): add support for AWS_BEARER_TOKEN_BEDROCK environment variable (#6739)
  [docs] Add OSS Skills Marketplace (#6752)
  feat: make skills available in codemode (#6763)
  Fix: Recipe Extensions Not Loading in Desktop (#6777)
  Different approach to determining final confidence level of prompt injection evaluation outcomes (#6729)
  fix: read_resource_tool deadlock causing test_compaction to hang (#6737)
  Upgrade error handling (#6747)
  ...
zanesq added a commit that referenced this pull request Jan 29, 2026
…sion-session

* 'main' of github.com:block/goose: (78 commits)
  copilot instructions: Update "No prerelease docs" instruction (#6795)
  refactor: centralize audience filtering before providers receive messages (#6728)
  update doc to remind contributors to activate hermit and document minimal npm and node version (#6727)
  nit: don't spit out compaction when in term mode as it fills up the screen (#6799)
  fix: correct tool support detection in Tetrate provider model fetching (#6808)
  Session manager fixes (#6809)
  fix(desktop): handle quoted paths with spaces in extension commands (#6430)
  fix: we can default gooseignore without writing it out (#6802)
  fix broken link (#6810)
  docs: add Beads MCP extension tutorial (#6792)
  feat(goose): add support for AWS_BEARER_TOKEN_BEDROCK environment variable (#6739)
  [docs] Add OSS Skills Marketplace (#6752)
  feat: make skills available in codemode (#6763)
  Fix: Recipe Extensions Not Loading in Desktop (#6777)
  Different approach to determining final confidence level of prompt injection evaluation outcomes (#6729)
  fix: read_resource_tool deadlock causing test_compaction to hang (#6737)
  Upgrade error handling (#6747)
  Fix/filter audience 6703 local (#6773)
  chore: re-sync package-lock.json (#6783)
  upgrade electron to 39.3.0 (#6779)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants