-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix(code_execution): handle model quirks with tool calls #6352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Auto-prefix unprefixed tool names (read_module -> code_execution__read_module) - Parse stringified JSON arrays in search_modules terms parameter - Add note that all tool calls are synchronous (no async/await) - Improve search_modules documentation to clarify array parameter format Fixes issues where some models (e.g. Nemotron) strip tool prefixes or stringify array parameters. Signed-off-by: rabi <ramishra@redhat.com>
| - WRONG: Multiple execute_code calls, each with one tool | ||
| - RIGHT: One execute_code call with a script that calls all needed tools | ||
| IMPORTANT: All tool calls are SYNCHRONOUS. Do NOT use async/await. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hrm - I guess this is true - are we sure?
Also, I wonder if there is some other way to enforce this than prompting, but I think this is is ok @alexhancock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by this part of the code @rabi . Isn't the builtin function we provide async, and therefor don't any calls the model writes to functions which call a tool need to be awaited?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems tools are exposed to the model as synchronous JavaScript functions via the Boa JS engine. create_tool_function wraps each tool using blocking_recv() which blocks until the async Rust handler responds. So, model-generated code I think should use const x = tool({...}) not await tool({...}).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense @rabi. I had misremembered. Merged
| - Multiple terms: terms=["git", "shell"] (a JSON array, NOT a string) | ||
| - Regex patterns: terms="sh.*", regex=true | ||
| IMPORTANT: Do NOT stringify arrays. Use terms=["a","b"] not terms="[\"a\",\"b\"]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have found saying NOT can use up attention and hurt performance, wonder if we need that so explicitly (if there is a way to feedback errors then that can help)
| tool_call: CallToolRequestParam, | ||
| cancellation_token: CancellationToken, | ||
| ) -> Result<ToolCallResult> { | ||
| // Some models strip the tool prefix, so auto-add it for known code_execution tools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this special casing for code execution as it needs to prefix it, but in other cases - they work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think code_execution tools (execute_code, read_module, search_modules) are the only tools the model calls directly when code_execution extension is enabled. Model is expected to call the tools with the extension prefix, but I suspect when all visible tools share the same prefix and the instruction is to use without prefix, some models strip them.
|
thanks @rabi - cc @alexhancock shoudl we do some benchmarks/checking with frontier models to make sure this is no worse (I am worryed about the use of "not" in the prompts as it consumes extra attention heads). |
michaelneale
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems a functional improvement, would like a +1 from @alexhancock though
alexhancock
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than the Q I asked
* main: (41 commits) Allow customizing the new line keybinding in the CLI (#5956) Ask for permission in the CLI (#6475) docs: add Ralph Loop tutorial for multi-model iterative development (#6455) Remove gitignore fallback from gooseignore docs (#6480) fix: clean up result recording for code mode (#6343) fix(code_execution): handle model quirks with tool calls (#6352) feat(ui): support prefersBorder option for MCP Apps (#6465) fixed line breaks (#6459) Use Intl.NumberFormat for token formatting in SessionsInsights (#6466) feat(ui): format large and small token counts for readability (#6449) fix: apply subrecipes when using slash commands (#6460) Fix: exclude platform_schedule_tool in CLI (#6442) Fix: Small update in how ML-based prompt injection determines final result (#6439) docs: remove SSE transport and rename to Streamable HTTP (#6319) fix: correct Cloudinary extension command and env variable (#6453) fix: add gap between buttons in MacDesktopInstallButtons.js (#6452) refactor: include hidden dotfiles folders in file picker search (#6315) upgraded safe npm packages (#6450) chore(deps): bump react-router and react-router-dom in /ui/desktop (#6408) chore(deps): bump lru from 0.12.5 to 0.16.3 (#6379) ...
* main: fix: require auth when running goose on non loopback address (#6478) chore(deps): bump hono from 4.11.3 to 4.11.4 in /ui/desktop (#6485) feat(cli): graceful fallback for keyring failures (#5808) fix: support global .gooseignore and negation patterns (#6157) docs: manual config for jetbrains (#6490) fix: Recipe slash command doesn't work with single optional parameter (#6235) fix(openrouter): Handle Gemini thoughtSignature for tool calls (#6370) docs: fix extensions page (#6484) Allow customizing the new line keybinding in the CLI (#5956) Ask for permission in the CLI (#6475) docs: add Ralph Loop tutorial for multi-model iterative development (#6455) Remove gitignore fallback from gooseignore docs (#6480) fix: clean up result recording for code mode (#6343) fix(code_execution): handle model quirks with tool calls (#6352) feat(ui): support prefersBorder option for MCP Apps (#6465) fixed line breaks (#6459) Use Intl.NumberFormat for token formatting in SessionsInsights (#6466) feat(ui): format large and small token counts for readability (#6449) fix: apply subrecipes when using slash commands (#6460)
Signed-off-by: rabi <ramishra@redhat.com> Signed-off-by: ThanhNguyxn <thanhnguyentuan2007@gmail.com>
…ased * 'main' of github.com:block/goose: fix(code_execution): serialize record_result output as JSON (#6495) perf(google): avoid accumulating thoughtSignatures across conversation history (#6462) fix(openai): make tool_call arguments optional and fix silent stream termination (#6309) fix: Improve error messages for invalid tool calls (#6483) fix: require auth when running goose on non loopback address (#6478) chore(deps): bump hono from 4.11.3 to 4.11.4 in /ui/desktop (#6485) feat(cli): graceful fallback for keyring failures (#5808) fix: support global .gooseignore and negation patterns (#6157) docs: manual config for jetbrains (#6490) fix: Recipe slash command doesn't work with single optional parameter (#6235) fix(openrouter): Handle Gemini thoughtSignature for tool calls (#6370) docs: fix extensions page (#6484) Allow customizing the new line keybinding in the CLI (#5956) Ask for permission in the CLI (#6475) docs: add Ralph Loop tutorial for multi-model iterative development (#6455) Remove gitignore fallback from gooseignore docs (#6480) fix: clean up result recording for code mode (#6343) fix(code_execution): handle model quirks with tool calls (#6352) feat(ui): support prefersBorder option for MCP Apps (#6465) fixed line breaks (#6459)
Signed-off-by: rabi <ramishra@redhat.com> Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Summary
Fixes issues where some models (e.g. Nemotron) strip tool prefixes or stringify array parameters.
Type of Change
AI Assistance
Testing
Tested with vLLM and Nemotron Hybrid MoE Model (nemotron-3-nano-30b-a3b)