feat(developer_shell): Add `output_tmpfile` parameter to shell tool #2817

cgwalters · 2025-06-07T19:29:19Z

The goal is to lower the amount of tokens the LLM needs to parse for large commands like make and cargo build. Most of the time they only need stderr if they fail, and if they don't fail then there's no need to parse the output at all usually.

As noted in the text this takes an optional regular expression and for key tasks like cargo build I think a regex filter of ^(warning)|(error): works well here, avoiding a latency hit where the LLM may need to check the redirected output file for warnings.

I also changed things so that we don't have a hardcoded command output limit, but instead we just automatically redirect to a tempfile anyways if it's large.

This is my first contribution to this project, but looking forward to more!

Benchmarking

Of course I asked Goose to benchmark itself this way by comparing token results from a build of itself when using output_tmpfile vs not. It claimed it worked with an up to 10x reduction in processed tokens.

But I didn't do serious benchmarking, and as I've been playing with things more I think I'm getting throttled by Gemini...p95% latency there in the console shot up to 60+ seconds (maybe it's just me?). My next step here is going to be playing with local models.

cgwalters · 2025-06-07T19:33:27Z

crates/goose-mcp/src/developer/mod.rs

                    n = stdout_reader.read_until(b'\n', &mut stdout_buf), if !stdout_done => {
                        if n? == 0 {
                            stdout_done = true;
-                        } else {


I think this was actually a pre-existing race condition where basically we could lose the last hunk of a process output.

This is a giant problem in general with select!. What would make this code a lot more elegant I think is to iterate over a stream - https://docs.rs/tokio-stream/0.1.17/tokio_stream/wrappers/struct.LinesStream.html

And then we'd unify those two streams into a single stream of enum Message { Stdout(String), Stderr(String) } which is one pattern recommended by https://blog.yoshuawuyts.com/futures-concurrency-3 - we then just drive that stream to completion, and don't need to carry these buffers.

OK I went ahead and did this and yeah I think the result is a lot cleaner!

The goal is to lower the amount of tokens the LLM needs to parse for large commands like `make` and `cargo build`. Most of the time they only need stderr if they fail, and if they *don't* fail then there's no need to parse the output at all usually. As noted in the text this takes an optional regular expression and for key tasks like `cargo build` I think a regex filter of `^(warning)|(error):` works well here, avoiding a latency hit where the LLM may need to check the redirected output file for warnings. I also changed things so that we don't have a hardcoded command output limit, but instead we just automatically redirect to a tempfile anyways if it's large. Finally as part of the implmentation of this I cleaned up the way we read from stdout/stderr to greatly deduplicate code by iterating over a stream. Signed-off-by: Colin Walters <walters@verbum.org>

michaelneale · 2025-07-31T22:08:17Z

@cgwalters yeah missed this one somehow - yeah something like this coudl work well, very nice. It is clear goose gobbles up tokens like candy.

I have this change here: #3750 which doesnt' require any tool signature change - and is in line with how other things work (we can layer on smarts, I like the idea of looking for patterns and rules and heuristics - eg stack traces etc).

related to this, we have a similar issue with text_view_file - which can easily swallow too much vs pushing back (that probably does need a signature change). In general having the tools return feedback vs up front prompting I have found can work quite well and doesn't add work to the LLM up front to work out what to do when (ie if something is large - you can tell it at that pointk with instructions on how to not hit that, or if you want to over ride).

If you are interested in updating this branch would happily take a look, I think we really need things like think you are totally right.

cgwalters · 2025-07-31T23:48:10Z

Thanks for looking at this. The out of the box developer_shell and other core MCP tools is IMO really key and sensitive part of what tools like goose provide (in combination with the model).

Probably what we should do is actually start with a survey of what similar tools provide (e.g. from Gemini CLI, Claude Code, Cursor etc). Just as one example, Gemini's edit tool supports specifying the number of matches https://github.com/google-gemini/gemini-cli/blob/6f7beb414cae67df59aea3f8a3e99389c3a82aec/packages/core/src/tools/edit.ts#L53 which I think is quite helpful for the model.

If you are interested in updating this branch would happily take a look, I think we really need things like think you are totally right.

Yes. And thanks for reviewing my PRs. I would like to advocate for more usage of Goose in my team and organization and it's helpful. Again I can't make hard promises but if I can demonstrate engagement it helps make the argument.

michaelneale · 2025-08-01T06:51:54Z

@cgwalters yeah - 100% agree on a survey, I think we should start from that and work backwards? I like the idea of multiple matches, and even separating out styles of editor (ie we can use editor models - which do ma ny powerful things IF you enable them - which cursor does for example, they may be a better approach than being too clever with editing, but then it is worth seeing what is out there and what works well)

@cgwalters also what is your nick on discord? (if on there)

cgwalters · 2025-08-01T12:34:50Z

@cgwalters also what is your nick on discord? (if on there)

I updated https://github.com/cgwalters/cgwalters/blob/main/README.md for this (it's colinwalters)

michaelneale · 2025-08-06T00:11:59Z

@cgwalters do you want to look at this again - or should we take another run at it given the subsequent changes? can focus on shell in one, and like you said, some work on editor in the others (I know editor view needs work so it can not choke on long lines - curious how others do it).

michaelneale · 2025-08-06T06:22:25Z

closing this for now - as things have happened in other PRs (but we have much more follow on work to do - but should do it in issue or discussion I think?)

cgwalters · 2025-08-06T15:51:27Z

Yes, #3750 basically ended up re-implementing this in a different way, right?

I think the logic here is more elegant though in that instead of buffering a giant string in memory and then writing it to a temporary file at the end, the code in this PR automatically switched to a temporary file while we were reading from the child process.

And actually now that I more carefully compare the two I see https://github.com/block/goose/pull/3750/files#r2257616339

I can try looking at re-doing this on main at some point.

Stream command output directly to a temp file when it exceeds 50KB, avoiding memory bloat for commands with huge output. Inspired by block/goose#2817. New features: - output_filter param: pass a regex (e.g. "^(warning|error):") to capture only matching lines inline while full output goes to file - Automatic GC: only keep 3 most recent output files to avoid polluting $HOME The agent is informed that Grep and Read can be used to analyze the temp files without triggering GC. Assisted-by: OpenCode (Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>

Stream command output directly to a temp file when it exceeds 50KB, avoiding memory bloat for commands with huge output. Inspired by block/goose#2817. New feature: - output_filter param: pass a regex (e.g. "^(warning|error):") to capture only matching lines inline while full output goes to file The agent is informed that Grep and Read can be used to analyze the temp files. UI now shows filtered output stats (e.g. "Filtered: 7 matches from 2.1 KB (1.9 KB omitted)") so users can see how much output was filtered and the total size. Assisted-by: OpenCode (Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>

Stream command output directly to a temp file when it exceeds 50KB, avoiding memory bloat for commands with huge output. Inspired by block/goose#2817. New feature: - output_filter param: pass a regex (e.g. "^(warning|error):") to capture only matching lines inline while full output goes to file Both filtered and non-filtered modes buffer in memory until the threshold is exceeded, avoiding unnecessary filesystem I/O for small output. The agent is informed that Grep and Read can be used to analyze the temp files. UI now shows filtered output stats (e.g. "Filtered: 7 matches from 2.1 KB (1.9 KB omitted)") so users can see how much output was filtered and the total size. Assisted-by: OpenCode (Claude Sonnet 4)

cgwalters commented Jun 7, 2025

View reviewed changes

cgwalters force-pushed the shell-output-tmpfile branch from f69017e to 748c48f Compare June 8, 2025 00:14

cgwalters mentioned this pull request Jun 8, 2025

fix: Don't break from consuming subprocess output in shell tool until both streams are done #2771

Merged

DOsinga requested a review from baxen July 7, 2025 10:12

DOsinga assigned baxen Jul 7, 2025

michaelneale self-assigned this Jul 31, 2025

michaelneale added status: in progress waiting labels Aug 6, 2025

michaelneale closed this Aug 6, 2025

cgwalters mentioned this pull request Aug 6, 2025

fix: don't return full shell output when very large #3750

Merged

cgwalters mentioned this pull request Aug 6, 2025

mcp/developer: Refactor to use tokio SplitStream #3894

Merged

cgwalters mentioned this pull request Jan 16, 2026

feat(bash): stream large output to tmpfile with filtering and GC anomalyco/opencode#8953

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(developer_shell): Add `output_tmpfile` parameter to shell tool #2817

feat(developer_shell): Add `output_tmpfile` parameter to shell tool #2817

Uh oh!

cgwalters commented Jun 7, 2025 •

edited

Loading

Uh oh!

cgwalters Jun 7, 2025

Uh oh!

cgwalters Jun 8, 2025

Uh oh!

michaelneale commented Jul 31, 2025 •

edited

Loading

Uh oh!

cgwalters commented Jul 31, 2025

Uh oh!

michaelneale commented Aug 1, 2025 •

edited

Loading

Uh oh!

cgwalters commented Aug 1, 2025

Uh oh!

michaelneale commented Aug 6, 2025

Uh oh!

michaelneale commented Aug 6, 2025

Uh oh!

cgwalters commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(developer_shell): Add output_tmpfile parameter to shell tool #2817

feat(developer_shell): Add output_tmpfile parameter to shell tool #2817

Uh oh!

Conversation

cgwalters commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarking

Uh oh!

cgwalters Jun 7, 2025

Choose a reason for hiding this comment

Uh oh!

cgwalters Jun 8, 2025

Choose a reason for hiding this comment

Uh oh!

michaelneale commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cgwalters commented Jul 31, 2025

Uh oh!

michaelneale commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cgwalters commented Aug 1, 2025

Uh oh!

michaelneale commented Aug 6, 2025

Uh oh!

michaelneale commented Aug 6, 2025

Uh oh!

cgwalters commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(developer_shell): Add `output_tmpfile` parameter to shell tool #2817

feat(developer_shell): Add `output_tmpfile` parameter to shell tool #2817

cgwalters commented Jun 7, 2025 •

edited

Loading

michaelneale commented Jul 31, 2025 •

edited

Loading

michaelneale commented Aug 1, 2025 •

edited

Loading