Skip to content

Conversation

@cgwalters
Copy link
Contributor

@cgwalters cgwalters commented Jun 7, 2025

The goal is to lower the amount of tokens the LLM needs to parse for large commands like make and cargo build. Most of the time they only need stderr if they fail, and if they don't fail then there's no need to parse the output at all usually.

As noted in the text this takes an optional regular expression and for key tasks like cargo build I think a regex filter of ^(warning)|(error): works well here, avoiding a latency hit where the LLM may need to check the redirected output file for warnings.

I also changed things so that we don't have a hardcoded command output limit, but instead we just automatically redirect to a tempfile anyways if it's large.


This is my first contribution to this project, but looking forward to more!

Benchmarking

Of course I asked Goose to benchmark itself this way by comparing token results from a build of itself when using output_tmpfile vs not. It claimed it worked with an up to 10x reduction in processed tokens.

But I didn't do serious benchmarking, and as I've been playing with things more I think I'm getting throttled by Gemini...p95% latency there in the console shot up to 60+ seconds (maybe it's just me?). My next step here is going to be playing with local models.

n = stdout_reader.read_until(b'\n', &mut stdout_buf), if !stdout_done => {
if n? == 0 {
stdout_done = true;
} else {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was actually a pre-existing race condition where basically we could lose the last hunk of a process output.

This is a giant problem in general with select!. What would make this code a lot more elegant I think is to iterate over a stream - https://docs.rs/tokio-stream/0.1.17/tokio_stream/wrappers/struct.LinesStream.html

And then we'd unify those two streams into a single stream of enum Message { Stdout(String), Stderr(String) } which is one pattern recommended by https://blog.yoshuawuyts.com/futures-concurrency-3 - we then just drive that stream to completion, and don't need to carry these buffers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I went ahead and did this and yeah I think the result is a lot cleaner!

The goal is to lower the amount of tokens the LLM needs to parse for large
commands like `make` and `cargo build`. Most of the time they only need
stderr if they fail, and if they *don't* fail then there's no need to parse
the output at all usually.

As noted in the text this takes an optional regular expression and for
key tasks like `cargo build` I think a regex filter of `^(warning)|(error):`
works well here, avoiding a latency hit where the LLM may need
to check the redirected output file for warnings.

I also changed things so that we don't have a hardcoded command output
limit, but instead we just automatically redirect to a tempfile
anyways if it's large.

Finally as part of the implmentation of this I cleaned up the way
we read from stdout/stderr to greatly deduplicate code by iterating
over a stream.

Signed-off-by: Colin Walters <walters@verbum.org>
@michaelneale
Copy link
Collaborator

michaelneale commented Jul 31, 2025

@cgwalters yeah missed this one somehow - yeah something like this coudl work well, very nice. It is clear goose gobbles up tokens like candy.

I have this change here: #3750 which doesnt' require any tool signature change - and is in line with how other things work (we can layer on smarts, I like the idea of looking for patterns and rules and heuristics - eg stack traces etc).

related to this, we have a similar issue with text_view_file - which can easily swallow too much vs pushing back (that probably does need a signature change). In general having the tools return feedback vs up front prompting I have found can work quite well and doesn't add work to the LLM up front to work out what to do when (ie if something is large - you can tell it at that pointk with instructions on how to not hit that, or if you want to over ride).

If you are interested in updating this branch would happily take a look, I think we really need things like think you are totally right.

@michaelneale michaelneale self-assigned this Jul 31, 2025
@cgwalters
Copy link
Contributor Author

Thanks for looking at this. The out of the box developer_shell and other core MCP tools is IMO really key and sensitive part of what tools like goose provide (in combination with the model).

Probably what we should do is actually start with a survey of what similar tools provide (e.g. from Gemini CLI, Claude Code, Cursor etc). Just as one example, Gemini's edit tool supports specifying the number of matches https://github.com/google-gemini/gemini-cli/blob/6f7beb414cae67df59aea3f8a3e99389c3a82aec/packages/core/src/tools/edit.ts#L53 which I think is quite helpful for the model.

If you are interested in updating this branch would happily take a look, I think we really need things like think you are totally right.

Yes. And thanks for reviewing my PRs. I would like to advocate for more usage of Goose in my team and organization and it's helpful. Again I can't make hard promises but if I can demonstrate engagement it helps make the argument.

@michaelneale
Copy link
Collaborator

michaelneale commented Aug 1, 2025

@cgwalters yeah - 100% agree on a survey, I think we should start from that and work backwards? I like the idea of multiple matches, and even separating out styles of editor (ie we can use editor models - which do ma ny powerful things IF you enable them - which cursor does for example, they may be a better approach than being too clever with editing, but then it is worth seeing what is out there and what works well)

@cgwalters also what is your nick on discord? (if on there)

@cgwalters
Copy link
Contributor Author

@cgwalters also what is your nick on discord? (if on there)

I updated https://github.com/cgwalters/cgwalters/blob/main/README.md for this (it's colinwalters)

@michaelneale
Copy link
Collaborator

@cgwalters do you want to look at this again - or should we take another run at it given the subsequent changes? can focus on shell in one, and like you said, some work on editor in the others (I know editor view needs work so it can not choke on long lines - curious how others do it).

@michaelneale
Copy link
Collaborator

closing this for now - as things have happened in other PRs (but we have much more follow on work to do - but should do it in issue or discussion I think?)

@cgwalters
Copy link
Contributor Author

Yes, #3750 basically ended up re-implementing this in a different way, right?

I think the logic here is more elegant though in that instead of buffering a giant string in memory and then writing it to a temporary file at the end, the code in this PR automatically switched to a temporary file while we were reading from the child process.

And actually now that I more carefully compare the two I see https://github.com/block/goose/pull/3750/files#r2257616339

I can try looking at re-doing this on main at some point.

cgwalters added a commit to cgwalters/opencode that referenced this pull request Jan 16, 2026
Stream command output directly to a temp file when it exceeds 50KB,
avoiding memory bloat for commands with huge output. Inspired by
block/goose#2817.

New features:
- output_filter param: pass a regex (e.g. "^(warning|error):") to
  capture only matching lines inline while full output goes to file
- Automatic GC: only keep 3 most recent output files to avoid
  polluting $HOME

The agent is informed that Grep and Read can be used to analyze the
temp files without triggering GC.

Assisted-by: OpenCode (Opus 4.5)
Signed-off-by: Colin Walters <walters@verbum.org>
cgwalters added a commit to cgwalters/opencode that referenced this pull request Jan 16, 2026
Stream command output directly to a temp file when it exceeds 50KB,
avoiding memory bloat for commands with huge output. Inspired by
block/goose#2817.

New feature:
- output_filter param: pass a regex (e.g. "^(warning|error):") to
  capture only matching lines inline while full output goes to file

The agent is informed that Grep and Read can be used to analyze the
temp files.

UI now shows filtered output stats (e.g. "Filtered: 7 matches from
2.1 KB (1.9 KB omitted)") so users can see how much output was
filtered and the total size.

Assisted-by: OpenCode (Opus 4.5)
Signed-off-by: Colin Walters <walters@verbum.org>
cgwalters added a commit to cgwalters/opencode that referenced this pull request Jan 16, 2026
Stream command output directly to a temp file when it exceeds 50KB,
avoiding memory bloat for commands with huge output. Inspired by
block/goose#2817.

New feature:
- output_filter param: pass a regex (e.g. "^(warning|error):") to
  capture only matching lines inline while full output goes to file

Both filtered and non-filtered modes buffer in memory until the
threshold is exceeded, avoiding unnecessary filesystem I/O for
small output.

The agent is informed that Grep and Read can be used to analyze the
temp files.

UI now shows filtered output stats (e.g. "Filtered: 7 matches from
2.1 KB (1.9 KB omitted)") so users can see how much output was
filtered and the total size.

Assisted-by: OpenCode (Claude Sonnet 4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants