Skip to content

Conversation

@cgwalters
Copy link
Contributor

@cgwalters cgwalters commented Aug 6, 2025

I was working on a previous PR with a lot larger output set sizes and I think I was seeing the last chunk of output sometimes being lost.

That said I asked Gemini to try to reproduce the failure with a case and I couldn't get it to fail offhand. In looking at the original code here it looks OK...

Async rust is hard; tokio::select! in particular can cause subtle bugs. For a lot more on this see
https://blog.yoshuawuyts.com/futures-concurrency-3 and various previous and future blog entries.

Anyways this code is I think cleaner, deduplicating the code to send the notification message.

@michaelneale
Copy link
Collaborator

michaelneale commented Aug 7, 2025

@cgwalters very interesting - I know from past experience stderro/out for shell and commands can be tricky when on different platform and when running bundled in the electron app (signed) for macos - are you able to verify it works well in that scenario as well? (lost a day or 2 to that once, different ways to spawn). How can we validate that?

edit: seems fine to me - doesn't change how the streams are started.

@michaelneale michaelneale added p2 Priority 2 - Medium performance Performance related labels Aug 7, 2025
@michaelneale michaelneale self-assigned this Aug 7, 2025
@michaelneale michaelneale requested a review from DOsinga August 7, 2025 04:18
@michaelneale
Copy link
Collaborator

I do like this and does look better, I am sure there are latent bugs in the old code here so I think this one is worth a real close look, nice one @cgwalters - how has it been performing for you?

@michaelneale
Copy link
Collaborator

michaelneale commented Aug 7, 2025

this seems to work really well, hard to know what to look for but so far so good, and much much better than old code (and I think the dependency was already pulled in from another crate so no extra cost really). This could wipe out a few tricky bugs - and yes I think the old way could lose the end of a stream if there is no newline and it ends. @DOsinga I think this may be worth getting in ASAP

very nice @cgwalters

edit, if goose is right about this, very big:

1. Deadlock Scenario with tokio::select!

The old code could hang if one stream closes while the other still has data:

// Old code
loop {
    tokio::select! {
        n = stdout_reader.read_until(b'\n', &mut stdout_buf), if !stdout_done => {
            if n? == 0 {
                stdout_done = true;
            }
            // ... process stdout
        }
        
        n = stderr_reader.read_until(b'\n', &mut stderr_buf), if !stderr_done => {
            if n? == 0 {
                stderr_done = true;
            }
            // ... process stderr
        }
        
        else => break,
    }
}

@michaelneale michaelneale added p0 Priority 0 - Critical/Urgent and removed p2 Priority 2 - Medium labels Aug 7, 2025
I was working on a PR with a lot larger output set sizes
and I *think* I was seeing the last chunk of output sometimes being
lost.

That said I asked Gemini to try to reproduce the failure
with a case and I couldn't get it to fail offhand. In looking
at the original code here it looks OK...

Async rust is hard; `tokio::select!` in particular can cause
subtle bugs. For a lot more on this see
https://blog.yoshuawuyts.com/futures-concurrency-3
and various previous and future blog entries.

Anyways this code is I think cleaner, deduplicating the code
to send the notification message.

Signed-off-by: Colin Walters <[email protected]>
@cgwalters cgwalters force-pushed the developer-shell-tokio-cleanup branch from ee7b82c to 032f920 Compare August 7, 2025 13:31
@cgwalters
Copy link
Contributor Author

and yes I think the old way could lose the end of a stream if there is no newline and it ends.

You're right! I asked AI to generate a test case based on that and indeed it fails with the old code and passes with the new.

Copy link
Collaborator

@michaelneale michaelneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah have given this a good run and seems solid.

@michaelneale michaelneale merged commit 719b569 into block:main Aug 8, 2025
10 checks passed
katzdave added a commit that referenced this pull request Aug 8, 2025
* 'main' of github.com:block/goose:
  remove fallback routing to hub/home for unknown routes (#3954)
  Use cross in linux bundle workflow (#3950)
  fix: disable signing for release branches until we figure out keys for this flow (#3951)
  Sanitize Tags Unicode Block (#3920)
  Add a message about DCO to CONTRIBUTING.md (#3741)
  Move hardcoded LLM prompts to template files (#3934)
  docs: migrate streamable config to consolidated component (#3936)
  feat: streamline list args on cli (#3937)
  mcp/developer: Refactor to use tokio SplitStream (#3894)
  feat: first time automated ollama install experience and openrouter (#3881)
  chore: rmcp 0.5.0 (#3935)
  add gpt-5 to openai provider format (#3924)
  added gpt5 context limit (#3927)
  show status of osx codesigning and increase timeout (#3926)
  Bump auto-compact threshold to 80% (#3925)
  FIX: gemini tool call hanging (#3898)
  feat(deps): upgrade rmcp to 0.4.1 (#3918)
  Fix dark mode rendering of config form and centered providers grid for wider screens. (#3837)
  fix: extension list not refreshing after installing from deeplink (#3878)
katzdave added a commit that referenced this pull request Aug 8, 2025
* 'main' of github.com:block/goose:
  remove fallback routing to hub/home for unknown routes (#3954)
  Use cross in linux bundle workflow (#3950)
  fix: disable signing for release branches until we figure out keys for this flow (#3951)
  Sanitize Tags Unicode Block (#3920)
  Add a message about DCO to CONTRIBUTING.md (#3741)
  Move hardcoded LLM prompts to template files (#3934)
  docs: migrate streamable config to consolidated component (#3936)
  feat: streamline list args on cli (#3937)
  mcp/developer: Refactor to use tokio SplitStream (#3894)
  feat: first time automated ollama install experience and openrouter (#3881)
  chore: rmcp 0.5.0 (#3935)
  add gpt-5 to openai provider format (#3924)
  added gpt5 context limit (#3927)
  show status of osx codesigning and increase timeout (#3926)
  Bump auto-compact threshold to 80% (#3925)
ayax79 pushed a commit to ayax79/goose that referenced this pull request Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

p0 Priority 0 - Critical/Urgent performance Performance related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants