Skip to content

Conversation

@michaelneale
Copy link
Collaborator

@michaelneale michaelneale commented Aug 1, 2025

by default the text_editor_view tool will read any file (under 400KB) but this can be harmful even at smaller size in filling up the context at a high clip.

With this change it will encourage the agent to search the file and read in ranges IFF it is > 2000 lines long (below that works as of today). It works by letting the agent know on first call that it is a large file (if it gets no range) and that it should either search for content, or call back specifying the full range explicitly to load if it still wants to.

I picked "2000 lines" as a reasonable size (eg for a document or source code) on a human level that is large to cut off at.

This means that you can run a query like this:

image

which is a large file, and would normally significantly use up the context, but now is only sipping it:

image

Before this change it would be at least this usage:

image

and much worse for larger files (yet can now do the same thing)

cc @cgwalters for idea

Copy link
Collaborator

@katzdave katzdave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simple solution!

remove slop comment
@michaelneale michaelneale changed the title fix: don't always read large file content fix: optimise reading large file content Aug 1, 2025
@michaelneale michaelneale added performance Performance related mcp MCP/Extension related labels Aug 1, 2025
@cgwalters
Copy link
Contributor

You said the word "optimize" and you tagged me to review and this made me look at the surrounding code... so I did #3781 which is orthogonal to this but related.

(The next optimization is to build up the Vec<&str> as we're reading the file and then we can apply line-based limits as we read instead of iterating again, and that's also where we'd want to do constraints on line length e.g. if desired)

Also all this clearly needs deduplication with the shell mcp

* main: (34 commits)
  Token counting in Auto-compact uses provider metadata (#3788)
  docs: Add YouTube link to Git MCP Tutorial (#3831)
  feat: more robust client initialization for the app (#3830)
  Build app bundles on release branches always (#3789)
  fix param order of debug_conversation_fixer (#3796)
  Fix directory switcher not working in active chat sessions and file browser not defaulting to current session directory path (#3791)
  File completion in CLI (#3822)
  docs: Dynamic linux install buttons (#3810)
  tests: Add missing `#[serial]` to two tests (#3816)
  Chore: apply more clippy rules to prevent from code complexity (#3813)
  chore(mcp): Add helpers to parse parameters (#2821)
  feat: enable docusaurus respectPrefersColorScheme (#3746)
  fix session resume in new window (#3800)
  Add settings field documentation to recipe guides (#3809)
  chore(deps): bump on-headers and compression in /documentation (#3532)
  fix(ui): refresh provider related issues (#3385)
  feat: Add comprehensive Linux build support (#3673)
  developer: Optimize text_editor_view a bit (#3781)
  Override session name generator for ollama provider (#3710)
  docs: fix markdown for cognee tutorial (#3801)
  ...
@michaelneale
Copy link
Collaborator Author

ugh new clippy rules means this will have a lot of unrelated changes (but not functional changes)

@michaelneale michaelneale requested a review from katzdave August 5, 2025 01:44
* main:
  chore: upgrade morph to use new model with instruction (#3745)
  add CODEOWNERS file with /documentation owners (#3840)
@michaelneale
Copy link
Collaborator Author

thanks @cgwalters @katzdave for feedback - it is a slightly larger appearing change as it simplifies the text_editor_view top level function (hard to see that in diffs, but if you see the code it looks simpler/neater)

}
};
let lines: Vec<&str> = content.lines().collect();
let total_lines = lines.len();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be paranoid here and check the max line length of lines

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I wasn't sure how to break that down - as it is a problem (ie if you tell goose to look at your session jsonl files - will get heavy).

The issue then is - the view editor tool doesn't have a range that looks inside a line (so it would need to be enhanced) so either part of this change or a follow on to have it take a column range as well as row start and finish?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will have a follow on task to make it read by column too when needed

};
let mut content = String::new();
f.read_to_string(&mut content)
.map_err(|e| ToolError::ExecutionError(format!("Failed to read file: {}", e)))?;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DOsinga this would return if it is a binary

* main:
  Fix leaky env variable causing flaky test (#3761)
  Update gemini error msg (#3847)
  Generic retry and error parsing (#3558)
  Clear the current line on ctrl-c in line with other tools (#3764)
* main:
  fix: replace glob/grep tool with shell (#3834)
  docs: Add Youtube Link to dev.to tutorial (#3869)
  Changed app settings configuration form to match settings panels (#3829)
  Tell the user to hit compact (#3851)
  Pin @mcp-ui/client in package.json (#3860)
  blog for mcp-jupyter server (#3059)
  docs: Adding dev.to Tutorial & Update CLI Component (#3828)
  Detect client disconnects and cancel tool calls (#3782)
  Suppress ansi with pipes (#3775)
@michaelneale michaelneale merged commit 8f54fa8 into main Aug 5, 2025
11 checks passed
@michaelneale michaelneale deleted the micn/large-file-read branch August 5, 2025 23:38
kathawthorne added a commit to kathawthorne/goose that referenced this pull request Aug 5, 2025
…e-editable-displayable-title

* upstream/main: (134 commits)
  fix: optimise reading large file content (block#3767)
  fix: replace glob/grep tool with shell (block#3834)
  docs: Add Youtube Link to dev.to tutorial (block#3869)
  Changed app settings configuration form to match settings panels (block#3829)
  Tell the user to hit compact (block#3851)
  Pin @mcp-ui/client in package.json (block#3860)
  blog for mcp-jupyter server (block#3059)
  docs: Adding dev.to Tutorial & Update CLI Component (block#3828)
  Detect client disconnects and cancel tool calls (block#3782)
  Suppress ansi with pipes (block#3775)
  Fix leaky env variable causing flaky test (block#3761)
  Update gemini error msg (block#3847)
  Generic retry and error parsing (block#3558)
  Clear the current line on ctrl-c in line with other tools (block#3764)
  chore: upgrade morph to use new model with instruction (block#3745)
  add CODEOWNERS file with /documentation owners (block#3840)
  Token counting in Auto-compact uses provider metadata (block#3788)
  docs: Add YouTube link to Git MCP Tutorial (block#3831)
  feat: more robust client initialization for the app (block#3830)
  Build app bundles on release branches always (block#3789)
  ...
michaelneale added a commit that referenced this pull request Aug 5, 2025
* main: (33 commits)
  fix: optimise reading large file content (#3767)
  fix: replace glob/grep tool with shell (#3834)
  docs: Add Youtube Link to dev.to tutorial (#3869)
  Changed app settings configuration form to match settings panels (#3829)
  Tell the user to hit compact (#3851)
  Pin @mcp-ui/client in package.json (#3860)
  blog for mcp-jupyter server (#3059)
  docs: Adding dev.to Tutorial & Update CLI Component (#3828)
  Detect client disconnects and cancel tool calls (#3782)
  Suppress ansi with pipes (#3775)
  Fix leaky env variable causing flaky test (#3761)
  Update gemini error msg (#3847)
  Generic retry and error parsing (#3558)
  Clear the current line on ctrl-c in line with other tools (#3764)
  chore: upgrade morph to use new model with instruction (#3745)
  add CODEOWNERS file with /documentation owners (#3840)
  Token counting in Auto-compact uses provider metadata (#3788)
  docs: Add YouTube link to Git MCP Tutorial (#3831)
  feat: more robust client initialization for the app (#3830)
  Build app bundles on release branches always (#3789)
  ...
michaelneale added a commit that referenced this pull request Aug 6, 2025
* main:
  Fix OpenAI Provider with GitHub Models (#3875)
  Cmd click open finder (#3807)
  fix: recipe parameter form max height and not scrolling (#3879)
  fix: optimise reading large file content (#3767)
  fix: replace glob/grep tool with shell (#3834)
  docs: Add Youtube Link to dev.to tutorial (#3869)
katzdave added a commit that referenced this pull request Aug 6, 2025
* 'main' of github.com:block/goose:
  Make the window title reflect what we are doing (#3883)
  additional metrics + Ui implementation (#3871)
  feat: Add session description editing functionality (#3819)
  Update filename in contributing docs (#3866)
  Fix voice dictation provider selection bug (#3862)
  doc: Update supported container runtimes (#3874)
  feat: add OAuth provider abstraction for CLI configuration (#3157)
  Don't ignore lockfiles on linux/windows builds (#3859)
  Use RMCP for StreamableHTTP OAuth support (#3845)
  Try to keep key order for Databricks (#3876)
  Fix OpenAI Provider with GitHub Models (#3875)
  Cmd click open finder (#3807)
  fix: recipe parameter form max height and not scrolling (#3879)
  fix: optimise reading large file content (#3767)
  fix: replace glob/grep tool with shell (#3834)
  docs: Add Youtube Link to dev.to tutorial (#3869)
katzdave added a commit that referenced this pull request Aug 6, 2025
* 'main' of github.com:block/goose:
  Make the window title reflect what we are doing (#3883)
  additional metrics + Ui implementation (#3871)
  feat: Add session description editing functionality (#3819)
  Update filename in contributing docs (#3866)
  Fix voice dictation provider selection bug (#3862)
  doc: Update supported container runtimes (#3874)
  feat: add OAuth provider abstraction for CLI configuration (#3157)
  Don't ignore lockfiles on linux/windows builds (#3859)
  Use RMCP for StreamableHTTP OAuth support (#3845)
  Try to keep key order for Databricks (#3876)
  Fix OpenAI Provider with GitHub Models (#3875)
  Cmd click open finder (#3807)
  fix: recipe parameter form max height and not scrolling (#3879)
  fix: optimise reading large file content (#3767)
  fix: replace glob/grep tool with shell (#3834)
  docs: Add Youtube Link to dev.to tutorial (#3869)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mcp MCP/Extension related performance Performance related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants