-
Notifications
You must be signed in to change notification settings - Fork 2.5k
fix: optimise reading large file content #3767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
katzdave
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice simple solution!
remove slop comment
|
You said the word "optimize" and you tagged me to review and this made me look at the surrounding code... so I did #3781 which is orthogonal to this but related. (The next optimization is to build up the Also all this clearly needs deduplication with the shell mcp |
* main: (34 commits) Token counting in Auto-compact uses provider metadata (#3788) docs: Add YouTube link to Git MCP Tutorial (#3831) feat: more robust client initialization for the app (#3830) Build app bundles on release branches always (#3789) fix param order of debug_conversation_fixer (#3796) Fix directory switcher not working in active chat sessions and file browser not defaulting to current session directory path (#3791) File completion in CLI (#3822) docs: Dynamic linux install buttons (#3810) tests: Add missing `#[serial]` to two tests (#3816) Chore: apply more clippy rules to prevent from code complexity (#3813) chore(mcp): Add helpers to parse parameters (#2821) feat: enable docusaurus respectPrefersColorScheme (#3746) fix session resume in new window (#3800) Add settings field documentation to recipe guides (#3809) chore(deps): bump on-headers and compression in /documentation (#3532) fix(ui): refresh provider related issues (#3385) feat: Add comprehensive Linux build support (#3673) developer: Optimize text_editor_view a bit (#3781) Override session name generator for ollama provider (#3710) docs: fix markdown for cognee tutorial (#3801) ...
|
ugh new clippy rules means this will have a lot of unrelated changes (but not functional changes) |
|
thanks @cgwalters @katzdave for feedback - it is a slightly larger appearing change as it simplifies the text_editor_view top level function (hard to see that in diffs, but if you see the code it looks simpler/neater) |
| } | ||
| }; | ||
| let lines: Vec<&str> = content.lines().collect(); | ||
| let total_lines = lines.len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be paranoid here and check the max line length of lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I wasn't sure how to break that down - as it is a problem (ie if you tell goose to look at your session jsonl files - will get heavy).
The issue then is - the view editor tool doesn't have a range that looks inside a line (so it would need to be enhanced) so either part of this change or a follow on to have it take a column range as well as row start and finish?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will have a follow on task to make it read by column too when needed
| }; | ||
| let mut content = String::new(); | ||
| f.read_to_string(&mut content) | ||
| .map_err(|e| ToolError::ExecutionError(format!("Failed to read file: {}", e)))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DOsinga this would return if it is a binary
* main: fix: replace glob/grep tool with shell (#3834) docs: Add Youtube Link to dev.to tutorial (#3869) Changed app settings configuration form to match settings panels (#3829) Tell the user to hit compact (#3851) Pin @mcp-ui/client in package.json (#3860) blog for mcp-jupyter server (#3059) docs: Adding dev.to Tutorial & Update CLI Component (#3828) Detect client disconnects and cancel tool calls (#3782) Suppress ansi with pipes (#3775)
…e-editable-displayable-title * upstream/main: (134 commits) fix: optimise reading large file content (block#3767) fix: replace glob/grep tool with shell (block#3834) docs: Add Youtube Link to dev.to tutorial (block#3869) Changed app settings configuration form to match settings panels (block#3829) Tell the user to hit compact (block#3851) Pin @mcp-ui/client in package.json (block#3860) blog for mcp-jupyter server (block#3059) docs: Adding dev.to Tutorial & Update CLI Component (block#3828) Detect client disconnects and cancel tool calls (block#3782) Suppress ansi with pipes (block#3775) Fix leaky env variable causing flaky test (block#3761) Update gemini error msg (block#3847) Generic retry and error parsing (block#3558) Clear the current line on ctrl-c in line with other tools (block#3764) chore: upgrade morph to use new model with instruction (block#3745) add CODEOWNERS file with /documentation owners (block#3840) Token counting in Auto-compact uses provider metadata (block#3788) docs: Add YouTube link to Git MCP Tutorial (block#3831) feat: more robust client initialization for the app (block#3830) Build app bundles on release branches always (block#3789) ...
* main: (33 commits) fix: optimise reading large file content (#3767) fix: replace glob/grep tool with shell (#3834) docs: Add Youtube Link to dev.to tutorial (#3869) Changed app settings configuration form to match settings panels (#3829) Tell the user to hit compact (#3851) Pin @mcp-ui/client in package.json (#3860) blog for mcp-jupyter server (#3059) docs: Adding dev.to Tutorial & Update CLI Component (#3828) Detect client disconnects and cancel tool calls (#3782) Suppress ansi with pipes (#3775) Fix leaky env variable causing flaky test (#3761) Update gemini error msg (#3847) Generic retry and error parsing (#3558) Clear the current line on ctrl-c in line with other tools (#3764) chore: upgrade morph to use new model with instruction (#3745) add CODEOWNERS file with /documentation owners (#3840) Token counting in Auto-compact uses provider metadata (#3788) docs: Add YouTube link to Git MCP Tutorial (#3831) feat: more robust client initialization for the app (#3830) Build app bundles on release branches always (#3789) ...
* 'main' of github.com:block/goose: Make the window title reflect what we are doing (#3883) additional metrics + Ui implementation (#3871) feat: Add session description editing functionality (#3819) Update filename in contributing docs (#3866) Fix voice dictation provider selection bug (#3862) doc: Update supported container runtimes (#3874) feat: add OAuth provider abstraction for CLI configuration (#3157) Don't ignore lockfiles on linux/windows builds (#3859) Use RMCP for StreamableHTTP OAuth support (#3845) Try to keep key order for Databricks (#3876) Fix OpenAI Provider with GitHub Models (#3875) Cmd click open finder (#3807) fix: recipe parameter form max height and not scrolling (#3879) fix: optimise reading large file content (#3767) fix: replace glob/grep tool with shell (#3834) docs: Add Youtube Link to dev.to tutorial (#3869)
* 'main' of github.com:block/goose: Make the window title reflect what we are doing (#3883) additional metrics + Ui implementation (#3871) feat: Add session description editing functionality (#3819) Update filename in contributing docs (#3866) Fix voice dictation provider selection bug (#3862) doc: Update supported container runtimes (#3874) feat: add OAuth provider abstraction for CLI configuration (#3157) Don't ignore lockfiles on linux/windows builds (#3859) Use RMCP for StreamableHTTP OAuth support (#3845) Try to keep key order for Databricks (#3876) Fix OpenAI Provider with GitHub Models (#3875) Cmd click open finder (#3807) fix: recipe parameter form max height and not scrolling (#3879) fix: optimise reading large file content (#3767) fix: replace glob/grep tool with shell (#3834) docs: Add Youtube Link to dev.to tutorial (#3869)
by default the text_editor_view tool will read any file (under 400KB) but this can be harmful even at smaller size in filling up the context at a high clip.
With this change it will encourage the agent to search the file and read in ranges IFF it is > 2000 lines long (below that works as of today). It works by letting the agent know on first call that it is a large file (if it gets no range) and that it should either search for content, or call back specifying the full range explicitly to load if it still wants to.
I picked "2000 lines" as a reasonable size (eg for a document or source code) on a human level that is large to cut off at.
This means that you can run a query like this:
which is a large file, and would normally significantly use up the context, but now is only sipping it:
Before this change it would be at least this usage:
and much worse for larger files (yet can now do the same thing)
cc @cgwalters for idea