Skip to content

Conversation

@tlongwell-block
Copy link
Collaborator

@tlongwell-block tlongwell-block commented Aug 6, 2025

🎯 Add To-Do Tools for Task Management in Goose

Summary

Implements session-scoped todo list functionality to help agents track and manage complex multi-step tasks. This feature provides two simple platform tools (todo__read and todo__write) that allow agents to maintain a working memory throughout their session.

Motivation

Agents often work on complex tasks that span multiple steps, files, or conversation turns. Without a way to track progress, agents may:

  • Forget to complete all requested steps
  • Lose context between operations
  • Struggle to communicate progress to users
  • Have difficulty resuming interrupted work

This PR addresses these issues by providing a simple, reliable task tracking mechanism.

Implementation Details

Core Changes

  • New module: crates/goose/src/agents/todo_tools.rs (~140 lines)

    • Thread-safe implementation using Arc<Mutex<String>>
    • Two tools: read (returns todo content) and write (replaces todo content)
    • Format-agnostic: agents decide structure (markdown, plain text, etc.)
    • Session-scoped: todo list exists only during agent lifetime
  • Integration: Minimal changes to crates/goose/src/agents/agent.rs (~20 lines)

    • Registers todo tools alongside existing platform tools
    • Follows established patterns exactly
  • System prompt: Added Task Management section to guide agent usage

    • Describes when and how to use todo tools
    • Provides markdown checkbox examples
    • Maintains consistent tone with existing documentation

Testing

  • Comprehensive test suite: crates/goose/tests/todo_tools_test.rs (11 tests)
    • Unit tests for tool creation and configuration
    • Integration tests for read/write operations
    • Edge cases: empty lists, large content (100KB), unicode/emoji
    • Concurrency testing with 10 simultaneous operations
    • All tests passing ✅

Design Decisions

  1. Minimal API: Just read/write operations (no append, update, delete)
  2. Complete replacement: Write replaces entire content for simplicity
  3. No persistence: Session-scoped by design
  4. No size limits: Agents self-regulate content
  5. Thread-safe: Proper synchronization for concurrent access

Usage Example

// Agent writes a todo list
todo__write(content: "- [ ] Review code\n- [ ] Run tests\n- [ ] Update docs")

// Agent reads current todos
todo__read() // Returns: "- [ ] Review code\n- [ ] Run tests\n- [ ] Update docs"

// Agent updates with progress
todo__write(content: "- [x] Review code\n- [ ] Run tests\n- [ ] Update docs")

Quality Checklist

  • All tests passing (11/11)
  • Code compiles without warnings
  • Formatted with cargo fmt
  • Clippy clean (./scripts/clippy-lint.sh)
  • Thread-safe implementation verified
  • Documentation added to system prompt
  • Follows existing Goose patterns

Impact

  • No breaking changes: Additive feature only
  • Minimal footprint: ~160 lines of new code
  • Zero dependencies: Uses only standard library
  • Backward compatible: Existing agents unaffected

Files Changed

  • crates/goose/src/agents/agent.rs - Register todo tools
  • crates/goose/src/agents/todo_tools.rs - Core implementation (new)
  • crates/goose/src/agents/mod.rs - Module export
  • crates/goose/src/prompts/system.md - Task Management section
  • crates/goose/tests/todo_tools_test.rs - Test suite (new)

This PR delivers a simple solution for task tracking that enhances agent capabilities without adding complexity or breaking existing functionality.

@michaelneale michaelneale self-assigned this Aug 6, 2025
@michaelneale
Copy link
Collaborator

michaelneale commented Aug 6, 2025

@tlongwell-block very nice - some ideas:

  • can we have it return an error if it is greater than a certain size (not sure what it is, but it should keep the list modest abd brief)
  • is it worth looking at how to throw it in the latest message each time (but only appended to last)
  • is it worth benchmarking a fine grained (vs blob) version?
  • have you tried it with the goose bench tests (or any rough benchmark or even casual A/B tests - would be good to know).

I like this version more than mine as it is in the right place

Copy link
Collaborator

@michaelneale michaelneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to make this granular (and cap number of tasks) vs a markdown - keeps the system prompt much smaller (can look at the old linked PR for what those signatures would look like). Would also be good to see before/after benchmarks to make sure doesn't degrade things (or even a casual A/B on the same task) - but otherwise I think this is good but would love @katzdave and @DOsinga opinion, I think we need this, but needs to be nice and lean, just one tool with simple actions (maybe the markdown blob approach is good in that sense)

bonus: is there an obvious way to tack on the list to each latest message as it is sent (but not kept in the session)?

@michaelneale michaelneale added status: in progress p1 Priority 1 - High (supports roadmap) performance Performance related labels Aug 7, 2025
@tlongwell-block
Copy link
Collaborator Author

@tlongwell-block very nice - some ideas:

  • can we have it return an error if it is greater than a certain size (not sure what it is, but it should keep the list modest abd brief)

Yes, though I wonder if perhaps amending the prompt to simply say something like "keep the todo list brief" would be sufficient. Having LLMs using the tool will self-limit its size. I will definitely set some tunable upper limit, though.

  • is it worth looking at how to throw it in the latest message each time (but only appended to last)

We did discuss this a bit in a chat earlier. I had some concerns around how to manage avoiding polluting the sessions file and/or context. We discussed either adding dynamically editing sessions to keep the to-do from being a part of every turn, or to simply transparently tack on the todo before submitted the session context to the LLM provider. Neither seemed... great.

@tlongwell-block
Copy link
Collaborator Author

  • is it worth benchmarking a fine grained (vs blob) version?

I don't think so. The blob is just as effective at conveying the information as a structured list, perhaps even more so since individual LLMs can format it as they please.

The lack of structure won't impact the LLM's ability to read the list. And it will definitely minimize the cognitive overhead required to use it.

@tlongwell-block
Copy link
Collaborator Author

Add Character Limit to Todo List Tool

Summary

Implements a configurable character-based size limit for the todo list tool to prevent unbounded growth and provide clear feedback to agents about usage.

Changes

Core Implementation

  • Added character limit validation in dispatch_tool_call for TODO_WRITE_TOOL_NAME

    • Validates content size while holding the lock (prevents race conditions)
    • Rejects writes that exceed the limit with clear error message
    • Returns character count on successful writes
  • Configuration via environment variable

    • GOOSE_TODO_MAX_CHARS with default of 50,000 characters
    • Setting to 0 disables the limit (unlimited)
  • Clean read responses

    • TODO_READ_TOOL_NAME returns pure content without metadata

Files Modified

  • crates/goose/src/agents/agent.rs: Added limit validation and helper function
  • crates/goose/tests/todo_tools_test.rs: Added comprehensive tests

Testing

Added tests for:

  • Character limit enforcement
  • Character count in write responses
  • Clean read responses (no metadata)
  • Unlimited mode with GOOSE_TODO_MAX_CHARS=0
  • Unicode character counting

Benefits

  • Prevents memory issues from unbounded todo list growth
  • Clear feedback to agents about size constraints
  • Thread-safe validation with proper lock ordering
  • Configurable via environment variable
  • Simple implementation with minimal code changes (~20 lines)

Configuration

Set the environment variable to customize the limit:

export GOOSE_TODO_MAX_CHARS=100000  # Allow up to 100k characters
export GOOSE_TODO_MAX_CHARS=0       # Disable limit (unlimited)

Default is 50,000 characters (~12,500 tokens).

Error Messages

When limit is exceeded:

"Todo list too large: 51234 chars (max: 50000)"

On successful write:

"Updated (8456 chars)"

Migration

  • No breaking changes for existing sessions
  • Todo lists are session-scoped and not persisted
  • Generous default limit unlikely to affect normal usage

@michaelneale
Copy link
Collaborator

yeah I wouldn't worry about making a variable for the limit - as long as there is one - say 2000 chars (and it should throw back an error if it tries to update one that is larger telling it to be briefer)

@michaelneale michaelneale removed their assignment Aug 8, 2025
Copy link
Collaborator

@katzdave katzdave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job, really excited about this change!

@tlongwell-block tlongwell-block merged commit 566b9dc into main Aug 11, 2025
11 checks passed
@tlongwell-block tlongwell-block deleted the tlongwell/todo branch August 11, 2025 19:57
lifeizhou-ap added a commit that referenced this pull request Aug 12, 2025
* main:
  feat: add @-mention file reference expansion to .goosehints (#3873)
  feat(cli): Add --name/-n to session remove and --id/-i alias for session export (#3941)
  Docs: provider and model run options (#4013)
  To-Do Tools (#3902)
  ci: correctly match doc only changes (#4009)
  Remove PR trigger for Linux build workflow (#4008)
  docs: update release docs with an additional step needed + adjust list formatting (#4005)
  chore(release): release version 1.3.0 (#3921)
  docs: MCP-ui blog content (#3996)
  feat: Add `GOOSE_TERMINAL` env variable to spawned terminals (#3911)
  add missing dependencies for developer setup (#3930)
zanesq added a commit that referenced this pull request Aug 12, 2025
…ndow

* 'main' of github.com:block/goose:
  sanitize message content on deserialization (#3966)
  Move summarize button inside of context view (#4015)
  blog: post on lead/worker model (#3994)
  Actually send cancellation to MCP servers (#3865)
  fix: enable 'goose://' handler for debian systems (#3952)
  fit: default ollama port (#4001)
  Remove cognitive complexity clippy lint (#4010)
  feat: add @-mention file reference expansion to .goosehints (#3873)
  feat(cli): Add --name/-n to session remove and --id/-i alias for session export (#3941)
  Docs: provider and model run options (#4013)
  To-Do Tools (#3902)
  ci: correctly match doc only changes (#4009)
  Remove PR trigger for Linux build workflow (#4008)
  docs: update release docs with an additional step needed + adjust list formatting (#4005)
katzdave added a commit that referenced this pull request Aug 12, 2025
* 'main' of github.com:block/goose:
  Move summarize button inside of context view (#4015)
  blog: post on lead/worker model (#3994)
  Actually send cancellation to MCP servers (#3865)
  fix: enable 'goose://' handler for debian systems (#3952)
  fit: default ollama port (#4001)
  Remove cognitive complexity clippy lint (#4010)
  feat: add @-mention file reference expansion to .goosehints (#3873)
  feat(cli): Add --name/-n to session remove and --id/-i alias for session export (#3941)
  Docs: provider and model run options (#4013)
  To-Do Tools (#3902)
  ci: correctly match doc only changes (#4009)
  Remove PR trigger for Linux build workflow (#4008)
ayax79 pushed a commit to ayax79/goose that referenced this pull request Aug 21, 2025
Co-authored-by: David Katz <[email protected]>
Signed-off-by: Jack Wright <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

p1 Priority 1 - High (supports roadmap) performance Performance related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants