-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add self-test recipe for goose validation #5111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
zanesq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Assuming this is cli only right?
Yes, this one is. But @DOsinga and I were talking about using playwright to test the desktop app. Will try to explore that in a subsequent PR |
|
cc @angiejones you might think this new feature is fun |
|
can you add it to the checklist @tlongwell-block that we run on a new release? |
…sion-streaming * 'main' of github.com:block/goose: (37 commits) Clear deeplinks after use (#5128) Revert "Fix gpt-5 input context limit (#4619)" (#5135) fix: missing cmake and protobuf for windows build, deduplicate sh/pws… (#5028) Fix bedrock tool input schema (#5064) Add self-test recipe for goose validation (#5111) fix: modifies openai request logic for reasoning models (#4221) (#4294) Fix race condition threat when set_param and set_secret of c… (#5109) Clean room implementation of the chat process (#5079) Bump rmcp (#5096) set version in an env variable for testing (#5100) fix : enhance fuzzy file search in goose desktop (#5071) Make async (#5126) docs: unlist tutorials for extensions with archived or moved servers (#5116) Add API Documentation Generator prompt (#5001) Add flag for enabling eleven labs voice dictation (#5095) force re-render fields to pick up custom params usage in instructions (#5112) Remove isUserInputDisabled (#5115) Improve Rust analysis output for `analyze` tool (#5072) Remove duplicate prepare_reply_context call (#5063) install react dev tools in development (#4979) ... # Conflicts: # ui/desktop/src/components/BaseChat2.tsx # ui/desktop/src/hooks/useChatStream.ts
* 'main' of github.com:block/goose: (49 commits) fixing video embed (#5171) chore: clean up random unused files (#5166) fix: adjust download_cli.sh to tolerate no OS variable (#5169) mcp tutorial page for firecrawl (#5152) Remove orphaned tool calls before compaction (#5059) feat: add copy as markdown button to documentation pages (#5158) chore: include vendored node executable (#5160) remove extra whitespace from message (#5159) Clear deeplinks after use (#5128) Revert "Fix gpt-5 input context limit (#4619)" (#5135) fix: missing cmake and protobuf for windows build, deduplicate sh/pws… (#5028) Fix bedrock tool input schema (#5064) Add self-test recipe for goose validation (#5111) fix: modifies openai request logic for reasoning models (#4221) (#4294) Fix race condition threat when set_param and set_secret of c… (#5109) Clean room implementation of the chat process (#5079) Bump rmcp (#5096) set version in an env variable for testing (#5100) fix : enhance fuzzy file search in goose desktop (#5071) Make async (#5126) ...
* main: (35 commits) fix: include apple silicon build of the desktop app in build artifacts (#5174) fixing video embed (#5171) chore: clean up random unused files (#5166) fix: adjust download_cli.sh to tolerate no OS variable (#5169) mcp tutorial page for firecrawl (#5152) Remove orphaned tool calls before compaction (#5059) feat: add copy as markdown button to documentation pages (#5158) chore: include vendored node executable (#5160) remove extra whitespace from message (#5159) Clear deeplinks after use (#5128) Revert "Fix gpt-5 input context limit (#4619)" (#5135) fix: missing cmake and protobuf for windows build, deduplicate sh/pws… (#5028) Fix bedrock tool input schema (#5064) Add self-test recipe for goose validation (#5111) fix: modifies openai request logic for reasoning models (#4221) (#4294) Fix race condition threat when set_param and set_secret of c… (#5109) Clean room implementation of the chat process (#5079) Bump rmcp (#5096) set version in an env variable for testing (#5100) fix : enhance fuzzy file search in goose desktop (#5071) ...
This PR introduces
goose-self-test.yaml, a meta-testing recipe that enables goose to validate its own capabilities through first-person integration testing.What is First-Person Integration Testing?
Traditional testing approaches rely on external test harnesses, unit tests, or integration suites that examine a system from the outside. This recipe takes a different approach: it has a running goose instance test itself using its own tools and capabilities.
This is meta-testing - the system under test is also the tester, examining its own behavior from within an active session. For an AI agent like goose, this approach offers unique insights into behavioral consistency and tool reliability that external testing cannot provide.
Primary Use Case: Goose Testing Goose
The most powerful application of this recipe is when goose itself is developing new goose features. A goose instance working on the codebase can:
cargo build --releaseThis creates a recursive development loop where goose can autonomously develop, test, and validate improvements to itself. The goose doing the development can examine test outputs, debug failures, and iterate on fixes - all while using the self-test recipe to validate each iteration.
How It Works
The self-test recipe guides goose through a structured validation process:
The recipe uses goose's own capabilities to create test scenarios, execute them, and validate the outcomes. Each test phase builds on the previous, creating a comprehensive assessment of functionality.
Design Principles
What Can Be Tested
From within a running session, goose can test:
What Cannot Be Tested
Certain aspects require external observation:
The recipe focuses on what's testable from within, providing meaningful validation of user-facing functionality.
Key Features
Flexible Execution
The recipe supports parameterized testing:
test_phases: Select specific test categories or run alltest_depth: Choose between quick, standard, or exhaustive testingparallel_tests: Enable/disable parallel test executionworkspace_dir: Specify test artifact locationSelf-Documenting
The test generates comprehensive reports:
Clean Artifacts
Test artifacts are organized in a single
gooseselftestdirectory, which is automatically added to.gitignoreto keep the repository clean.Why This Matters
Continuous Validation
Provides a standardized method to verify goose functionality across different:
Behavioral Testing
Unlike unit tests that verify code correctness, this tests actual agent behavior - crucial for AI systems where behavior can vary with context and model.
Meta-Cognitive Assessment
The successful completion of self-testing demonstrates goose's ability to:
Quality Assurance
Enables rapid validation after:
Initial Validation
The recipe has been successfully tested with:
Results from initial testing: