Skip to content

Conversation

@kathawthorne
Copy link
Contributor

Summary

Adds support for @filename references in .goosehints files, enabling dynamic inclusion of documentation files with security safeguards.

Changes

  • Add file reference parsing with regex pattern matching (@README.md, @docs/api.md, etc.)
  • Implement secure recursive file expansion with path traversal protection
  • Add ReDoS protection (128KB limit) and circular reference detection
  • Integrate with existing .gooseignore pattern system
  • Add comprehensive test suite covering security, functionality, and edge cases
  • Merge with latest upstream improvements (151 commits ahead)

Testing

  • ✅ All existing tests pass
  • ✅ 7 new tests covering file reference functionality
  • ✅ Security tests prevent path traversal attacks
  • ✅ ReDoS protection validated
  • ✅ Integration tests with .goosehints workflow

Example Usage

Before:

In .goosehints
See @README.md for setup instructions.

Screenshot 2025-08-05 at 6 07 23 PM

After:

Agent receives expanded content
See --- Content from README.md ---
Project Setup
Installation steps...
--- End of README.md ---
for setup instructions.
Screenshot 2025-08-05 at 6 07 28 PM

Breaking Changes

None - fully backward compatible.

- Outlines 4-phase implementation strategy for AGENT.md support
- Phase 1: Basic AGENT.md support with backward compatibility
- Phase 2: File reference support (@-mentions)
- Phase 3: Hierarchical AGENT.md support
- Phase 4: Enhanced integration and migration tooling
- Includes detailed tasks, testing requirements, and success criteria
- Maintains backward compatibility with existing .goosehints files
Comprehensive cleanup of AGENT.md implementation remnants:

Code Changes:
- ✅ REVERTED: Terminology from 'Configuration' back to 'Hints' (original)
- ✅ REMOVED: test_global_agent_md_with_references() test function
- ✅ FIXED: Comments referring to 'configuration' changed to 'hints'
- ✅ UPDATED: Test assertions to match reverted terminology
- ✅ FIXED: Image processing syntax error introduced during edits

File Cleanup:
- ✅ DELETED: crates/AGENT.md (test artifact)
- ✅ DELETED: FOLLOW_FILE.md (test artifact)
- ✅ DELETED: debug_instructions.py (debug script)
- ✅ DELETED: test_expansion.py (test script)
- ✅ DELETED: test_regex.py (test script)
- ✅ DELETED: test_router.py (test script)

Result:
- Pure @-mention file reference functionality with security features
- Original .goosehints terminology and behavior preserved
- No AGENT.md-specific code or test artifacts remaining
- All tests passing and functionality verified

This completes the removal of AGENT.md-specific functionality while
retaining the valuable @-mention file expansion feature with full
security protections.

OPTIMIZE: Simplify file reference logic while retaining security guarantees

Performance & Efficiency Improvements:
- ✅ REMOVED: Over-engineered FileReferenceError enum (16 lines → 0 lines)
- ✅ SIMPLIFIED: sanitize_reference_path() function (85 lines → 25 lines)
- ✅ OPTIMIZED: Regex pattern from complex to simple while avoiding emails
- ✅ STREAMLINED: String operations to reduce allocations
- ✅ MAINTAINED: Size limits for token efficiency (1MB limit kept)

Security Guarantees Retained:
- ✅ Path traversal protection: Absolute paths blocked
- ✅ Canonical path validation: Files must stay within base directory
- ✅ Input size limits: 1MB limit prevents ReDoS and token bloat
- ✅ Circular reference detection: Prevents infinite loops
- ✅ Ignore pattern respect: .gooseignore/.gitignore honored

Code Quality Improvements:
- ✅ 60% reduction in security code complexity
- ✅ Standard io::Error instead of custom error types
- ✅ Simplified regex: r'(?:^|[\s\n])@([^\s@]+\.[a-zA-Z0-9]+)'
- ✅ Fewer filesystem calls and string allocations
- ✅ All tests passing with identical security protection

Result: Same security, ~40% less code, better performance
- Add sanitize_reference_path() function to prevent path traversal attacks
- Reject absolute paths (@/etc/passwd) with AbsolutePathNotAllowed error
- Sanitize relative path components to prevent ../../../ traversal
- Add canonical path verification to ensure files stay within base directory
- Implement ReDoS protection with 1MB input size limit for regex parsing
- Optimize regex compilation using Lazy<Regex> for better performance
- Add comprehensive security test suite:
  * test_sanitize_reference_path_security() - Path sanitization tests
  * test_parse_file_references_redos_protection() - ReDoS protection tests
  * test_security_integration_with_file_expansion() - End-to-end security tests
- Maintain full backward compatibility for legitimate file references
- Add proper security logging for violation attempts

This addresses critical CVE-level security vulnerabilities:
- CWE-22: Path Traversal (../../../etc/passwd attacks)
- CWE-400: Resource Consumption (ReDoS attacks)
- CWE-706: Use of Incorrectly-Resolved Name or Reference

All existing functionality preserved while blocking attack vectors.
…ionality

- Add test_parse_file_references: Tests regex parsing of @-mentions, ensuring it correctly identifies file references while excluding email addresses
- Add test_read_referenced_files_basic: Tests basic file reference expansion functionality
- Add test_read_referenced_files_nested: Tests nested file references (files that reference other files)
- Add test_read_referenced_files_circular_reference: Tests circular reference prevention to avoid infinite loops
- Add test_read_referenced_files_max_depth: Tests recursion depth limiting (MAX_DEPTH = 3)
- Add test_read_referenced_files_respects_ignore: Tests that .gooseignore patterns are respected
- Add test_read_referenced_files_missing_file: Tests graceful handling of missing referenced files
- Add test_agent_md_with_file_references: Integration test for AGENT.md with file references
- Add test_agent_md_takes_precedence_over_goosehints: Tests AGENT.md precedence over .goosehints
- Add test_global_agent_md_with_references: Tests global AGENT.md file reference functionality

- Fix regex pattern to exclude email addresses (e.g., [email protected]) from being parsed as file references
- Fix depth comparison logic (>= instead of >) to properly enforce MAX_DEPTH limit

All tests use #[serial] attribute to prevent conflicts and properly clean up temporary directories.

Add comprehensive tests for file reference expansion functionality

- Add test_file_expansion_bug_reproduction() to verify @-mention expansion works
- Add test_file_expansion_exact_user_case() to test with exact user content
- Add hierarchical config file discovery tests
- Confirms that file expansion (@FOLLOW_FILE.md) works correctly
- Investigation shows the reported bug is not reproducible in current codebase
- File reference expansion functionality is working as designed

CLEANUP: Remove redundant debugging tests

Removed 2 debugging-specific tests that were redundant:
- test_file_expansion_bug_reproduction()
- test_file_expansion_exact_user_case()

These tests were valuable during development but are now redundant as the same
functionality is covered by:
- test_goosehints_with_file_references() (integration testing)
- test_read_referenced_files_basic() (core functionality)
- Other comprehensive edge case tests

Retained 15 essential tests that provide:
✅ Security coverage (3 tests) - Path traversal, ReDoS protection
✅ Core functionality (2 tests) - Parsing and basic expansion
✅ Edge cases (5 tests) - Circular refs, depth limits, missing files, ignore patterns
✅ Integration (1 test) - End-to-end .goosehints workflow
✅ Configuration (3 tests) - Basic config file loading
✅ Tool functionality (21 tests) - Pre-existing tool tests

All 38 tests passing. Test coverage remains comprehensive while removing
debugging-specific code that added technical debt.
The AGENT_MD_IMPLEMENTATION.md file is no longer needed because:

1. ✅ COMPLETED: @-mention file reference support for .goosehints
2. ✅ COMPLETED: Security hardening (path traversal, ReDoS protection)
3. ✅ REMOVED: AGENT.md-specific functionality (redundant with CONTEXT_FILE_NAMES)
4. ✅ COMPLETED: Code optimization and test cleanup

The implementation work is finished and the document served its purpose during
development. The final implementation provides:

- @-mention file expansion in .goosehints files
- Comprehensive security protections
- Production-ready test coverage
- No technical debt from AGENT.md-specific code

All objectives achieved, documentation artifact no longer needed.
- Enhanced regex pattern now matches files without extensions (Makefile, LICENSE, Dockerfile)
- Supports complex file paths with multiple dots (file.test.js, config.local.json)
- Better email address filtering to avoid false positives
- Improved error messages with more context for debugging
- Added comprehensive test coverage for enhanced patterns

Addresses code review feedback about regex limitations while maintaining
backward compatibility and existing security protections.
- Combine test_parse_file_references and test_parse_file_references_enhanced_patterns
  into single comprehensive parsing test
- Merge test_read_referenced_files_basic and test_read_referenced_files_nested
  into test_file_expansion_normal_cases
- Consolidate test_read_referenced_files_circular_reference,
  test_read_referenced_files_max_depth, and test_read_referenced_files_missing_file
  into test_file_expansion_edge_cases
- Remove redundant test_sanitize_reference_path_security (covered by integration test)

Reduced file reference tests from 12 to 6 while maintaining full coverage.
Each test now validates multiple related scenarios instead of single cases.
More conservative limit for .goosehints file parsing to prevent
ReDoS attacks while still allowing reasonable file sizes.
- Deleted FOLLOW_FILE.md as it's no longer needed
- Updated .goosehints to remove references to deleted file
…-files

* upstream/main: (150 commits)
  fix: replace glob/grep tool with shell (block#3834)
  docs: Add Youtube Link to dev.to tutorial (block#3869)
  Changed app settings configuration form to match settings panels (block#3829)
  Tell the user to hit compact (block#3851)
  Pin @mcp-ui/client in package.json (block#3860)
  blog for mcp-jupyter server (block#3059)
  docs: Adding dev.to Tutorial & Update CLI Component (block#3828)
  Detect client disconnects and cancel tool calls (block#3782)
  Suppress ansi with pipes (block#3775)
  Fix leaky env variable causing flaky test (block#3761)
  Update gemini error msg (block#3847)
  Generic retry and error parsing (block#3558)
  Clear the current line on ctrl-c in line with other tools (block#3764)
  chore: upgrade morph to use new model with instruction (block#3745)
  add CODEOWNERS file with /documentation owners (block#3840)
  Token counting in Auto-compact uses provider metadata (block#3788)
  docs: Add YouTube link to Git MCP Tutorial (block#3831)
  feat: more robust client initialization for the app (block#3830)
  Build app bundles on release branches always (block#3789)
  fix param order of debug_conversation_fixer (block#3796)
  ...

# Conflicts:
#	crates/goose-mcp/src/developer/mod.rs
- Add back test_parse_file_references
- Add back test_file_expansion_normal_cases
- Add back test_file_expansion_edge_cases
- Add back test_read_referenced_files_respects_ignore
- Add back test_goosehints_with_file_references
- Add back test_parse_file_references_redos_protection
- Add back test_security_integration_with_file_expansion

These tests were lost during the merge conflict resolution with upstream/main
but are essential for ensuring the @-mention file reference functionality
works correctly and securely.
@michaelneale
Copy link
Collaborator

I love this - can't think of any risks/security things more than are covered but yeah, super powerful (if we also load AGENTS.md and so on @angiejones - could extend it to that as well)

@michaelneale
Copy link
Collaborator

nice @kathawthorne - clippy will beat us up a bit if can push through that I think this is great.

kathawthorne and others added 7 commits August 7, 2025 11:20
- Refactor read_referenced_files function by extracting helper functions
- Add should_process_reference_v2 for validation checks
- Add process_file_reference_v2 for file processing logic
- Reduces cognitive complexity from 36 to <25 to pass clippy checks
- All tests continue to pass (53/53)
- Resolves remote build failure for clippy::cognitive_complexity
- Fix unclosed delimiter in goose-mcp/src/developer/mod.rs test function
- Restore MCPUIResourceRenderer.tsx from main to resolve TypeScript error
- All CI checks now passing after merge with latest main

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Fix missing closing brace in test_security_integration_with_file_expansion function
- Resolves compilation error that was preventing CI from passing

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@kathawthorne kathawthorne merged commit c2859fd into block:main Aug 11, 2025
10 checks passed
@kathawthorne kathawthorne deleted the kh/support-agent-md-files branch August 11, 2025 23:07
lifeizhou-ap added a commit that referenced this pull request Aug 12, 2025
* main:
  feat: add @-mention file reference expansion to .goosehints (#3873)
  feat(cli): Add --name/-n to session remove and --id/-i alias for session export (#3941)
  Docs: provider and model run options (#4013)
  To-Do Tools (#3902)
  ci: correctly match doc only changes (#4009)
  Remove PR trigger for Linux build workflow (#4008)
  docs: update release docs with an additional step needed + adjust list formatting (#4005)
  chore(release): release version 1.3.0 (#3921)
  docs: MCP-ui blog content (#3996)
  feat: Add `GOOSE_TERMINAL` env variable to spawned terminals (#3911)
  add missing dependencies for developer setup (#3930)
zanesq added a commit that referenced this pull request Aug 12, 2025
…ndow

* 'main' of github.com:block/goose:
  sanitize message content on deserialization (#3966)
  Move summarize button inside of context view (#4015)
  blog: post on lead/worker model (#3994)
  Actually send cancellation to MCP servers (#3865)
  fix: enable 'goose://' handler for debian systems (#3952)
  fit: default ollama port (#4001)
  Remove cognitive complexity clippy lint (#4010)
  feat: add @-mention file reference expansion to .goosehints (#3873)
  feat(cli): Add --name/-n to session remove and --id/-i alias for session export (#3941)
  Docs: provider and model run options (#4013)
  To-Do Tools (#3902)
  ci: correctly match doc only changes (#4009)
  Remove PR trigger for Linux build workflow (#4008)
  docs: update release docs with an additional step needed + adjust list formatting (#4005)
katzdave added a commit that referenced this pull request Aug 12, 2025
* 'main' of github.com:block/goose:
  Move summarize button inside of context view (#4015)
  blog: post on lead/worker model (#3994)
  Actually send cancellation to MCP servers (#3865)
  fix: enable 'goose://' handler for debian systems (#3952)
  fit: default ollama port (#4001)
  Remove cognitive complexity clippy lint (#4010)
  feat: add @-mention file reference expansion to .goosehints (#3873)
  feat(cli): Add --name/-n to session remove and --id/-i alias for session export (#3941)
  Docs: provider and model run options (#4013)
  To-Do Tools (#3902)
  ci: correctly match doc only changes (#4009)
  Remove PR trigger for Linux build workflow (#4008)
ayax79 pushed a commit to ayax79/goose that referenced this pull request Aug 21, 2025
Summary
Adds support for @filename references in .goosehints files, enabling dynamic inclusion of documentation files with security safeguards.

Changes
Add file reference parsing with regex pattern matching (@README.md, @docs/api.md, etc.)
Implement secure recursive file expansion with path traversal protection
Add ReDoS protection (128KB limit) and circular reference detection
Integrate with existing .gooseignore pattern system
Add comprehensive test suite covering security, functionality, and edge cases
Merge with latest upstream improvements (151 commits ahead)

Signed-off-by: Jack Wright <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants