Skip to content

Conversation

@alexhancock
Copy link
Collaborator

@alexhancock alexhancock commented Oct 29, 2025

Two approaches to tests that will ensure MCP sampling continues to work (following the merge of #5367)

    1. mcp_integration_tests.rs - Make sure all codepaths related to sampling work correctly assuming the llm responds predictably. No assertions on content here, or integration with a real provider, just making sure the messages are handled properly.
    1. A new scripts/test_mcp.sh script, similar to scripts/test_providers.sh but which does two things:
    • Calls a tool that causes sampling to occur across a range of models
    • Uses a separate fast model as a judge to determine if the interaction went through and actually performed sampling.

@alexhancock alexhancock requested a review from jamadeo October 29, 2025 21:12
@alexhancock alexhancock marked this pull request as draft October 29, 2025 21:13
@alexhancock alexhancock removed the request for review from jamadeo October 29, 2025 21:13
@alexhancock alexhancock force-pushed the alexhancock/mcp-sampling-testing-approaches branch 5 times, most recently from f0d73d5 to dcb2259 Compare October 30, 2025 15:25
@alexhancock alexhancock requested a review from jamadeo October 30, 2025 15:28
@alexhancock alexhancock changed the title tests: testing approaches for mcp sampling chore(tests/mcp): testing for MCP sampling Oct 30, 2025
@alexhancock alexhancock requested a review from DOsinga October 30, 2025 15:29
@alexhancock alexhancock marked this pull request as ready for review October 30, 2025 15:38
@alexhancock alexhancock force-pushed the alexhancock/mcp-sampling-testing-approaches branch 2 times, most recently from 1cc48e1 to 5810586 Compare October 30, 2025 18:02
@alexhancock
Copy link
Collaborator Author

@jamadeo @DOsinga testing is now set up here for sampling.

Douwe and I discussed how we should work to merge test_providers.sh test_subrecipes.sh and test_mcp.sh into a unified approach of tests matrixed across the the providers/models with some ability to choose what runs where, but makes more sense in a followup.

@alexhancock alexhancock force-pushed the alexhancock/mcp-sampling-testing-approaches branch from 5810586 to 47b8945 Compare November 3, 2025 16:31
Copilot AI review requested due to automatic review settings November 3, 2025 16:31
@alexhancock alexhancock force-pushed the alexhancock/mcp-sampling-testing-approaches branch from 47b8945 to ca1b761 Compare November 3, 2025 16:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds MCP (Model Context Protocol) sampling test functionality to the Goose project. The changes enable testing of MCP sampling capabilities across multiple AI providers by validating that models can properly respond to sampling requests initiated through MCP tools.

Key changes include:

  • New test script test_mcp.sh that validates MCP sampling across multiple providers (Anthropic, Google, OpenRouter, OpenAI, Tetrate, Databricks)
  • MockProvider implementation for testing MCP sampling in integration tests without requiring actual API calls
  • Updated test replay files to reflect MCP sampling test results
  • Integration of the MCP test script into the CI/CD workflow

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
scripts/test_mcp.sh New bash script that tests MCP sampling across multiple providers using a judge model to validate responses
crates/goose/tests/mcp_integration_test.rs Adds MockProvider implementation and integrates it with ExtensionManager for MCP sampling tests
crates/goose/tests/mcp_replays/npx-y@modelcontextprotocol_server-everything Updated replay file showing successful sampleLLM tool invocation with Great Gatsby quote
crates/goose/tests/mcp_replays/npx-y@modelcontextprotocol_server-everything.results.json Updated results showing successful sampling response
crates/goose/tests/mcp_replays/uvxmcp-server-fetch Updated replay file with cleaner output (removed verbose setup logs)
crates/goose/tests/mcp_replays/uvxmcp-server-fetch.results.json Updated error message for fetch failure
crates/goose/tests/mcp_replays/cargorun--quiet-pgoose-server--bingoosed--mcpdeveloper Updated replay file with newer timestamps and different user environment
crates/goose/tests/mcp_replays/cargorun--quiet-pgoose-server--bingoosed--mcpdeveloper.results.json Updated list_windows output with more window items
.github/workflows/pr-smoke-test.yml Adds MCP test step to CI workflow with required API keys and configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@alexhancock alexhancock merged commit c1c1371 into main Nov 3, 2025
18 of 19 checks passed
katzdave added a commit that referenced this pull request Nov 3, 2025
…se into dkatz/manual-compact-fix-usage

* 'dkatz/manual-compact-fix-usage' of github.com:block/goose: (35 commits)
  Fix image processing (#5544)
  docs: AI attribution for PRs (#5547)
  chore(tests/mcp): testing for MCP sampling (#5456)
  docs: adding HOWTOAI.md (#5533)
  added configuration content, also added signoff, fix merging issue with another commit by creating a clean branch. removed and closed commits that caused signoff issues. (#5519)
  Fixes Gemini API parse issue by converting nullable type arrays to single types in tool schemas (#5530)
  Troubleshooting diagnostics doc (#5526)
  fix link to Ollama FAQ (#5531)
  docs: remove speech-mcp (#5514)
  fix: adds ProviderRetry to openai provider (#5518)
  docs: extensions directory minor updates (#5466)
  Docs/json recipe support (#5492)
  docs: recipe buttons (#5507)
  Improve system theme detection and fallback (#5427)
  [Autovisualiser] remove unnecessary content from mermaid HTML template (#5505)
  Improve subagents docs (#5484)
  FIX: prefer linux in WSL and add INSTALL_OS override for CLI (#5215)
  Propagate session ID in LLM and MCP requests (#5165)
  feat: YT Short for Canva MCP + goose (#5495)
  Change Recipes Test Script (#5457)
  ...
katzdave added a commit that referenced this pull request Nov 4, 2025
* 'main' of github.com:block/goose: (21 commits)
  Manual compaction counting fix + cli cleanup (#5480)
  chore(deps): bump prismjs and react-syntax-highlighter in /ui/desktop (#5549)
  fix: remove qwen3-coder from provider/mcp smoke tests (#5551)
  fix: do not build unsigned desktop app bundles on every PR in ci. add manual option. (#5550)
  fix: update Husky prepare script to v9 format (#5522)
  Fix 404 for responsible coding guide (#5543)
  fix hermit `text file busy` issues on linux (#5372)
  Fix image processing (#5544)
  docs: AI attribution for PRs (#5547)
  chore(tests/mcp): testing for MCP sampling (#5456)
  docs: adding HOWTOAI.md (#5533)
  added configuration content, also added signoff, fix merging issue with another commit by creating a clean branch. removed and closed commits that caused signoff issues. (#5519)
  Fixes Gemini API parse issue by converting nullable type arrays to single types in tool schemas (#5530)
  Troubleshooting diagnostics doc (#5526)
  fix link to Ollama FAQ (#5531)
  docs: remove speech-mcp (#5514)
  fix: adds ProviderRetry to openai provider (#5518)
  docs: extensions directory minor updates (#5466)
  Docs/json recipe support (#5492)
  docs: recipe buttons (#5507)
  ...
fbalicchia pushed a commit to fbalicchia/goose that referenced this pull request Nov 7, 2025
BlairAllan pushed a commit to BlairAllan/goose that referenced this pull request Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants