Skip to content

Conversation

@alexhancock
Copy link
Collaborator

Noticing the qwen3-coder often produces failing test results in test_providers.sh an test_mcp.sh because it doesn't reliably call the tools we expect

Most recent example where it simply didn't call the sampleLLM tool when it is directly told to do so by the test script https://github.com/block/goose/actions/runs/19044249135/job/54389311803?pr=5550

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the test scripts to remove references to specific AI models that are no longer being tested. The changes remove qwen/qwen3-coder from the OpenRouter provider tests and replace it with gemini-2.5-flash in the MCP test script.

Key changes:

  • Removed qwen/qwen3-coder from OpenRouter provider tests in test_providers.sh
  • Replaced qwen/qwen3-coder with gemini-2.5-flash in OpenRouter provider tests in test_mcp.sh

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
scripts/test_providers.sh Removed qwen/qwen3-coder from the OpenRouter provider test configuration
scripts/test_mcp.sh Replaced qwen/qwen3-coder with gemini-2.5-flash for OpenRouter provider testing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"anthropic:claude-haiku-4-5-20251001"
"google:gemini-2.5-flash"
"openrouter:qwen/qwen3-coder"
"openrouter:gemini-2.5-flash"
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model identifier gemini-2.5-flash appears to be missing the provider prefix. Based on the pattern in line 18 of this file (google/gemini-2.5-flash) and line 18 of test_providers.sh (google/gemini-2.5-flash), OpenRouter model identifiers should include the provider namespace. This should likely be google/gemini-2.5-flash to be consistent with OpenRouter's model naming convention.

Suggested change
"openrouter:gemini-2.5-flash"
"openrouter:google/gemini-2.5-flash"

Copilot uses AI. Check for mistakes.
@alexhancock alexhancock force-pushed the alexhancock/smoke-tests-remove-qwen-3 branch from 93e6879 to 956d22c Compare November 3, 2025 18:20
@alexhancock alexhancock force-pushed the alexhancock/smoke-tests-remove-qwen-3 branch from 956d22c to 0cb85b6 Compare November 3, 2025 18:24
Copilot AI review requested due to automatic review settings November 3, 2025 18:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@alexhancock alexhancock merged commit 38e7dc8 into main Nov 3, 2025
20 checks passed
@michaelneale
Copy link
Collaborator

@alexhancock are you sure it isn't a regression, as it was I thought reliably working. The aim of these is to catch drift in Goose so fi it adds something that breaks popular models then we have a regression.

(I think this is https://openrouter.ai/qwen/qwen3-coder model?) - if it doesn't work for this, we have regressed something in goose.

katzdave added a commit that referenced this pull request Nov 4, 2025
* 'main' of github.com:block/goose: (21 commits)
  Manual compaction counting fix + cli cleanup (#5480)
  chore(deps): bump prismjs and react-syntax-highlighter in /ui/desktop (#5549)
  fix: remove qwen3-coder from provider/mcp smoke tests (#5551)
  fix: do not build unsigned desktop app bundles on every PR in ci. add manual option. (#5550)
  fix: update Husky prepare script to v9 format (#5522)
  Fix 404 for responsible coding guide (#5543)
  fix hermit `text file busy` issues on linux (#5372)
  Fix image processing (#5544)
  docs: AI attribution for PRs (#5547)
  chore(tests/mcp): testing for MCP sampling (#5456)
  docs: adding HOWTOAI.md (#5533)
  added configuration content, also added signoff, fix merging issue with another commit by creating a clean branch. removed and closed commits that caused signoff issues. (#5519)
  Fixes Gemini API parse issue by converting nullable type arrays to single types in tool schemas (#5530)
  Troubleshooting diagnostics doc (#5526)
  fix link to Ollama FAQ (#5531)
  docs: remove speech-mcp (#5514)
  fix: adds ProviderRetry to openai provider (#5518)
  docs: extensions directory minor updates (#5466)
  Docs/json recipe support (#5492)
  docs: recipe buttons (#5507)
  ...
wpfleger96 added a commit that referenced this pull request Nov 4, 2025
* main: (85 commits)
  improve linux tray icon support (#5425)
  feat: log rotation (#5561)
  use app.isPackaged instead of checking for node env development (#5465)
  disable RPM build-ID generation to prevent package conflicts (#5563)
  Add Diagnostics Info to Q&A and Bug Report Templates (#5565)
  fix: improve server error messages to include HTTP status code (#5532)
  improvement: add useful error message when attempting to use unauthenticated cursor-agent (#5300)
  fix: unblock acp via databricks (#5562)
  feat: add --output-format json flag to goose run command (#5525)
  Sessions required (#5548)
  feat: add grouped extension loading notification (#5529)
  we should run this on main and also test open models at least via ope… (#5556)
  info: print location of sessions.db via goose info (#5557)
  chore: remove yarn usage from documentation (#5555)
  cli: adjust default theme to address #1905 (#5552)
  Manual compaction counting fix + cli cleanup (#5480)
  chore(deps): bump prismjs and react-syntax-highlighter in /ui/desktop (#5549)
  fix: remove qwen3-coder from provider/mcp smoke tests (#5551)
  fix: do not build unsigned desktop app bundles on every PR in ci. add manual option. (#5550)
  fix: update Husky prepare script to v9 format (#5522)
  ...
wpfleger96 added a commit that referenced this pull request Nov 5, 2025
* main: (54 commits)
  add clippy warning for string_slice (#5422)
  improve linux tray icon support (#5425)
  feat: log rotation (#5561)
  use app.isPackaged instead of checking for node env development (#5465)
  disable RPM build-ID generation to prevent package conflicts (#5563)
  Add Diagnostics Info to Q&A and Bug Report Templates (#5565)
  fix: improve server error messages to include HTTP status code (#5532)
  improvement: add useful error message when attempting to use unauthenticated cursor-agent (#5300)
  fix: unblock acp via databricks (#5562)
  feat: add --output-format json flag to goose run command (#5525)
  Sessions required (#5548)
  feat: add grouped extension loading notification (#5529)
  we should run this on main and also test open models at least via ope… (#5556)
  info: print location of sessions.db via goose info (#5557)
  chore: remove yarn usage from documentation (#5555)
  cli: adjust default theme to address #1905 (#5552)
  Manual compaction counting fix + cli cleanup (#5480)
  chore(deps): bump prismjs and react-syntax-highlighter in /ui/desktop (#5549)
  fix: remove qwen3-coder from provider/mcp smoke tests (#5551)
  fix: do not build unsigned desktop app bundles on every PR in ci. add manual option. (#5550)
  ...
fbalicchia pushed a commit to fbalicchia/goose that referenced this pull request Nov 7, 2025
BlairAllan pushed a commit to BlairAllan/goose that referenced this pull request Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants