Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
498 changes: 0 additions & 498 deletions evals/open-model-gym/agent-gym-report-2026-02-03.html

This file was deleted.

18 changes: 16 additions & 2 deletions evals/open-model-gym/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,18 @@ runners:
# stdio:
# - node mcp-harness/dist/index.js

- name: goose-full
- name: goose
type: goose
bin: goose
extensions: [developer, todo, skills, code_execution, extensionmanager]
stdio:
- node mcp-harness/dist/index.js

Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runner rename from goose-full to goose makes the repository docs inconsistent (the README still documents goose-full), so users following the Quick Start/config examples will likely get “unknown runner” errors; either update the README in this PR or keep goose-full as an alias name in config.

Suggested change
- name: goose-full
type: goose
bin: goose
extensions: [developer, todo, skills, code_execution, extensionmanager]
stdio:
- node mcp-harness/dist/index.js

Copilot uses AI. Check for mistakes.
- name: goose-diet
type: goose
bin: ~/Downloads/goose-diet
extensions: [developer]
Comment on lines +58 to +61
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goose-diet sets bin: ~/Downloads/goose-diet, but the suite runner builds shell commands from this string and also runs which ${bin} for hashing, so ~ expansion isn't guaranteed and can cause the runner binary to be unresolvable on some shells/platforms; prefer an absolute path (or add explicit ~ expansion in the runner code) to avoid flaky execution.

Copilot uses AI. Check for mistakes.

- name: opencode
type: opencode
bin: opencode
Expand All @@ -69,6 +74,12 @@ runners:
stdio:
- node mcp-harness/dist/index.js

- name: pi-lean
type: pi
bin: pi
# Pi takes provider/model from the test matrix, not config
# MCP support via pi-mcp-adapter: `pi install npm:pi-mcp-adapter`

# =============================================================================
# Test Matrix
# =============================================================================
Expand All @@ -80,6 +91,9 @@ matrix:
- scenario: everyday-app-automation
- scenario: file-editing

# Feature removal: all runners
- scenario: remove-feature

# Multi-turn: goose and pi only (opencode doesn't support session continuation)
- scenario: multi-turn-edit
runners: [pi, goose-full]
runners: [pi, goose]
3 changes: 3 additions & 0 deletions evals/open-model-gym/mcp-harness/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions evals/open-model-gym/suite/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading