Skip to content

fix(gym): isolate scenario Cargo projects from parent workspace#8640

Merged
jh-block merged 1 commit into
aaif-goose:mainfrom
kyledef:fix/gym-scenarios-workspace-isolation
Apr 20, 2026
Merged

fix(gym): isolate scenario Cargo projects from parent workspace#8640
jh-block merged 1 commit into
aaif-goose:mainfrom
kyledef:fix/gym-scenarios-workspace-isolation

Conversation

@kyledef
Copy link
Copy Markdown
Contributor

@kyledef kyledef commented Apr 18, 2026

Summary

Two open-model-gym scenarios (file-editing and multi-turn-edit) scaffold a Cargo project without declaring [workspace]. When the gym is run from inside a checkout of goose itself — which is the most natural place to run it — the final cargo build validation step fails before goose's output is even evaluated:

error: current package believes it's in a workspace when it's not:
current:   .../evals/open-model-gym/suite/.workdir/file-editing_.../Cargo.toml
workspace: .../goose/Cargo.toml

This causes the scenario to be scored as failed regardless of how well the agent performed the edit.

Fix

Add an empty [workspace] table to the scaffolded Cargo.toml in each affected scenario. This makes the scenario project self-contained so cargo build works regardless of where the gym is executed.

  Cargo.toml: |
    [package]
    name = "user-service"
    version = "0.1.0"
    edition = "2021"
+
+   [workspace]

Repro

From inside a goose checkout, with a model configured:

cd evals/open-model-gym
just install
just run  # or: npx tsx suite/src/runner.ts

Before this change, file-editing fails at the cargo build validation rule even when the agent produces the correct edit. After the change, the scenario runs to completion.

Verified locally on macOS with databricks/goose-claude-4-7-opus — the file-editing scenario now passes end-to-end (1/1 tests passed in ~27s).

Scope

Minimal, two-line change. No behavior change for users who run the gym outside a Rust workspace.

Scaffolded Cargo.toml files for the file-editing and multi-turn-edit
scenarios do not declare [workspace], so when the gym is run from
inside a checkout of goose (or any other Rust workspace) the
scenario's final 'cargo build' validation fails with:

    error: current package believes it's in a workspace when it's not

Add an empty [workspace] table to each scaffolded Cargo.toml so the
scenario project is self-contained regardless of where the gym runs.

Signed-off-by: Kyle De Freitas <kdefreitas@squareup.com>
@jh-block jh-block added this pull request to the merge queue Apr 20, 2026
Merged via the queue into aaif-goose:main with commit 5b43b5f Apr 20, 2026
18 checks passed
spikewang pushed a commit to spikewang/goose that referenced this pull request Apr 22, 2026
…-goose#8640)

Signed-off-by: Kyle De Freitas <kdefreitas@squareup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants