Skip to content

Conversation

@michaelneale
Copy link
Collaborator

these tests seem to fail a bit - due to availability of an api, and also testing with microseconds for cache timing, so runs many iterations of latter to check caching is faster (is a bit of an odd test in the first place to test in CI)

@michaelneale michaelneale changed the title fix: trying more runs for cache and retries fix: pricing integrations tests -> trying more runs for cache and retries Jul 21, 2025
@michaelneale michaelneale changed the title fix: pricing integrations tests -> trying more runs for cache and retries fix: pricing integration tests -> trying more runs for cache and retries Jul 21, 2025
@jsibbison-square
Copy link
Contributor

Personally I think some these of tests don't add a lot of value and we should delete all except for 'model_in_open_router', 'model_not_in_open_router' and maybe the concurrent one.

  • test_pricing_cache_performance Checking performance in CI doesn't make sense as we are not comparing against historic time and we will write off failures as transient issues
  • test_model_not_in_openrouter Valid behavior test
  • test_pricing_refresh I think this test is just testing that the refresh_pricing method exists, no assertions of granuality
  • test_concurrent_access This is ensuring we are using a mutex in the code, no easy way to test it that I can think of.

@michaelneale
Copy link
Collaborator Author

@jsibbison-square agree - yeah I don't think they add a lot, could drop some of them?

@lifeizhou-ap
Copy link
Contributor

lifeizhou-ap commented Jul 21, 2025

Hi @michaelneale and @jsibbison-square ,

I had the look into the build failure reason here. I feel below is the the reason why the test is not stable.

In our tests, we use #[tokio::test]. By default, each test run on a multi-threaded runtime, which are running parallel with the other test cases.

let test_cache_dir = format!("/tmp/goose_test_cache_perf_{}", std::process::id());
std::env::set_var("GOOSE_CACHE_DIR", &test_cache_dir);

With the above code, each tests created same directory as a cache dir because they shared the same process id and the same environment variable. In each test cases, it set the cache_dir first and remove the dir. However, since they are using the same dir name, it causes each test case use a shared file which caused the failure when running in parallel.

i guess the options to fix instabilities and make sure test isolation are:

  • passing cache_dir in the PricingCache struct constructor to make it is easier to test
  • change the test to run in sequential (I don't think this is a good solution. each tests case should be isolated from each other)
  • using temp_env crate to override the environment variable in the test.

I also went through the test cases (as integration tests)

  • test_pricing_cache_performance:
    My understanding of the intention for this test case: when getting the price first, it fetches the prices and build the cache, and when getting the price again, it uses the cache. I feel this is a valid test scenario. can verify the data with the cache file instead of the time elapse

  • test_model_not_in_openrouter is valid scenario

  • test_pricing_refresh is also a valid scenario. It can check the timestamp in the cache file

  • test_concurrent_access, it could be a valid test case, but I am not sure how to verify (maybe make sure it can get the model price at the end of test again?)

I feel when we use external services in the test cases, it also brings instability. maybe some of them can be converted to unit tests, and we can have a single integration test.

@michaelneale
Copy link
Collaborator Author

michaelneale commented Jul 21, 2025

even some retries on external may be ok, and testing cache as one shot is not right, need to let it run some iterations (comparison at microsecond level does seem a bit meaningless for that test though)

@michaelneale
Copy link
Collaborator Author

thanks @lifeizhou-ap switched to using proper tempfile - that should help with some, still has the retries in there and gives the cache a good number of runs to ensure it gets faster to stick to intent of test (others are in there).

* main: (69 commits)
  Add inline python extension (#3107)
  fix: add maintainer, homepage and categories to DEB/RPM package config (#3096)
  blog: agent to agent convo (#3677)
  Possible to disable random thinking messages (#3304)
  Two VS code tutorials (#3603)
  small blog fixes (#3549)
  docs: fix installation command for YouTube Transcript MCP in servers.json (#3595)
  Docs for using Docker Model Runner as a local LLM provider.  (#3509)
  Docs: VS Code Extension move to tutorials (#3601)
  Fix working directory when session has no messages (#3513)
  goose docs MCP server (#3665)
  Remove confusing status output when testing sharing url connection and it shows 404 (#3659)
  chore: use typed notifications from rmcp (#3653)
  feat: convert GetPromptResult from mcp_core to rmcp version (#3650)
  feat: Replace usage of mcp_core Tools/ToolAnnotations in openapi schema (#3649)
  fix: ensure execution task result is shown (#3629)
  docs: Quick spotlight fix (#3633)
  alexhancock/rmcp-tools-annotations (#3617)
  fix: clean up subagent (#3565)
  Adds the `WaitingForUserInput` state (#3620)
  ...
@michaelneale michaelneale merged commit b5fb459 into main Jul 28, 2025
8 checks passed
@michaelneale michaelneale deleted the micn/fix-pricing-tests branch July 28, 2025 03:13
katzdave added a commit that referenced this pull request Jul 28, 2025
…cn/compact2-task-tracking

* 'dkatz/goose-compact2' of github.com:block/goose: (22 commits)
  rm stray files
  unused
  fmt
  fix threshold
  autocompact splice last message
  fmt
  Fix conversations before they hit the LLM (#3660)
  cli: add detailed instruction for WSL users (#3496)
  feat: recipe runs will now prompt for missing extension secrets (#3668)
  fix: pricing integration tests -> trying more runs for cache and retries (#3546)
  Add inline python extension (#3107)
  fix: add maintainer, homepage and categories to DEB/RPM package config (#3096)
  blog: agent to agent convo (#3677)
  Possible to disable random thinking messages (#3304)
  Two VS code tutorials (#3603)
  small blog fixes (#3549)
  docs: fix installation command for YouTube Transcript MCP in servers.json (#3595)
  Docs for using Docker Model Runner as a local LLM provider.  (#3509)
  Docs: VS Code Extension move to tutorials (#3601)
  Fix working directory when session has no messages (#3513)
  ...
michaelneale added a commit that referenced this pull request Jul 29, 2025
* dkatz/goose-compact2: (22 commits)
  rm stray files
  unused
  fmt
  fix threshold
  autocompact splice last message
  fmt
  Fix conversations before they hit the LLM (#3660)
  cli: add detailed instruction for WSL users (#3496)
  feat: recipe runs will now prompt for missing extension secrets (#3668)
  fix: pricing integration tests -> trying more runs for cache and retries (#3546)
  Add inline python extension (#3107)
  fix: add maintainer, homepage and categories to DEB/RPM package config (#3096)
  blog: agent to agent convo (#3677)
  Possible to disable random thinking messages (#3304)
  Two VS code tutorials (#3603)
  small blog fixes (#3549)
  docs: fix installation command for YouTube Transcript MCP in servers.json (#3595)
  Docs for using Docker Model Runner as a local LLM provider.  (#3509)
  Docs: VS Code Extension move to tutorials (#3601)
  Fix working directory when session has no messages (#3513)
  ...
michaelneale added a commit that referenced this pull request Jul 29, 2025
* main:
  blog: streamlining detection development w/ recipes (#3689)
  fix: have option for cli providers to use their configured or default model  (#3683)
  docs: new blog post and corrections to an old one on goosehints (#3657)
  Resolve sub recipe path relative to the parent recipe path (#3642)
  Speed up recipe loading from deeplinks and various fixes (#3662)
  fix cmd + , not opening settings (#3694)
  Add warning when JSON env parsing fails. (#3696)
  chore: refactor session naming into provider (#3678)
  feat (ui): File picker for scheduling recipes default to recipe dir (#3611)
  fix: address issue with streamable http interactions via mcp (#3693)
  Provider scenario tests (#3688)
  Fix conversations before they hit the LLM (#3660)
  cli: add detailed instruction for WSL users (#3496)
  feat: recipe runs will now prompt for missing extension secrets (#3668)
  fix: pricing integration tests -> trying more runs for cache and retries (#3546)
atarantino pushed a commit to atarantino/goose that referenced this pull request Aug 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants