fix: pricing integration tests -> trying more runs for cache and retries #3546

michaelneale · 2025-07-21T03:09:54Z

these tests seem to fail a bit - due to availability of an api, and also testing with microseconds for cache timing, so runs many iterations of latter to check caching is faster (is a bit of an odd test in the first place to test in CI)

jsibbison-square · 2025-07-21T04:15:12Z

Personally I think some these of tests don't add a lot of value and we should delete all except for 'model_in_open_router', 'model_not_in_open_router' and maybe the concurrent one.

test_pricing_cache_performance Checking performance in CI doesn't make sense as we are not comparing against historic time and we will write off failures as transient issues
test_model_not_in_openrouter Valid behavior test
test_pricing_refresh I think this test is just testing that the refresh_pricing method exists, no assertions of granuality
test_concurrent_access This is ensuring we are using a mutex in the code, no easy way to test it that I can think of.

michaelneale · 2025-07-21T05:32:34Z

@jsibbison-square agree - yeah I don't think they add a lot, could drop some of them?

lifeizhou-ap · 2025-07-21T05:50:04Z

Hi @michaelneale and @jsibbison-square ,

I had the look into the build failure reason here. I feel below is the the reason why the test is not stable.

In our tests, we use #[tokio::test]. By default, each test run on a multi-threaded runtime, which are running parallel with the other test cases.

let test_cache_dir = format!("/tmp/goose_test_cache_perf_{}", std::process::id());
std::env::set_var("GOOSE_CACHE_DIR", &test_cache_dir);

With the above code, each tests created same directory as a cache dir because they shared the same process id and the same environment variable. In each test cases, it set the cache_dir first and remove the dir. However, since they are using the same dir name, it causes each test case use a shared file which caused the failure when running in parallel.

i guess the options to fix instabilities and make sure test isolation are:

passing cache_dir in the PricingCache struct constructor to make it is easier to test
change the test to run in sequential (I don't think this is a good solution. each tests case should be isolated from each other)
using temp_env crate to override the environment variable in the test.

I also went through the test cases (as integration tests)

test_pricing_cache_performance:
My understanding of the intention for this test case: when getting the price first, it fetches the prices and build the cache, and when getting the price again, it uses the cache. I feel this is a valid test scenario. can verify the data with the cache file instead of the time elapse
test_model_not_in_openrouter is valid scenario
test_pricing_refresh is also a valid scenario. It can check the timestamp in the cache file
test_concurrent_access, it could be a valid test case, but I am not sure how to verify (maybe make sure it can get the model price at the end of test again?)

I feel when we use external services in the test cases, it also brings instability. maybe some of them can be converted to unit tests, and we can have a single integration test.

michaelneale · 2025-07-21T07:21:07Z

even some retries on external may be ok, and testing cache as one shot is not right, need to let it run some iterations (comparison at microsecond level does seem a bit meaningless for that test though)

michaelneale · 2025-07-21T07:41:15Z

thanks @lifeizhou-ap switched to using proper tempfile - that should help with some, still has the retries in there and gives the cache a good number of runs to ensure it gets faster to stick to intent of test (others are in there).

* main: (69 commits) Add inline python extension (#3107) fix: add maintainer, homepage and categories to DEB/RPM package config (#3096) blog: agent to agent convo (#3677) Possible to disable random thinking messages (#3304) Two VS code tutorials (#3603) small blog fixes (#3549) docs: fix installation command for YouTube Transcript MCP in servers.json (#3595) Docs for using Docker Model Runner as a local LLM provider. (#3509) Docs: VS Code Extension move to tutorials (#3601) Fix working directory when session has no messages (#3513) goose docs MCP server (#3665) Remove confusing status output when testing sharing url connection and it shows 404 (#3659) chore: use typed notifications from rmcp (#3653) feat: convert GetPromptResult from mcp_core to rmcp version (#3650) feat: Replace usage of mcp_core Tools/ToolAnnotations in openapi schema (#3649) fix: ensure execution task result is shown (#3629) docs: Quick spotlight fix (#3633) alexhancock/rmcp-tools-annotations (#3617) fix: clean up subagent (#3565) Adds the `WaitingForUserInput` state (#3620) ...

…cn/compact2-task-tracking * 'dkatz/goose-compact2' of github.com:block/goose: (22 commits) rm stray files unused fmt fix threshold autocompact splice last message fmt Fix conversations before they hit the LLM (#3660) cli: add detailed instruction for WSL users (#3496) feat: recipe runs will now prompt for missing extension secrets (#3668) fix: pricing integration tests -> trying more runs for cache and retries (#3546) Add inline python extension (#3107) fix: add maintainer, homepage and categories to DEB/RPM package config (#3096) blog: agent to agent convo (#3677) Possible to disable random thinking messages (#3304) Two VS code tutorials (#3603) small blog fixes (#3549) docs: fix installation command for YouTube Transcript MCP in servers.json (#3595) Docs for using Docker Model Runner as a local LLM provider. (#3509) Docs: VS Code Extension move to tutorials (#3601) Fix working directory when session has no messages (#3513) ...

* dkatz/goose-compact2: (22 commits) rm stray files unused fmt fix threshold autocompact splice last message fmt Fix conversations before they hit the LLM (#3660) cli: add detailed instruction for WSL users (#3496) feat: recipe runs will now prompt for missing extension secrets (#3668) fix: pricing integration tests -> trying more runs for cache and retries (#3546) Add inline python extension (#3107) fix: add maintainer, homepage and categories to DEB/RPM package config (#3096) blog: agent to agent convo (#3677) Possible to disable random thinking messages (#3304) Two VS code tutorials (#3603) small blog fixes (#3549) docs: fix installation command for YouTube Transcript MCP in servers.json (#3595) Docs for using Docker Model Runner as a local LLM provider. (#3509) Docs: VS Code Extension move to tutorials (#3601) Fix working directory when session has no messages (#3513) ...

* main: blog: streamlining detection development w/ recipes (#3689) fix: have option for cli providers to use their configured or default model (#3683) docs: new blog post and corrections to an old one on goosehints (#3657) Resolve sub recipe path relative to the parent recipe path (#3642) Speed up recipe loading from deeplinks and various fixes (#3662) fix cmd + , not opening settings (#3694) Add warning when JSON env parsing fails. (#3696) chore: refactor session naming into provider (#3678) feat (ui): File picker for scheduling recipes default to recipe dir (#3611) fix: address issue with streamable http interactions via mcp (#3693) Provider scenario tests (#3688) Fix conversations before they hit the LLM (#3660) cli: add detailed instruction for WSL users (#3496) feat: recipe runs will now prompt for missing extension secrets (#3668) fix: pricing integration tests -> trying more runs for cache and retries (#3546)

…ies (block#3546) Signed-off-by: Adam Tarantino <[email protected]>

trying more runs for cache and retries

3a26dc7

michaelneale changed the title ~~fix: trying more runs for cache and retries~~ fix: pricing integrations tests -> trying more runs for cache and retries Jul 21, 2025

michaelneale requested review from lifeizhou-ap and wendytang July 21, 2025 03:10

michaelneale changed the title ~~fix: pricing integrations tests -> trying more runs for cache and retries~~ fix: pricing integration tests -> trying more runs for cache and retries Jul 21, 2025

use temp_dir crate/tempfile to avoid race conditions

156de7e

zanesq approved these changes Jul 22, 2025

View reviewed changes

michaelneale merged commit b5fb459 into main Jul 28, 2025
8 checks passed

michaelneale deleted the micn/fix-pricing-tests branch July 28, 2025 03:13

atarantino pushed a commit to atarantino/goose that referenced this pull request Aug 5, 2025

fix: pricing integration tests -> trying more runs for cache and retr…

5ce6aec

…ies (block#3546) Signed-off-by: Adam Tarantino <[email protected]>

wpfleger96 mentioned this pull request Oct 16, 2025

delete flaky pricing integration tests #5207

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: pricing integration tests -> trying more runs for cache and retries #3546

fix: pricing integration tests -> trying more runs for cache and retries #3546

Uh oh!

michaelneale commented Jul 21, 2025

Uh oh!

jsibbison-square commented Jul 21, 2025

Uh oh!

michaelneale commented Jul 21, 2025

Uh oh!

lifeizhou-ap commented Jul 21, 2025 •

edited

Loading

Uh oh!

michaelneale commented Jul 21, 2025 •

edited

Loading

Uh oh!

michaelneale commented Jul 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix: pricing integration tests -> trying more runs for cache and retries #3546

fix: pricing integration tests -> trying more runs for cache and retries #3546

Uh oh!

Conversation

michaelneale commented Jul 21, 2025

Uh oh!

jsibbison-square commented Jul 21, 2025

Uh oh!

michaelneale commented Jul 21, 2025

Uh oh!

lifeizhou-ap commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelneale commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelneale commented Jul 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lifeizhou-ap commented Jul 21, 2025 •

edited

Loading

michaelneale commented Jul 21, 2025 •

edited

Loading