-
Notifications
You must be signed in to change notification settings - Fork 2.3k
fix: pricing integration tests -> trying more runs for cache and retries #3546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Personally I think some these of tests don't add a lot of value and we should delete all except for 'model_in_open_router', 'model_not_in_open_router' and maybe the concurrent one.
|
|
@jsibbison-square agree - yeah I don't think they add a lot, could drop some of them? |
|
Hi @michaelneale and @jsibbison-square , I had the look into the build failure reason here. I feel below is the the reason why the test is not stable. In our tests, we use With the above code, each tests created same directory as a cache dir because they shared the same process id and the same environment variable. In each test cases, it set the cache_dir first and remove the dir. However, since they are using the same dir name, it causes each test case use a shared file which caused the failure when running in parallel. i guess the options to fix instabilities and make sure test isolation are:
I also went through the test cases (as integration tests)
I feel when we use external services in the test cases, it also brings instability. maybe some of them can be converted to unit tests, and we can have a single integration test. |
|
even some retries on external may be ok, and testing cache as one shot is not right, need to let it run some iterations (comparison at microsecond level does seem a bit meaningless for that test though) |
|
thanks @lifeizhou-ap switched to using proper tempfile - that should help with some, still has the retries in there and gives the cache a good number of runs to ensure it gets faster to stick to intent of test (others are in there). |
* main: (69 commits) Add inline python extension (#3107) fix: add maintainer, homepage and categories to DEB/RPM package config (#3096) blog: agent to agent convo (#3677) Possible to disable random thinking messages (#3304) Two VS code tutorials (#3603) small blog fixes (#3549) docs: fix installation command for YouTube Transcript MCP in servers.json (#3595) Docs for using Docker Model Runner as a local LLM provider. (#3509) Docs: VS Code Extension move to tutorials (#3601) Fix working directory when session has no messages (#3513) goose docs MCP server (#3665) Remove confusing status output when testing sharing url connection and it shows 404 (#3659) chore: use typed notifications from rmcp (#3653) feat: convert GetPromptResult from mcp_core to rmcp version (#3650) feat: Replace usage of mcp_core Tools/ToolAnnotations in openapi schema (#3649) fix: ensure execution task result is shown (#3629) docs: Quick spotlight fix (#3633) alexhancock/rmcp-tools-annotations (#3617) fix: clean up subagent (#3565) Adds the `WaitingForUserInput` state (#3620) ...
…cn/compact2-task-tracking * 'dkatz/goose-compact2' of github.com:block/goose: (22 commits) rm stray files unused fmt fix threshold autocompact splice last message fmt Fix conversations before they hit the LLM (#3660) cli: add detailed instruction for WSL users (#3496) feat: recipe runs will now prompt for missing extension secrets (#3668) fix: pricing integration tests -> trying more runs for cache and retries (#3546) Add inline python extension (#3107) fix: add maintainer, homepage and categories to DEB/RPM package config (#3096) blog: agent to agent convo (#3677) Possible to disable random thinking messages (#3304) Two VS code tutorials (#3603) small blog fixes (#3549) docs: fix installation command for YouTube Transcript MCP in servers.json (#3595) Docs for using Docker Model Runner as a local LLM provider. (#3509) Docs: VS Code Extension move to tutorials (#3601) Fix working directory when session has no messages (#3513) ...
* dkatz/goose-compact2: (22 commits) rm stray files unused fmt fix threshold autocompact splice last message fmt Fix conversations before they hit the LLM (#3660) cli: add detailed instruction for WSL users (#3496) feat: recipe runs will now prompt for missing extension secrets (#3668) fix: pricing integration tests -> trying more runs for cache and retries (#3546) Add inline python extension (#3107) fix: add maintainer, homepage and categories to DEB/RPM package config (#3096) blog: agent to agent convo (#3677) Possible to disable random thinking messages (#3304) Two VS code tutorials (#3603) small blog fixes (#3549) docs: fix installation command for YouTube Transcript MCP in servers.json (#3595) Docs for using Docker Model Runner as a local LLM provider. (#3509) Docs: VS Code Extension move to tutorials (#3601) Fix working directory when session has no messages (#3513) ...
* main: blog: streamlining detection development w/ recipes (#3689) fix: have option for cli providers to use their configured or default model (#3683) docs: new blog post and corrections to an old one on goosehints (#3657) Resolve sub recipe path relative to the parent recipe path (#3642) Speed up recipe loading from deeplinks and various fixes (#3662) fix cmd + , not opening settings (#3694) Add warning when JSON env parsing fails. (#3696) chore: refactor session naming into provider (#3678) feat (ui): File picker for scheduling recipes default to recipe dir (#3611) fix: address issue with streamable http interactions via mcp (#3693) Provider scenario tests (#3688) Fix conversations before they hit the LLM (#3660) cli: add detailed instruction for WSL users (#3496) feat: recipe runs will now prompt for missing extension secrets (#3668) fix: pricing integration tests -> trying more runs for cache and retries (#3546)
…ies (block#3546) Signed-off-by: Adam Tarantino <[email protected]>
these tests seem to fail a bit - due to availability of an api, and also testing with microseconds for cache timing, so runs many iterations of latter to check caching is faster (is a bit of an odd test in the first place to test in CI)