-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asynchronous test runs are sometimes not completed correctly #1147
Comments
@jmaczan I never encountered this issue, can you give us something to reproduce it? For example, number of test cases, metric you're using, etc. |
@penguine-ip , with run_async=False, it finished without any issue |
@penguine-ip I have about 20 test cases, I use literally the same set of metrics as @rjiangnju. |
I have the same issue, my metrics are GEval and Answer Relevancy. I run |
To add to this, I've been running the code in jupyter notebook on a Windows 10, but when I transfered the code to a simple python file and executed it, everything worked fine. However, since this bug occurs randomly, I may just have gotten lucky. |
I have similar experiences, it fails for me in Azure Pipelines, but seems to work fine locally |
@jmaczan @threeteck @rjiangnju Hey all, if it works fine locally but not in some environments then it might be because we're writing to file to cache results. When you run For other environments, the OS might be handling file locking differently. Basically, when we run things in async, we need to make sure the cache we're writing is always in its most updated state, so sometimes it cases different coroutines to deadlock each other in reading and writing to file. |
Hi @penguine-ip , Just tried with write_cache=False and It seems work on my side. The os is windows 11, but the code is running in a docker container with Debian 12 by WSL, hope this can help solve the issue. |
Unfortunately it doesn't work for me, I keep having the same issue. I'm also running pipeline on Ubuntu and I'm using deepeval 1.5.0 |
@rjiangnju could you share all other parameters you use in evaluate(), along with 'write_cache=False'? |
@jmaczan , here is the command I use: Maybe also related to the metric, I changed a little, removed Hallucination and added two geval metrics. |
@jmaczan @rjiangnju yup you can find it here: https://docs.confident-ai.com/docs/evaluation-introduction#evaluating-without-pytest |
Describe the bug
When running
evaluate()
withrun_async=True
, sometimes tests are not completed (so any job/task/pipeline that relies on exit code will fail). Results are neither printed nor emitted. The evaluation is essentialy stuck after running the last test case. It is not always the case, without any regularity, so it looks like a race condition. Likely there is issue with async stuff in evaluate.py file -a_execute_test_cases()
,get_or_create_event_loop()
orloop.run_until_complete
.It might be possible that
await asyncio.sleep(throttle_value)
leads to semaphore being stuck or something. I haven't debug it except a brief static code analysis, though.To Reproduce
Steps to reproduce the behavior:
evaluate()
withrun_async=True
Expected behavior
Tests always end with printing either results or errors. They should never last infinitely
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: