Skip to content

Releases: confident-ai/deepeval

Lots of new features

14 Dec 10:50
Compare
Choose a tag to compare

Lots of new features this release:

  1. JudgementalGPT now allows for different languages - useful for our APAC and European friends
  2. RAGAS metrics now supports all OpenAI models - useful for those running into context length issues
  3. LLMEvalMetric now returns a reasoning for its score
  4. deepeval test run now has hooks that call on test run completion
  5. evaluate now displays retrieval_context for RAG evaluation
  6. RAGAS metric now displays metric breakdown for all its distinct metrics

Continuous Evaluation

22 Nov 12:45
Compare
Choose a tag to compare
Continuous Evaluation Pre-release
Pre-release

Automatically integrated with Confident AI for continous evaluation throughout the lifetime of your LLM (app):

-log evaluation results and analyze metrics pass / fails
-compare and pick the optimal hyperparameters (eg. prompt templates, chunk size, models used, etc.) based on evaluation results
-debug evaluation results via LLM traces
-manage evaluation test cases / datasets in one place
-track events to identify live LLM responses in production
-add production events to existing evaluation datasets to strength evals over time

Continuous Evaluation

04 Dec 10:42
Compare
Choose a tag to compare

Automatically integrated with Confident AI for continous evaluation throughout the lifetime of your LLM (app):

-log evaluation results and analyze metrics pass / fails
-compare and pick the optimal hyperparameters (eg. prompt templates, chunk size, models used, etc.) based on evaluation results
-debug evaluation results via LLM traces
-manage evaluation test cases / datasets in one place
-track events to identify live LLM responses in production
-add production events to existing evaluation datasets to strength evals over time

Evaluate entire datasets

16 Nov 07:20
Compare
Choose a tag to compare

Mid-week bug fixes release with an extra feature:

Judgemental GPT

14 Nov 05:12
Compare
Choose a tag to compare

In this release, deepeval has added support for:

  • JudgementalGPT, a dedicated LLM app developed by Confident AI to perform evaluations more robustly and accurately. JudgementalGPT provides a score and a reason for the score.
  • Parallel testing: execute test cases in parallel and speed up evaluation up to 100x.

v0.20.17

13 Nov 10:08
Compare
Choose a tag to compare
new release

v0.20.16

07 Nov 03:32
Compare
Choose a tag to compare
new release

v0.20.15

06 Nov 19:59
Compare
Choose a tag to compare
new release

v0.20.14

05 Nov 06:30
Compare
Choose a tag to compare
prepare for release

v0.20.13

05 Nov 06:27
Compare
Choose a tag to compare
release v0.20.13