From 987f0f35c3b6411a4995c35f7580fbb21c937dfd Mon Sep 17 00:00:00 2001 From: shrjain1312 <36454110+shrjain1312@users.noreply.github.com> Date: Mon, 11 Mar 2024 17:56:46 +0530 Subject: [PATCH] Readme fixes (#593) * fixed readme for evals tutorials * fix uptrain website links * Update README.md * Update README.md * Update README.md * minor fixes --------- Co-authored-by: Dhruv Chawla <43818888+Dominastorm@users.noreply.github.com> --- README.md | 41 ++++--- docs/dashboard/evaluations.mdx | 2 + docs/dashboard/getting_started.mdx | 3 + docs/dashboard/project.mdx | 2 + docs/dashboard/prompts.mdx | 2 + .../response-matching.mdx | 6 +- examples/checks/README.md | 105 ++++++++--------- examples/checks/code_eval/README.md | 25 ++++ .../checks/compare_ground_truth/README.md | 26 +++++ .../compare_ground_truth/matching.ipynb | 4 +- examples/checks/context_awareness/README.md | 36 ++---- examples/checks/conversation/README.md | 35 ++---- examples/checks/custom/README.md | 108 ++---------------- examples/checks/language_features/README.md | 38 ++---- examples/checks/response_quality/README.md | 42 +++---- examples/checks/safeguarding/README.md | 37 ++---- examples/checks/sub_query/README.md | 35 ++---- 17 files changed, 212 insertions(+), 335 deletions(-) create mode 100644 examples/checks/code_eval/README.md create mode 100644 examples/checks/compare_ground_truth/README.md diff --git a/README.md b/README.md index adbb89e5e..dda975984 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@

- + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

@@ -21,7 +21,7 @@ Demo of UpTrain's LLM evaluations with scores for hallucinations, retrieved-context quality, response tonality for a customer support chatbot -**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
@@ -55,14 +55,17 @@ UpTrain provides tons of ways to **customize evaluations**. You can customize ev Support for **40+ operators** such as BLEU, ROUGE, Embeddings Similarity, Exact match, etc. +Interactive Dashboards + +UpTrain Dashboard is a web-based interface that runs on your **local machine**. You can use the dashboard to evaluate your LLM applications, view the results, and perform root cause analysis. + ### Coming Soon: -1. Experiment Dashboards -2. Collaborate with your team -3. Embedding visualization via UMAP and Clustering -4. Pattern recognition among failure cases -5. Prompt improvement suggestions +1. Collaborate with your team +2. Embedding visualization via UMAP and Clustering +3. Pattern recognition among failure cases +4. Prompt improvement suggestions
@@ -71,18 +74,19 @@ Support for **40+ operators** such as BLEU, ROUGE, Embeddings Similarity, Exact | Eval | Description | | ---- | ----------- | -|[Reponse Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness) | Grades whether the response has answered all the aspects of the question specified. | -|[Reponse Conciseness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-conciseness) | Grades how concise the generated response is or if it has any additional irrelevant information for the question asked. | -|[Reponse Relevance](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-relevance)| Grades how relevant the generated context was to the question specified.| -|[Reponse Validity](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-validity)| Grades if the response generated is valid or not. A response is considered to be valid if it contains any information.| -|[Reponse Consistency](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-consistency)| Grades how consistent the response is with the question asked as well as with the context provided.| +|[Response Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness) | Grades whether the response has answered all the aspects of the question specified. | +|[Response Conciseness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-conciseness) | Grades how concise the generated response is or if it has any additional irrelevant information for the question asked. | +|[Response Relevance](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-relevance)| Grades how relevant the generated context was to the question specified.| +|[Response Validity](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-validity)| Grades if the response generated is valid or not. A response is considered to be valid if it contains any information.| +|[Response Consistency](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-consistency)| Grades how consistent the response is with the question asked as well as with the context provided.| quality of retrieved context and response groundedness + | Eval | Description | | ---- | ----------- | |[Context Relevance](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance) | Grades how relevant the context was to the question specified. | -|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization) | Grades how complete the generated response was for the question specified given the information provided in the context. | +|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization) | Grades how complete the generated response was for the question specified, given the information provided in the context. | |[Factual Accuracy](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy)| Grades whether the response generated is factually correct and grounded by the provided context.| |[Context Conciseness](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-conciseness)| Evaluates the concise context cited from an original context for irrelevant information. |[Context Reranking](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)| Evaluates how efficient the reranked context is compared to the original context.| @@ -91,7 +95,7 @@ Support for **40+ operators** such as BLEU, ROUGE, Embeddings Similarity, Exact | Eval | Description | | ---- | ----------- | -|[Language Features](https://docs.uptrain.ai/predefined-evaluations/language-quality/fluency-and-coherence) | Grades whether the response has answered all the aspects of the question specified. | +|[Language Features](https://docs.uptrain.ai/predefined-evaluations/language-quality/fluency-and-coherence) | Grades the quality and effectiveness of language in a response, focusing on factors such as clarity, coherence, conciseness, and overall communication. | |[Tonality](https://docs.uptrain.ai/predefined-evaluations/code-evals/code-hallucination) | Grades whether the generated response matches the required persona's tone | language quality of the response @@ -123,9 +127,15 @@ Support for **40+ operators** such as BLEU, ROUGE, Embeddings Similarity, Exact | Eval | Description | | ---- | ----------- | -|[Prompt Injection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/prompt-injection) | Grades whether the generated response is leaking any system prompt. | +|[Prompt Injection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/prompt-injection) | Grades whether the user's prompt is an attempt to make the LLM reveal its system prompts. | |[Jailbreak Detection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/jailbreak) | Grades whether the user's prompt is an attempt to jailbreak (i.e. generate illegal or harmful responses). | +evaluate the clarity of user queries + +| Eval | Description | +| ---- | ----------- | +|[Sub-Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness) | Evaluate whether all of the sub-questions generated from a user's query, taken together, cover all aspects of the user's query or not | +
# Get started πŸ™Œ @@ -147,6 +157,7 @@ cd uptrain # Run UpTrain bash run_uptrain.sh ``` +> **_NOTE:_** UpTrain Dashboard is currently in **Beta version**. We would love your feedback to improve it. ## Using the UpTrain package diff --git a/docs/dashboard/evaluations.mdx b/docs/dashboard/evaluations.mdx index 8dddf580b..733902e9f 100644 --- a/docs/dashboard/evaluations.mdx +++ b/docs/dashboard/evaluations.mdx @@ -55,6 +55,8 @@ You can look at the complete list of UpTrain's supported metrics [here](/predefi +UpTrain Dashboard is currently in Beta version. We would love your feedback to improve it. + Before you start, ensure you have docker installed on your machine. If not, you can install it from [here](https://docs.docker.com/get-docker/). + ### How to install? The following commands will download the UpTrain dashboard and start it on your local machine: @@ -24,6 +25,8 @@ cd uptrain bash run_uptrain.sh ``` +UpTrain Dashboard is currently in Beta version. We would love your feedback to improve it. + UpTrain Dashboard is currently in Beta version. We would love your feedback to improve it. + +UpTrain Dashboard is currently in Beta version. We would love your feedback to improve it. + - Github banner 006 (1) + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications +

Try out Evaluations - Read Docs - +Quickstart Tutorials +- Slack Community - Feature Request

-

- - PRs Welcome - - - - - - Quickstart - - - Website - -

- -# Pre-built Evaluations We Offer πŸ“ +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. -#### Evaluate the quality of your responses: +
-| Metrics | Usage | -|------------|----------| -| [Response Completeness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/completeness.ipynb) | Evaluate if the response completely resolves the given user query. | -| [Response Relevance](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/relevance.ipynb) | Evaluate whether the generated response for the given question, is relevant or not. | -| [Response Conciseness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/conciseness.ipynb) | Evaluate how concise the generated response is i.e. the extent of additional irrelevant information in the response. | -| [Response Matching ](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/matching.ipynb) | Compare the LLM-generated text with the gold (ideal) response using the defined score metric. | -| [Response Consistency](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/consistency.ipynb) | Evaluate how consistent the response is with the question asked as well as with the context provided. | +# Pre-built Evaluations We Offer πŸ“ +quality of your responses +| Eval | Description | +| ---- | ----------- | +|[Response Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness) | Grades whether the response has answered all the aspects of the question specified. | +|[Response Conciseness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-conciseness) | Grades how concise the generated response is or if it has any additional irrelevant information for the question asked. | +|[Response Relevance](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-relevance)| Grades how relevant the generated context was to the question specified.| +|[Response Validity](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-validity)| Grades if the response generated is valid or not. A response is considered to be valid if it contains any information.| +|[Response Consistency](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-consistency)| Grades how consistent the response is with the question asked as well as with the context provided.| -#### Evaluate the quality of retrieved context and response groundedness: +quality of retrieved context and response groundedness -| Metrics | Usage | -|------------|----------| +| Eval | Description | +| ---- | ----------- | |[Context Relevance](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance) | Grades how relevant the context was to the question specified. | -|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization) | Grades how complete the generated response was for the question specified given the information provided in the context. | +|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization) | Grades how complete the generated response was for the question specified, given the information provided in the context. | |[Factual Accuracy](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy)| Grades whether the response generated is factually correct and grounded by the provided context.| |[Context Conciseness](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-conciseness)| Evaluates the concise context cited from an original context for irrelevant information. |[Context Reranking](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)| Evaluates how efficient the reranked context is compared to the original context.| +language quality of the response + +| Eval | Description | +| ---- | ----------- | +|[Language Features](https://docs.uptrain.ai/predefined-evaluations/language-quality/fluency-and-coherence) | Grades the quality and effectiveness of language in a response, focusing on factors such as clarity, coherence, conciseness, and overall communication. | +|[Tonality](https://docs.uptrain.ai/predefined-evaluations/code-evals/code-hallucination) | Grades whether the generated response matches the required persona's tone | + +language quality of the response + +| Eval | Description | +| ---- | ----------- | +|[Code Hallucination](https://docs.uptrain.ai/predefined-evaluations/code-evals/code-hallucination) | Grades whether the code present in the generated response is grounded by the context. | -#### Evaluations to safeguard system prompts and avoid LLM mis-use: +conversation as a whole -| Metrics | Usage | -|------------|----------| -| [Prompt Injection](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/safeguarding/system_prompt_injection.ipynb) | Identify prompt leakage attacks| -| [Jailbreak Detection](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/safeguarding/jailbreak_detection.ipynb) | Detect prompts with potentially harmful or illegal behaviour| +| Eval | Description | +| ---- | ----------- | +|[User Satisfaction](https://docs.uptrain.ai/predefined-evaluations/conversation-evals/user-satisfaction) | Grades how well the user's concerns are addressed and assesses their satisfaction based on provided conversation. | -#### Evaluate the language quality of the response: +custom evaluations and others -| Metrics | Usage | -|------------|----------| -| [Tone Critique](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/tone_critique.ipynb) | Assess if the tone of machine-generated responses matches with the desired persona.| -| [Language Critique](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/language_critique.ipynb) | Evaluate LLM generated responses on multiple aspects - fluence, politeness, grammar, and coherence.| -| [Rouge Score](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/rouge_score.ipynb) | Measure the similarity between two pieces of text. | + Eval | Description | +| ---- | ----------- | +|[Custom Guideline](https://docs.uptrain.ai/predefined-evaluations/custom-evals/custom-guideline) | Allows you to specify a guideline and grades how well the LLM adheres to the provided guideline when giving a response. | +|[Custom Prompts](https://docs.uptrain.ai/predefined-evaluations/custom-evals/custom-prompt-eval) | Allows you to create your own set of evaluations. | -#### Evaluate the conversation as a whole: +compare responses with ground truth -| Metrics | Usage | -|------------|----------| -| [Conversation Satisfaction](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/conversation/conversation_satisfaction.ipynb) | Measures the user’s satisfaction with the conversation with the AI assistant based on completeness and user acceptance.| +| Eval | Description | +| ---- | ----------- | +|[Response Matching](https://docs.uptrain.ai/predefined-evaluations/ground-truth-comparison/response-matching) | Compares and grades how well the response generated by the LLM aligns with the provided ground truth. | -#### Defining custom evaluations and others: +safeguard system prompts and avoid LLM mis-use -| Metrics | Usage | -|------------|----------| -| [Guideline Adherence](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/custom/guideline_adherence.ipynb) | Grade how well the LLM adheres to a given custom guideline.| -| [Custom Prompt Evaluation](https://github.com/uptrain-ai/uptrain/blob/main) | Evaluate by defining your custom grading prompt.| -| [Cosine Similarity](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/custom/cosine_similarity.ipynb) | Calculate cosine similarity between embeddings of two texts.| +| Eval | Description | +| ---- | ----------- | +|[Prompt Injection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/prompt-injection) | Grades whether the user's prompt is an attempt to make the LLM reveal its system prompts. | +|[Jailbreak Detection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/jailbreak) | Grades whether the user's prompt is an attempt to jailbreak (i.e. generate illegal or harmful responses). | -If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min). +evaluate the clarity of user queries -You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=) \ No newline at end of file +| Eval | Description | +| ---- | ----------- | +|[Sub-Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness) | Evaluate whether all of the sub-questions generated from a user's query, taken together, cover all aspects of the user's query or not | diff --git a/examples/checks/code_eval/README.md b/examples/checks/code_eval/README.md new file mode 100644 index 000000000..c7b379ec5 --- /dev/null +++ b/examples/checks/code_eval/README.md @@ -0,0 +1,25 @@ +

+ + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications + +

+ +

+Try out Evaluations +- +Read Docs +- +Quickstart Tutorials +- +Slack Community +- +Feature Request +

+ +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. + +language quality of the response + +| Eval | Description | +| ---- | ----------- | +|[Code Hallucination](https://docs.uptrain.ai/predefined-evaluations/code-evals/code-hallucination) | Grades whether the code present in the generated response is grounded by the context. | diff --git a/examples/checks/compare_ground_truth/README.md b/examples/checks/compare_ground_truth/README.md new file mode 100644 index 000000000..3fa7b7eb0 --- /dev/null +++ b/examples/checks/compare_ground_truth/README.md @@ -0,0 +1,26 @@ +

+ + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications + +

+ + +

+Try out Evaluations +- +Read Docs +- +Quickstart Tutorials +- +Slack Community +- +Feature Request +

+ +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. + +compare responses with ground truth + +| Eval | Description | +| ---- | ----------- | +|[Response Matching](https://docs.uptrain.ai/predefined-evaluations/ground-truth-comparison/response-matching) | Compares and grades how well the response generated by the LLM aligns with the provided ground truth. | \ No newline at end of file diff --git a/examples/checks/compare_ground_truth/matching.ipynb b/examples/checks/compare_ground_truth/matching.ipynb index f1e0342cb..07ecd6b31 100644 --- a/examples/checks/compare_ground_truth/matching.ipynb +++ b/examples/checks/compare_ground_truth/matching.ipynb @@ -33,9 +33,7 @@ "id": "2ef54d59-295e-4f15-a35f-33f4e86ecdd2", "metadata": {}, "source": [ - "**What is Response Matching?**: Response Completeness is a metric that determines how well the response generated by an LLM matches the ground truth. It comes in handy while checking for the overlap between an LLM generated response and ground truth.\n", - "\n", - "For example, if a user asks a question about the formula of chlorophyll, the ideal response could be: \"The formula of chlorophyll is C55H72MgN4O5\", rather if the response contains some other information about chlorophyll and not the formula: \"Chlorophyll is the pigmet used in photosynthesis, it helpes in generating oxygen.\" it might not really be ideal as it does not match with the ground truth resulting in a low matching score.\n", + "**What is Response Matching?**: Response Matching compares the LLM-generated text with the gold (ideal) response using the defined score metric. \n", "\n", "**Data schema**: The data schema required for this evaluation is as follows:\n", "\n", diff --git a/examples/checks/context_awareness/README.md b/examples/checks/context_awareness/README.md index 4e97cd06b..4199cec74 100644 --- a/examples/checks/context_awareness/README.md +++ b/examples/checks/context_awareness/README.md @@ -1,47 +1,31 @@

- Github banner 006 (1) + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

+

Try out Evaluations - Read Docs - +Quickstart Tutorials +- Slack Community - Feature Request

-

- - PRs Welcome - - - - - - Quickstart - - - Website - -

- -# Context Awareness Evaluations πŸ“ +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. -#### Evaluate the quality of retrieved context and response groundedness: +quality of retrieved context and response groundedness -| Metrics | Usage | -|------------|----------| +| Eval | Description | +| ---- | ----------- | |[Context Relevance](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance) | Grades how relevant the context was to the question specified. | -|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization) | Grades how complete the generated response was for the question specified given the information provided in the context. | +|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization) | Grades how complete the generated response was for the question specified, given the information provided in the context. | |[Factual Accuracy](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy)| Grades whether the response generated is factually correct and grounded by the provided context.| |[Context Conciseness](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-conciseness)| Evaluates the concise context cited from an original context for irrelevant information. -|[Context Reranking](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)| Evaluates how efficient the reranked context is compared to the original context.| - -If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min). - -You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=) \ No newline at end of file +|[Context Reranking](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)| Evaluates how efficient the reranked context is compared to the original context.| \ No newline at end of file diff --git a/examples/checks/conversation/README.md b/examples/checks/conversation/README.md index 6fac2f4e1..8403ef1fd 100644 --- a/examples/checks/conversation/README.md +++ b/examples/checks/conversation/README.md @@ -1,43 +1,26 @@

- Github banner 006 (1) + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

+

Try out Evaluations - Read Docs - +Quickstart Tutorials +- Slack Community - Feature Request

-

- - PRs Welcome - - - - - - Quickstart - - - Website - -

- - -# Conversation Satisfaction Evaluation πŸ“ - -#### Evaluate the conversation as a whole: - -| Metrics | Usage | -|------------|----------| -| [Conversation Satisfaction](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/conversation/conversation_satisfaction.ipynb) | Measures the user’s satisfaction with the conversation with the AI assistant based on completeness and user acceptance.| +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. -If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min). +conversation as a whole -You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=) \ No newline at end of file +| Eval | Description | +| ---- | ----------- | +|[User Satisfaction](https://docs.uptrain.ai/predefined-evaluations/conversation-evals/user-satisfaction) | Grades how well the user's concerns are addressed and assesses their satisfaction based on provided conversation. | \ No newline at end of file diff --git a/examples/checks/custom/README.md b/examples/checks/custom/README.md index 9d5c27698..c777a28fb 100644 --- a/examples/checks/custom/README.md +++ b/examples/checks/custom/README.md @@ -1,117 +1,27 @@

- Github banner 006 (1) + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

-

-Try out Evaluations -- -Read Docs -- -Slack Community -- -Feature Request -

- -

- - PRs Welcome - - - - - - Quickstart - - - Website - -

- - -# Other Custom Evaluations πŸ“ - -#### Evaluate the quality of your responses: - -| Metrics | Usage | -|------------|----------| -| [Response Completeness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/completeness.ipynb) | Evaluate if the response completely resolves the given user query. | -| [Response Relevance](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/relevance.ipynb) | Evaluate whether the generated response for the given question, is relevant or not. | -| [Response Conciseness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/conciseness.ipynb) | Evaluate how concise the generated response is i.e. the extent of additional irrelevant information in the response. | -| [Response Matching ](https://github.com/uptrain-ai/uptrain/blob/main) | Compare the LLM-generated text with the gold (ideal) response using the defined score metric. | -| [Response Consistency](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/consistency.ipynb) | Evaluate how consistent the response is with the question asked as well as with the context provided. | - - -#### Evaluate based on langauge features: - -| Metrics | Usage | -|------------|----------| -| [Factual Accuracy](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/context_awareness/factual_accuracy.ipynb) | Evaluate if the facts present in the response can be verified by the retrieved context. | -| [Response Completeness wrt Context](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/context_awareness/response_completeness_wrt_context.ipynb) | Grades how complete the response was for the question specified concerning the information present in the context.| -| [Context Relevance](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/context_awareness/relevance.ipynb) | Evaluate if the retrieved context contains sufficient information to answer the given question. | - - -#### Evaluations to safeguard system prompts and avoid LLM mis-use: - -| Metrics | Usage | -|------------|----------| -| [Prompt Injection](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/safeguarding/system_prompt_injection.ipynb) | Identify prompt leakage attacks| - - -#### Evaluate the language quality of the response: - -| Metrics | Usage | -|------------|----------| -| [Tone Critique](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/tone_critique.ipynb) | Assess if the tone of machine-generated responses matches with the desired persona.| -| [Language Critique](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/language_critique.ipynb) | Evaluate LLM generated responses on multiple aspects - fluence, politeness, grammar, and coherence.| - -#### Evaluate the conversation as a whole: - -| Metrics | Usage | -|------------|----------| -| [Conversation Satisfaction](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/conversation/conversation_satisfaction.ipynb) | Measures the user’s satisfaction with the conversation with the AI assistant based on completeness and user acceptance.| - - Github banner 006 (1) - -

Try out Evaluations - Read Docs - +Quickstart Tutorials +- Slack Community - Feature Request

-

- - PRs Welcome - - - - - - Quickstart - - - Website - -

- - -# Pre-built Evaluations We Offer πŸ“ - -#### Defining custom evaluations and others: - -| Metrics | Usage | -|------------|----------| -| [Guideline Adherence](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/custom/guideline_adherence.ipynb) | Grade how well the LLM adheres to a given custom guideline.| -| [Custom Prompt Evaluation](https://github.com/uptrain-ai/uptrain/blob/main) | Evaluate by defining your custom grading prompt.| -| [Cosine Similarity](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/custom/cosine_similarity.ipynb) | Calculate cosine similarity between embeddings of two texts.| +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. -If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min). +custom evaluations and others -You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=) \ No newline at end of file + Eval | Description | +| ---- | ----------- | +|[Custom Guideline](https://docs.uptrain.ai/predefined-evaluations/custom-evals/custom-guideline) | Allows you to specify a guideline and grades how well the LLM adheres to the provided guideline when giving a response. | +|[Custom Prompts](https://docs.uptrain.ai/predefined-evaluations/custom-evals/custom-prompt-eval) | Allows you to create your own set of evaluations. | \ No newline at end of file diff --git a/examples/checks/language_features/README.md b/examples/checks/language_features/README.md index b046842f9..ef3236477 100644 --- a/examples/checks/language_features/README.md +++ b/examples/checks/language_features/README.md @@ -1,45 +1,27 @@

- Github banner 006 (1) + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

+

Try out Evaluations - Read Docs - +Quickstart Tutorials +- Slack Community - Feature Request

-

- - PRs Welcome - - - - - - Quickstart - - - Website - -

- - -# Evaluations based on language quality πŸ“ - -#### Evaluate the language quality of the response: - -| Metrics | Usage | -|------------|----------| -| [Tone Critique](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/tone_critique.ipynb) | Assess if the tone of machine-generated responses matches with the desired persona.| -| [Language Critique](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/language_critique.ipynb) | Evaluate LLM generated responses on multiple aspects - fluence, politeness, grammar, and coherence.| -| [Rouge Score](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/rouge_score.ipynb) | Measure the similarity between two pieces of text. | +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. -If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min). +language quality of the response -You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=) \ No newline at end of file +| Eval | Description | +| ---- | ----------- | +|[Language Features](https://docs.uptrain.ai/predefined-evaluations/language-quality/fluency-and-coherence) | Grades the quality and effectiveness of language in a response, focusing on factors such as clarity, coherence, conciseness, and overall communication. | +|[Tonality](https://docs.uptrain.ai/predefined-evaluations/code-evals/code-hallucination) | Grades whether the generated response matches the required persona's tone | \ No newline at end of file diff --git a/examples/checks/response_quality/README.md b/examples/checks/response_quality/README.md index 8a34bceaa..314e1c50c 100644 --- a/examples/checks/response_quality/README.md +++ b/examples/checks/response_quality/README.md @@ -1,47 +1,31 @@

- Github banner 006 (1) + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

+

Try out Evaluations - Read Docs - +Quickstart Tutorials +- Slack Community - Feature Request

-

- - PRs Welcome - - - - - - Quickstart - - - Website - -

- - -# Evaluations based on response quality πŸ“ - -#### Evaluate the quality of your responses: +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. -| Metrics | Usage | -|------------|----------| -| [Response Completeness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/completeness.ipynb) | Evaluate if the response completely resolves the given user query. | -| [Response Relevance](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/relevance.ipynb) | Evaluate whether the generated response for the given question, is relevant or not. | -| [Response Conciseness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/conciseness.ipynb) | Evaluate how concise the generated response is i.e. the extent of additional irrelevant information in the response. | -| [Response Matching ](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/matching.ipynb) | Compare the LLM-generated text with the gold (ideal) response using the defined score metric. | -| [Response Consistency](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/consistency.ipynb) | Evaluate how consistent the response is with the question asked as well as with the context provided. | -If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min). +quality of your responses -You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=) \ No newline at end of file +| Eval | Description | +| ---- | ----------- | +|[Response Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness) | Grades whether the response has answered all the aspects of the question specified. | +|[Response Conciseness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-conciseness) | Grades how concise the generated response is or if it has any additional irrelevant information for the question asked. | +|[Response Relevance](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-relevance)| Grades how relevant the generated context was to the question specified.| +|[Response Validity](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-validity)| Grades if the response generated is valid or not. A response is considered to be valid if it contains any information.| +|[Response Consistency](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-consistency)| Grades how consistent the response is with the question asked as well as with the context provided.| \ No newline at end of file diff --git a/examples/checks/safeguarding/README.md b/examples/checks/safeguarding/README.md index ac177c1bb..a3a353ef9 100644 --- a/examples/checks/safeguarding/README.md +++ b/examples/checks/safeguarding/README.md @@ -1,44 +1,27 @@

- Github banner 006 (1) + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

+

Try out Evaluations - Read Docs - +Quickstart Tutorials +- Slack Community - Feature Request

-

- - PRs Welcome - - - - - - Quickstart - - - Website - -

- - -# Evaluations to ensure better safety πŸ“ - -#### Evaluations to safeguard system prompts and avoid LLM mis-use: - -| Metrics | Usage | -|------------|----------| -| [Prompt Injection](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/safeguarding/system_prompt_injection.ipynb) | Identify prompt leakage attacks| -| [Jailbreak Detection](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/safeguarding/jailbreak_detection.ipynb) | Detect prompts with potentially harmful or illegal behaviour| +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. -If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min). +safeguard system prompts and avoid LLM mis-use -You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=) \ No newline at end of file +| Eval | Description | +| ---- | ----------- | +|[Prompt Injection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/prompt-injection) | Grades whether the user's prompt is an attempt to make the LLM reveal its system prompts. | +|[Jailbreak Detection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/jailbreak) | Grades whether the user's prompt is an attempt to jailbreak (i.e. generate illegal or harmful responses). | \ No newline at end of file diff --git a/examples/checks/sub_query/README.md b/examples/checks/sub_query/README.md index 08179307b..73abf6173 100644 --- a/examples/checks/sub_query/README.md +++ b/examples/checks/sub_query/README.md @@ -1,44 +1,27 @@

- Github banner 006 (1) + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

+

Try out Evaluations - Read Docs - +Quickstart Tutorials +- Slack Community - Feature Request

-

- - PRs Welcome - - - - - - Quickstart - - - Website - -

- - -# Pre-built Evaluations We Offer πŸ“ - -#### Evaluate the quality of sub queries: - -| Metrics | Usage | -|------------|----------| -| [Sub-query Completeness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/sub_query/sub_query_completeness.ipynb) | Evaluate if the list of generated sub-questions comprehensively cover all aspects of the main question.| +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. -If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min). +evaluate the clarity of user queries -You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=) \ No newline at end of file +| Eval | Description | +| ---- | ----------- | +|[Sub-Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness) | Evaluate whether all of the sub-questions generated from a user's query, taken together, cover all aspects of the user's query or not |