Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 10 additions & 8 deletions docs/docs/examples/cookbooks/cleanlab_tlm_rag.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,11 @@
"# Trustworthy RAG with the Trustworthy Language Model\n",
"\n",
"This tutorial demonstrates how to use Cleanlab's [Trustworthy Language Model](https://cleanlab.ai/blog/trustworthy-language-model/) (TLM) in any RAG system, to score the trustworthiness of answers and improve overall reliability of the RAG system.\n",
"We recommend first completing the [TLM example tutorial](https://docs.llamaindex.ai/en/stable/examples/llm/cleanlab/).\n",
"We recommend first completing the [TLM example tutorial](https://docs.llamaindex.ai/en/stable/examples/llm/cleanlab/). <br />\n",
"If you're interested in using Cleanlab as a real-time Evaluator (which can also work as a Guardrail), check out [this tutorial](https://docs.llamaindex.ai/en/stable/examples/evaluation/Cleanlab/).\n",
"\n",
"**Retrieval-Augmented Generation (RAG)** has become popular for building LLM-based Question-Answer systems in domains where LLMs alone suffer from: hallucination, knowledge gaps, and factual inaccuracies. However, RAG systems often still produce unreliable responses, because they depend on LLMs that are fundamentally unreliable. Cleanlab's Trustworthy Language Model (TLM) offers a solution by providing trustworthiness scores to assess and improve response quality, **independent of your RAG architecture or retrieval and indexing processes**. \n",
"\n",
"**Retrieval-Augmented Generation (RAG)** has become popular for building LLM-based Question-Answer systems in domains where LLMs alone suffer from: hallucination, knowledge gaps, and factual inaccuracies. However, RAG systems often still produce unreliable responses, because they depend on LLMs that are fundamentally unreliable. Cleanlab’s Trustworthy Language Model scores the trustworthiness of every LLM response in real-time, using state-of-the-art uncertainty estimates for LLMs, **independent of your RAG architecture or retrieval and indexing processes**. \n",
"\n",
"To diagnose when RAG answers cannot be trusted, simply swap your existing LLM that is generating answers based on the retrieved context with TLM. This notebook showcases this for a standard RAG system, based off a tutorial in the popular [LlamaIndex](https://docs.llamaindex.ai/) framework. Here we merely replace the LLM used in the LlamaIndex tutorial with TLM, and showcase some of the benefits. TLM can be similarly inserted into *any* other RAG framework.\n",
"\n",
Expand Down Expand Up @@ -51,9 +53,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We then initialize Cleanlab's TLM. Here we initialize a CleanlabTLM object with default settings. \n",
"\n",
"You can get your Cleanlab API key here: https://app.cleanlab.ai/account after creating an account. For detailed instructions, refer to [this guide](https://help.cleanlab.ai/guide/quickstart/api/#api-key)."
"We then initialize Cleanlab's TLM. Here we initialize a CleanlabTLM object with default settings. "
]
},
{
Expand All @@ -65,6 +65,7 @@
"from llama_index.llms.cleanlab import CleanlabTLM\n",
"\n",
"# set api key in env or in llm\n",
"# get free API key from: https://cleanlab.ai/\n",
"# import os\n",
"# os.environ[\"CLEANLAB_API_KEY\"] = \"your api key\"\n",
"\n",
Expand All @@ -77,7 +78,7 @@
"source": [
"Note: If you encounter `ValidationError` during the above import, please upgrade your python version to >= 3.11\n",
"\n",
"You can achieve better results by playing with the TLM configurations outlined in this [advanced TLM tutorial](https://help.cleanlab.ai/tutorials/tlm_advanced/).\n",
"You can achieve better results by playing with the TLM configurations outlined in this [advanced TLM tutorial](https://help.cleanlab.ai/tlm/tutorials/tlm_advanced/).\n",
"\n",
"For example, if your application requires OpenAI's GPT-4 model and restrict the output tokens to 256, you can configure it using the `options` argument:\n",
"\n",
Expand Down Expand Up @@ -231,7 +232,7 @@
"In addition, you can just use TLM's trustworthiness score in an existing custom-built RAG pipeline (using any other LLM generator, streaming or not). <br>\n",
"To achieve this, you'd need to fetch the prompt sent to LLM (including system instructions, retrieved context, user query, etc.) and the returned response. TLM requires both to predict trustworthiness.\n",
"\n",
"Detailed information about this approach, along with example code, is available [here](https://help.cleanlab.ai/tlm/use-cases/tlm_rag/#alternate-low-latencystreaming-approach-use-tlm-to-assess-responses-from-an-existing-rag-system)."
"Detailed information about this approach, along with example code, is available [here](https://help.cleanlab.ai/tlm/tutorials/tlm/)."
]
},
{
Expand Down Expand Up @@ -674,7 +675,8 @@
"\n",
"With TLM, you can easily increase trust in any RAG system! \n",
"\n",
"Feel free to check [TLM's performance benchmarks](https://cleanlab.ai/blog/trustworthy-language-model/) for more details."
"Feel free to check [TLM's performance benchmarks](https://cleanlab.ai/blog/trustworthy-language-model/) for more details. <br />\n",
"If you're interested in using Cleanlab as a real-time Evaluator (which can also work as a Guardrail), check out [this tutorial](https://docs.llamaindex.ai/en/stable/examples/evaluation/Cleanlab/)."
]
}
],
Expand Down
15 changes: 7 additions & 8 deletions docs/docs/examples/llm/cleanlab.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,13 @@
"source": [
"# Cleanlab Trustworthy Language Model\n",
"\n",
"This notebook shows how to use Cleanlab's Trustworthy Language Model (TLM) and Trustworthiness score.\n",
"Cleanlab’s Trustworthy Language Model scores the trustworthiness of every LLM response in real-time, using state-of-the-art uncertainty estimates for LLMs. Trust scoring is crucial for applications where unchecked hallucinations and other LLM errors are a show-stopper.\n",
"\n",
"TLM is a more reliable LLM that gives high-quality outputs and indicates when it is unsure of the answer to a question, making it suitable for applications where unchecked hallucinations are a show-stopper.<br />\n",
"Trustworthiness score quantifies how confident you can be that the response is good (higher values indicate greater trustworthiness). These scores combine estimates of both aleatoric and epistemic uncertainty to provide an overall gauge of trustworthiness.\n",
"This page demonstrates how to use TLM in place of your own LLM, to both generate responses and score their trustworthiness. That’s not the only way to use TLM though. <br />\n",
"To add trust scoring to a RAG application, you can instead see [this tutorial](https://docs.llamaindex.ai/en/stable/examples/evaluation/Cleanlab/) which walkthroughs building Trustworthy RAG with Cleanlab. \n",
"Beyond RAG applications, you can also score the trustworthiness of responses already generated from any LLM via [this tutorial](https://help.cleanlab.ai/tlm/tutorials/tlm/).\n",
"\n",
"\n",
"Read more about TLM API on [Cleanlab Studio's docs](https://help.cleanlab.ai/reference/python/trustworthy_language_model/). For more advanced usage, feel free to refer to the [quickstart tutorial](https://help.cleanlab.ai/tutorials/tlm/).\n",
"\n",
"Visit https://cleanlab.ai and sign up to get a free API key."
"Learn more about TLM in the Cleanlab [documentation](https://help.cleanlab.ai/tlm/)."
]
},
{
Expand Down Expand Up @@ -82,6 +80,7 @@
"outputs": [],
"source": [
"# set api key in env or in llm\n",
"# get free API key from: https://cleanlab.ai/\n",
"# import os\n",
"# os.environ[\"CLEANLAB_API_KEY\"] = \"your api key\"\n",
"\n",
Expand Down Expand Up @@ -244,7 +243,7 @@
"- **log**: specify additional metadata to return. include “explanation” here to get explanations of why a response is scored with low trustworthiness\n",
"\n",
"These configurations are passed as a dictionary to the `CleanlabTLM` object during initialization. <br />\n",
"More details about these options can be referred from [Cleanlab's API documentation](https://help.cleanlab.ai/reference/python/trustworthy_language_model/#class-tlmoptions) and a few use-cases of these options are explored in [this notebook](https://help.cleanlab.ai/tlm/tutorials/tlm_advanced/).\n",
"More details about these options can be referred from [Cleanlab's API documentation](https://help.cleanlab.ai/tlm/api/python/tlm/#class-tlmoptions) and a few use-cases of these options are explored in [this notebook](https://help.cleanlab.ai/tlm/tutorials/tlm_advanced/).\n",
"\n",
"Let's consider an example where the application requires `gpt-4` model with `128` output tokens."
]
Expand Down