run-llama · logan-markewich · May 21, 2025 · May 16, 2025 · May 17, 2025 · May 17, 2025
diff --git a/docs/docs/examples/cookbooks/cleanlab_tlm_rag.ipynb b/docs/docs/examples/cookbooks/cleanlab_tlm_rag.ipynb
@@ -8,9 +8,11 @@
     "# Trustworthy RAG with the Trustworthy Language Model\n",
     "\n",
     "This tutorial demonstrates how to use Cleanlab's [Trustworthy Language Model](https://cleanlab.ai/blog/trustworthy-language-model/) (TLM) in any RAG system, to score the trustworthiness of answers and improve overall reliability of the RAG system.\n",
-    "We recommend first completing the [TLM example tutorial](https://docs.llamaindex.ai/en/stable/examples/llm/cleanlab/).\n",
+    "We recommend first completing the [TLM example tutorial](https://docs.llamaindex.ai/en/stable/examples/llm/cleanlab/). <br />\n",
+    "If you're interested in using Cleanlab as a real-time Evaluator (which can also work as a Guardrail), check out [this tutorial](https://docs.llamaindex.ai/en/stable/examples/evaluation/Cleanlab/).\n",
     "\n",
-    "**Retrieval-Augmented Generation (RAG)** has become popular for building LLM-based Question-Answer systems in domains where LLMs alone suffer from: hallucination, knowledge gaps, and factual inaccuracies. However, RAG systems often still produce unreliable responses, because they depend on LLMs that are fundamentally unreliable. Cleanlab's Trustworthy Language Model (TLM) offers a solution by providing trustworthiness scores to assess and improve response quality, **independent of your RAG architecture or retrieval and indexing processes**. \n",
+    "\n",
+    "**Retrieval-Augmented Generation (RAG)** has become popular for building LLM-based Question-Answer systems in domains where LLMs alone suffer from: hallucination, knowledge gaps, and factual inaccuracies. However, RAG systems often still produce unreliable responses, because they depend on LLMs that are fundamentally unreliable. Cleanlab’s Trustworthy Language Model scores the trustworthiness of every LLM response in real-time, using state-of-the-art uncertainty estimates for LLMs, **independent of your RAG architecture or retrieval and indexing processes**. \n",
     "\n",
     "To diagnose when RAG answers cannot be trusted, simply swap your existing LLM that is generating answers based on the retrieved context with TLM. This notebook showcases this for a standard RAG system, based off a tutorial in the popular [LlamaIndex](https://docs.llamaindex.ai/) framework. Here we merely replace the LLM used in the LlamaIndex tutorial with TLM, and showcase some of the benefits. TLM can be similarly inserted into *any* other RAG framework.\n",
     "\n",
@@ -51,9 +53,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We then initialize Cleanlab's TLM. Here we initialize a CleanlabTLM object with default settings. \n",
-    "\n",
-    "You can get your Cleanlab API key here: https://app.cleanlab.ai/account after creating an account. For detailed instructions, refer to [this guide](https://help.cleanlab.ai/guide/quickstart/api/#api-key)."
+    "We then initialize Cleanlab's TLM. Here we initialize a CleanlabTLM object with default settings. "
    ]
   },
   {
@@ -65,6 +65,7 @@
     "from llama_index.llms.cleanlab import CleanlabTLM\n",
     "\n",
     "# set api key in env or in llm\n",
+    "# get free API key from: https://cleanlab.ai/\n",
     "# import os\n",
     "# os.environ[\"CLEANLAB_API_KEY\"] = \"your api key\"\n",
     "\n",
@@ -77,7 +78,7 @@
    "source": [
     "Note: If you encounter `ValidationError` during the above import, please upgrade your python version to >= 3.11\n",
     "\n",
-    "You can achieve better results by playing with the TLM configurations outlined in this [advanced TLM tutorial](https://help.cleanlab.ai/tutorials/tlm_advanced/).\n",
+    "You can achieve better results by playing with the TLM configurations outlined in this [advanced TLM tutorial](https://help.cleanlab.ai/tlm/tutorials/tlm_advanced/).\n",
     "\n",
     "For example, if your application requires OpenAI's GPT-4 model and restrict the output tokens to 256, you can configure it using the `options` argument:\n",
     "\n",
@@ -231,7 +232,7 @@
     "In addition, you can just use TLM's trustworthiness score in an existing custom-built RAG pipeline (using any other LLM generator, streaming or not). <br>\n",
     "To achieve this, you'd need to fetch the prompt sent to LLM (including system instructions, retrieved context, user query, etc.) and the returned response. TLM requires both to predict trustworthiness.\n",
     "\n",
-    "Detailed information about this approach, along with example code, is available [here](https://help.cleanlab.ai/tlm/use-cases/tlm_rag/#alternate-low-latencystreaming-approach-use-tlm-to-assess-responses-from-an-existing-rag-system)."
+    "Detailed information about this approach, along with example code, is available [here](https://help.cleanlab.ai/tlm/tutorials/tlm/)."
    ]
   },
   {
@@ -674,7 +675,8 @@
     "\n",
     "With TLM, you can easily increase trust in any RAG system! \n",
     "\n",
-    "Feel free to check [TLM's performance benchmarks](https://cleanlab.ai/blog/trustworthy-language-model/) for more details."
+    "Feel free to check [TLM's performance benchmarks](https://cleanlab.ai/blog/trustworthy-language-model/) for more details. <br />\n",
+    "If you're interested in using Cleanlab as a real-time Evaluator (which can also work as a Guardrail), check out [this tutorial](https://docs.llamaindex.ai/en/stable/examples/evaluation/Cleanlab/)."
    ]
   }
  ],

diff --git a/docs/docs/examples/llm/cleanlab.ipynb b/docs/docs/examples/llm/cleanlab.ipynb
@@ -16,15 +16,13 @@
    "source": [
     "# Cleanlab Trustworthy Language Model\n",
     "\n",
-    "This notebook shows how to use Cleanlab's Trustworthy Language Model (TLM) and Trustworthiness score.\n",
+    "Cleanlab’s Trustworthy Language Model scores the trustworthiness of every LLM response in real-time, using state-of-the-art uncertainty estimates for LLMs.  Trust scoring is crucial for applications where unchecked hallucinations and other LLM errors are a show-stopper.\n",
     "\n",
-    "TLM is a more reliable LLM that gives high-quality outputs and indicates when it is unsure of the answer to a question, making it suitable for applications where unchecked hallucinations are a show-stopper.<br />\n",
-    "Trustworthiness score quantifies how confident you can be that the response is good (higher values indicate greater trustworthiness). These scores combine estimates of both aleatoric and epistemic uncertainty to provide an overall gauge of trustworthiness.\n",
+    "This page demonstrates how to use TLM in place of your own LLM, to both generate responses and score their trustworthiness. That’s not the only way to use TLM though. <br />\n",
+    "To add trust scoring to a RAG application, you can instead see [this tutorial](https://docs.llamaindex.ai/en/stable/examples/evaluation/Cleanlab/) which walkthroughs building Trustworthy RAG with Cleanlab. \n",
+    "Beyond RAG applications, you can also score the trustworthiness of responses already generated from any LLM via [this tutorial](https://help.cleanlab.ai/tlm/tutorials/tlm/).\n",
     "\n",
-    "\n",
-    "Read more about TLM API on [Cleanlab Studio's docs](https://help.cleanlab.ai/reference/python/trustworthy_language_model/). For more advanced usage, feel free to refer to the [quickstart tutorial](https://help.cleanlab.ai/tutorials/tlm/).\n",
-    "\n",
-    "Visit https://cleanlab.ai and sign up to get a free API key."
+    "Learn more about TLM in the Cleanlab [documentation](https://help.cleanlab.ai/tlm/)."
    ]
   },
   {
@@ -82,6 +80,7 @@
    "outputs": [],
    "source": [
     "# set api key in env or in llm\n",
+    "# get free API key from: https://cleanlab.ai/\n",
     "# import os\n",
     "# os.environ[\"CLEANLAB_API_KEY\"] = \"your api key\"\n",
     "\n",
@@ -244,7 +243,7 @@
     "- **log**: specify additional metadata to return. include “explanation” here to get explanations of why a response is scored with low trustworthiness\n",
     "\n",
     "These configurations are passed as a dictionary to the `CleanlabTLM` object during initialization. <br />\n",
-    "More details about these options can be referred from [Cleanlab's API documentation](https://help.cleanlab.ai/reference/python/trustworthy_language_model/#class-tlmoptions) and a few use-cases of these options are explored in [this notebook](https://help.cleanlab.ai/tlm/tutorials/tlm_advanced/).\n",
+    "More details about these options can be referred from [Cleanlab's API documentation](https://help.cleanlab.ai/tlm/api/python/tlm/#class-tlmoptions) and a few use-cases of these options are explored in [this notebook](https://help.cleanlab.ai/tlm/tutorials/tlm_advanced/).\n",
     "\n",
     "Let's consider an example where the application requires `gpt-4` model with `128` output tokens."
    ]