cvs-health
diff --git a/‎README.md
+16-16 b/‎README.md
+16-16
diff --git a/‎assets/images/archive/LLM_Framework.png
-33.8 KB b/‎assets/images/archive/LLM_Framework.png
-33.8 KB
diff --git a/‎assets/images/archive/LLaMBDA.png
-17.6 KB b/‎assets/images/archive/LLaMBDA.png
-17.6 KB
diff --git a/‎assets/images/archive/llambda2_logo_old.PNG
-14.4 KB b/‎assets/images/archive/llambda2_logo_old.PNG
-14.4 KB
diff --git a/‎assets/images/archive/llambda_logo_cvsred.PNG
-7.98 KB b/‎assets/images/archive/llambda_logo_cvsred.PNG
-7.98 KB
diff --git a/‎assets/images/archive/llambda_logo_old.PNG
-8.7 KB b/‎assets/images/archive/llambda_logo_old.PNG
-8.7 KB
diff --git a/‎assets/images/langfair-logo.png
18.7 KB b/‎assets/images/langfair-logo.png
18.7 KB
diff --git a/‎assets/images/langfair-logo2.png
18 KB b/‎assets/images/langfair-logo2.png
18 KB
diff --git a/‎assets/images/langfair-logo3.png
17.2 KB b/‎assets/images/langfair-logo3.png
17.2 KB
diff --git a/‎assets/images/llambda-logo-alt-dark.png
-21.4 KB b/‎assets/images/llambda-logo-alt-dark.png
-21.4 KB
diff --git a/‎assets/images/llambda-logo-only-dark.png
-6.51 KB b/‎assets/images/llambda-logo-only-dark.png
-6.51 KB
diff --git a/‎assets/images/llambda-logo-only.png
-6.96 KB b/‎assets/images/llambda-logo-only.png
-6.96 KB
diff --git a/‎assets/images/llambda-logo.png
-23.2 KB b/‎assets/images/llambda-logo.png
-23.2 KB
diff --git a/‎assets/images/llambda-logo2.png
-22.1 KB b/‎assets/images/llambda-logo2.png
-22.1 KB
diff --git a/‎data/DATA_COPYRIGHT.md
+1-1 b/‎data/DATA_COPYRIGHT.md
+1-1
diff --git a/‎examples/evaluations/classification/classification_metrics_demo.ipynb
+6-6 b/‎examples/evaluations/classification/classification_metrics_demo.ipynb
+6-6
diff --git a/‎examples/evaluations/recommendation/recommendation_metrics_demo.ipynb
+7-7 b/‎examples/evaluations/recommendation/recommendation_metrics_demo.ipynb
+7-7
diff --git a/‎examples/evaluations/text_generation/auto_eval_demo.ipynb
+27-27 b/‎examples/evaluations/text_generation/auto_eval_demo.ipynb
+27-27
@@ -1,12 +1,12 @@
 <p align="center">
-  <img src="./assets/images/llambda-logo.png" />
+  <img src="./assets/images/langfair-logo.png" />
 </p>
 
 # Library for Assessing Bias and Fairness in LLMs
 
-LLaMBDA (Large Language Model Bias Detection and Auditing) is a Python library for conducting bias and fairness assessments of LLM use cases. This repository includes a framework for [choosing bias and fairness metrics](#choosing-bias-and-fairness-metrics-for-an-llm-use-case), [demo notebooks](./examples), and a LLM bias and fairness [technical playbook](https://arxiv.org/pdf/2407.10853) containing a thorough discussion of LLM bias and fairness risks, evaluation metrics, and best practices. Please refer to our [documentation site](https://cvs-health.github.io/llambda/) for more details on how to use LLaMBDA.
+LangFair is a Python library for conducting bias and fairness assessments of LLM use cases. This repository includes a framework for [choosing bias and fairness metrics](#choosing-bias-and-fairness-metrics-for-an-llm-use-case), [demo notebooks](./examples), and a LLM bias and fairness [technical playbook](https://arxiv.org/pdf/2407.10853) containing a thorough discussion of LLM bias and fairness risks, evaluation metrics, and best practices. Please refer to our [documentation site](https://cvs-health.github.io/langfair/) for more details on how to use LangFair.
 
-Bias and fairness metrics offered by LLaMBDA fall into one of several categories. The full suite of metrics is displayed below.
+Bias and fairness metrics offered by LangFair fall into one of several categories. The full suite of metrics is displayed below.
 
 ##### Counterfactual Fairness Metrics
 * Strict Counterfactual Sentiment Parity ([Huang et al., 2020](https://arxiv.org/pdf/1911.03064))
@@ -38,18 +38,18 @@ Bias and fairness metrics offered by LLaMBDA fall into one of several categories
 * False Discovery Rate Disparity ([Bellamy et al., 2018](https://arxiv.org/abs/1810.01943); [Saleiro et al., 2019](https://arxiv.org/abs/1811.05577))
 
 ## Quickstart 
-### (Optional) Create a virtual environment for using LLaMBDA
-We recommend creating a new virtual environment using venv before installing LLaMBDA. To do so, please follow instructions [here](https://docs.python.org/3/library/venv.html).
+### (Optional) Create a virtual environment for using LangFair
+We recommend creating a new virtual environment using venv before installing LangFair. To do so, please follow instructions [here](https://docs.python.org/3/library/venv.html).
 
-### Installing LLaMBDA
-The latest version can be installed from PyPI:
+### Installing LangFair
+The latest version can be installed from the github URL:
 
 ```bash
-pip install llambda
+pip install git+https://github.com/cvs-health/langfair.git
 ```
 
 ### Usage
-Below is a sample of code illustrating how to use LLaMBDA's `AutoEval` class for text generation and summarization use cases. The below example assumes the user has already defined parameters `DEPLOYMENT_NAME`, `API_KEY`, `API_BASE`, `API_TYPE`, `API_VERSION`, and a list of prompts from their use case `prompts`.
+Below is a sample of code illustrating how to use LangFair's `AutoEval` class for text generation and summarization use cases. The below example assumes the user has already defined parameters `DEPLOYMENT_NAME`, `API_KEY`, `API_BASE`, `API_TYPE`, `API_VERSION`, and a list of prompts from their use case `prompts`.
 
 Create `langchain` LLM object.
 ```python
@@ -60,13 +60,13 @@ llm = AzureChatOpenAI(
     azure_endpoint=API_BASE,
     openai_api_type=API_TYPE,
     openai_api_version=API_VERSION,
-    temperature=1 # User to set temperature
+    temperature=0.4 # User to set temperature
 )
 ```
 
 Run the `AutoEval` method for automated bias / fairness evaluation
 ```python
-from llambda.auto import AutoEval
+from langfair.auto import AutoEval
 auto_object = AutoEval(
     prompts=prompts, 
     langchain_llm=llm
@@ -92,7 +92,7 @@ auto_object.print_results()
 </p>
 
 ## Example Notebooks
-See **[Demo Notebooks](./examples)** for notebooks illustrating how to use LLaMBDA for various bias and fairness evaluation metrics.
+See **[Demo Notebooks](./examples)** for notebooks illustrating how to use LangFair for various bias and fairness evaluation metrics.
 
 ## Choosing Bias and Fairness Metrics for an LLM Use Case
 In general, bias and fairness assessments of LLM use cases do not require satisfying all possible evaluation metrics. Instead, practitioners should prioritize and concentrate on a relevant subset of metrics. To demystify metric choice for bias and fairness assessments of LLM use cases, we introduce a decision framework for selecting the appropriate evaluation metrics, as depicted in the diagram below. Leveraging the use case taxonomy outlined in the [technical playbook](https://arxiv.org/abs/2407.10853), we determine suitable choices of bias and fairness metrics for a given use case based on its relevant characteristics.
@@ -114,7 +114,7 @@ Lastly, we classify the remaining subset of focused use cases as having minimal
 
 
 ## Associated Research
-A technical description of LLaMBDA's evaluation metrics and a practitioner's guide for selecting evaluation metrics is contained in **[this paper](https://arxiv.org/pdf/2407.10853)**. Below is the bibtex entry for this paper:
+A technical description of LangFair's evaluation metrics and a practitioner's guide for selecting evaluation metrics is contained in **[this paper](https://arxiv.org/pdf/2407.10853)**. Below is the bibtex entry for this paper:
 
 @misc{bouchard2024actionableframeworkassessingbias,
       title={An Actionable Framework for Assessing Bias and Fairness in Large Language Model Use Cases}, 
@@ -127,10 +127,10 @@ A technical description of LLaMBDA's evaluation metrics and a practitioner's gui
 }
 
 ## Code Documentation
-Please refer to our [documentation site](https://cvs-health.github.io/llambda/) for more details on how to use LLaMBDA.
+Please refer to our [documentation site](https://cvs-health.github.io/langfair/) for more details on how to use LangFair.
 
 ## Development Team
-The open-source version of LLaMBDA is the culmination of extensive work carried out by a dedicated team of developers. While the internal commit history will not be made public, we believe it's essential to acknowledge the significant contributions of our development team who were instrumental in bringing this project to fruition:
+The open-source version of LangFair is the culmination of extensive work carried out by a dedicated team of developers. While the internal commit history will not be made public, we believe it's essential to acknowledge the significant contributions of our development team who were instrumental in bringing this project to fruition:
 
 - [Dylan Bouchard](https://github.com/dylanbouchard)
 - [Mohit Singh Chauhan](https://github.com/mohitcek)
@@ -139,4 +139,4 @@ The open-source version of LLaMBDA is the culmination of extensive work carried
 - [Zeya Ahmad](https://github.com/zeya30)
 
 ## Contributing
-Contributions are welcome. Please refer [here](./CONTRIBUTING.md) for instructions on how to contribute to LLaMBDA.
+Contributions are welcome. Please refer [here](./CONTRIBUTING.md) for instructions on how to contribute to LangFair.
@@ -1,4 +1,4 @@
-Please refer to below for copyright information for the two files contained in `llambda/data`
+Please refer to below for copyright information for the two files contained in `langfair/data`
 
 #### Copyright information for [RealToxicityPrompts.jsonl](https://huggingface.co/datasets/allenai/real-toxicity-prompts)
 ***
 
@@ -13,15 +13,15 @@
     "\n",
     "import numpy as np\n",
     "from IPython.display import Image\n",
-    "from llambda.metrics.classification import ClassificationMetrics"
+    "from langfair.metrics.classification import ClassificationMetrics"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "b9290443-ce88-4d54-beea-1e1888500b36",
    "metadata": {},
    "source": [
-    "Bias and fairness metrics offered by `llambda` fall into various categories: counterfactual discrimination metrics, stereotype metrics, toxicity mtrics, recommendation fairness metrics, and classification fairness metrics. The full suite of metrics is displayed below.\n",
+    "Bias and fairness metrics offered by `langfair` fall into various categories: counterfactual discrimination metrics, stereotype metrics, toxicity mtrics, recommendation fairness metrics, and classification fairness metrics. The full suite of metrics is displayed below.\n",
     "\n",
     "##### Counterfactual Discrimination Metrics\n",
     "* Strict Counterfactual Sentiment Parity ([Huang et al., 2020](https://arxiv.org/pdf/1911.03064))\n",
@@ -210,15 +210,15 @@
  ],
  "metadata": {
   "environment": {
-   "kernel": "llambda-env",
+   "kernel": "langfair",
    "name": "workbench-notebooks.m121",
    "type": "gcloud",
    "uri": "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/workbench-notebooks:m121"
   },
   "kernelspec": {
-   "display_name": "llambda-env",
+   "display_name": "langfair",
    "language": "python",
-   "name": "llambda-env"
+   "name": "langfair"
   },
   "language_info": {
    "codemirror_mode": {
@@ -230,7 +230,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.19"
+   "version": "3.9.20"
   }
  },
  "nbformat": 4,
 
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 1,
    "id": "f694ef3c-96cb-472c-80c4-0409222fc4ac",
    "metadata": {
     "tags": []
@@ -13,15 +13,15 @@
     "\n",
     "from IPython.display import Image\n",
     "\n",
-    "from llambda.metrics.recommendation import RecommendationMetrics\n"
+    "from langfair.metrics.recommendation import RecommendationMetrics"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "b9290443-ce88-4d54-beea-1e1888500b36",
    "metadata": {},
    "source": [
-    "Bias and fairness metrics offered by `llambda` fall into various categories: counterfactual discrimination metrics, stereotype metrics, toxicity mtrics, recommendation fairness metrics, and classification fairness metrics. The full suite of metrics is displayed below.\n",
+    "Bias and fairness metrics offered by `langfair` fall into various categories: counterfactual discrimination metrics, stereotype metrics, toxicity mtrics, recommendation fairness metrics, and classification fairness metrics. The full suite of metrics is displayed below.\n",
     "\n",
     "##### Counterfactual Discrimination Metrics\n",
     "* Strict Counterfactual Sentiment Parity ([Huang et al., 2020](https://arxiv.org/pdf/1911.03064))\n",
@@ -394,15 +394,15 @@
  ],
  "metadata": {
   "environment": {
-   "kernel": "llambda-env",
+   "kernel": "langfair",
    "name": "workbench-notebooks.m121",
    "type": "gcloud",
    "uri": "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/workbench-notebooks:m121"
   },
   "kernelspec": {
-   "display_name": "llambda-env",
+   "display_name": "langfair",
    "language": "python",
-   "name": "llambda-env"
+   "name": "langfair"
   },
   "language_info": {
    "codemirror_mode": {
@@ -414,7 +414,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.19"
+   "version": "3.9.20"
   }
  },
  "nbformat": 4,
 
@@ -30,7 +30,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 2,
    "metadata": {
     "tags": []
    },
@@ -43,14 +43,14 @@
     "from dotenv import find_dotenv, load_dotenv\n",
     "from langchain_openai import AzureChatOpenAI\n",
     "\n",
-    "from llambda.auto import AutoEval\n",
+    "from langfair.auto import AutoEval\n",
     "\n",
     "warnings.filterwarnings(\"ignore\")"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 3,
    "metadata": {
     "tags": []
    },
@@ -77,7 +77,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 4,
    "metadata": {
     "tags": []
    },
@@ -99,7 +99,7 @@
        " \"#Person1#: Watsup, ladies! Y'll looking'fine tonight. May I have this dance?\\\\n#Person2#: He's cute! He looks like Tiger Woods! But, I can't dance. . .\\\\n#Person1#: It's all good. I'll show you all the right moves. My name's Malik.\\\\n#Person2#: Nice to meet you. I'm Wen, and this is Nikki.\\\\n#Person1#: How you feeling', vista? Mind if I take your friend'round the dance floor?\\\\n#Person2#: She doesn't mind if you don't mind getting your feet stepped on.\\\\n#Person1#: Right. Cool! Let's go!\\n\"]"
       ]
      },
-     "execution_count": 3,
+     "execution_count": 4,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -118,7 +118,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 5,
    "metadata": {
     "tags": []
    },
@@ -132,7 +132,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### `AutoEval()` - For calculating all toxicity, stereotype, and counterfactual metrics supported by LLaMBDA\n",
+    "#### `AutoEval()` - For calculating all toxicity, stereotype, and counterfactual metrics supported by LangFair\n",
     "\n",
     "**Class Attributes:**\n",
     "- `prompts` - (**list of strings**)\n",
@@ -173,7 +173,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 6,
    "metadata": {
     "tags": []
    },
@@ -191,7 +191,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 7,
    "metadata": {
     "tags": []
    },
@@ -216,7 +216,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 8,
    "metadata": {
     "tags": []
    },
@@ -227,35 +227,35 @@
      "text": [
       "\u001b[1mStep 1: Fairness Through Unawareness\u001b[0m\n",
       "------------------------------------\n",
-      "LLaMBDA: Number of prompts containing race words: 0\n",
-      "- LLaMBDA: The prompts satisfy fairness through unawareness for race words, the recommended risk assessment only include Toxicity\n",
-      "LLaMBDA: Number of prompts containing gender words: 31\n",
-      "- LLaMBDA: The prompts do not satisfy fairness through unawareness for gender words, the recommended risk assessments include Toxicity, Stereotype, and Counterfactual Discrimination.\n",
+      "langfair: Number of prompts containing race words: 0\n",
+      "- langfair: The prompts satisfy fairness through unawareness for race words, the recommended risk assessment only include Toxicity\n",
+      "langfair: Number of prompts containing gender words: 31\n",
+      "- langfair: The prompts do not satisfy fairness through unawareness for gender words, the recommended risk assessments include Toxicity, Stereotype, and Counterfactual Discrimination.\n",
       "\n",
       "\u001b[1mStep 2: Generate Counterfactual Dataset\u001b[0m\n",
       "---------------------------------------\n",
-      "LLaMBDA: gender words found in 31 prompts.\n",
+      "langfair: gender words found in 31 prompts.\n",
       "Generating 25 responses for each gender prompt...\n",
-      "LLaMBDA: Responses successfully generated!\n",
+      "langfair: Responses successfully generated!\n",
       "\n",
       "\u001b[1mStep 3: Generating Model Responses\u001b[0m\n",
       "----------------------------------\n",
-      "LLaMBDA: Generating 25 responses per prompt...\n",
-      "LLaMBDA: Responses successfully generated!\n",
+      "langfair: Generating 25 responses per prompt...\n",
+      "langfair: Responses successfully generated!\n",
       "\n",
       "\u001b[1mStep 4: Evaluate Toxicity Metrics\u001b[0m\n",
       "---------------------------------\n",
-      "LLaMBDA: Computing toxicity scores...\n",
-      "LLaMBDA: Evaluating metrics...\n",
+      "langfair: Computing toxicity scores...\n",
+      "langfair: Evaluating metrics...\n",
       "\n",
       "\u001b[1mStep 5: Evaluate Stereotype Metrics\u001b[0m\n",
       "-----------------------------------\n",
-      "LLaMBDA: Computing stereotype scores...\n",
-      "LLaMBDA: Evaluating metrics...\n",
+      "langfair: Computing stereotype scores...\n",
+      "langfair: Evaluating metrics...\n",
       "\n",
       "\u001b[1mStep 6: Evaluate Counterfactual Metrics\u001b[0m\n",
       "---------------------------------------\n",
-      "LLaMBDA: Evaluating metrics...\n"
+      "langfair: Evaluating metrics...\n"
      ]
     }
    ],
@@ -620,15 +620,15 @@
  ],
  "metadata": {
   "environment": {
-   "kernel": "llambda-env",
+   "kernel": "langfair",
    "name": "workbench-notebooks.m121",
    "type": "gcloud",
    "uri": "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/workbench-notebooks:m121"
   },
   "kernelspec": {
-   "display_name": "llambda-env",
+   "display_name": "langfair",
    "language": "python",
-   "name": "llambda-env"
+   "name": "langfair"
   },
   "language_info": {
    "codemirror_mode": {
@@ -640,7 +640,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.19"
+   "version": "3.9.20"
   }
  },
  "nbformat": 4,
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		-Please refer to below for copyright information for the two files contained in `llambda/data`
	`1`	+Please refer to below for copyright information for the two files contained in `langfair/data`
`2`	`2`
`3`	`3`	`#### Copyright information for [RealToxicityPrompts.jsonl](https://huggingface.co/datasets/allenai/real-toxicity-prompts)`
`4`	`4`	`***`