Merge pull request #104 from cvs-health/main

v0.3.2 updates
cvs-health · Jan 15, 2025 · 34876f6 · 34876f6
2 parents ac731db + 956e6d3
commit 34876f6
Show file tree

Hide file tree

Showing 5 changed files with 45 additions and 8 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,24 @@
+## Description
+<!--- Provide a general summary of your changes. -->
+<!--- Mention related issues, pull requests, or discussions with #<issue/PR/discussion ID>. -->
+<!--- Tag people for whom this PR may be of interest using @<username>. -->
+
+## Contributor License Agreement
+<!--- Select all that apply by putting an x between the brackets: [x] -->
+- [ ] confirm you have signed the [LangFair CLA](https://forms.office.com/pages/responsepage.aspx?id=uGG7-v46dU65NKR_eCuM1xbiih2MIwxBuRvO0D_wqVFUMlFIVFdYVFozN1BJVjVBRUdMUUY5UU9QRS4u&route=shorturl)
+
+## Tests
+<!--- Select all that apply by putting an x between the brackets: [x] -->
+- [ ] no new tests required
+- [ ] new tests added
+- [ ] existing tests adjusted
+
+## Documentation
+<!--- Select all that apply by putting an x between the brackets: [x] -->
+- [ ] no documentation changes needed
+- [ ] README updated
+- [ ] API docs added or updated
+- [ ] example notebook added or updated
+
+## Screenshots
+<!--- If applicable, please add screenshots. -->
diff --git a/README.md b/README.md
@@ -128,7 +128,7 @@ auto_object = AutoEval(
 )
 results = await auto_object.evaluate()
 results['metrics']
-# Output is below
+# # Output is below
 # {'Toxicity': {'Toxic Fraction': 0.0004,
 #   'Expected Maximum Toxicity': 0.013845130120171235,
 #   'Toxicity Probability': 0.01},
@@ -199,7 +199,7 @@ Bias and fairness metrics offered by LangFair are grouped into several categorie
 
 
 ## 📖 Associated Research
-A technical description of LangFair's evaluation metrics and a practitioner's guide for selecting evaluation metrics is contained in **[this paper](https://arxiv.org/abs/2407.10853)**. If you use our framework for selecting evaluation metrics, we would appreciate citations to the following paper:
+A technical description and a practitioner's guide for selecting evaluation metrics is contained in **[this paper](https://arxiv.org/abs/2407.10853)**. If you use our evaluation approach, we would appreciate citations to the following paper:
 
 ```bibtex
 @misc{bouchard2024actionableframeworkassessingbias,
@@ -213,6 +213,20 @@ A technical description of LangFair's evaluation metrics and a practitioner's gu
 }
 ```
 
+A high-level description of LangFair's functionality is contained in **[this paper](https://arxiv.org/abs/2501.03112)**. If you use LangFair, we would appreciate citations to the following paper:
+
+```bibtex
+@misc{bouchard2025langfairpythonpackageassessing,
+      title={LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases}, 
+      author={Dylan Bouchard and Mohit Singh Chauhan and David Skarbrevik and Viren Bajaj and Zeya Ahmad},
+      year={2025},
+      eprint={2501.03112},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2501.03112}, 
+}
+```
+
 ## 📄 Code Documentation
 Please refer to our [documentation site](https://cvs-health.github.io/langfair/) for more details on how to use LangFair.
 

diff --git a/examples/evaluations/text_generation/auto_eval_demo.ipynb b/examples/evaluations/text_generation/auto_eval_demo.ipynb
@@ -11,7 +11,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This notebook demonstrate the implementation of `AutoEval` class. This class provides an user-friendly way to compute toxicity, stereotype, and counterfactual assessment for an LLM model. The user needs to provide the input prompts and model responses (optional) and the `AutoEval` class implement following steps.\n",
+    "This notebook demonstrate the implementation of `AutoEval` class. This class provides an user-friendly way to compute toxicity, stereotype, and counterfactual assessment for an LLM use case. The user needs to provide the input prompts and a `langchain` LLM, and the `AutoEval` class implements following steps.\n",
     "\n",
     "1. Check Fairness Through Awareness (FTU)\n",
     "2. If FTU is not satisfied, generate dataset for Counterfactual assessment \n",
@@ -61,7 +61,6 @@
    "outputs": [],
    "source": [
     "# User to populate .env file with API credentials\n",
-    "repo_path = '/'.join(os.getcwd().split('/')[:-3])\n",
     "load_dotenv(find_dotenv())\n",
     "\n",
     "API_KEY = os.getenv('API_KEY')\n",

diff --git a/poetry.lock b/poetry.lock
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "langfair"
-version = "0.3.1"
+version = "0.3.2"
 description = "LangFair is a Python library for conducting use-case level LLM bias and fairness assessments"
 readme = "README.md"
 authors = ["Dylan Bouchard <[email protected]>",