Merge pull request #85 from cvs-health/db/ftu_check_method

dylanbouchard · web-flow · commit 7c84aa752b00 · 2025-01-01T10:01:33.000-05:00
New FTU check method
diff --git a/examples/evaluations/text_generation/counterfactual_metrics_demo.ipynb b/examples/evaluations/text_generation/counterfactual_metrics_demo.ipynb
@@ -20,7 +20,9 @@
    "source": [
     "Content\n",
     "1. [Introduction](#section1')\n",
-    "2. [Generate Demo Dataset](#section2')\n",
+    "2. [Generate Counterfactual Dataset](#section2')<br>\n",
+    "    2.1 [Check fairness through unawareness](#section2-1')<br>\n",
+    "    2.2 [Generate counterfactual responses](#section2-2')\n",
     "3. [Assessment](#section3')<br>\n",
     "    3.1 [Lazy Implementation](#section3-1')<br>\n",
     "    3.2 [Separate Implementation](#section3-2')\n",
@@ -120,7 +122,7 @@
    "metadata": {},
    "source": [
     "<a id='section2'></a>\n",
-    "## 2. Generate Demo Dataset"
+    "## 2. Generate Counterfactual Dataset"
    ]
   },
   {
@@ -163,64 +165,15 @@
     "tags": []
    },
    "source": [
-    "### Counterfactual Dataset Generator\n",
+    "#### Counterfactual Dataset Generator\n",
     "***\n",
-    "##### `CounterfactualGenerator()` - Class for generating data for counterfactual discrimination assessment (class)\n",
+    "##### `CounterfactualGenerator()` - Used for generating data for counterfactual fairness assessment (class)\n",
     "\n",
     "**Class Attributes:**\n",
     "\n",
-    "- `langchain_llm` (**langchain llm (Runnable), default=None**) A langchain llm object to get passed to LLMChain `llm` argument. \n",
+    "- `langchain_llm` (**langchain llm (Runnable), default=None**) A LangChain llm object to get passed to LangChain `RunnableSequence`. \n",
     "- `suppressed_exceptions` (**tuple, default=None**) Specifies which exceptions to handle as 'Unable to get response' rather than raising the exception\n",
-    "- `max_calls_per_min` (**deprecated as of 0.2.0**) Use LangChain's InMemoryRateLimiter instead.\n",
-    "\n",
-    "**Methods:**\n",
-    "\n",
-    "1. `parse_texts()` - Parses a list of texts for protected attribute words and names\n",
-    "\n",
-    "    **Method Parameters:**\n",
-    "\n",
-    "    - `text` - (**string**) A text corpus to be parsed for protected attribute words and names\n",
-    "    - `attribute` - (**{'race','gender','name'}**) Specifies what to parse for among race words, gender words, and names\n",
-    "    - `custom_list` - (**List[str], default=None**) Custom list of tokens to use for parsing prompts. Must be provided if attribute is None.\n",
-    "    \n",
-    "    **Returns:**\n",
-    "    - list of results containing protected attribute words found (**list**)\n",
-    "\n",
-    "2. `create_prompts()` - Creates counterfactual prompts by counterfactual substitution\n",
-    "\n",
-    "    **Method Parameters:**\n",
-    "\n",
-    "    - `prompts` - (**List of strings**) A list of prompts on which counterfactual substitution and response generation will be done\n",
-    "    - `attribute` - (**{'gender', 'race'}, default=None**) Specifies what to parse for among race words and gender words. Must be specified if custom_list is None.\n",
-    "    - `custom_dict` - (**Dict[str, List[str]], default=None**) A dictionary containing corresponding lists of tokens for counterfactual substitution. Keys should correspond to groups. Must be provided if attribute is None. For example: {'male': ['he', 'him', 'woman'], 'female': ['she', 'her', 'man']}\n",
-    "            subset_prompts : bool, default=True\n",
-    "    \n",
-    "    **Returns:**\n",
-    "    - list of prompts on which counterfactual substitution was completed (**list**)\n",
-    "    \n",
-    "3. `neutralize_tokens()` - Neutralize gender and race words contained in a list of texts. Replaces gender words with a gender-neutral equivalent and race words with \"[MASK]\".\n",
-    "\n",
-    "    **Method Parameters:**\n",
-    "\n",
-    "    - `text_list` - (**List of strings**) A list of texts on which gender or race neutralization will occur\n",
-    "    - `attribute` - (**{'gender', 'race'}, default='gender'**) Specifies whether to use race or gender for for neutralization\n",
-    "\n",
-    "    **Returns:**\n",
-    "    - list of texts neutralized with respect to race or gender (**list**)\n",
-    "\n",
-    "4. `generate_responses()` - Creates counterfactual prompts obtained by counterfactual substitution and generates responses asynchronously. \n",
-    "\n",
-    "    **Method Parameters:**\n",
-    "\n",
-    "    - `prompts` - (**List of strings**) A list of prompts on which counterfactual substitution and response generation will be done\n",
-    "    - `attribute` - (**{'gender', 'race'}, default='gender'**) Specifies whether to use race or gender for counterfactual substitution\n",
-    "    - `system_prompt` - (**str, default=\"You are a helpful assistant.\"**) Specifies system prompt for generation  \n",
-    "    - `count` - (**int, default=25**) Specifies number of responses to generate for each prompt.\n",
-    "    - `custom_dict` - (**Dict[str, List[str]], default=None**) A dictionary containing corresponding lists of tokens for counterfactual substitution. Keys should correspond to groups. Must be provided if attribute is None. For example: {'male': ['he', 'him', 'woman'], 'female': ['she', 'her', 'man']}\n",
-    "\n",
-    "    **Returns:** A dictionary with two keys: `data` and `metadata`.\n",
-    "    - `data` (**dict**) A dictionary containing the prompts and responses.\n",
-    "    - `metadata` (**dict**) A dictionary containing metadata about the generation process, including non-completion rate, temperature, count, original prompts, and identified proctected attribute words."
+    "- `max_calls_per_min` (**deprecated as of 0.2.0**) Use LangChain's InMemoryRateLimiter instead."
    ]
   },
   {
@@ -366,7 +319,32 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "For illustration, this notebook assesses with 'race' as the protected attribute, but metrics can be evaluated for 'gender' or other custom protected attributes in the same way. First, the above mentioned `parse_texts` method is used to identify the input prompts that contain protected attribute words. \n",
+    "<a id='section2-1'></a>\n",
+    "### 2.1 Check fairness through unawareness"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### `CounterfactualGenerator.check_ftu()` - Parses prompts to check for fairness through unawareness. Returns dictionary with prompts, corresponding attribute words found, and applicable metadata. \n",
+    "\n",
+    "**Method Parameters:**\n",
+    "\n",
+    "- `text` - (**string**) A text corpus to be parsed for protected attribute words and names\n",
+    "- `attribute` - (**{'race','gender','name'}**) Specifies what to parse for among race words, gender words, and names\n",
+    "- `custom_list` - (**List[str], default=None**) Custom list of tokens to use for parsing prompts. Must be provided if attribute is None.\n",
+    "- `subset_prompts` - (**bool, default=True**) Indicates whether to return all prompts or only those containing attribute words\n",
+    "\n",
+    "**Returns:**\n",
+    "- dictionary with prompts, corresponding attribute words found, and applicable metadata (**dict**)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For illustration, this notebook assesses with 'race' as the protected attribute, but metrics can be evaluated for 'gender' or other custom protected attributes in the same way. First, the above mentioned `check_ftu` method is used to check for fairness through unawareness, i.e. whether prompts contain mentions of protected attribute words. In the returned object, prompts are subset to retain only those that contain protected attribute words. \n",
     "\n",
     "Note: We recommend using atleast 1000 prompts that contain protected attribute words for better estimates. Otherwise, increase `count` attribute of `CounterfactualGenerator` class generate more responses."
    ]
@@ -456,21 +434,54 @@
    ],
    "source": [
     "# Check for fairness through unawareness\n",
-    "attribute = 'race'\n",
-    "df = pd.DataFrame({'prompt': prompts})\n",
-    "df[attribute + '_words'] =  cdg.parse_texts(texts=prompts, attribute=attribute)\n",
-    "\n",
-    "# Remove input prompts that doesn't include a race word\n",
-    "race_prompts = df[df['race_words'].apply(lambda x: len(x) > 0)][['prompt','race_words']]\n",
-    "print(f\"Race words found in {len(race_prompts)} prompts\")\n",
+    "ftu_result = cdg.check_ftu(\n",
+    "    prompts=prompts,\n",
+    "    attribute='race',\n",
+    "    subset_prompts=True\n",
+    ")\n",
+    "race_prompts = pd.DataFrame(ftu_result[\"data\"]).rename(columns={'attribute_words': 'race_words'})\n",
     "race_prompts.tail(5)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Generate the model response on the input prompts using `generate_responses` method."
+    "As seen above, this use case does not satisfy fairness through unawareness, since 246 prompts contain mentions of race words."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id='section2-2'></a>\n",
+    "### 2.2 Generate counterfactual responses"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### `CounterfactualGenerator.generate_responses()` - Creates counterfactual prompts obtained by counterfactual substitution and generates responses asynchronously. \n",
+    "\n",
+    "**Method Parameters:**\n",
+    "\n",
+    "- `prompts` - (**List of strings**) A list of prompts on which counterfactual substitution and response generation will be done\n",
+    "- `attribute` - (**{'gender', 'race'}, default='gender'**) Specifies whether to use race or gender for counterfactual substitution\n",
+    "- `system_prompt` - (**str, default=\"You are a helpful assistant.\"**) Specifies system prompt for generation  \n",
+    "- `count` - (**int, default=25**) Specifies number of responses to generate for each prompt.\n",
+    "- `custom_dict` - (**Dict[str, List[str]], default=None**) A dictionary containing corresponding lists of tokens for counterfactual substitution. Keys should correspond to groups. Must be provided if attribute is None. For example: {'male': ['he', 'him', 'woman'], 'female': ['she', 'her', 'man']}\n",
+    "\n",
+    "**Returns:** A dictionary with two keys: `data` and `metadata`.\n",
+    "- `data` (**dict**) A dictionary containing the prompts and responses.\n",
+    "- `metadata` (**dict**) A dictionary containing metadata about the generation process, including non-completion rate, temperature, count, original prompts, and identified proctected attribute words."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Create counterfactual input prompts and generate corresponding LLM responses  using `generate_responses` method."
    ]
   },
   {
@@ -566,7 +577,7 @@
    ],
    "source": [
     "generations = await cdg.generate_responses(\n",
-    "    prompts=df['prompt'], attribute='race', count=1\n",
+    "    prompts=race_prompts['prompt'], attribute='race', count=1\n",
     ")\n",
     "output_df = pd.DataFrame(generations['data'])\n",
     "output_df.head(1)"
@@ -617,7 +628,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### `CounterfactualMetrics()` - Calculate all the counterfactual metrics (class)\n",
+    "#### `CounterfactualMetrics()` - Calculate all the counterfactual metrics (class)\n",
     "**Class Attributes:**\n",
     "- `metrics` - (**List of strings/Metric objects**) Specifies which metrics to use.\n",
     "Default option is a list if strings (`metrics` = [\"Cosine\", \"Rougel\", \"Bleu\", \"Sentiment Bias\"]).\n",
@@ -1206,9 +1217,9 @@
    "uri": "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/workbench-notebooks:m125"
   },
   "kernelspec": {
-   "display_name": "langchain",
+   "display_name": ".venv",
    "language": "python",
-   "name": "langchain"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
@@ -1220,7 +1231,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.10"
+   "version": "3.9.6"
   }
  },
  "nbformat": 4,
diff --git a/langfair/generator/counterfactual.py b/langfair/generator/counterfactual.py
@@ -191,12 +191,7 @@ def parse_texts(
             List of length `len(texts)` with each element being a list of identified protected
             attribute words in provided text
         """
-        assert not (custom_list and attribute), """
-        Either custom_list or attribute must be None.
-        """
-        assert custom_list or attribute in ["race", "gender"], """
-        If custom_list is None, attribute must be 'race' or 'gender'.
-        """
+        self._validate_attributes(attribute=attribute, custom_list=custom_list)
         result = []
         for text in texts:
             result.append(
@@ -234,13 +229,9 @@ def create_prompts(
         dict
             Dictionary containing counterfactual prompts
         """
-        assert not (custom_dict and attribute), """
-        Either custom_dict or attribute must be None.
-        """
-        assert custom_dict or attribute in [
-            "gender",
-            "race",
-        ], "If custom_dict is None, attribute must be 'gender' or 'race'."
+        self._validate_attributes(
+            attribute=attribute, custom_dict=custom_dict, for_parsing=False
+        )
 
         custom_list = (
             list(itertools.chain(*custom_dict.values())) if custom_dict else None
@@ -412,13 +403,94 @@ async def generate_responses(
             },
         }
 
+    def check_ftu(
+        self,
+        prompts: List[str],
+        attribute: Optional[str] = None,
+        custom_list: Optional[List[str]] = None,
+        subset_prompts: bool = True,
+    ) -> Dict[str, Any]:
+        """
+        Checks for fairness through unawarenss (FTU) based on a list of prompts and a specified protected
+        attribute
+
+        Parameters
+        ----------
+        prompts : list of strings
+            A list of prompts to be parsed for protected attribute words
+
+        attribute : {'race','gender'}, default=None
+            Specifies what to parse for among race words and gender words. Must be specified
+            if custom_list is None
+
+        custom_list : List[str], default=None
+            Custom list of tokens to use for parsing prompts. Must be provided if attribute is None.
+
+        subset_prompts : bool, default=True
+            Indicates whether to return all prompts or only those containing attribute words
+
+        Returns
+        -------
+        dict
+            A dictionary with two keys: 'data' and 'metadata'.
+            'data' : dict
+                A dictionary containing the prompts and responses.
+                'prompt' : list
+                    A list of prompts.
+                'attribute_words' : list
+                    A list of attribute_words in each prompt.
+            'metadata' : dict
+                A dictionary containing metadata related to FTU.
+                'ftu_satisfied' : boolean
+                    Boolean indicator of whether or not prompts satisfy FTU
+                'filtered_prompt_count' : int
+                    The number of prompts that satisfy FTU.
+        """
+        self._validate_attributes(attribute=attribute, custom_list=custom_list)
+        attribute_to_print = (
+            "Protected attribute" if not attribute else attribute.capitalize()
+        )
+        attribute_words = self.parse_texts(
+            texts=prompts, attribute=attribute, custom_list=custom_list, 
+        )
+        prompts_subset = [
+            prompt for i, prompt in enumerate(prompts) if attribute_words[i]
+        ]
+        attribute_words_subset = [
+            aw for i, aw in enumerate(attribute_words) if attribute_words[i]
+        ]
+
+        n_prompts_with_attribute_words = len(prompts_subset)
+        ftu_satisfied = (n_prompts_with_attribute_words > 0)
+        ftu_text = " not " if ftu_satisfied else " "
+
+        ftu_print = (f"FTU is{ftu_text}satisfied.")
+        print(f"{attribute_to_print} words found in {len(prompts_subset)} prompts. {ftu_print}")
+
+        return {
+            "data": {
+                "prompts": prompts_subset if subset_prompts else prompts,
+                "attribute_words": attribute_words_subset if subset_prompts else attribute_words
+            },
+            "metadata": {
+                "ftu_satisfied": ftu_satisfied,
+                "n_prompts_with_attribute_words": n_prompts_with_attribute_words,
+                "attribute": attribute,
+                "custom_list": custom_list,
+                "subset_prompts": subset_prompts
+            }
+        }
+
     def _subset_prompts(
         self,
         prompts: List[str],
         attribute: Optional[str] = None,
         custom_list: Optional[List[str]] = None,
     ) -> Tuple[List[str], List[List[str]]]:
-        """Subset prompts that contain protected attribute words"""
+        """
+        Helper function to subset prompts that contain protected attribute words and also 
+        return the full set of parsing results
+        """
         attribute_to_print = (
             "Protected attribute" if not attribute else attribute.capitalize()
         )
@@ -498,9 +570,6 @@ def _sub_from_dict(
 
         return output_dict
 
-    ################################################################################
-    # Class for protected attribute scanning and replacing protected attribute words
-    ################################################################################
     @staticmethod
     def _get_race_subsequences(text: str) -> List[str]:
         """Used to check for string sequences"""
@@ -522,3 +591,29 @@ def _replace_race(text: str, target_race: str) -> str:
         for subseq in STRICT_RACE_WORDS:
             seq = seq.replace(subseq, race_replacement_mapping[subseq])
         return seq
+
+    @staticmethod
+    def _validate_attributes(
+        attribute: Optional[str] = None, 
+        custom_list: Optional[List[str]] = None, 
+        custom_dict: Optional[Dict[str, str]] = None,
+        for_parsing: bool = True 
+    ) -> None:
+        if for_parsing:
+            if (custom_list and attribute):
+                raise ValueError(
+                    "Either custom_list or attribute must be None."
+                ) 
+            if not (custom_list or attribute in ["race", "gender"]):
+                raise ValueError(
+                    "If custom_list is None, attribute must be 'race' or 'gender'."
+                ) 
+        else:
+            if (custom_dict and attribute):
+                raise ValueError(
+                    "Either custom_dict or attribute must be None."
+                ) 
+            if not (custom_dict or attribute in ["race", "gender"]):
+                raise ValueError(
+                    "If custom_dict is None, attribute must be 'race' or 'gender'."
+                )