|
122 | 122 | "\n",
|
123 | 123 | "Ragas measures your pipeline's performance against two dimensions\n",
|
124 | 124 | "\n",
|
125 |
| - "1. Factuality: measures the factual consistency of the generated answer against the given context.\n", |
| 125 | + "1. Faithfulness: measures the factual consistency of the generated answer against the given context.\n", |
126 | 126 | "2. Relevancy: measures how relevant retrieved contexts and the generated answer are to the question.\n",
|
127 | 127 | "\n",
|
128 | 128 | "Through repeated experiments, we have found that the quality of a RAG pipeline is highly dependent on these two dimensions. The final `ragas_score` is the harmonic mean of these two factors.\n",
|
|
137 | 137 | "metadata": {},
|
138 | 138 | "outputs": [],
|
139 | 139 | "source": [
|
140 |
| - "from ragas.metrics import context_relevancy, answer_relevancy, factuality" |
| 140 | + "from ragas.metrics import context_relevancy, answer_relevancy, faithfulness" |
141 | 141 | ]
|
142 | 142 | },
|
143 | 143 | {
|
|
149 | 149 | "\n",
|
150 | 150 | "1. context_relevancy - a measure of how relevent the retrieved context is to the question. Conveys quality of the retrieval pipeline.\n",
|
151 | 151 | "2. answer_relevancy - a measure of how relevent the answer is to the question\n",
|
152 |
| - "3. factuality - the factual consistancy of the answer to the context base on the question.\n", |
| 152 | + "3. faithfulness - the factual consistancy of the answer to the context base on the question.\n", |
153 | 153 | "\n",
|
154 |
| - "**Note:** *`factuality` using OpenAI's API to compute the score. If you using this metric make sure you set the environment key `OPENAI_API_KEY` with your API key.*\n", |
| 154 | + "**Note:** *`faithfulness` using OpenAI's API to compute the score. If you using this metric make sure you set the environment key `OPENAI_API_KEY` with your API key.*\n", |
155 | 155 | "\n",
|
156 | 156 | "**Note:** *`context_relevancy` and `answer_relevancy` use very small LLMs to compute the score. It will run on CPU but having a GPU is recommended.*\n",
|
157 | 157 | "\n",
|
|
188 | 188 | {
|
189 | 189 | "data": {
|
190 | 190 | "text/plain": [
|
191 |
| - "{'ragas_score': 0.860, 'context_relavency': 0.817, 'factuality': 0.892, 'answer_relevancy': 0.874}" |
| 191 | + "{'ragas_score': 0.860, 'context_relavency': 0.817, 'faithfulness': 0.892, 'answer_relevancy': 0.874}" |
192 | 192 | ]
|
193 | 193 | },
|
194 | 194 | "execution_count": 8,
|
|
200 | 200 | "from ragas import evaluate\n",
|
201 | 201 | "\n",
|
202 | 202 | "result = evaluate(\n",
|
203 |
| - " fiqa_eval[\"baseline\"], metrics=[context_relevancy, factuality, answer_relevancy]\n", |
| 203 | + " fiqa_eval[\"baseline\"], metrics=[context_relevancy, faithfulness, answer_relevancy]\n", |
204 | 204 | ")\n",
|
205 | 205 | "\n",
|
206 | 206 | "result"
|
|
248 | 248 | " <th>answer</th>\n",
|
249 | 249 | " <th>contexts</th>\n",
|
250 | 250 | " <th>context_relavency</th>\n",
|
251 |
| - " <th>factuality</th>\n", |
| 251 | + " <th>faithfulness</th>\n", |
252 | 252 | " <th>answer_relevancy</th>\n",
|
253 | 253 | " </tr>\n",
|
254 | 254 | " </thead>\n",
|
|
336 | 336 | "3 [Set up a meeting with the bank that handles y... 0.781 \n",
|
337 | 337 | "4 [The time horizon for your 401K/IRA is essenti... 0.737 \n",
|
338 | 338 | "\n",
|
339 |
| - " factuality answer_relevancy \n", |
| 339 | + " faithfulness answer_relevancy \n", |
340 | 340 | "0 1.0 0.922 \n",
|
341 | 341 | "1 1.0 0.923 \n",
|
342 | 342 | "2 1.0 0.824 \n",
|
|
0 commit comments