Skip to content

Commit

Permalink
mf
Browse files Browse the repository at this point in the history
  • Loading branch information
mam10eks committed Feb 26, 2023
1 parent dcb9455 commit a7eb190
Show file tree
Hide file tree
Showing 3 changed files with 1,402 additions and 0 deletions.
344 changes: 344 additions & 0 deletions reproducibility-experiments/full-rank-retriever-reproducibility.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,344 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "74212152-7d6a-42af-a431-2b972f30ed54",
"metadata": {
"tags": []
},
"source": [
"# Tutorial with Full-Rank Retrievers\n",
"\n",
"This notebook shows how post-hoc experiments of the IR Experiment Platform can be conducted.\n",
"\n",
"To start the notebook, please clone the archived shared task repository:\n",
"\n",
"```\n",
"[email protected]:tira-io/ir-experiment-platform-benchmarks.git\n",
"```\n",
"\n",
"Inside the cloned repository, you can start the Jupyter notebook which automatically installs a minimal virtual environment using:\n",
"```\n",
"make jupyterlab\n",
"```\n",
"\n",
"The notebook covers how to run full-rank appraoches submitted to TIRA in reproducibility/replicability experiments on the same or new data.\n",
"\n",
"For each of the softwares submitted to TIRA, the `tira` integration to PyTerrier loads the Docker Image submitted to TIRA to execute it in PyTerrier pipelines (i.e., a first execution could take sligthly longer).\n"
]
},
{
"cell_type": "markdown",
"id": "6c4c7d74-ae9f-44e6-9d76-970425673879",
"metadata": {},
"source": [
"## Import Dependencies"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "6fe2ebee-9626-4858-bd0b-246a66b286e6",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import pandas as pd\n",
"pd.set_option('display.max_colwidth', 0)\n",
"\n",
"from tira.local_client import Client\n",
"tira = Client()\n",
"\n",
"import pyterrier as pt\n",
"if not pt.started():\n",
" pt.init()\n"
]
},
{
"cell_type": "markdown",
"id": "60d5e570-a7a6-4461-b796-b0f2505dada0",
"metadata": {},
"source": [
"### Initialize A Full-Rank Retriever\n",
"\n",
"We create a pyterrier retriever called `submitted_baseline` that is an approach submitted to a shared task in TIRA.\n",
"The approach is identified by the name `ir-benchmarks/tira-ir-starter/BM25 (tira-ir-starter-pyterrier)`, i.e., a software `BM25 (tira-ir-starter-pyterrier)` submitted to `ir-benchmarks` by the team `tira-ir-starter` (that hosts baselines).\n",
"This software consists of two stages: First, a first software component builds an PyTerrier Index, and the second software does the actual retrieval with BM25.\n",
"\n",
"With this API, any full-rank approach submitted in TIRA can be executed and re-executed, e.g., on new data.\n",
"\n",
"We can run the retriever on any dataset integrated in `ir_dataset`.\n",
"Here, we use `vaswani` to show the overall functionality with a fast example."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "c6784eea-1c8a-4b53-89f1-4ccfdb2e91a4",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"submitted_baseline = tira.pt.retriever(\n",
" 'ir-benchmarks/tira-ir-starter/BM25 (tira-ir-starter-pyterrier)',\n",
" dataset='vaswani',\n",
")\n"
]
},
{
"cell_type": "markdown",
"id": "fa1e37f3-6341-4ecf-9e96-c5f92ea53d9b",
"metadata": {},
"source": [
"Next, we can make the actual retrieval, here on two topics to keep the result set size small."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "b107a2de-f0c0-4807-a172-5f91487dcf35",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>qid</th>\n",
" <th>query</th>\n",
" <th>q0</th>\n",
" <th>docno</th>\n",
" <th>rank</th>\n",
" <th>score</th>\n",
" <th>system</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>MEASUREMENT OF DIELECTRIC CONSTANT OF LIQUIDS BY THE USE OF MICROWAVE TECHNIQUES</td>\n",
" <td>Q0</td>\n",
" <td>8172</td>\n",
" <td>1</td>\n",
" <td>24.566031</td>\n",
" <td>pyterrier.default_pipelines.wmodel_batch_retrieve</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>MEASUREMENT OF DIELECTRIC CONSTANT OF LIQUIDS BY THE USE OF MICROWAVE TECHNIQUES</td>\n",
" <td>Q0</td>\n",
" <td>9881</td>\n",
" <td>2</td>\n",
" <td>22.110514</td>\n",
" <td>pyterrier.default_pipelines.wmodel_batch_retrieve</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>MEASUREMENT OF DIELECTRIC CONSTANT OF LIQUIDS BY THE USE OF MICROWAVE TECHNIQUES</td>\n",
" <td>Q0</td>\n",
" <td>5502</td>\n",
" <td>3</td>\n",
" <td>21.717148</td>\n",
" <td>pyterrier.default_pipelines.wmodel_batch_retrieve</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>MEASUREMENT OF DIELECTRIC CONSTANT OF LIQUIDS BY THE USE OF MICROWAVE TECHNIQUES</td>\n",
" <td>Q0</td>\n",
" <td>1502</td>\n",
" <td>4</td>\n",
" <td>19.478355</td>\n",
" <td>pyterrier.default_pipelines.wmodel_batch_retrieve</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1</td>\n",
" <td>MEASUREMENT OF DIELECTRIC CONSTANT OF LIQUIDS BY THE USE OF MICROWAVE TECHNIQUES</td>\n",
" <td>Q0</td>\n",
" <td>9859</td>\n",
" <td>5</td>\n",
" <td>18.626342</td>\n",
" <td>pyterrier.default_pipelines.wmodel_batch_retrieve</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1995</th>\n",
" <td>2</td>\n",
" <td>MATHEMATICAL ANALYSIS AND DESIGN DETAILS OF WAVEGUIDE FED MICROWAVE RADIATIONS</td>\n",
" <td>Q0</td>\n",
" <td>4833</td>\n",
" <td>996</td>\n",
" <td>5.161525</td>\n",
" <td>pyterrier.default_pipelines.wmodel_batch_retrieve</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1996</th>\n",
" <td>2</td>\n",
" <td>MATHEMATICAL ANALYSIS AND DESIGN DETAILS OF WAVEGUIDE FED MICROWAVE RADIATIONS</td>\n",
" <td>Q0</td>\n",
" <td>3529</td>\n",
" <td>997</td>\n",
" <td>5.161525</td>\n",
" <td>pyterrier.default_pipelines.wmodel_batch_retrieve</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1997</th>\n",
" <td>2</td>\n",
" <td>MATHEMATICAL ANALYSIS AND DESIGN DETAILS OF WAVEGUIDE FED MICROWAVE RADIATIONS</td>\n",
" <td>Q0</td>\n",
" <td>271</td>\n",
" <td>998</td>\n",
" <td>5.161525</td>\n",
" <td>pyterrier.default_pipelines.wmodel_batch_retrieve</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1998</th>\n",
" <td>2</td>\n",
" <td>MATHEMATICAL ANALYSIS AND DESIGN DETAILS OF WAVEGUIDE FED MICROWAVE RADIATIONS</td>\n",
" <td>Q0</td>\n",
" <td>2429</td>\n",
" <td>999</td>\n",
" <td>5.161525</td>\n",
" <td>pyterrier.default_pipelines.wmodel_batch_retrieve</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999</th>\n",
" <td>2</td>\n",
" <td>MATHEMATICAL ANALYSIS AND DESIGN DETAILS OF WAVEGUIDE FED MICROWAVE RADIATIONS</td>\n",
" <td>Q0</td>\n",
" <td>17</td>\n",
" <td>1000</td>\n",
" <td>5.161525</td>\n",
" <td>pyterrier.default_pipelines.wmodel_batch_retrieve</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>2000 rows × 7 columns</p>\n",
"</div>"
],
"text/plain": [
" qid \\\n",
"0 1 \n",
"1 1 \n",
"2 1 \n",
"3 1 \n",
"4 1 \n",
"... .. \n",
"1995 2 \n",
"1996 2 \n",
"1997 2 \n",
"1998 2 \n",
"1999 2 \n",
"\n",
" query \\\n",
"0 MEASUREMENT OF DIELECTRIC CONSTANT OF LIQUIDS BY THE USE OF MICROWAVE TECHNIQUES \n",
"1 MEASUREMENT OF DIELECTRIC CONSTANT OF LIQUIDS BY THE USE OF MICROWAVE TECHNIQUES \n",
"2 MEASUREMENT OF DIELECTRIC CONSTANT OF LIQUIDS BY THE USE OF MICROWAVE TECHNIQUES \n",
"3 MEASUREMENT OF DIELECTRIC CONSTANT OF LIQUIDS BY THE USE OF MICROWAVE TECHNIQUES \n",
"4 MEASUREMENT OF DIELECTRIC CONSTANT OF LIQUIDS BY THE USE OF MICROWAVE TECHNIQUES \n",
"... ... \n",
"1995 MATHEMATICAL ANALYSIS AND DESIGN DETAILS OF WAVEGUIDE FED MICROWAVE RADIATIONS \n",
"1996 MATHEMATICAL ANALYSIS AND DESIGN DETAILS OF WAVEGUIDE FED MICROWAVE RADIATIONS \n",
"1997 MATHEMATICAL ANALYSIS AND DESIGN DETAILS OF WAVEGUIDE FED MICROWAVE RADIATIONS \n",
"1998 MATHEMATICAL ANALYSIS AND DESIGN DETAILS OF WAVEGUIDE FED MICROWAVE RADIATIONS \n",
"1999 MATHEMATICAL ANALYSIS AND DESIGN DETAILS OF WAVEGUIDE FED MICROWAVE RADIATIONS \n",
"\n",
" q0 docno rank score \\\n",
"0 Q0 8172 1 24.566031 \n",
"1 Q0 9881 2 22.110514 \n",
"2 Q0 5502 3 21.717148 \n",
"3 Q0 1502 4 19.478355 \n",
"4 Q0 9859 5 18.626342 \n",
"... .. ... .. ... \n",
"1995 Q0 4833 996 5.161525 \n",
"1996 Q0 3529 997 5.161525 \n",
"1997 Q0 271 998 5.161525 \n",
"1998 Q0 2429 999 5.161525 \n",
"1999 Q0 17 1000 5.161525 \n",
"\n",
" system \n",
"0 pyterrier.default_pipelines.wmodel_batch_retrieve \n",
"1 pyterrier.default_pipelines.wmodel_batch_retrieve \n",
"2 pyterrier.default_pipelines.wmodel_batch_retrieve \n",
"3 pyterrier.default_pipelines.wmodel_batch_retrieve \n",
"4 pyterrier.default_pipelines.wmodel_batch_retrieve \n",
"... ... \n",
"1995 pyterrier.default_pipelines.wmodel_batch_retrieve \n",
"1996 pyterrier.default_pipelines.wmodel_batch_retrieve \n",
"1997 pyterrier.default_pipelines.wmodel_batch_retrieve \n",
"1998 pyterrier.default_pipelines.wmodel_batch_retrieve \n",
"1999 pyterrier.default_pipelines.wmodel_batch_retrieve \n",
"\n",
"[2000 rows x 7 columns]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"topics = pd.DataFrame([\n",
" {'qid': 1, 'query': 'MEASUREMENT OF DIELECTRIC CONSTANT OF LIQUIDS BY THE USE OF MICROWAVE TECHNIQUES'},\n",
" {'qid': 2, 'query': 'MATHEMATICAL ANALYSIS AND DESIGN DETAILS OF WAVEGUIDE FED MICROWAVE RADIATIONS'},\n",
"])\n",
"\n",
"submitted_baseline(topics)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading

0 comments on commit a7eb190

Please sign in to comment.