The IR Experiment Platform

The Information Retrieval Experiment Platform integrates ir_datasets, ir_measures, and PyTerrier with TIRA to promote more standardized, reproducible, and scalable retrieval experiments---and ultimately blinded experiments in IR. Standardization is achieved when the input and output of an experiment are compatible with ir_datasets and ir_measures, and the retrieval approach implements PyTerrier’s interfaces. However, none of this is a must for reproducibility and scalability, as TIRA can run any dockerized software locally or remotely in a cloud-native execution environment. Version control and caching ensure efficient (re)execution. TIRA allows for blind evaluation when an experiment runs on a remote server/cloud not under the control of the experimenter. The test data and ground truth are then hidden from view, and the retrieval software has to process them in a sandbox that prevents data leaks.

The platform currently includes 15 corpora (1.9 billion documents) on which 32 well-known shared tasks are based, as well as Docker images of 50 standard retrieval approaches. Within this setup, we were able to automatically run and evaluate the 50 approaches on the 32 tasks (1600 runs) in less than a week.

The hosted version of the IR Experiment Platform is open for submissions at https://www.tira.io/task-overview/ir-benchmarks.

Experiments

All evaluations and analysis (including those reported in the paper) are located in analysis-of-submissions.

Up-To-Date Leaderboards

Comparing the leaderboards accross different tasks is quite interesting (we have a large scale evaluation on that in the paper), e.g., compare MS MARCO DL 2019 with Antique or Args.me: On MS MARCO, all kinds of deep learning models are at the top, which totally reverses for other corpora, e.g., Args.me or Antique.

The current leaderboards can be viewed in tira.io:

Antique
Args.me 2020 Task 1
Args.me 2021 Task 1
Cranfield
TREC COVID
TREC Deep Learning 2019 (passage)
TREC Deep Learning 2020 (passage)
TREC Genomics 2004
TREC Genomics 2005
TREC 7
TREC 8
Robust04
TREC Web Track 2002 (gov)
TREC Web Track 2003 (gov)
TREC Web Track 2004 (gov)
TREC Web Track 2009 (ClueWeb09)
TREC Web Track 2010 (ClueWeb09)
TREC Web Track 2011 (ClueWeb09)
TREC Web Track 2012 (ClueWeb09)
TREC Web Track 2013 (ClueWeb12)
TREC Web Track 2014 (ClueWeb12)
Touché 2020 Task 2 (ClueWeb12)
Touché 2021 Task 2 (ClueWeb12)
Touché 2023 Task 2 (ClueWeb22) (Task is still ongoing, so the leaderboard is not yet public)
TREC Terabyte 2004 (gov2)
TREC Terabyte 2005 (gov2)
TREC Terabyte 2006 (gov2)
NFCorpus
Vaswani
TREC Core 2018 (wapo)
TREC Precision Medicine 2017
TREC Precision Medicine 2018

Import new Datasets

All datasets from the main branch of ir_datasets are supported by default. We have a tutorial showing how new, potentially work-in-progress data can be imported at ir-datasets/tutorial

Submission

Submission is available at: The hosted version of the IR Experiment Platform is open for submissions at https://www.tira.io/task-overview/ir-benchmarks. To simplify submissions, we provide several starters (that yield 50 different retrieval models) that you can use as starting point.

After the run was unblinded and published by an organizer, it becomes visible on the leaderboard (here, as example, the top entries by nDCG@10 for the ClueWeb09):

Reproducibility

Examples of reproducibility experiments are available in the directory reproducibility-experiments. The main advantage of the IR Experiment Platform is that after the shared tasks, the complete shared task repository can be archived in a fully self contained archive (including all software, runs, etc.). This repository https://github.com/tira-io/ir-experiment-platform-benchmarks contains an archived shared task repository covering over 50~retrieval softwares on more than 32 benchmarks with overall over 2000 executed softwares.

IR Starters

We provide starters for 4 frequently used IR research frameworks that can be used as basis for software submissions to the Information Retrieval Experiment Platform. Retrieval Systems submitted to the IR Experiment Platform has to be implemented in fully self-contained Docker images, i.e., the software must be able to run without internet connection to improve reproducibility (e.g., preventing cases where an external dependency or API is not available anymore in a few years). Our existing starters can be directly submitted to TIRA, as all of them have been extensively tested on 32 benchmarks in TIRA, and they also might serve as starting point for custom development.

The starters are available and documented in the directory tira-ir-starters.

Starter for PyTerrier in Jupyter

The simplest starter implements BM25 retrieval using a few lines of declarative PyTerrier code in a Jupyter notebook.

Paper

If you use TIRA/Tirex in your research (or cached run files or cached indices or other outputs), please cite the TIRA and TIREx paper:

@InProceedings{froebe:2023e,
  author =                   {Maik Fr{\"o}be and Jan Heinrich Reimer and Sean MacAvaney and Niklas Deckers and Simon Reich and Janek Bevendorff and Benno Stein and Matthias Hagen and Martin Potthast},
  booktitle =                {46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023)},
  doi =                      {10.1145/3539618.3591888},
  editor =                   {Hsin-Hsi Chen and {Wei-Jou (Edward)} Duh and Hen-Hsen Huang and {Makoto P.} Kato and Josiane Mothe and Barbara Poblete},
  isbn =                     {9781450394086},
  month =                    jul,
  numpages =                 11,
  pages =                    {2826--2836},
  publisher =                {ACM},
  site =                     {Taipei, Taiwan},
  title =                    {{The Information Retrieval Experiment Platform}},
  year =                     2023
}

@InProceedings{froebe:2023b,
  address =                  {Berlin Heidelberg New York},
  author =                   {Maik Fr{\"o}be and Matti Wiegmann and Nikolay Kolyada and Bastian Grahm and Theresa Elstner and Frank Loebe and Matthias Hagen and Benno Stein and Martin Potthast},
  booktitle =                {Advances in Information Retrieval. 45th European Conference on {IR} Research ({ECIR} 2023)},
  month =                    apr,
  publisher =                {Springer},
  series =                   {Lecture Notes in Computer Science},
  site =                     {Dublin, Irland},
  title =                    {{Continuous Integration for Reproducible Shared Tasks with TIRA.io}},
  year =                     2023
}

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
analysis-of-submissions		analysis-of-submissions
ir-datasets		ir-datasets
ir-measures		ir-measures
reproducibility-experiments		reproducibility-experiments
serps		serps
tira-ir-starters		tira-ir-starters
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The IR Experiment Platform

Experiments

Up-To-Date Leaderboards

Import new Datasets

Submission

Reproducibility

IR Starters

Starter for PyTerrier in Jupyter

Paper

About

Releases

Packages

Contributors 2

Languages

tira-io/ir-experiment-platform

Folders and files

Latest commit

History

Repository files navigation

The IR Experiment Platform

Experiments

Up-To-Date Leaderboards

Import new Datasets

Submission

Reproducibility

IR Starters

Starter for PyTerrier in Jupyter

Paper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages