diff --git a/README.md b/README.md index 25101fc902..c9909e0542 100644 --- a/README.md +++ b/README.md @@ -207,11 +207,13 @@ See individual pages for details! Key: + F1 = "flat" baseline (Lucene analyzer) -+ F2 = "flat" baselinse (pre-tokenized with `bert-base-uncased` tokenizer) ++ F2 = "flat" baseline (pre-tokenized with `bert-base-uncased` tokenizer) + MF = "multifield" baseline (Lucene analyzer) + U1 = uniCOIL (noexp) + S1 = SPLADE++ CoCondenser-EnsembleDistil +See instructions below the table for how to reproduce results for a model on all BEIR corpora "in one go". + | Corpus | F1 | F2 | MF | U1 | S1 | |-------------------------|:-----------------------------------------------------------------------------:|:--------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------:| | TREC-COVID | [+](docs/regressions/regressions-beir-v1.0.0-trec-covid-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-trec-covid-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-trec-covid-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-trec-covid-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-trec-covid-splade-pp-ed.md) | @@ -244,6 +246,36 @@ Key: | Climate-FEVER | [+](docs/regressions/regressions-beir-v1.0.0-climate-fever-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-climate-fever-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-climate-fever-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-climate-fever-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-climate-fever-splade-pp-ed.md) | | SciFact | [+](docs/regressions/regressions-beir-v1.0.0-scifact-flat.md) | [+](docs/regressions/regressions-beir-v1.0.0-scifact-flat-wp.md) | [+](docs/regressions/regressions-beir-v1.0.0-scifact-multifield.md) | [+](docs/regressions/regressions-beir-v1.0.0-scifact-unicoil-noexp.md) | [+](docs/regressions/regressions-beir-v1.0.0-scifact-splade-pp-ed.md) | +To reproduce the SPLADE++ CoCondenser-EnsembleDistil results, start by downloading the collection: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +Once you've unpacked the data, the following commands will loop over all BEIR corpora and run the regressions: + +```bash +MODEL="splade-pp-ed"; CORPORA=(trec-covid bioasq nfcorpus nq hotpotqa fiqa signal1m trec-news robust04 arguana webis-touche2020 cqadupstack-android cqadupstack-english cqadupstack-gaming cqadupstack-gis cqadupstack-mathematica cqadupstack-physics cqadupstack-programmers cqadupstack-stats cqadupstack-tex cqadupstack-unix cqadupstack-webmasters cqadupstack-wordpress quora dbpedia-entity scidocs fever climate-fever scifact); for c in "${CORPORA[@]}" +do + echo "Running $c..." + python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-${c}-${MODEL} > logs/log.beir-v1.0.0-${c}-${MODEL} 2>&1 +done +``` + +You can verify the results by examining the log files in `logs/`. + +For the other models, modify the above commands as follows: + +| Key | Corpus | Checksum | `MODEL` | +|:----|:--------------------------------|:-----------------------------------|:----------------| +| F1 | `beir-v1.0.0-corpus.tar` | `faefd5281b662c72ce03d22021e4ff6b` | `flat` | +| F2 | `beir-v1.0.0-corpus-wp.tar` | `3cf8f3dcdcadd49362965dd4466e6ff2` | `flat-wp` | +| MF | `beir-v1.0.0-corpus.tar` | `faefd5281b662c72ce03d22021e4ff6b` | `multifield` | +| U1 | `beir-v1.0.0-unicoil-noexp.tar` | `4fd04d2af816a6637fc12922cccc8a83` | `unicoil-noexp` | +| S1 | `beir-v1.0.0-splade-pp-ed.tar` | `9c7de5b444a788c9e74c340bf833173b` | `splade-pp-ed` | +
Cross-lingual and Multi-lingual Regressions diff --git a/docs/regressions/regressions-beir-v1.0.0-arguana-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-arguana-flat-wp.md index 0dfe24914f..affd3390a5 100644 --- a/docs/regressions/regressions-beir-v1.0.0-arguana-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-arguana-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-arguana-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-arguana-flat.md b/docs/regressions/regressions-beir-v1.0.0-arguana-flat.md index f8fd5b0b72..6a2edb1202 100644 --- a/docs/regressions/regressions-beir-v1.0.0-arguana-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-arguana-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-arguana-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-arguana-multifield.md b/docs/regressions/regressions-beir-v1.0.0-arguana-multifield.md index e891a8ba1e..00d432319a 100644 --- a/docs/regressions/regressions-beir-v1.0.0-arguana-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-arguana-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-arguana-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-arguana-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-arguana-splade-pp-ed.md index 82a55ae17a..8e012e5762 100644 --- a/docs/regressions/regressions-beir-v1.0.0-arguana-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-arguana-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-arguana-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-arguana-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): ArguAna | 0.9744 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): ArguAna | 0.9950 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-arguana-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-arguana-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-arguana-unicoil-noexp.md index 411d24a587..dfbbf17dc1 100644 --- a/docs/regressions/regressions-beir-v1.0.0-arguana-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-arguana-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-arguana-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-bioasq-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-bioasq-flat-wp.md index 4a5ad1e473..2f50940160 100644 --- a/docs/regressions/regressions-beir-v1.0.0-bioasq-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-bioasq-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-bioasq-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-bioasq-flat.md b/docs/regressions/regressions-beir-v1.0.0-bioasq-flat.md index 136406368d..47c8772283 100644 --- a/docs/regressions/regressions-beir-v1.0.0-bioasq-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-bioasq-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-bioasq-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-bioasq-multifield.md b/docs/regressions/regressions-beir-v1.0.0-bioasq-multifield.md index b716bf0d4d..f4c4e4732f 100644 --- a/docs/regressions/regressions-beir-v1.0.0-bioasq-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-bioasq-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-bioasq-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-bioasq-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-bioasq-splade-pp-ed.md index ff7f9fda9c..a34fa07dde 100644 --- a/docs/regressions/regressions-beir-v1.0.0-bioasq-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-bioasq-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-bioasq-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-bioasq-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): BioASQ | 0.7385 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): BioASQ | 0.8757 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-bioasq-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-bioasq-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-bioasq-unicoil-noexp.md index a32934cef5..6ade7df20b 100644 --- a/docs/regressions/regressions-beir-v1.0.0-bioasq-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-bioasq-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-bioasq-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-climate-fever-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-climate-fever-flat-wp.md index e08f6d6cc3..88c506582c 100644 --- a/docs/regressions/regressions-beir-v1.0.0-climate-fever-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-climate-fever-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-climate-fever-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-climate-fever-flat.md b/docs/regressions/regressions-beir-v1.0.0-climate-fever-flat.md index fb4dcd8880..be8aec2002 100644 --- a/docs/regressions/regressions-beir-v1.0.0-climate-fever-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-climate-fever-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-climate-fever-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-climate-fever-multifield.md b/docs/regressions/regressions-beir-v1.0.0-climate-fever-multifield.md index e52bafc6a2..804fba6942 100644 --- a/docs/regressions/regressions-beir-v1.0.0-climate-fever-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-climate-fever-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-climate-fever-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-climate-fever-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-climate-fever-splade-pp-ed.md index 61d0d283ec..38ca30c81c 100644 --- a/docs/regressions/regressions-beir-v1.0.0-climate-fever-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-climate-fever-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-climate-fever-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-climate-fever-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): Climate-FEVER | 0.5211 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): Climate-FEVER | 0.7183 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-climate-fever-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-climate-fever-unicoil-noexp.md index a35178271f..f59b69a8ef 100644 --- a/docs/regressions/regressions-beir-v1.0.0-climate-fever-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-climate-fever-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-climate-fever-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-flat-wp.md index 58e0839c57..4ad4812a89 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-android-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-flat.md index 717a561d08..da53dc2b97 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-android-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-multifield.md index 66457e47a5..d530228937 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-android-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-splade-pp-ed.md index 767f0257d8..1999b35ce8 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-cqadupstack-android-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-android-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-android | 0.7404 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): CQADupStack-android | 0.9064 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-unicoil-noexp.md index b0b0447351..e3d7e9a529 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-android-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-android-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-flat-wp.md index d522c71826..b7d00b8df9 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-english-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-flat.md index e57a3dab8a..ef4438aef4 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-english-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-multifield.md index 4b7bf19f54..510546e712 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-english-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-splade-pp-ed.md index 301e8dfb96..94c1d6ac6c 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-cqadupstack-english-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-english-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-english | 0.6946 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): CQADupStack-english | 0.8454 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-unicoil-noexp.md index d1ffc9c3e0..5d5a2af593 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-english-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-english-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-flat-wp.md index 4637b8ae97..6afb68cce4 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-gaming-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-flat.md index 7d573f285a..1310d3ee7b 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-gaming-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-multifield.md index 542835339c..3f8638e1b7 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-gaming-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-splade-pp-ed.md index 72dacd9f05..d8ad72ac06 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-cqadupstack-gaming-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-gaming-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-gaming | 0.8131 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): CQADupStack-gaming | 0.9221 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.md index 9b22d9151f..52875fc3a9 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-gaming-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-flat-wp.md index e35368e6d2..ea74d211ee 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-gis-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-flat.md index b260b147f1..48b41caabf 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-gis-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-multifield.md index d5df1020c4..1dc6047e73 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-gis-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-splade-pp-ed.md index c69ddf6cc7..74046e7499 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-cqadupstack-gis-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-gis-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-gis | 0.6320 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): CQADupStack-gis | 0.8325 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-unicoil-noexp.md index 1fc6fedd7e..18690233ab 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-gis-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-gis-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-flat-wp.md index 5a9a3b9743..2feae1eba0 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-mathematica-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-flat.md index 7985f03a6b..1973a30738 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-mathematica-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-multifield.md index 5372c3be8a..4d9725000f 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-mathematica-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-splade-pp-ed.md index 358ae79188..7ba58897f1 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-cqadupstack-mathematica-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-mathematica-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-mathematica | 0.5797 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): CQADupStack-mathematica | 0.8007 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.md index 5d15dda9da..6fc51ab862 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-flat-wp.md index 5eb902d4a2..beba7ba07b 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-physics-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-flat.md index b7e12c933e..d5a4ec958d 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-physics-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-multifield.md index 02f551950d..bf67559d35 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-physics-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-splade-pp-ed.md index c59291a5cb..1a90ccbf6f 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-cqadupstack-physics-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-physics-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-physics | 0.7196 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): CQADupStack-physics | 0.9010 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-unicoil-noexp.md index e8685fbc26..d44e63867b 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-physics-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-physics-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-flat-wp.md index 39a155b0cf..e9aa896d19 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-programmers-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-flat.md index 8592142113..651fe0021e 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-programmers-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-multifield.md index 14ae76b81b..ef52ce1f01 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-programmers-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-splade-pp-ed.md index 3720a4c1de..5701f8787f 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-cqadupstack-programmers-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-programmers-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-programmers | 0.6585 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): CQADupStack-programmers | 0.8603 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.md index 172259340c..24ad3041ba 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-programmers-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-flat-wp.md index 6322d2578e..62eafe2259 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-stats-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-flat.md index 7eb589102d..d27c9829ee 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-stats-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-multifield.md index 60b29d5e14..2d7d6b2129 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-stats-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-splade-pp-ed.md index f215c94a7c..956da5af05 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-cqadupstack-stats-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-stats-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-stats | 0.5894 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): CQADupStack-stats | 0.7776 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-unicoil-noexp.md index 07eba9eeca..25c0e8edb9 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-stats-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-stats-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-flat-wp.md index ef00d72255..f34e4117a4 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-tex-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-flat.md index 9b1d9d7917..707832fd00 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-tex-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-multifield.md index 7bbc0b3b20..f4bcca6703 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-tex-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-splade-pp-ed.md index 2ffe2439d9..4d47aaa407 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-cqadupstack-tex-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-tex-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-tex | 0.5161 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): CQADupStack-tex | 0.7341 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-unicoil-noexp.md index 8f208451af..0265be4812 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-tex-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-tex-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-flat-wp.md index 040f01ac6d..be0907310a 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-unix-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-flat.md index a5a7781672..6736e12390 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-unix-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-multifield.md index fbb47a1a94..64ffc5eba8 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-unix-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-splade-pp-ed.md index 56cf1e1436..4627fdb237 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-cqadupstack-unix-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-unix-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-unix | 0.6214 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): CQADupStack-unix | 0.8257 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-unicoil-noexp.md index 603be401eb..db9229e340 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-unix-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-unix-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-flat-wp.md index 5f1cd14936..b3178f8fb7 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-webmasters-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-flat.md index 6d7b5258f6..481d12fd88 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-webmasters-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-multifield.md index f7dc6da9fb..1e56166a21 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-webmasters-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-splade-pp-ed.md index 0c77f851cc..ef6b3b5984 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-cqadupstack-webmasters-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-webmasters-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-webmasters | 0.6360 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): CQADupStack-webmasters | 0.8710 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.md index 1cb4752433..7982fe2bad 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-flat-wp.md index 8b7d01774c..573a86fb79 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-wordpress-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-flat.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-flat.md index 4f885d52f7..12388a49ab 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-wordpress-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-multifield.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-multifield.md index 33a8d385d1..d31505fee9 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-wordpress-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-splade-pp-ed.md index c81dfa6e10..ed2b0bcc6a 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-cqadupstack-wordpress-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-wordpress-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): CQADupStack-wordpress | 0.5945 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): CQADupStack-wordpress | 0.7924 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.md index f207bd511d..de5579a1c7 100644 --- a/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-flat-wp.md index eb2ca0f7a4..4d39ca5807 100644 --- a/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-dbpedia-entity-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-flat.md b/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-flat.md index dcda4101aa..49f3ed6760 100644 --- a/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-dbpedia-entity-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-multifield.md b/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-multifield.md index a07bf17828..07f2816b62 100644 --- a/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-dbpedia-entity-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-splade-pp-ed.md index 6506a631dd..4e87a0890a 100644 --- a/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-dbpedia-entity-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-dbpedia-entity-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): DBPedia | 0.5624 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): DBPedia | 0.7838 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-unicoil-noexp.md index b3ced6a900..bba9350034 100644 --- a/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-dbpedia-entity-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-dbpedia-entity-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-fever-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-fever-flat-wp.md index b0be01a272..f4c961d64f 100644 --- a/docs/regressions/regressions-beir-v1.0.0-fever-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-fever-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-fever-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-fever-flat.md b/docs/regressions/regressions-beir-v1.0.0-fever-flat.md index e4b6ea972e..fb63549eec 100644 --- a/docs/regressions/regressions-beir-v1.0.0-fever-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-fever-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-fever-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-fever-multifield.md b/docs/regressions/regressions-beir-v1.0.0-fever-multifield.md index e79419449c..e1cc0bf503 100644 --- a/docs/regressions/regressions-beir-v1.0.0-fever-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-fever-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-fever-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-fever-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-fever-splade-pp-ed.md index c270e9b95f..ba853dc4f7 100644 --- a/docs/regressions/regressions-beir-v1.0.0-fever-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-fever-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-fever-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-fever-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): FEVER | 0.9459 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): FEVER | 0.9660 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-fever-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-fever-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-fever-unicoil-noexp.md index 569f5088fa..28e8872c73 100644 --- a/docs/regressions/regressions-beir-v1.0.0-fever-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-fever-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-fever-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-fiqa-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-fiqa-flat-wp.md index 95f78592fc..6bd4b4c278 100644 --- a/docs/regressions/regressions-beir-v1.0.0-fiqa-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-fiqa-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-fiqa-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-fiqa-flat.md b/docs/regressions/regressions-beir-v1.0.0-fiqa-flat.md index da05949d0d..8326ea356a 100644 --- a/docs/regressions/regressions-beir-v1.0.0-fiqa-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-fiqa-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-fiqa-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-fiqa-multifield.md b/docs/regressions/regressions-beir-v1.0.0-fiqa-multifield.md index 0dddb0a453..2c37269d4d 100644 --- a/docs/regressions/regressions-beir-v1.0.0-fiqa-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-fiqa-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-fiqa-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-fiqa-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-fiqa-splade-pp-ed.md index 816d26dc2d..9a09531087 100644 --- a/docs/regressions/regressions-beir-v1.0.0-fiqa-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-fiqa-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-fiqa-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-fiqa-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): FiQA-2018 | 0.6314 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): FiQA-2018 | 0.8392 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-fiqa-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-fiqa-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-fiqa-unicoil-noexp.md index 61d6da5f73..a8d58b5942 100644 --- a/docs/regressions/regressions-beir-v1.0.0-fiqa-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-fiqa-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-fiqa-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-hotpotqa-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-hotpotqa-flat-wp.md index a4d63139ab..7119df6fc2 100644 --- a/docs/regressions/regressions-beir-v1.0.0-hotpotqa-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-hotpotqa-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-hotpotqa-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-hotpotqa-flat.md b/docs/regressions/regressions-beir-v1.0.0-hotpotqa-flat.md index 134d5d8921..69c17458b6 100644 --- a/docs/regressions/regressions-beir-v1.0.0-hotpotqa-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-hotpotqa-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-hotpotqa-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-hotpotqa-multifield.md b/docs/regressions/regressions-beir-v1.0.0-hotpotqa-multifield.md index 814440c441..4d43571a13 100644 --- a/docs/regressions/regressions-beir-v1.0.0-hotpotqa-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-hotpotqa-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-hotpotqa-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-hotpotqa-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-hotpotqa-splade-pp-ed.md index 3dc9999350..7bdc2828d7 100644 --- a/docs/regressions/regressions-beir-v1.0.0-hotpotqa-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-hotpotqa-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-hotpotqa-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-hotpotqa-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): HotpotQA | 0.8177 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): HotpotQA | 0.8952 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-hotpotqa-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-hotpotqa-unicoil-noexp.md index aca3f7fcb7..f840075a68 100644 --- a/docs/regressions/regressions-beir-v1.0.0-hotpotqa-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-hotpotqa-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-hotpotqa-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-nfcorpus-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-nfcorpus-flat-wp.md index 2bf634592a..5fd256a795 100644 --- a/docs/regressions/regressions-beir-v1.0.0-nfcorpus-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-nfcorpus-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-nfcorpus-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-nfcorpus-flat.md b/docs/regressions/regressions-beir-v1.0.0-nfcorpus-flat.md index 6494fb39a6..7592c1cc74 100644 --- a/docs/regressions/regressions-beir-v1.0.0-nfcorpus-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-nfcorpus-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-nfcorpus-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-nfcorpus-multifield.md b/docs/regressions/regressions-beir-v1.0.0-nfcorpus-multifield.md index 0ef5494b80..73f9e9af84 100644 --- a/docs/regressions/regressions-beir-v1.0.0-nfcorpus-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-nfcorpus-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-nfcorpus-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-nfcorpus-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-nfcorpus-splade-pp-ed.md index 20df849190..4c85acec7b 100644 --- a/docs/regressions/regressions-beir-v1.0.0-nfcorpus-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-nfcorpus-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-nfcorpus-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-nfcorpus-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): NFCorpus | 0.2844 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): NFCorpus | 0.5925 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-nfcorpus-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-nfcorpus-unicoil-noexp.md index fdae8fd957..50cebd6845 100644 --- a/docs/regressions/regressions-beir-v1.0.0-nfcorpus-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-nfcorpus-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-nfcorpus-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-nq-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-nq-flat-wp.md index 8a03583464..b8333ee9d7 100644 --- a/docs/regressions/regressions-beir-v1.0.0-nq-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-nq-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-nq-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-nq-flat.md b/docs/regressions/regressions-beir-v1.0.0-nq-flat.md index 7a634feb25..5fe9011628 100644 --- a/docs/regressions/regressions-beir-v1.0.0-nq-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-nq-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-nq-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-nq-multifield.md b/docs/regressions/regressions-beir-v1.0.0-nq-multifield.md index bebb6feda9..038c4525ef 100644 --- a/docs/regressions/regressions-beir-v1.0.0-nq-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-nq-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-nq-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-nq-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-nq-splade-pp-ed.md index 2818302d27..3223877cdd 100644 --- a/docs/regressions/regressions-beir-v1.0.0-nq-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-nq-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-nq-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-nq-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): NQ | 0.9296 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): NQ | 0.9839 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-nq-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-nq-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-nq-unicoil-noexp.md index 4251e9cbc3..00291806fa 100644 --- a/docs/regressions/regressions-beir-v1.0.0-nq-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-nq-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-nq-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-quora-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-quora-flat-wp.md index 47985aeaa5..e7e0948f81 100644 --- a/docs/regressions/regressions-beir-v1.0.0-quora-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-quora-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-quora-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-quora-flat.md b/docs/regressions/regressions-beir-v1.0.0-quora-flat.md index 47ee59a2f6..5396106854 100644 --- a/docs/regressions/regressions-beir-v1.0.0-quora-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-quora-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-quora-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-quora-multifield.md b/docs/regressions/regressions-beir-v1.0.0-quora-multifield.md index 250ea59d9e..ffd55b767f 100644 --- a/docs/regressions/regressions-beir-v1.0.0-quora-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-quora-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-quora-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-quora-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-quora-splade-pp-ed.md index cef0cb2665..e1a3ba59fb 100644 --- a/docs/regressions/regressions-beir-v1.0.0-quora-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-quora-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-quora-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-quora-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): Quora | 0.9863 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): Quora | 0.9989 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-quora-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-quora-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-quora-unicoil-noexp.md index a6f5e15cc0..4d0ed2772d 100644 --- a/docs/regressions/regressions-beir-v1.0.0-quora-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-quora-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-quora-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-robust04-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-robust04-flat-wp.md index 530dc749b4..63b73dd4a8 100644 --- a/docs/regressions/regressions-beir-v1.0.0-robust04-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-robust04-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-robust04-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-robust04-flat.md b/docs/regressions/regressions-beir-v1.0.0-robust04-flat.md index 0373a9ab97..90623d1396 100644 --- a/docs/regressions/regressions-beir-v1.0.0-robust04-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-robust04-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-robust04-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-robust04-multifield.md b/docs/regressions/regressions-beir-v1.0.0-robust04-multifield.md index 16f664515b..9df507eb10 100644 --- a/docs/regressions/regressions-beir-v1.0.0-robust04-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-robust04-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-robust04-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-robust04-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-robust04-splade-pp-ed.md index 5b2959de82..df5db26520 100644 --- a/docs/regressions/regressions-beir-v1.0.0-robust04-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-robust04-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-robust04-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-robust04-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): Robust04 | 0.3850 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): Robust04 | 0.6228 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-robust04-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-robust04-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-robust04-unicoil-noexp.md index 42992a72fc..ce49117acd 100644 --- a/docs/regressions/regressions-beir-v1.0.0-robust04-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-robust04-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-robust04-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-scidocs-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-scidocs-flat-wp.md index 0859833008..0d5d4e5590 100644 --- a/docs/regressions/regressions-beir-v1.0.0-scidocs-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-scidocs-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-scidocs-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-scidocs-flat.md b/docs/regressions/regressions-beir-v1.0.0-scidocs-flat.md index 20bc2554b1..99aa6c412a 100644 --- a/docs/regressions/regressions-beir-v1.0.0-scidocs-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-scidocs-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-scidocs-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-scidocs-multifield.md b/docs/regressions/regressions-beir-v1.0.0-scidocs-multifield.md index 88cd082b84..f018423b40 100644 --- a/docs/regressions/regressions-beir-v1.0.0-scidocs-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-scidocs-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-scidocs-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-scidocs-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-scidocs-splade-pp-ed.md index cbd79412b4..990fc8ae03 100644 --- a/docs/regressions/regressions-beir-v1.0.0-scidocs-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-scidocs-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-scidocs-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-scidocs-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): SCIDOCS | 0.3730 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): SCIDOCS | 0.6016 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-scidocs-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-scidocs-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-scidocs-unicoil-noexp.md index a3f97ae7bb..daaa7c3dbc 100644 --- a/docs/regressions/regressions-beir-v1.0.0-scidocs-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-scidocs-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-scidocs-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-scifact-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-scifact-flat-wp.md index 6030501af8..90279d5234 100644 --- a/docs/regressions/regressions-beir-v1.0.0-scifact-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-scifact-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-scifact-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-scifact-flat.md b/docs/regressions/regressions-beir-v1.0.0-scifact-flat.md index 2ce7259fbf..2c04ee5592 100644 --- a/docs/regressions/regressions-beir-v1.0.0-scifact-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-scifact-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-scifact-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-scifact-multifield.md b/docs/regressions/regressions-beir-v1.0.0-scifact-multifield.md index 413d25a222..27127bb8e8 100644 --- a/docs/regressions/regressions-beir-v1.0.0-scifact-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-scifact-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-scifact-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-scifact-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-scifact-splade-pp-ed.md index 165aae2674..a6b371a913 100644 --- a/docs/regressions/regressions-beir-v1.0.0-scifact-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-scifact-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-scifact-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-scifact-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): SciFact | 0.9353 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): SciFact | 0.9867 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-scifact-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-scifact-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-scifact-unicoil-noexp.md index ecc6a5f6f8..5538d4cc40 100644 --- a/docs/regressions/regressions-beir-v1.0.0-scifact-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-scifact-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-scifact-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-signal1m-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-signal1m-flat-wp.md index 5dbc780a08..466438dfca 100644 --- a/docs/regressions/regressions-beir-v1.0.0-signal1m-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-signal1m-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-signal1m-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-signal1m-flat.md b/docs/regressions/regressions-beir-v1.0.0-signal1m-flat.md index e424cb0ed2..498fecaccc 100644 --- a/docs/regressions/regressions-beir-v1.0.0-signal1m-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-signal1m-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-signal1m-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-signal1m-multifield.md b/docs/regressions/regressions-beir-v1.0.0-signal1m-multifield.md index 26d44ac19d..4d333f5826 100644 --- a/docs/regressions/regressions-beir-v1.0.0-signal1m-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-signal1m-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-signal1m-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-signal1m-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-signal1m-splade-pp-ed.md index 9e438040f4..f5b479ee0b 100644 --- a/docs/regressions/regressions-beir-v1.0.0-signal1m-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-signal1m-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-signal1m-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-signal1m-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): Signal-1M | 0.3398 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): Signal-1M | 0.5492 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-signal1m-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-signal1m-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-signal1m-unicoil-noexp.md index 57437d3153..f44c07dad2 100644 --- a/docs/regressions/regressions-beir-v1.0.0-signal1m-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-signal1m-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-signal1m-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-trec-covid-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-trec-covid-flat-wp.md index 0db8e724b6..18f7ae31fc 100644 --- a/docs/regressions/regressions-beir-v1.0.0-trec-covid-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-trec-covid-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-trec-covid-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-trec-covid-flat.md b/docs/regressions/regressions-beir-v1.0.0-trec-covid-flat.md index 1c33d5fe9c..5c149268d2 100644 --- a/docs/regressions/regressions-beir-v1.0.0-trec-covid-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-trec-covid-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-trec-covid-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-trec-covid-multifield.md b/docs/regressions/regressions-beir-v1.0.0-trec-covid-multifield.md index 76576cdbf8..777e18e5f5 100644 --- a/docs/regressions/regressions-beir-v1.0.0-trec-covid-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-trec-covid-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-trec-covid-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-trec-covid-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-trec-covid-splade-pp-ed.md index df59c1a48e..2193963b39 100644 --- a/docs/regressions/regressions-beir-v1.0.0-trec-covid-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-trec-covid-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-trec-covid-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-trec-covid-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): TREC-COVID | 0.1282 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): TREC-COVID | 0.4441 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-trec-covid-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-trec-covid-unicoil-noexp.md index ce3e30c228..f8b0297b8e 100644 --- a/docs/regressions/regressions-beir-v1.0.0-trec-covid-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-trec-covid-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-trec-covid-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-trec-news-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-trec-news-flat-wp.md index 39eee061d1..26f210b4e8 100644 --- a/docs/regressions/regressions-beir-v1.0.0-trec-news-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-trec-news-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-trec-news-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-trec-news-flat.md b/docs/regressions/regressions-beir-v1.0.0-trec-news-flat.md index 437caa2027..15633cebce 100644 --- a/docs/regressions/regressions-beir-v1.0.0-trec-news-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-trec-news-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-trec-news-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-trec-news-multifield.md b/docs/regressions/regressions-beir-v1.0.0-trec-news-multifield.md index 0c7448acb6..8a32534966 100644 --- a/docs/regressions/regressions-beir-v1.0.0-trec-news-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-trec-news-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-trec-news-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-trec-news-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-trec-news-splade-pp-ed.md index 57667eea01..49602ac0f6 100644 --- a/docs/regressions/regressions-beir-v1.0.0-trec-news-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-trec-news-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-trec-news-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-trec-news-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): TREC-NEWS | 0.4414 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): TREC-NEWS | 0.7060 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-trec-news-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-trec-news-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-trec-news-unicoil-noexp.md index b76c0c8258..ea77fe4fdd 100644 --- a/docs/regressions/regressions-beir-v1.0.0-trec-news-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-trec-news-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-trec-news-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-flat-wp.md b/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-flat-wp.md index 007e3cdd6c..1827d09950 100644 --- a/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-flat-wp.md +++ b/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-flat-wp.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-webis-touche2020-flat-wp ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-flat.md b/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-flat.md index 10c3482eb7..ef35f5460e 100644 --- a/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-flat.md +++ b/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-flat.md @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-webis-touche2020-flat ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-multifield.md b/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-multifield.md index 664996b530..3b4592421e 100644 --- a/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-multifield.md +++ b/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-multifield.md @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-webis-touche2020-multifield ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-splade-pp-ed.md b/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-splade-pp-ed.md index 5d76e9c4ea..c639bbe9e4 100644 --- a/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-splade-pp-ed.md +++ b/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-splade-pp-ed.md @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](../../src/ma From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression beir-v1.0.0-webis-touche2020-splade-pp-ed +python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-webis-touche2020-splade-pp-ed ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -66,8 +75,3 @@ With the above commands, you should be able to reproduce the following results: | BEIR (v1.0.0): Webis-Touche2020 | 0.4715 | | **R@1000** | **SPLADE++ (CoCondenser-EnsembleDistil)**| | BEIR (v1.0.0): Webis-Touche2020 | 0.8191 | - - -## Reproduction Log[*](../../docs/reproducibility.md) - -To add to this reproduction log, modify [this template](../../src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-splade-pp-ed.template) and run `bin/build.sh` to rebuild the documentation. diff --git a/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-unicoil-noexp.md b/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-unicoil-noexp.md index 9885e34e61..398c6bc769 100644 --- a/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-unicoil-noexp.md +++ b/docs/regressions/regressions-beir-v1.0.0-webis-touche2020-unicoil-noexp.md @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression beir-v1.0.0-webis-touche2020-unicoil-noexp ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat-wp.template index 883051e316..f1dc159dad 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat.template index f3aaa385c1..82166680f9 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-arguana-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-arguana-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-arguana-multifield.template index 39960620bb..22a8ec7a28 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-arguana-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-arguana-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-arguana-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-arguana-splade-pp-ed.template index cbcdd5090c..78a2d43ac0 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-arguana-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-arguana-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-arguana-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-arguana-unicoil-noexp.template index 7037b1f942..9a49bf09b4 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-arguana-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-arguana-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-flat-wp.template index 39ddd86af9..2ecce71e0e 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-flat.template index b9f006423f..9f50d6615c 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-multifield.template index 5fef099fd1..c589d717ac 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-splade-pp-ed.template index 663c6a3280..8ec35bfd16 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-unicoil-noexp.template index 0c94dd1119..6af96dd5d6 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-bioasq-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat-wp.template index 20dd483e87..ab3df62d18 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat.template index be8a6b93a5..4a33ebf35b 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-multifield.template index f63334f946..9809054c15 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-splade-pp-ed.template index bc4e4d80ad..8f237c634e 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-unicoil-noexp.template index 63392a37c4..3b10d56eb8 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-climate-fever-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-flat-wp.template index 7afd0c0997..571f09e94d 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-flat.template index 1bcc0b4d5b..5bde8dae71 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-multifield.template index f7c9f91a46..45d721aaee 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-splade-pp-ed.template index e072c57f09..d0ef5507fd 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-unicoil-noexp.template index 06c911ee81..82aa18004e 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-android-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-flat-wp.template index 17ce96ceac..db0c692912 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-flat.template index b5d668bd79..6626091a26 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-multifield.template index 48a5cec922..b79a777d89 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-splade-pp-ed.template index 3e31476324..8386ba5113 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-unicoil-noexp.template index 9e7e8a10d3..465e45cf78 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-english-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-flat-wp.template index 9da82eca4e..5c8cc2debe 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-flat.template index 62f3571740..e5fd9664ac 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-multifield.template index eadd63ed0a..7672ed22f9 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-splade-pp-ed.template index 966a20f575..585933100f 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.template index 7a2c93a5dc..752940fbae 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gaming-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-flat-wp.template index 74a064e7ef..95ef239613 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-flat.template index 4af81819a5..073dd903fb 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-multifield.template index 0da6842d13..29b477bba8 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-splade-pp-ed.template index b422320af5..28d54232b1 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-unicoil-noexp.template index 914a75d34e..14a9d91633 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-gis-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-flat-wp.template index 18bdc0ee66..e6c4c519bc 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-flat.template index c8a339245a..6095067feb 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-multifield.template index 877dedffa3..1dc2965b4f 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-splade-pp-ed.template index 8da983b875..924f21e675 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.template index d05e3564b2..cef0268ccb 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-mathematica-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-flat-wp.template index 3947ea722f..07279a6205 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-flat.template index f9748e622d..dbf1c8c1d1 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-multifield.template index 9666a36cd5..92f8cd67f2 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-splade-pp-ed.template index b0ecfd8f26..c6d1378193 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-unicoil-noexp.template index 26162ad1e8..0a1b94d143 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-physics-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-flat-wp.template index f62e6a081e..1dca0336b3 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-flat.template index 3009b3f7d1..c48c4f2abc 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-multifield.template index e32ac8549d..478cdd73f3 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-splade-pp-ed.template index 448c9bd438..9df8c74b42 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.template index ce1afbbe08..33c39a8c18 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-programmers-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-flat-wp.template index f0e174111c..0cef6d1b25 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-flat.template index 5c573679c8..1174e4ef72 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-multifield.template index 7dc9744562..22ecce5e42 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-splade-pp-ed.template index 6ef242875b..cda65ad3d1 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-unicoil-noexp.template index 778e00ca96..2349351ff6 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-stats-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-flat-wp.template index a68904b738..99534a50b8 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-flat.template index 4dcf8ba60b..3553efd632 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-multifield.template index 19ab0d4e30..462e49c1bf 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-splade-pp-ed.template index 13b8d74016..6d14e0d18e 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-unicoil-noexp.template index 9c5028d31b..39a392e655 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-tex-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-flat-wp.template index c66ef9dce0..ab464b5e58 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-flat.template index 1ce06d3a4a..94895833a6 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-multifield.template index 6e7cf0159f..1c78663c21 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-splade-pp-ed.template index 2722294720..9bad836259 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-unicoil-noexp.template index 634dcf61b1..40a8ce96ad 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-unix-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-flat-wp.template index d6c7758792..bb9a1ef459 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-flat.template index 2b7314d8ac..c7afe67484 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-multifield.template index 837c404983..d1fb431575 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-splade-pp-ed.template index a2403fbd35..7851a507e8 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.template index 71f2ae3f82..67da932348 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-webmasters-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-flat-wp.template index 994dbec854..a8399fcebe 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-flat.template index 958e6a7924..cc5ec4575e 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-multifield.template index b8441a9645..498c770bf8 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-splade-pp-ed.template index 21b613d30e..5b2f5e7de0 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.template index efab4a17b6..8e6264e4c1 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-cqadupstack-wordpress-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-flat-wp.template index aad2b01b0b..99be20d299 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-flat.template index 6d9205e9bb..c639629e6c 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-multifield.template index 31d34ae37e..00ad3c9452 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-splade-pp-ed.template index 1fe68b179e..31707ee422 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-unicoil-noexp.template index dfc1d3a4b6..289c0c5900 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-dbpedia-entity-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-fever-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-fever-flat-wp.template index d100b296d8..7801ea36c3 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-fever-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-fever-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-fever-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-fever-flat.template index acf1e67ba9..8fcbd886bf 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-fever-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-fever-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-fever-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-fever-multifield.template index a992c4c57b..66e3e04058 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-fever-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-fever-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-fever-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-fever-splade-pp-ed.template index e8849567d8..48eee02dfc 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-fever-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-fever-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-fever-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-fever-unicoil-noexp.template index 66c4f4aa1d..62fab6fec7 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-fever-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-fever-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-flat-wp.template index 107c7e3098..ec07ab75f6 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-flat.template index feabcf7b4b..6055c3e5f0 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-multifield.template index 4de7b44928..d9999a56cf 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-splade-pp-ed.template index f3b584ac7d..46bd64363b 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-unicoil-noexp.template index b229c52abe..9f29b35ca5 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-fiqa-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-flat-wp.template index e0212b8744..e3f5044af9 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-flat.template index e472e9505d..366af01a6a 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-multifield.template index 26dbb9f239..71433ad3f2 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-splade-pp-ed.template index bf19978b1f..cae099240a 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-unicoil-noexp.template index be6255d1c8..9bac77a7ac 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-hotpotqa-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-flat-wp.template index 11c28bef39..e9f2d29d24 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-flat.template index 66691933d8..9ba717ef35 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-multifield.template index 4be8ecc62f..ef15434ca8 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-splade-pp-ed.template index e59bf5e569..3afb342def 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-unicoil-noexp.template index 20df61e2dc..02ac462121 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-nfcorpus-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-nq-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-nq-flat-wp.template index dc717315b1..c8b484e3a4 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-nq-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-nq-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-nq-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-nq-flat.template index 04931a531c..c6545996ea 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-nq-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-nq-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-nq-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-nq-multifield.template index c4f216ccab..3f651e39de 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-nq-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-nq-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-nq-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-nq-splade-pp-ed.template index 587ff797ec..4eef68e481 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-nq-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-nq-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-nq-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-nq-unicoil-noexp.template index 380c2ef906..1e069ab9b2 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-nq-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-nq-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-quora-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-quora-flat-wp.template index 9d6596be37..b9dec4a41f 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-quora-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-quora-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-quora-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-quora-flat.template index 63c47a6c4e..a35cbbf57b 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-quora-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-quora-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-quora-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-quora-multifield.template index 2e15e24b53..e0faad90c7 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-quora-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-quora-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-quora-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-quora-splade-pp-ed.template index 4a180cda01..97f7c6cba8 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-quora-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-quora-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-quora-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-quora-unicoil-noexp.template index 8265204a85..8fd04ae393 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-quora-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-quora-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-robust04-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-robust04-flat-wp.template index abb4a63490..54b83d1e45 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-robust04-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-robust04-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-robust04-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-robust04-flat.template index cfbc174bf8..254148e792 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-robust04-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-robust04-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-robust04-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-robust04-multifield.template index a88056a5f8..4e2e19943d 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-robust04-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-robust04-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-robust04-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-robust04-splade-pp-ed.template index 0337c1c47c..1cfddd717f 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-robust04-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-robust04-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-robust04-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-robust04-unicoil-noexp.template index b2dfd4d70d..39daec88b4 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-robust04-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-robust04-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-flat-wp.template index 95654f7668..f426ad331f 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-flat.template index 8ed5127328..729804709d 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-multifield.template index 49b9c74755..ac6f0ffbb0 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-splade-pp-ed.template index b4026c2751..7899d8a1cf 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-unicoil-noexp.template index b9088751bb..2d2a4502a3 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-scidocs-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-scifact-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-scifact-flat-wp.template index 081e5612b5..9319f59673 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-scifact-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-scifact-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-scifact-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-scifact-flat.template index 6e305bb225..3612e1eb47 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-scifact-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-scifact-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-scifact-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-scifact-multifield.template index 4091cb6f6b..a231f064f7 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-scifact-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-scifact-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-scifact-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-scifact-splade-pp-ed.template index 7ab7187b54..74ace11ce4 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-scifact-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-scifact-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-scifact-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-scifact-unicoil-noexp.template index c969cc5f61..b5c1ba0cc0 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-scifact-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-scifact-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-flat-wp.template index f50084cd57..e006c8f655 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-flat.template index 9fbc26ae08..b33e1da162 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-multifield.template index 6e573b0182..4d86532ca0 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-splade-pp-ed.template index 656b536fbd..ee70e55aec 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-unicoil-noexp.template index 3608965d09..1069b452c8 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-signal1m-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-flat-wp.template index c185080d4e..b2b4893608 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-flat.template index 468bf8fcbb..9b206f7775 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-multifield.template index 1ccb937725..5044c46cb8 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-splade-pp-ed.template index 0bc71d0338..d998ac7417 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-unicoil-noexp.template index 3e98f0f3cf..01e8f042c7 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-trec-covid-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-flat-wp.template index 89f3318e28..ca0abc3025 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-flat.template index d57cd78a5b..534c55d0cd 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-multifield.template index 9493d27c20..a43cb93513 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-splade-pp-ed.template index 56c908e657..88e7cf94c8 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-unicoil-noexp.template index 20fd128a0e..7a3f983d2b 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-trec-news-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-flat-wp.template b/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-flat-wp.template index 83f73aa5f0..7b03c1a309 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-flat-wp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-flat-wp.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, pre-tokenized with the `bert-base-uncased` tokenizer, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus-wp.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus-wp.tar -C collections/ +``` + +The tarball is 13 GB and has MD5 checksum `3cf8f3dcdcadd49362965dd4466e6ff2`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-flat.template b/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-flat.template index 0113063702..cac0cb7539 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-flat.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-flat.template @@ -12,6 +12,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-multifield.template b/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-multifield.template index a45f59cf63..ae75999bb2 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-multifield.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-multifield.template @@ -13,6 +13,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-corpus.tar -P collections/ +tar xvf collections/beir-v1.0.0-corpus.tar -C collections/ +``` + +The tarball is 14 GB and has MD5 checksum `faefd5281b662c72ce03d22021e4ff6b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-splade-pp-ed.template b/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-splade-pp-ed.template index 477fc85a7e..7743f6b880 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-splade-pp-ed.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-splade-pp-ed.template @@ -11,10 +11,19 @@ Note that this page is automatically generated from [this template](${template}) From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end: ``` -python src/main/python/run_regression.py --index --verify --search \ - --regression ${test_name} +python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the SPLADE++ CoCondenser-EnsembleDistil model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-splade-pp-ed.tar -P collections/ +tar xvf collections/beir-v1.0.0-splade-pp-ed.tar -C collections/ +``` + +The tarball is 42 GB and has MD5 checksum `9c7de5b444a788c9e74c340bf833173b`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Sample indexing command: @@ -47,8 +56,3 @@ ${eval_cmds} With the above commands, you should be able to reproduce the following results: ${effectiveness} - - -## Reproduction Log[*](${root_path}/docs/reproducibility.md) - -To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation. diff --git a/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-unicoil-noexp.template b/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-unicoil-noexp.template index b7e549b52a..26ed3bfd6a 100644 --- a/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-unicoil-noexp.template +++ b/src/main/resources/docgen/templates/beir-v1.0.0-webis-touche2020-unicoil-noexp.template @@ -16,6 +16,16 @@ From one of our Waterloo servers (e.g., `orca`), the following command will perf python src/main/python/run_regression.py --index --verify --search --regression ${test_name} ``` +All the BEIR corpora, encoded by the uniCOIL-noexp model, are available for download: + +```bash +wget https://rgw.cs.uwaterloo.ca/pyserini/data/beir-v1.0.0-unicoil-noexp.tar -P collections/ +tar xvf collections/beir-v1.0.0-unicoil-noexp.tar -C collections/ +``` + +The tarball is 30 GB and has MD5 checksum `4fd04d2af816a6637fc12922cccc8a83`. +After download and unpacking the corpora, the `run_regression.py` command above should work without any issue. + ## Indexing Typical indexing command: