TCT-ColBERTv2 update on MS MARCO V2 - trained models (castorini#737)

MXueguang · Nov 5, 2021 · 211a737 · 211a737
1 parent 39b4b19
commit 211a737
Showing 1 changed file with 77 additions and 6 deletions.
diff --git a/docs/experiments-msmarco-v2-tct_colbert-v2.md b/docs/experiments-msmarco-v2-tct_colbert-v2.md
@@ -8,20 +8,32 @@ The model is described in the following paper:
 At present, all indexes are referenced as absolute paths on our Waterloo machine `orca`, so these results are not broadly reproducible.
 We are working on figuring out ways to distribute the indexes.
 
-For the TREC 2021 Deep Learning Track, we applied our TCT-ColBERTv2 model trained on MS MARCO (V1) in a zero-shot manner.
-Specifically, we applied inference over the MS MARCO V2 [passage corpus](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-v2.md#passage-collection) and [segmented document corpus](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-v2.md#document-collection-segmented) to obtain the dense vectors.
+For the TREC 2021 Deep Learning Track, we tried two different approaches:
 
-Let's prepare our environment variables:
+1. We applied our TCT-ColBERTv2 model trained on MS MARCO (V1) in a zero-shot manner.
+2. We started with the above TCT-ColBERTv2 model and further fine-tuned on the MS MARCO (V2) passage data.
+
+In both cases, we applied inference over the MS MARCO V2 [passage corpus](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-v2.md#passage-collection) and [segmented document corpus](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-v2.md#document-collection-segmented) to obtain the dense vectors.
+
+These are the indexes and the encoder for the zero-shot (V1) models:
 
 ```bash
 export PASSAGE_INDEX0="/store/scratch/indexes/trec2021/faiss-flat.tct_colbert-v2-hnp.0shot.msmarco-passage-v2-augmented"
 export DOC_INDEX0="/store/scratch/indexes/trec2021/faiss-flat.tct_colbert-v2-hnp.0shot.msmarco-doc-v2-segmented"
 export ENCODER0="castorini/tct_colbert-v2-hnp-msmarco"
 ```
 
-## Passage V2
+These are the indexes and the encoder for the fine-tuned (V2) models:
 
-Dense retrieval with TCT-ColBERT-V2, brute-force index:
+```bash
+export PASSAGE_INDEX1="/store/scratch/indexes/trec2021/faiss-flat.tct_colbert-v2-hnp.psg_v2_ft.msmarco-passage-v2-augmented"
+export DOC_INDEX1="/store/scratch/indexes/trec2021/faiss-flat.tct_colbert-v2-hnp.psg_v2_ft.msmarco-doc-v2-segmented"
+export ENCODER1="castorini/tct_colbert-v2-hnp-msmarco-r2"
+```
+
+## Passage V2 (Zero Shot)
+
+Dense retrieval with TCT-ColBERTv2 model trained on MS MARCO (V1), with FAISS brute-force index (i.e., zero shot):
 
 ```bash
 $ python -m pyserini.dsearch --topics collections/passv2_dev_queries.tsv \
@@ -53,7 +65,35 @@ However, we measure recall at both 100 and 1000 hits; the latter is a common set
 Because there are duplicate passages in MS MARCO V2 collections, score differences might be observed due to tie-breaking effects.
 For example, if we output in MS MARCO format `--output-format msmarco` and then convert to TREC format with `pyserini.eval.convert_msmarco_run_to_trec_run`, the scores will be different.
 
-## Document V2
+## Passage V2 (Fine Tuned)
+
+Dense retrieval with TCT-ColBERTv2 model fine-tuned on MS MARCO (V2) passage data, with FAISS brute-force index:
+
+```bash
+$ python -m pyserini.dsearch --topics collections/passv2_dev_queries.tsv \
+                             --index ${PASSAGE_INDEX1} \
+                             --encoder ${ENCODER1} \
+                             --batch-size 144 \
+                             --threads 36 \
+                             --output runs/run.msmarco-passage-v2-augmented.tct_colbert-v2-hnp.psg_v2_ft.dev1.trec \
+                             --output-format trec
+```
+
+To evaluate using `trec_eval`:
+
+```bash
+$ python -m pyserini.eval.trec_eval -c -M 100 -m map -m recip_rank collections/passv2_dev_qrels.tsv runs/run.msmarco-passage-v2-augmented.tct_colbert-v2-hnp.psg_v2_ft.dev1.trec
+Results:
+map                   	all	0.1981
+recip_rank            	all	0.2000
+
+$ python -m pyserini.eval.trec_eval -c -m recall.100,1000 collections/passv2_dev_qrels.tsv runs/run.msmarco-passage-v2-augmented.tct_colbert-v2-hnp.psg_v2_ft.dev1.trec
+Results:
+recall_100            	all	0.6403
+recall_1000           	all	0.8452
+```
+
+## Document V2 (Zero Shot)
 
 Dense retrieval with TCT-ColBERT-V2, brute-force index:
 
@@ -89,5 +129,36 @@ However, we measure recall at both 100 and 1000 hits; the latter is a common set
 
 Same comment about duplicate passages and score ties applies here as well.
 
+## Document V2 (Fine Tuned)
+
+Dense retrieval with TCT-ColBERTv2 model fine-tuned on MS MARCO (V2) passage data, with FAISS brute-force index:
+
+```bash
+$ python -m pyserini.dsearch --topics collections/docv2_dev_queries.tsv \
+                             --index ${DOC_INDEX1} \
+                             --encoder ${ENCODER1} \
+                             --batch-size 144 \
+                             --threads 36 \
+                             --hits 10000 \
+                             --max-passage-hits 1000 \
+                             --max-passage \
+                             --output runs/run.msmarco-document-v2-segmented.tct_colbert-v2-hnp.psg_v2_ft.dev1.trec \
+                             --output-format trec
+```
+
+To evaluate using `trec_eval`:
+
+```bash
+$ python -m pyserini.eval.trec_eval -c -M 100 -m map -m recip_rank collections/docv2_dev_qrels.tsv runs/run.msmarco-document-v2-segmented.tct_colbert-v2-hnp.psg_v2_ft.dev1.trec
+Results:
+map                   	all	0.2719
+recip_rank            	all	0.2745
+
+$ python -m pyserini.eval.trec_eval -c -m recall.100,1000 collections/docv2_dev_qrels.tsv runs/run.msmarco-document-v2-segmented.tct_colbert-v2-hnp.psg_v2_ft.dev1.trec
+Results:
+recall_100            	all	0.7778
+recall_1000           	all	0.8974
+```
+
 ## Reproduction Log[*](reproducibility.md)