Skip to content

Commit

Permalink
TCT-ColBERTv2 update on MS MARCO V2 - trained models (castorini#737)
Browse files Browse the repository at this point in the history
  • Loading branch information
lintool authored and MXueguang committed Nov 5, 2021
1 parent 39b4b19 commit 211a737
Showing 1 changed file with 77 additions and 6 deletions.
83 changes: 77 additions & 6 deletions docs/experiments-msmarco-v2-tct_colbert-v2.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,32 @@ The model is described in the following paper:
At present, all indexes are referenced as absolute paths on our Waterloo machine `orca`, so these results are not broadly reproducible.
We are working on figuring out ways to distribute the indexes.

For the TREC 2021 Deep Learning Track, we applied our TCT-ColBERTv2 model trained on MS MARCO (V1) in a zero-shot manner.
Specifically, we applied inference over the MS MARCO V2 [passage corpus](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-v2.md#passage-collection) and [segmented document corpus](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-v2.md#document-collection-segmented) to obtain the dense vectors.
For the TREC 2021 Deep Learning Track, we tried two different approaches:

Let's prepare our environment variables:
1. We applied our TCT-ColBERTv2 model trained on MS MARCO (V1) in a zero-shot manner.
2. We started with the above TCT-ColBERTv2 model and further fine-tuned on the MS MARCO (V2) passage data.

In both cases, we applied inference over the MS MARCO V2 [passage corpus](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-v2.md#passage-collection) and [segmented document corpus](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-v2.md#document-collection-segmented) to obtain the dense vectors.

These are the indexes and the encoder for the zero-shot (V1) models:

```bash
export PASSAGE_INDEX0="/store/scratch/indexes/trec2021/faiss-flat.tct_colbert-v2-hnp.0shot.msmarco-passage-v2-augmented"
export DOC_INDEX0="/store/scratch/indexes/trec2021/faiss-flat.tct_colbert-v2-hnp.0shot.msmarco-doc-v2-segmented"
export ENCODER0="castorini/tct_colbert-v2-hnp-msmarco"
```

## Passage V2
These are the indexes and the encoder for the fine-tuned (V2) models:

Dense retrieval with TCT-ColBERT-V2, brute-force index:
```bash
export PASSAGE_INDEX1="/store/scratch/indexes/trec2021/faiss-flat.tct_colbert-v2-hnp.psg_v2_ft.msmarco-passage-v2-augmented"
export DOC_INDEX1="/store/scratch/indexes/trec2021/faiss-flat.tct_colbert-v2-hnp.psg_v2_ft.msmarco-doc-v2-segmented"
export ENCODER1="castorini/tct_colbert-v2-hnp-msmarco-r2"
```

## Passage V2 (Zero Shot)

Dense retrieval with TCT-ColBERTv2 model trained on MS MARCO (V1), with FAISS brute-force index (i.e., zero shot):

```bash
$ python -m pyserini.dsearch --topics collections/passv2_dev_queries.tsv \
Expand Down Expand Up @@ -53,7 +65,35 @@ However, we measure recall at both 100 and 1000 hits; the latter is a common set
Because there are duplicate passages in MS MARCO V2 collections, score differences might be observed due to tie-breaking effects.
For example, if we output in MS MARCO format `--output-format msmarco` and then convert to TREC format with `pyserini.eval.convert_msmarco_run_to_trec_run`, the scores will be different.

## Document V2
## Passage V2 (Fine Tuned)

Dense retrieval with TCT-ColBERTv2 model fine-tuned on MS MARCO (V2) passage data, with FAISS brute-force index:

```bash
$ python -m pyserini.dsearch --topics collections/passv2_dev_queries.tsv \
--index ${PASSAGE_INDEX1} \
--encoder ${ENCODER1} \
--batch-size 144 \
--threads 36 \
--output runs/run.msmarco-passage-v2-augmented.tct_colbert-v2-hnp.psg_v2_ft.dev1.trec \
--output-format trec
```

To evaluate using `trec_eval`:

```bash
$ python -m pyserini.eval.trec_eval -c -M 100 -m map -m recip_rank collections/passv2_dev_qrels.tsv runs/run.msmarco-passage-v2-augmented.tct_colbert-v2-hnp.psg_v2_ft.dev1.trec
Results:
map all 0.1981
recip_rank all 0.2000

$ python -m pyserini.eval.trec_eval -c -m recall.100,1000 collections/passv2_dev_qrels.tsv runs/run.msmarco-passage-v2-augmented.tct_colbert-v2-hnp.psg_v2_ft.dev1.trec
Results:
recall_100 all 0.6403
recall_1000 all 0.8452
```

## Document V2 (Zero Shot)

Dense retrieval with TCT-ColBERT-V2, brute-force index:

Expand Down Expand Up @@ -89,5 +129,36 @@ However, we measure recall at both 100 and 1000 hits; the latter is a common set

Same comment about duplicate passages and score ties applies here as well.

## Document V2 (Fine Tuned)

Dense retrieval with TCT-ColBERTv2 model fine-tuned on MS MARCO (V2) passage data, with FAISS brute-force index:

```bash
$ python -m pyserini.dsearch --topics collections/docv2_dev_queries.tsv \
--index ${DOC_INDEX1} \
--encoder ${ENCODER1} \
--batch-size 144 \
--threads 36 \
--hits 10000 \
--max-passage-hits 1000 \
--max-passage \
--output runs/run.msmarco-document-v2-segmented.tct_colbert-v2-hnp.psg_v2_ft.dev1.trec \
--output-format trec
```

To evaluate using `trec_eval`:

```bash
$ python -m pyserini.eval.trec_eval -c -M 100 -m map -m recip_rank collections/docv2_dev_qrels.tsv runs/run.msmarco-document-v2-segmented.tct_colbert-v2-hnp.psg_v2_ft.dev1.trec
Results:
map all 0.2719
recip_rank all 0.2745

$ python -m pyserini.eval.trec_eval -c -m recall.100,1000 collections/docv2_dev_qrels.tsv runs/run.msmarco-document-v2-segmented.tct_colbert-v2-hnp.psg_v2_ft.dev1.trec
Results:
recall_100 all 0.7778
recall_1000 all 0.8974
```

## Reproduction Log[*](reproducibility.md)

0 comments on commit 211a737

Please sign in to comment.