You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/experiments-ance.md
+5-8
Original file line number
Diff line number
Diff line change
@@ -4,11 +4,7 @@ This guide provides instructions to reproduce the following dense retrieval work
4
4
5
5
> Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, Arnold Overwijk. [Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval](https://arxiv.org/pdf/2007.00808.pdf)
6
6
7
-
Starting with v0.12.0, you can reproduce these results directly from the [Pyserini PyPI package](https://pypi.org/project/pyserini/).
8
-
Since dense retrieval depends on neural networks, Pyserini requires a more complex set of dependencies to use this feature.
9
-
See [package installation notes](../README.md#installation) for more details.
10
-
11
-
Note that we have observed minor differences in scores between different computing environments (e.g., Linux vs. macOS).
7
+
Note that we often observe minor differences in scores between different computing environments (e.g., Linux vs. macOS).
12
8
However, the differences usually appear in the fifth digit after the decimal point, and do not appear to be a cause for concern from a reproducibility perspective.
13
9
Thus, while the scoring script provides results to much higher precision, we have intentionally rounded to four digits after the decimal point.
14
10
@@ -168,7 +164,8 @@ Top100 accuracy: 0.8522
168
164
## Reproduction Log[*](reproducibility.md)
169
165
170
166
+ Results reproduced by [@lintool](https://github.com/lintool) on 2021-04-25 (commit [`854c19`](https://github.com/castorini/pyserini/commit/854c1930ba00819245c0a9fbcf2090ce14db4db0))
171
-
+ Results reproduced by [@jingtaozhan](https://github.com/jingtaozhan) on 2021-05-15 (commit [`53d8d3c`](https://github.com/castorini/pyserini/commit/53d8d3cbb78c88a23ce132a42b0396caad7d2e0f))
167
+
+ Results reproduced by [@jingtaozhan](https://github.com/jingtaozhan) on 2021-05-15 (commit [`53d8d3`](https://github.com/castorini/pyserini/commit/53d8d3cbb78c88a23ce132a42b0396caad7d2e0f))
172
168
+ Results reproduced by [@jmmackenzie](https://github.com/jmmackenzie) on 2021-05-17 (PyPI [`0.12.0`](https://pypi.org/project/pyserini/0.12.0/))
173
-
+ Results reproduced by [@yuki617](https://github.com/yuki617) on 2021-06-7 (commit [`c7b37d6`](https://github.com/castorini/pyserini/commit/c7b37d6073cda62685f64d6d0b99dc46f0718346))
174
-
+ Results reproduced by [@ArthurChen189](https://github.com/ArthurChen189) on 2021-07-06 (commit [`c9f44b2`](https://github.com/castorini/pyserini/commit/c9f44b2a24103fff4887cade831f9b7c2472b190))
169
+
+ Results reproduced by [@yuki617](https://github.com/yuki617) on 2021-06-07 (commit [`c7b37d`](https://github.com/castorini/pyserini/commit/c7b37d6073cda62685f64d6d0b99dc46f0718346))
170
+
+ Results reproduced by [@ArthurChen189](https://github.com/ArthurChen189) on 2021-07-06 (commit [`c9f44b`](https://github.com/castorini/pyserini/commit/c9f44b2a24103fff4887cade831f9b7c2472b190))
171
+
+ Results reproduced by [@lintool](https://github.com/lintool) on 2022-12-23 (commit [`0c495c`](https://github.com/castorini/pyserini/commit/0c495cf2999dda980eb1f85efa30a4323cef5855))
Copy file name to clipboardExpand all lines: docs/experiments-distilbert_kd.md
+2-5
Original file line number
Diff line number
Diff line change
@@ -4,11 +4,7 @@ This guide provides instructions to reproduce the DistilBERT KD dense retrieval
4
4
5
5
> Sebastian Hofstätter, Sophia Althammer, Michael Schröder, Mete Sertkan, and Allan Hanbury. [Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation.](https://arxiv.org/abs/2010.02666) arXiv:2010.02666, October 2020.
6
6
7
-
Starting with v0.12.0, you can reproduce these results directly from the [Pyserini PyPI package](https://pypi.org/project/pyserini/).
8
-
Since dense retrieval depends on neural networks, Pyserini requires a more complex set of dependencies to use this feature.
9
-
See [package installation notes](../README.md#installation) for more details.
10
-
11
-
Note that we have observed minor differences in scores between different computing environments (e.g., Linux vs. macOS).
7
+
Note that we often observe minor differences in scores between different computing environments (e.g., Linux vs. macOS).
12
8
However, the differences usually appear in the fifth digit after the decimal point, and do not appear to be a cause for concern from a reproducibility perspective.
13
9
Thus, while the scoring script provides results to much higher precision, we have intentionally rounded to four digits after the decimal point.
14
10
@@ -56,3 +52,4 @@ recall_1000 all 0.9553
56
52
## Reproduction Log[*](reproducibility.md)
57
53
58
54
+ Results reproduced by [@lintool](https://github.com/lintool) on 2021-04-26 (commit [`854c19`](https://github.com/castorini/pyserini/commit/854c1930ba00819245c0a9fbcf2090ce14db4db0))
55
+
+ Results reproduced by [@lintool](https://github.com/lintool) on 2022-12-23 (commit [`0c495c`](https://github.com/castorini/pyserini/commit/0c495cf2999dda980eb1f85efa30a4323cef5855))
Copy file name to clipboardExpand all lines: docs/experiments-distilbert_tasb.md
+2-4
Original file line number
Diff line number
Diff line change
@@ -4,10 +4,7 @@ This guide provides instructions to reproduce the DistilBERT KD TASB dense retri
4
4
5
5
> Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin, Allan Hanbury. [Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling.](https://arxiv.org/abs/2104.06967)_SIGIR 2021_.
6
6
7
-
Since dense retrieval depends on neural networks, Pyserini requires a more complex set of dependencies to use this feature.
8
-
See [package installation notes](../README.md#installation) for more details.
9
-
10
-
Note that we have observed minor differences in scores between different computing environments (e.g., Linux vs. macOS).
7
+
Note that we often observe minor differences in scores between different computing environments (e.g., Linux vs. macOS).
11
8
However, the differences usually appear in the fifth digit after the decimal point, and do not appear to be a cause for concern from a reproducibility perspective.
12
9
Thus, while the scoring script provides results to much higher precision, we have intentionally rounded to four digits after the decimal point.
13
10
@@ -56,3 +53,4 @@ recall_1000 all 0.9771
56
53
## Reproduction Log[*](reproducibility.md)
57
54
58
55
+ Results reproduced by [@lintool](https://github.com/lintool) on 2021-05-28 (commit [`102ed2`](https://github.com/castorini/pyserini/commit/102ed2b2e8770978e4b3e09804913dcffb63c4a7))
56
+
+ Results reproduced by [@lintool](https://github.com/lintool) on 2022-12-23 (commit [`0c495c`](https://github.com/castorini/pyserini/commit/0c495cf2999dda980eb1f85efa30a4323cef5855))
Copy file name to clipboardExpand all lines: docs/experiments-dkrr.md
+3-2
Original file line number
Diff line number
Diff line change
@@ -71,7 +71,7 @@ The expected results are as follows, shown in the "ours" column:
71
71
| Top-500 | 90.37 || 92.24 |
72
72
| Top-1000 | 91.30 || 93.43 |
73
73
74
-
For reference, reported results from the paper (Table 7) are shown in the "orig" column.
74
+
For reference, reported results from the paper (Table 8) are shown in the "orig" column.
75
75
76
76
## TriviaQA (TQA)
77
77
@@ -134,7 +134,7 @@ The expected results are as follows, shown in the "ours" column:
134
134
| Top-500 | 89.77 || 89.87 |
135
135
| Top-1000 | 90.35 || 90.63 |
136
136
137
-
For reference, reported results from the paper (Table 7) are shown in the "orig" column.
137
+
For reference, reported results from the paper (Table 8) are shown in the "orig" column.
138
138
139
139
## Hybrid sparse-dense retrieval with GAR-T5
140
140
@@ -143,3 +143,4 @@ Running hybrid sparse-dense retrieval with DKKR and [GAR-T5](https://github.com/
143
143
## Reproduction Log[*](reproducibility.md)
144
144
145
145
+ Results reproduced by [@lintool](https://github.com/lintool) on 2021-02-12 (commit [`52a1e7`](https://github.com/castorini/pyserini/commit/52a1e7f241b7b833a3ec1d739e629c08417a324c))
146
+
+ Results reproduced by [@lintool](https://github.com/lintool) on 2022-12-23 (commit [`90676b`](https://github.com/castorini/pyserini/commit/90676b351b47585084aa8136265d02a67ced3803))
Copy file name to clipboardExpand all lines: docs/experiments-sbert.md
+3-6
Original file line number
Diff line number
Diff line change
@@ -2,11 +2,7 @@
2
2
3
3
This guide provides instructions to reproduce the SBERT dense retrieval models for MS MARCO passage ranking (v3) described [here](https://github.com/UKPLab/sentence-transformers/blob/master/docs/pretrained-models/msmarco-v3.md).
4
4
5
-
Starting with v0.12.0, you can reproduce these results directly from the [Pyserini PyPI package](https://pypi.org/project/pyserini/).
6
-
Since dense retrieval depends on neural networks, Pyserini requires a more complex set of dependencies to use this feature.
7
-
See [package installation notes](../README.md#installation) for more details.
8
-
9
-
Note that we have observed minor differences in scores between different computing environments (e.g., Linux vs. macOS).
5
+
Note that we often observe minor differences in scores between different computing environments (e.g., Linux vs. macOS).
10
6
However, the differences usually appear in the fifth digit after the decimal point, and do not appear to be a cause for concern from a reproducibility perspective.
11
7
Thus, while the scoring script provides results to much higher precision, we have intentionally rounded to four digits after the decimal point.
+ Results reproduced by [@lintool](https://github.com/lintool) on 2021-04-02 (commit [`8dcf99`](https://github.com/castorini/pyserini/commit/8dcf99982a7bfd447ce9182ff219a9dad2ddd1f2))
97
93
+ Results reproduced by [@lintool](https://github.com/lintool) on 2021-04-26 (commit [`854c19`](https://github.com/castorini/pyserini/commit/854c1930ba00819245c0a9fbcf2090ce14db4db0))
94
+
+ Results reproduced by [@lintool](https://github.com/lintool) on 2022-12-23 (commit [`0c495c`](https://github.com/castorini/pyserini/commit/0c495cf2999dda980eb1f85efa30a4323cef5855))
0 commit comments