document pt.Evaluate #494

terrier-org · Dec 4, 2024 · 7dcf4a4 · 7dcf4a4
1 parent 3168936
commit 7dcf4a4
Show file tree

Hide file tree

Showing 2 changed files with 6 additions and 4 deletions.
diff --git a/docs/experiments.rst b/docs/experiments.rst
@@ -11,13 +11,15 @@ which includes implementations of many standard metrics. By default, to calculat
 the `pytrec_eval <https://github.com/cvangysel/pytrec_eval>`_ library, which itself is a Python wrapper around 
 the widely-used `trec_eval evaluation tool <https://github.com/usnistgov/trec_eval>`_.
 
-The main way to achieve this is using `pt.Experiment()`.
+The main way to achieve this is using ``pt.Experiment()``. If you have an existing results dataframe, you can use
+``pt.Evaluate()``.
 
 API
 ========
 
 .. autofunction:: pyterrier.Experiment()
 
+.. autofunction:: pyterrier.Evaluate()
 
 Examples
 ========
@@ -252,7 +254,7 @@ Often used measures, including the name that must be used, are:
 - Interpolated recall precision curves (`iprec_at_recall`). This is family of measures, so requesting `iprec_at_recall` will output measurements for `[email protected]`, `[email protected]`, etc.
 - Precision at rank cutoff (e.g. `P_5`).
 - Recall (`recall`) will generate recall at different cutoffs, such as `recall_5`, etc.).
-- Mean response time (`mrt`) will report the average number of milliseconds to conduct a query (this is calculated by `pt.Experiment()` directly, not pytrec_eval).
+- Mean response time (`mrt`) will report the average number of milliseconds to conduct a query (this is calculated by ``pt.Experiment()`` directly, not pytrec_eval).
 - trec_eval measure *families* such as `official`, `set` and `all_trec` will be expanded. These result in many measures being returned. For instance, asking for `official` results in the following (very wide) output reporting the usual default metrics of trec_eval:
 
 .. include:: ./_includes/experiment-official.rst
@@ -290,7 +292,6 @@ More specifically, lets consider the TREC Deep Learning track passage ranking ta
 
 The available evaluation measure objects are listed below.
 
-
 .. autofunction:: pyterrier.measures.P
 
 .. autofunction:: pyterrier.measures.R

diff --git a/pyterrier/pipelines.py b/pyterrier/pipelines.py
@@ -610,7 +610,8 @@ def _restore_state(param_state):
 
 def Evaluate(res : pd.DataFrame, qrels : pd.DataFrame, metrics=['map', 'ndcg'], perquery=False) -> Dict:
     """
-    Evaluate the result dataframe with the given qrels
+    Evaluate a single result dataframe with the given qrels. This method may be used as an alternative to
+    ``pt.Experiment()`` for getting only the evaluation measurements given a single set of existing results.
 
     Args:
         res: Either a dataframe with columns=['qid', 'docno', 'score'] or a dict {qid:{docno:score,},}