Skip to content

Commit

Permalink
document pt.Evaluate #494
Browse files Browse the repository at this point in the history
  • Loading branch information
cmacdonald committed Dec 4, 2024
1 parent 3168936 commit 7dcf4a4
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 4 deletions.
7 changes: 4 additions & 3 deletions docs/experiments.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,15 @@ which includes implementations of many standard metrics. By default, to calculat
the `pytrec_eval <https://github.com/cvangysel/pytrec_eval>`_ library, which itself is a Python wrapper around
the widely-used `trec_eval evaluation tool <https://github.com/usnistgov/trec_eval>`_.

The main way to achieve this is using `pt.Experiment()`.
The main way to achieve this is using ``pt.Experiment()``. If you have an existing results dataframe, you can use
``pt.Evaluate()``.

API
========

.. autofunction:: pyterrier.Experiment()

.. autofunction:: pyterrier.Evaluate()

Examples
========
Expand Down Expand Up @@ -252,7 +254,7 @@ Often used measures, including the name that must be used, are:
- Interpolated recall precision curves (`iprec_at_recall`). This is family of measures, so requesting `iprec_at_recall` will output measurements for `[email protected]`, `[email protected]`, etc.
- Precision at rank cutoff (e.g. `P_5`).
- Recall (`recall`) will generate recall at different cutoffs, such as `recall_5`, etc.).
- Mean response time (`mrt`) will report the average number of milliseconds to conduct a query (this is calculated by `pt.Experiment()` directly, not pytrec_eval).
- Mean response time (`mrt`) will report the average number of milliseconds to conduct a query (this is calculated by ``pt.Experiment()`` directly, not pytrec_eval).
- trec_eval measure *families* such as `official`, `set` and `all_trec` will be expanded. These result in many measures being returned. For instance, asking for `official` results in the following (very wide) output reporting the usual default metrics of trec_eval:

.. include:: ./_includes/experiment-official.rst
Expand Down Expand Up @@ -290,7 +292,6 @@ More specifically, lets consider the TREC Deep Learning track passage ranking ta

The available evaluation measure objects are listed below.


.. autofunction:: pyterrier.measures.P

.. autofunction:: pyterrier.measures.R
Expand Down
3 changes: 2 additions & 1 deletion pyterrier/pipelines.py
Original file line number Diff line number Diff line change
Expand Up @@ -610,7 +610,8 @@ def _restore_state(param_state):

def Evaluate(res : pd.DataFrame, qrels : pd.DataFrame, metrics=['map', 'ndcg'], perquery=False) -> Dict:
"""
Evaluate the result dataframe with the given qrels
Evaluate a single result dataframe with the given qrels. This method may be used as an alternative to
``pt.Experiment()`` for getting only the evaluation measurements given a single set of existing results.
Args:
res: Either a dataframe with columns=['qid', 'docno', 'score'] or a dict {qid:{docno:score,},}
Expand Down

0 comments on commit 7dcf4a4

Please sign in to comment.