Skip to content

Commit

Permalink
updtae EVALUATION.md
Browse files Browse the repository at this point in the history
  • Loading branch information
lovodkin93 authored Sep 19, 2020
1 parent 48c48fe commit 2691964
Showing 1 changed file with 6 additions and 4 deletions.
10 changes: 6 additions & 4 deletions EVALUATION.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
The evaluation process is done through the `evaluate` function that is located in the `evaluation.py` script [evaluation.py](ucca/evaluation.py).
A wrapping script of the `evaluation.py` script is `evaluate_standard.py` [evaluate_standard.py](scripts/evaluate_standard.py). For more details on how the scripts receives its argument, please write `evaluate_standard --help` in the prompt.
The evaluation process is done through the `evaluate` function that is located in the [evaluation.py](ucca/evaluation.py) script.
A wrapping script of the `evaluation.py` script is [evaluate_standard.py](scripts/evaluate_standard.py). For more details on how the scripts receives its argument, please write `evaluate_standard --help` in the prompt.
The evaluation process compares the gold-standard annotation of a specific passage, with the calculated annotation of that same passage.
Both passages are of `Passage` object type, which is an object that contains the connected graph that represents the annotation of the passage.
The evaluation includes the recall, precision and F1 scores. The calculation of these scores is done by comparing each edge's labels and yield, which are the literals that are under the edge's child node (if we look at the annotation as a tree).
Expand Down Expand Up @@ -32,7 +32,7 @@ Other types of edges that can be included are:
If there is a need in doing the evaluation based on specific labels, a useful flag is `--constructions=categories` , which shows evaluation results per edge label (category).
6. units: whether to evaluate common units
7. fscore: whether to compute precision, recall and f1 score
8. errors: whether to print the mistakes (prints something similar to a confusion matrix). It is worth mentioning the `--as-table` option in the `evaluate_standard` script (https://github.com/lovodkin93/ucca/blob/master/scripts/evaluate_standard.py), that prints the confusion matrix as a table.
8. errors: whether to print the mistakes (prints something similar to a confusion matrix). It is worth mentioning the `--as-table` option in the [evaluate_standard](https://github.com/lovodkin93/ucca/blob/master/scripts/evaluate_standard.py) script, that prints the confusion matrix as a table.
9. normalize: flatten centers and move common functions to root before evaluation - modifies passages. There's an option to normalize the passages jointly. In order to normalize them seperately, it should be done before calling `evaluate`.
10. eval_type: specific evaluation type(s) to limit to. One can choose one of the following evaluation types:
- labeled - in the process of evaluation, both the labels of the edges and their yields are compared.
Expand All @@ -41,7 +41,7 @@ If there is a need in doing the evaluation based on specific labels, a useful fl
11. ref_yield_tags: reference passage for fine-grained evaluation. In other words, it enables us to do evaluation to edges of different types of labels (that are not part of the UCCA labels), such as subject, object and so on. Nevertheless, the recall, precision and f1 scores will still be calculated based on the UCCA parsing.

The function evaluate returns a `Score` object, which contains the recall, precision and f1 scores of the generated annotation.
For example, by running https://github.com/lovodkin93/ucca/blob/master/ucca/tests/test_validation.py, the following line https://github.com/lovodkin93/ucca/blob/master/ucca/tests/test_evaluation.py#L331 generates a `Score` class. One of its elements is called `evaluators`, which comprises of three `EvaluatorResults` classes:
For example, by running [test_validation.py](ucca/tests/test_validation.py), the following line [Score](https://github.com/lovodkin93/ucca/blob/master/ucca/tests/test_evaluation.py#L331) generates a `Score` class. One of its elements is called `evaluators`, which comprises of three `EvaluatorResults` classes:
- 'labeled'
- 'unlabeled'
- 'weak_labeled'
Expand All @@ -63,5 +63,7 @@ The results for each such type of edges comprise of:
- r (recall)

For more details on the `evaluate` function, please see the following links:

[evaluate](https://ucca.readthedocs.io/en/latest/api/ucca.evaluation.evaluate.html#ucca.evaluation.evaluate)

[Scores](https://ucca.readthedocs.io/en/latest/api/ucca.evaluation.Scores.html#ucca.evaluation.Scores)

0 comments on commit 2691964

Please sign in to comment.