Add an 'evaluation pipeline' for benchmarks and agreements between sets #482

LoannPeurey · 2024-08-07T13:00:01Z

Is your feature request related to a problem? Please describe.
Have a standard pipeline for evaluating sets (vs human or vs gold standard)

Describe the solution you'd like
where to store evaluation
2 types d'éval:
benchmarking (human v gold standard, vtc v human)
agreement (human v human, vtc v another automated system)
benchmarking:
where: extra/benchmarking
given that it describes the "quality" of the whole set, it makes sense to include benchmarking results within the set folder -- however, this sounds messy, so we decide to put it in extra, and signal this info in the metadata for the sets
what format:
format visuel pour l'humain qui veut avoir un aperçu général, pdf dans lequel on met les matrices de conf, préc, rec, fscore
yaml avec paramètres (par ex quels fichiers ont été comparés, versions de dataset)
csv1: matrice de conf non-normalisé (car cela donne aussi une idée de la quantité de data sur laquelle ça a été annoté)
csv2: f-score
comment on pourrait les exploiter :
get fscores for all annotators by grabbing this from all sets
get fscores per child, it would mean grabbing from all sets based on information in the yaml

!! This feature should probably come after #454

LoannPeurey added the enhancement New feature or request label Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an 'evaluation pipeline' for benchmarks and agreements between sets #482

Add an 'evaluation pipeline' for benchmarks and agreements between sets #482

LoannPeurey commented Aug 7, 2024

Add an 'evaluation pipeline' for benchmarks and agreements between sets #482

Add an 'evaluation pipeline' for benchmarks and agreements between sets #482

Comments

LoannPeurey commented Aug 7, 2024