Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an 'evaluation pipeline' for benchmarks and agreements between sets #482

Open
LoannPeurey opened this issue Aug 7, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@LoannPeurey
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Have a standard pipeline for evaluating sets (vs human or vs gold standard)

Describe the solution you'd like
where to store evaluation
2 types d'éval:
benchmarking (human v gold standard, vtc v human)
agreement (human v human, vtc v another automated system)
benchmarking:
where: extra/benchmarking
given that it describes the "quality" of the whole set, it makes sense to include benchmarking results within the set folder -- however, this sounds messy, so we decide to put it in extra, and signal this info in the metadata for the sets
what format:
format visuel pour l'humain qui veut avoir un aperçu général, pdf dans lequel on met les matrices de conf, préc, rec, fscore
yaml avec paramètres (par ex quels fichiers ont été comparés, versions de dataset)
csv1: matrice de conf non-normalisé (car cela donne aussi une idée de la quantité de data sur laquelle ça a été annoté)
csv2: f-score
comment on pourrait les exploiter :
get fscores for all annotators by grabbing this from all sets
get fscores per child, it would mean grabbing from all sets based on information in the yaml

!! This feature should probably come after #454

@LoannPeurey LoannPeurey added the enhancement New feature or request label Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant