feat: Spellcheck benchmark dataset and evaluation algorithm #340

jeremyarancio · 2024-04-11T16:39:09Z

What

Creation of the benchmark and evaluation algorithm to evaluate the spellcheck

Benchmark

The benchmark is composed of 247 lists of ingredients from 3 data sources:

30% of the old dataset composed of manually corrected lists of ingredients in French from the previous work by Lucain W. Unmodified lists of ingredients are removed.
15 manually corrected lists of ingredients in different languages (used to prompt engineer OpenAI on the Spellcheck task)
100 lists of ingredients with the tag 50-percent-unknown corrected with GPT-3.5. It follows the correction guidelines.

Argilla to validate benchmark

Lists of ingredients corrected with GPT-3.5 are checked and modified to respect the spellcheck guidelines.

Evaluation algorithm

An evaluation algorithm is created to estimate the performance of the Spellcheck.
It calculates the Precision-Recall of the correction based on text sequences (Original-Reference-Prediction) by using tokenization and alignment algorithm.

…cated

…gilla

spellcheck/README.md

spellcheck/scripts/old_to_new/0_convert_old_data.py

spellcheck/README.md

…ents lists

…ents lists (fix path error)

… Recall metric added

… HF (v2)

…orking

- Evaluation is performed on the Sagemaker instance - Evaluation data are sent to S3 - Metaflow human evaluation step get and push to Argilla

Bug when preparing Argilla dataset

…ementation into code

jeremyarancio added 11 commits March 27, 2024 18:17

Spellcheck - Previous dataset processed

3f286d2

Fix typos in Readme.md

dfdfecf

feat(spellcheck): ✨ Spellcheck architecture + Creation benchmark (WIP)

bc56183

feat(spellcheck): ✨ Prompt engineering + Benchmark creation

b048c79

Generate benchmark

e1fdfda

feat(spellcheck): ✨ Set up Argilla for benchmark correction + is_trun…

0f6753c

…cated

feat(spellcheck): ✨ Validation benchmark created and verified with Ar…

cfcb5dd

…gilla

feat(spellcheck): ✨ Spellcheck evaluation algorithm developed

59985b5

Evaluation script works (save)

4f8a45f

Evaluation algorithm is working!

9f09e20

docs(spellcheck): 📝 Finalize documentation evaluation algorithm

1911461

github-actions bot assigned jeremyarancio Apr 11, 2024

Merge branch 'develop' into spellcheck_validation_data

4154d23

jeremyarancio requested a review from raphael0202 April 11, 2024 17:02

jeremyarancio added NLP dataset spellcheck labels Apr 11, 2024

jeremyarancio linked an issue Apr 11, 2024 that may be closed by this pull request

Use large language models (LLM) to perform ingredient list spellcheck #314

Open

jeremyarancio removed a link to an issue Apr 11, 2024

Use large language models (LLM) to perform ingredient list spellcheck #314

Open

jeremyarancio added 5 commits April 12, 2024 11:11

docs(spellcheck): 📝 Update README.md & correct code

89bd698

Fix README

cd88c5b

docs(spellcheck): 📝 Code documentation

26239e3

docs(spellcheck): 📝 Document evaluation algorithm & fix typos

cd79d48

docs(spellcheck): 📝 Fix typos in readme

d837b98