Skip to content

NotDachun/french-politician-ner

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Finding French Politicians at Scale - Utilizing Vector Search to Optimize Synthetic Data Generation

We utilize Mistral-Large to generate labels for web-data in the low-resource, high-cost-to-label task of fine-grained named entity recognition. Then, we use this synthetic dataset to fine-tune a Mistral 7B model.

To extend this problem, we define a new fine-grained entity type, French politicians, and utilize our framework to rapidly generate synthetic data and train a downstream model. Scanning across an unlabeled dataset, we are able to identify 2174 mentions of French politicians compared to just 7 mentions in the labeled dataset.

image

About

Project for SF Mistral AI Hackathon 2024

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.5%
  • Python 1.5%