Skip to content

jayantaadhikary/dataset-error-reduction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 

Repository files navigation

Dataset Error Reduction

To reduce error in NLP Datasets using LLMs.

The datasets used currently are SQuAD and RACE which are Extractive Question Answering Datasets.

Use in your system

Fork and Clone this Repository.

  1. If you are using a OpenAI API key, create a .env file in the same directory and add your OpenAI API key.

OPENAI_API_KEY = YOUR_API_KEY

  1. If you are using a LLM model locally using a local server, then use the files inside of the LocalLLM folder instead of the default files. The folder /ollama consists of my notebooks which were using Ollama to run LLMs locally.

PS - I use LM Studio and Ollamato run the LLMs locally.

References:

  1. SQuAD: 100,000+ Questions for Machine Comprehension of Text (Rajpurkar et al., EMNLP 2016)
  2. RACE: Large-scale ReAding Comprehension Dataset From Examinations (Lai et al., EMNLP 2017)

About

Reducing Error in NLP Datasets using LLMs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published