LLMDataParser

title	emoji	colorFrom	colorTo	sdk	pinned	license	short_description
LLMEval Dataset Parser	⚡	green	gray	docker	false	mit	A collection of parsers for LLM benchmark datasets

LLMDataParser

LLMDataParser is a Python library that provides parsers for benchmark datasets used in evaluating Large Language Models (LLMs). It offers a unified interface for loading and parsing datasets like MMLU, GSM8k, and others, streamlining dataset preparation for LLM evaluation. The library aims to simplify the process of working with common LLM benchmark datasets through a consistent API.

Spaces: You can also try out the online demo on Hugging Face Spaces: LLMEval Dataset Parser Demo

Features

Unified Interface: Consistent DatasetParser for all datasets.
Easy to Use: Simple methods and built-in Python types.
Extensible: Easily add support for new datasets.
Gradio: Built-in Gradio interface for interactive dataset exploration and testing.

Installation

Option 1: Using pip

You can install the package directly using pip. Even with only a pyproject.toml file, this method works for standard installations.

Clone the Repository:

git clone https://github.com/jeff52415/LLMDataParser.git
cd LLMDataParser

Install Dependencies with pip:
```
pip install .
```

Option 2: Using Poetry

Poetry manages the virtual environment and dependencies automatically, so you don't need to create a conda environment first.

Install Dependencies with Poetry:
```
poetry install
```
Activate the Virtual Environment:
```
poetry shell
```

Available Parsers

MMLUDatasetParser
MMLUProDatasetParser
MMLUReduxDatasetParser
TMMLUPlusDatasetParser
GSM8KDatasetParser
MATHDatasetParser
MGSMDatasetParser
HumanEvalDatasetParser
HumanEvalDatasetPlusParser
BBHDatasetParser
MBPPDatasetParser
IFEvalDatasetParser
TWLegalDatasetParser
TMLUDatasetParser

Quick Start Guide

Here's a simple example demonstrating how to use the library:

from llmdataparser import ParserRegistry
# list all available parsers
ParserRegistry.list_parsers()
# get a parser
parser = ParserRegistry.get_parser("mmlu")
# load the parser
parser.load() # optional: task_name, split
# parse the parser
parser.parse() # optional: split_names

print(parser.task_names)
print(parser.split_names)
print(parser.get_dataset_description)
print(parser.get_huggingface_link)
print(parser.total_tasks)
data = parser.get_parsed_data

We also provide a Gradio demo for interactive testing:

python app.py

Adding New Dataset Parsers

To add support for a new dataset, please refer to our detailed guide in docs/adding_new_parser.md. The guide includes:

Step-by-step instructions for creating a new parser
Code examples and templates
Best practices and common patterns
Testing guidelines

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions or support, please open an issue on GitHub or contact [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.github/workflows		.github/workflows
docs		docs
llmdataparser		llmdataparser
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
mkdocs.yml		mkdocs.yml
nginx.conf		nginx.conf
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMDataParser

Features

Installation

Option 1: Using pip

Option 2: Using Poetry

Available Parsers

Quick Start Guide

Adding New Dataset Parsers

License

Contact

About

Releases

Packages

Languages

License

jeff52415/LLMDataParser

Folders and files

Latest commit

History

Repository files navigation

LLMDataParser

Features

Installation

Option 1: Using pip

Option 2: Using Poetry

Available Parsers

Quick Start Guide

Adding New Dataset Parsers

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages