Skip to content

Pythonic dataset processing for fine-tuning LLMs. Used for a CalHacks 2023 award winning project.

Notifications You must be signed in to change notification settings

TanGentleman/EduScribe-LLM-Backend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EduScribe-LLM-Backend

Pythonic dataset processing for fine-tuning LLMs. Used for a CalHacks 2023 award winning project.

EduScribe Devpost: https://devpost.com/software/eduscribe Repository: https://github.com/VinnyXP/EduScribe

Procedure

  1. Download a dataset from huggingface. For this project, I chose https://huggingface.co/datasets/vgoldberg/longform_article_summarization
  2. Set filepaths and configuration constants in config.py
  3. Run python parse_parquet.py

Features

  1. Various functions to parse parquet files into a usable format for our use case, fine-tuning LLMs using the Together.ai API.

About

Pythonic dataset processing for fine-tuning LLMs. Used for a CalHacks 2023 award winning project.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages