Skip to content

indhub/loss-curve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

loss-curve

Dev environment

Conda env named curve in trn

Download dataset

mkdir data
cd data
wget https://huggingface.co/datasets/roneneldan/TinyStories/resolve/main/TinyStoriesV2-GPT4-train.txt
wget https://huggingface.co/datasets/roneneldan/TinyStories/resolve/main/TinyStoriesV2-GPT4-valid.txt
cd ..

Convert to tfrecords

python tinystories/make_tfrecord.py \
    data/TinyStoriesV2-GPT4-train.txt \
    data/TinyStoriesV2-GPT4-valid.txt \
    data/tinystories-train.tfrecord \
    data/tinystories-valid.tfrecord

There are ~2717000 training samples and ~27000 validation samples.

Get sentencepiece

mkdir sentencepiece
cd sentencepiece
curl https://huggingface.co/t5-base/resolve/main/spiece.model -o t5-base
cd ..

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published