Skip to content

random text sample from all wikipedia articles (March edition)

Notifications You must be signed in to change notification settings

annapovey/rand-txt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

python3 -m venv wiki_env source wiki_env/bin/activate which pip pip install apache_beam mwparserfromhell pip3 install datasets

There is 6,458,670 wikipedia march 2022 articles. Selected 36K random lines from 6M articles wikipedia_20220301_en_train 1_sentence_per_line.txt

Wanted 2 sentences per line paste -d " " - - < wikipedia_20220301_en_train.txt > 2_sentences_per_line.txt

About

random text sample from all wikipedia articles (March edition)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages