Script used to download the entire Urban Dictionary dataset. Actual dataset is pretty large, so I've split it into four Google Fusion Tables:
If you want to collect your own sample from urban dictionary, this repo includes a few scripts that can help you do just that.
Main entry downloader. Requires a word list to download entries for. Try grabbing the one from here.
$ npm install
# Pass in a word list file
$ node download.js data/a.txt
This will attempt to download the first 10 definitions for each word in the list into a file data/a.txt
. Data is stored in NeDB databases, but you should be able to easily update download.js
to output whatever format you need.
Simple python script used to turn NeDB dataset from download.js
into CSV:
$ python3 gen_csv.py data.db out.csv
Simple Javascript script used to generate markdown for entries. Used for character level machine learning of urban dictionary entries.
$ node gen_md.js data.db urban.md
This is for research purposes. I'm not affiliated with Urban Dictionary.