The good news: The United States Sentencing Commission makes very detailed files available about sentencing in the US. 🎉
The bad news: They are in a crazy fixed-width format and include SAS and SPSS scripts to read them into those programs and those programs alone. 😱
So what can we do about it? Well, we can write a little converter that converts them all! These files will do that for you.
It turns out the data is small enough that you can upload it to GitHub! However, it's lzma compressed. Here's how you can decompress them.
These files are filled with tons of nulls. The typical file size compressed is around 10MB and uncompressed aroung 1.5GB. So so so many blank fields. Loading this directly into pandas on a small box will probably make your box sad. Instead, you should really look at the usecols
kwarg of pd.read_csv.
If you're using homebrew, just do
$ brew install xz
Then you can open the files by doing
$ unxz [FILENAME].xz
First install xz utilties
$ sudo apt update && sudo apt install xz-utils
Then you should be able to open files thus
$ xz -d [FILENAME].xz
Both 7zip and WinZip will open these files for you. Download and install them at your leisure.
This is script has only been tested with Python 3 and it assumes you have click
installed. But this is just for progress bars, so you can comment out those lines if you want.
First you'll need to get the data from the Sentencing Commission. The script getdata.sh
gives examples, and will itself download FY08-20's data files.
Next you'll need to point the script convert.py
at the file. For instance,
$ python3 convert.py data/opafy14nid.zip
This will leave you a file called data/opafy14nid.csv
in that folder.
Be warned, these files end up being quite large, so you may want to gzip or xzip them.
MIT