COVID-19 Decarceration and Public Health

CAPP 30254 Machine Learning Final Project

Overview

The problem

According to data collected by the New York Times, many of the largest outbreaks of the coronavirus are in carceral sites - correctional institutions, prison systems and jails. The close quarters make it impossible for inmates and staff alike to follow physical distancing guidelines. Advocates across the country have rallied around a decarceration campaign, pressuring attorneys general and sheriffs’ offices to release prisoners for the sake of public health. At the same time, skeptics have raised questions about public safety and crime rates.

The question(s):

Can we predict the death rate/rate of infection in the prison population without decarceration? Can we predict recidivism rates, with respect to violent crime, if people are de-carcerated? And what qualitative analysis can we contribute to this urgent conversation?

Installation:

To install required packages, run the following command in your command-line interface:

pip install -r requirements.txt

To view the Jupyter notebooks for data preprocessing and analysis, run the following command in your command-line interface to open Jupyter notebooks in your browser from the main project folder:

cd analysis
jupyter notebook

Directory:

Additional information on files in each subfolder are listed below by policy area.

data: Raw and processed data (Public Safety and Public Health)
exploratory_analysis: Exploratory analysis on data (Public Safety and Public Health)
files: Database configuration and analysis (Public Safety)
analysis: Data cleaning, preprocessing, and analysis (Public Health)

North Carolina's Department of Public Safety (NCDPS) releases "all public information on all NC Department of Public Safety offenders convicted since 1972." Before running the following modules or notebooks, download all tables and store them as CSVs. (Note: this will require around 5 GB of storage). Run ./ncdoc_parallel.sh to store the data in the preprocessed/ directory. For more information, see ncdoc_data project by jtwalsh0.

Files (Public Safety)

config.py: contains CSV locations and constants e.g., seed, CSV names, etc.
main.py: builds, populates, and queries a SQLite3 database by calling the following modules
- create_db.py: establishes a connection and creates tables
- populate_db.py: inserts records into database tables
build_dataset.py: queries tables in database, constructs flags and additional features, and outputs a CSV. Also contains functions to prepare data for and conduct analysis. Calls the following modules:
- query_db.py: executes SQL queries on the database
- pipeline.py: contains functions to perform imputation, one-hot encoding, etc.
classification.py: runs classification models, outputs precision-recall curves and the most important features. Also contains function to predict on active sentences using the best model.
query_and_build.ipynb: calls functions in build_dataset.py to output datasets as CSVs.
models_1994.ipynb and models_2008.ipynb: finds the best model(s) and returns evaluation metrics for data trimmed starting at 1994 and data starting at 2008 respectively
predict_active.ipynb: applies all of our models on all of our datasets (1994 and 2008, different target outcomes).

Data (Public Safety)

coding_offenses.xls: categorizes offense labels from the NCDPS based on extent of harm on a scale from 1 to 5, where 1 is the least likely and 5 is most likely
dataset_main_active3.csv: pre-processed output from build_all() in build_dataset, where recidivism is defined as reincarceration within three years of release. Files is too large to be pushed to github, but can be recreated using the information above

Analysis (Public Health)

clean_data.py: functions to transform data for machine learning. Functions include one-hot-encoding and normalizing data.
prison_conditions_wrangle.py: functions to clean and wrangle data from the UCLA COVID in Prisons dataset and the Bureau of Justice Statistics
build_prison_conditions_df.py: functions to build dataframes on prison capacity, prison population numbers, COVID-19 related social distancing policies in prisons, and mitigation policies to address the adverse effects of isolation on prisoners.
ph_analysis.py: functions to run a series of ML models on the COVID in prisons dataset. Functions include temporally splitting the data, running a temporal cross validation grid search to tune hyperparameters, training and testing several models, and selecting and evaluating the best predictors of COVID-19 cases in prisons.
prison_data_processing.ipynb: a Jupyter Notebook walking through the process of building the COVID in Prisons data set, and running the Machine Learning analysis.
ph_plotting.py: functions to plot related public health data and cross-validation

Data (Public Health)

marshall_covid_cases.csv: covid cases downloaded from the Marshall Project's COVID Tracker
may_19:
- ucla_0519_COVID19_related_prison_releases.csv: From the UCLA Law COVID-19 Behind Bars Data Project, Tracking number of residents released for prison population reduction efforts
- ucla_0519_jail_prison_condition_policies.csv: From the UCLA Law COVID-19 Behind Bars Data Project, Descriptive summaries of ongoing policies affecting carceral conditions
- ucla_0519_jail_prison_confirmed_cases_deaths.csv: From the UCLA Law COVID-19 Behind Bars Data Project Tracking viral spread, screening procedures, and testing
- ucla_0519_visitation_policy_by_state.csv: From the UCLA Law COVID-19 Behind Bars Data Project Tracking visitation suspension policies and offerings of compensatory remote access

Team:

Authors

We also want to acknowledge and thank the course staff of CAPP 30254 (Nick Feamster, Felipe Alamos, Tammy Glazer, Alec Macmillen, Erika Tyagi, and Jonathan Tan) for their feedback and encouragement.

Name		Name	Last commit message	Last commit date
Latest commit History 253 Commits
analysis		analysis
data		data
exploratory_analysis		exploratory_analysis
files		files
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COVID-19 Decarceration and Public Health

Table of Contents

Overview

The problem

The question(s):

Installation:

Directory:

Files (Public Safety)

Data (Public Safety)

Analysis (Public Health)

Data (Public Health)

Team:

Authors

About

Releases

Packages

Contributors 4

Languages

License

christi-liongson/covid_decarceration

Folders and files

Latest commit

History

Repository files navigation

COVID-19 Decarceration and Public Health

Table of Contents

Overview

The problem

The question(s):

Installation:

Directory:

Files (Public Safety)

Data (Public Safety)

Analysis (Public Health)

Data (Public Health)

Team:

Authors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages