Skip to content

Latest commit

 

History

History
7 lines (6 loc) · 1.88 KB

README.md

File metadata and controls

7 lines (6 loc) · 1.88 KB

AIHack18Challenge

My description and mock solution of a challenge for AIHack 2018, a student-hackathon with 300 expected attendees

Challenge

The goal of this challenge is to look for interesting correlations within the dataset. With features describing everything from average income to education to age distribution and living arrangements there are a lot of opportunities and we want you guys to find insights that you think could be utilized to improve the lives of those living people living in the area. Specifically we want you to pick one or two variables from the dataset you think will be valuable to be able to predict and from that train a model to try and predict said variables using either the entire dataset or subsets of the dataset that you are given.

Dataset

The dataset is 23123x7730 with each instance being a block group, a statistical unit used by the US census bureau, labeled with 7730 features. The features describes various socioeconomic traits of each block group and is rooted in a 10-question questionaire that every single American citizen should have answered, issued by the US census bureau. The questions ask for sex,age,gender, annual income, civil status, education and employment status and the dataset has restructured these answers into anyonomous features describing the averages of some answers and the count of people fitting certain characteristics as well. One variable could be "PER CAPITA INCOME IN THE PAST 12 MONTHS (IN 2016 INFLATION-ADJUSTED DOLLARS): Total: Total population -- (Estimate)". This would also have a corresponding variable with the same name, just ending in "(Margin of Error)",as the name suggest this would give you the "Margin of Error" for that specific feature. So for every actual feature there are two variables: one estimate and one margin of error. (It might be an idea you erase all margin of error labels before you try and fit your models.)