- Review the full story of building a ML model for classification or regression.
- Understand how data is formatted and downloaded including CSV and JSON.
- Review Neural Network architecture
- A Single Perceptron
- Activation Functions
- Diagram the components of a simple multi-layered perceptron (XOR)
- Review the terminology of the training process
- Training
- Learning Rate
- Epochs
- Batch size
- Loss
- Tabular Data (CSV)
- Tabular Data from Coding Train "Data + APIs" tutorial (lots of extra stuff here the first few minutes is probably most relevant?)
- Tabular Data Coding Train Processing tutorial (code is not JS!)
- JSON Data
- What is JSON Part 1 - Coding Train p5.js tutorial
- What is JSON Part 2 - Coding Train p5.js tutorial
- JSON Data from Coding Train "Data + APIs" tutorial (same as above, lots of extra unrelated stuff here).
- Perceptron Video, Perceptron p5.js code
- Perceptron Slides
- Neural Networks (Nature of Code Chapter 10)
- Full Coding Train Video Series using TensorFlow.js, 7.1-7.5 cover how the data was collected and cleaned
- Crowdsourcing Colors website, Crowdsourcing Colors source code
- Consider how to frame the problem and collect data.
- Understand critical questions to ask (e.g. Who is this for? What’s the context?)
- Understand the questions to ask about sourcing and collecting data.
- Learn how to prepare a data set, including how to normalize and properly format it.
- Feminist Data Set by Caroline Sinders
- Gender Shades: How well do IBM, Microsoft, and Face++ AI services guess the gender of a face? by Joy Buolamwini and Timnit Gebru
-
Find a dataset that interests you and link to it from the assignment 5 wiki Some ideas:
- Something you find online. For example, take a look at Kaggle and here is a list of datasets compiled last year.
- Find a dataset that you collect yourself or is already being collected about you. For example, personal data like steps taken per day, browser history, minutes spent on your mobile device, sensor readings, and more.
- A dataset that you collect by crowdsourcing data. One way to approach this is to create a google form with questions and ask friends and fellow students to complete the form.
-
Consider the following questions related to your dataset:
- Who collected and compiled it?
- Why was it collected?
- How was it collected?
- Describe the data: What are the dimensions? What are the variables and their data types? What can the first 5-20 rows tell us?
- Is there missing, incorrect, or otherwise problematic data?
- For whom is this data accurate or useful? What is this data unrepresentative of? (Who is missing and left out of the data?)
- Knowing what you know now about machine learning, what will a model trained on this data help you do? Are there are alternative (non-machine learning) methods you could use instead?
-
Pick from one of the following three "coding exercise" options:
- Option #1: Augment Lydia Jessup's 311 Calls ml5.js example. You could add an additional input field, customize the interface, or change other parameters of
ml5.neuralNetwork()
. - Option #2: Train a machine learning model in ml5.js with the dataset you picked for part 1 of the assignment.
- Option #3: Continue working on your sketch from Assignment 4. Document improvements or changes you made.
- Option #1: Augment Lydia Jessup's 311 Calls ml5.js example. You could add an additional input field, customize the interface, or change other parameters of
-
Complete a blog post with dataset description and documentation of your code exercise. Link from the homework wiki.