This project can be developed locally in two ways (primarily).
- By installation of all code/tools/data/etc. locally on your machine (this is currently the most common workflow).
- By installing Docker, then installing all code/tool/etc. into a Docker container, treating the new Docker container as your isolated development environment.
Why do we need two ways of doing things? The short answer is that we don't need two ways of doing things, but there are pros and cons to each approach. This document by IBM covers containerization and why someone would consider leveraging it (it's a bit long, but the following table hits on someone of the differences). The introductory sections of this video are quite helpful as well https://www.youtube.com/watch?v=KFyRLxiRKAc
Aspect | Without Containers | With Containers |
---|---|---|
Largely Avoids "Works On My Machine" | No | Yes |
Complexity | Low | Medium |
Getting Started | Fast | Medium-Slow then becomes Fast |
Portability | More Work | Low Work |
Consistency | High Work | Low Work |
Agility | Low | High |
Isolation | Low | High |
For me (Collin), I chose dockerized development because I can run different versions of software in isolation.
Once you've chosen which style of development you would like to persue, go to Getting Started - Without Docker xor Getting Started - With Docker which ever matches your needs.
Clone this repo and then install the project package:
cd aafm
pip install -e .
This section assumes you're using VS Code for development
- Go through this document or vidoe to familiarize yourself with containerized development https://code.visualstudio.com/docs/remote/containers or https://www.youtube.com/watch?v=KFyRLxiRKAc
- Add the
Remote - Containers
extension to VS Code (https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) - Clone this repository
- Open the root folder (aafm) in a container. See: https://code.visualstudio.com/docs/remote/containers#_quick-start-open-an-existing-folder-in-a-container or the video in step 1.
- The container has been configured to include Python 3.8, and Jupyter, meaning both of those will work out of the box without further configuration.
ETL (Extract Transform Load) basically just means data prep. Load some data from somewhere, transform it into a useful shape, then store it somewhere else.
- From Microsoft Teams, navigate to Files, then the Data Folder, then download
data.7z
- You will need a utility (or library) such as 7-zip to decompress the file.
- Why 7z? Because it has much better compression (LZMA2) than zip (DEFLATE).
- Extract the contents of
data.7z
into this project at./data/raw/daily/
. - Open
./notebooks/extract.ipynb
using Jupyter Notebooks or VS Code Notebooks.- If you're using our Docker Container for development, the right tools have already been added.
Simply open
extract.ipynb
in VS Code and wait for the proper UI to load.
- If you're using our Docker Container for development, the right tools have already been added.
Simply open
- Run each notebook cell on it's own to see how they work, or run them all.