NOTE: The suggested code instructions are for macOS
Install Visual Studio Code here: https://code.visualstudio.com/
Follow instructions here https://www.freecodecamp.org/news/how-to-open-visual-studio-code-from-your-terminal/
Use the brew package manager to install requirements before setting up the repo. Visit https://brew.sh for installation instructions for the package manager and then enter following command to install requirements.
brew install [email protected]
Installation of Python can also be done through pyenv which can be installed through brew.
brew install pyenv
# Make sure to update x with the correct version of 3.10 in use. You may list all versions of 3.10 by executing
# pyenv install 3.10
pyenv install 3.10.x
pyenv global 3.10.x
pyenv user 3.10.x
If you are using zsh, execute the following to make sure the python command will link properly to the version installed through pyenv:
echo 'eval "$(pyenv init --path)"' >> ~/.zprofile
echo 'eval "$(pyenv init -)"' >> ~/.zshrc
git clone [email protected]:connor-hilll/short_stack_ml.git
Follow these instructions and run the commands from the root of the repo.
-
Create Virtualenv:
python3 -m venv venv
-
Activate venv:
. venv/bin/activate
-
Install requirements:
pip3 install -r requirements.txt
-
Install new kernal:
ipython kernel install --user --name=short_stack_ml
-
Open in VS Code:
code .
- Note: VS Code must be added to your path. Reference
Add VS Code to your PATH
section above
- Note: VS Code must be added to your path. Reference
-
Ensure iPython Notebook is using the virtual environment:
- In the
energy_analysis.ipynb
file, in the top right corner, it should state that the kernal isvenv (Python 3.x.x)
- In the
The main packages used are as follows:
- Pandas - data analysis and manipulation tool
- Numpy - fundamental package for scientific computing
- StatsModels - provides classes and functions for the estimation of many different statistical models
- Scikit-learn - Simple and efficient tools for predictive data analysis
- Matplot lib - Plotting library
- iPython - Provides a rich toolkit to help you make the most of using Python interactively
Some of the librarys have overlap such as scikit-learn and statsmodels. However, they all serve specific purposes
Can be found in ./data/weather_data.csv
NOTE: We are specifically interested in the Philadelphia Airport data
The weather data is free from NOAA (National Centers for Environmental Information) and can be found here: https://www.ncei.noaa.gov/
Can be found in ./data/residential_usage.csv
This is energy data from a single family residential property in Southern New Jersey. Unfortunately it was quite difficult to find free anonymous multifamily data.
The energy data can be found here: https://data.mendeley.com/datasets/rfnp2d3kjp
The cleaned data is a combined version of the two datasets. CDD and HDD stand for Cooling Degree Days and Heating Degree Days respectively. These values are commonly used in energy calculations. Please reference this weather.gov link for more information.