This project utilises open data from Data.gov.sg to build several Machine Learning (ML) models that help predict HDB Resale Prices. Live project
The main focus of this project is to complete a full cycle of
- Extract Transform Load (ETL)
- ML Model building
- Deployment
- Live dashboarding
The project involves a large dataset (>40k points) involving geodata of all Singapore HDB resale prices over the years 2022 and 2023.
The following steps were taken in the project: (all steps can be found in the JupyterNotebook ipynb files)
- Data was obtained through rest API calls to Data.gov.sg, followed by data wrangling
- Feature creation and selection (using KBest on Mutual Information, L1 Regularisation)
- Hyperparameter tuning (Random Cross Validation)
- Model selection and testing Normal and Ensemble models (Gradient boosting, Random forest)
- Front end web application (Flask) development with Bootstrap 5
- Dashboarding (Tableau & Streamlit)
Updates: 20240306
- Completed refactoring of code to modules
- Model and other objects are read once at the start of app startup, instead of every prediction
- Rounded up predictions
- Configured logging for individual modules