Skip to content

Math for Machine Learning Final Project: Cncept of distribution drift in Divvy bike ridership data

Notifications You must be signed in to change notification settings

klinkoberstar/pedal_pals

Repository files navigation

Pedal Pals

Contributors: Jackie Glasheen, Kathryn Link-Oberstar, Jennifer Yeaton

We explored the concept of distribution drift in Divvy bike ridership data, spanning from 2014 through 2019. Utilizing a random sample of one million trips (approximately 5% of the dataset), we examine trip duration trends, revealing a notable shift in distribution patterns between 2014-2017 and 2018-2019, especially during summer months. Our analysis incorporates various machine learning models including K-Nearest Neighbors (KNN), Random Forest, and Multi-Layer Perceptrons (MLP) to predict trip durations. The models are evaluated using Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE), with adjustments for recent trends to address the observed distribution drift. The MLP model demonstrates superior performance, suggesting its effectiveness in handling high-dimensional data and adapting to non-linear patterns in the presence of distribution drift. Our findings highlight the importance of accounting for temporal changes in data distributions when developing predictive models.

Read our final paper HERE

This project was completed as part of coursework for Mathematical Foundations of Machine Learning (Computer Science 35300) at the University of Chicago.

About

Math for Machine Learning Final Project: Cncept of distribution drift in Divvy bike ridership data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published