Skip to content

andreweryan/Microsoft-GeoLife

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Microsoft-GeoLife

This repo is a revisit and significant refresh of some work from my M.S. at GMU doing trajectory mining and urban analytics with the Microsoft GeoLife dataset.

The dataset consists of ~25M GPS points across 18,670 PLT/CSV files for 182 anonymized individuals.

Merging all the data together into a single Polars DataFrame with minimal preprocessing takes approximately 30 seconds with multiprocessing on a 12 core (24 thread) AMD Ryzen 9 3900x with 32GB RAM.

Dumping the full dataset of points and lines to a GeoPandas GeoDataFrame for exporting Parquet/GeoParquet takes significantly longer, approximately 15 minutes total, with each file being ~1.7-1.8GB each. This is a process that was done once for testing purposes. Subsequent reprocesing of the data with additional filtering/preprocessing (e.g., temporal resampling, trajectory/trip heuristics, etc.) is significantly faster, about 1 minute total, as the data is drastically reduced dependending on filters.

Full points around Beijing Full trips/trajecroties around Beijing Points from some initial preprocessing/filtering. Trips/trajecroties from some initial preprocessing/filtering.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages