Computing bikeability scores in Sydney for each local district. Assignment for DATA 2901.
This is based on the WalkScore computed based on density, diversity, design, destination accessibility and distance to cycling infrastructure or transport. Can check if bikeability correlates to property values.
Availability of cafes, restaurants, shopping centres, road network, cycling infrastructure, connectivity to public transport, public parks and trees can be intergrated.
Correlation with median income and average monthly rent calculated from Census data from Australian Bureau of Statistics (ABS)
SA2 data from ABS and bike sharing data:
StatisticalAreas.csv: area id, area name, parent area id
Neighbourhoods.csv: area id, area name, land area, population, number of dwellings, number of business
CensusStats.csv: area id, median household income, avg monthly rent
BusinessStats.csv: area id, num businesses, retail trade, accommodation and food, health care, ...
BikeSharingPods.csv: station id, name, num bikes, num scooters, latitude, longitude, description
Building database using PostgreSQL (access to database need to be provided) intergrating data from. At least one additional data set, from a web source using Web Scraping or Web-API.
- Sydney neighbourhood data (from CSV)
- Census Data from neighbourhoods (population count + no of dwellings)
- Cycling options in terms of of neighbourhoods (spatial join)
- Formula used:
cyclability = z(population density)+z(dwelling density)+z(service balance)+z(bikepod density) Using z score assuming normal distribution
Measure | Definition | Data Source |
---|---|---|
population density | population divided by neighbourhood’s land area | Neighbourhoods.csv |
dwelling density | number of dwellings divided by neighbourhood land area | Neighbourhoods.csv |
dservice balance | balance of selected business types in neighbourhood | BusinessStats.csv |
bikepod density | number of bike-sharing pods per suburb divided by area | BusinessStats.csv |
- Measure and Score in each neighbourhood in database. At least one index helpful for data integration and cyclability score
- Correation between score and median annual household income or average weekly rent per neighbourhood
Include in calculation data inferred using machine learning or natural language processing. (e.g. count named entities about planned cycling infrastructure and geolocation)
4 page + Appendix. Data integration and outcome.
- Dataset Description: identify data source and how data obtained and pre-processed
- Database Description: which database schema was database integrated (diagram)
- Cyclability Analysis: formula applied, overview of results. Can use text, highlight, graphical representation
- Correlation Analysis: how well score correlate to median household income, if there is correlation with average weekly rent with neighbourhoods
Python Juptyer notebook in Python and SQL. Use Jupyter and PostgreSQL servers in labs. Any extra libraries not in labs need to be disclosed in documentation.
- Source code for integration and analysis
- Report/documentation (up to 4 pages)
- Demo
- Access to database with schema and processed data