Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: methodology of top bike/ped crash intersections is flawed, results inaccurate #3

Open
markstos opened this issue Dec 28, 2023 · 0 comments

Comments

@markstos
Copy link

markstos commented Dec 28, 2023

This bug report relates to this published map of the most dangerous intersections for pedestrians and cyclists:

image

The same methodology was used for the overall "most dangerous" intersections report. The overall map may also have flawed results but was not checked.

I tracked down the code used to generate both and found these:

query for most dangerous intersections

"crash_df[injury_crashes|fatal_crashes].groupby(by=['Roadway Id','Intersecting Road']).size().sort_values(ascending=False)[:20]"

query for most dangerous bike/ped intersections

    "bikeped_crashes[ped_involved | bike_involved].groupby(by=['Roadway Id','Intersecting Road', 'Year']).size().sort_values(ascending=False)[:20]"

IDS methodology explained

Both queries use the same approach: They rely on matching the names of the street names precisely. This would produce correct results if the data was perfectly clean and consistent.

To check if that was the case, I ran my own analysis of the most dangerous bike/ped intersections, using a purely spatial analysis and ignoring the quality and consistency of the the street names.

The IDS found 8 locations that had at least 3 bike or ped crashes over the 20 year period studied. My alternate analysis found 167 locations. To get a number of intersections less than 10, I had to filter to least bike/ped 12 crashes.

To check my work, I reviewed one the top intersections that the spatial analysis found in the top 10, but the IDS did not. One of those was 7th and Walnut. Then I used the IDS interactive crash dashboard to confirm if there were in fact a lot of bike/ped crashes there, and there are. In this zoomed-in screenshot of that intersections, you can see the 12 crashes:

image

To understand why the results differ, you can check out the tooltips for each crash and look at the street names. Here is a sampling of the intersection names you will find:

  • 7TH ST & WALNUT
  • 7TH & WALNUT ST
  • E 7TH ST & N WALNUT ST
  • W 7TH ST & N WALNUT ST

The variations continue, but that's enough to illustrate the problem: The
IDS method used considers each of those a distinct intersection name, so the counts were for them weren't properly combined.

Hex grid method

I've published the code I used for a hex-grid based analysis here:

https://gitlab.com/markstos/sidewalk-priority-map/-/blob/main/filter-add-ped-crashes.mjs?ref_type=heads

To start with, I generated a hex grid that covered the city of Bloomington with a cell side of 0.05 km.

Next, I used the code above to count cyclist and pedestrian involved crashes contained within each hex, based on the GeoJSON crash data the IDS provided.

Using QGIS, I filtered the layer to only hexes that had 12 more crashes to find the top bike/ped crash hexes.

Here's the result which shows the top 9 hexes that had 12 or more bike/ped crashes. The white triangles on top of them are the actual crash locations:

image

By comparing the locations with the hexes, we can see that no hex captured more than one crash-prone intersection.

While checking my own work here, I found an additional bug with the accuracy of the interactive crash web map, but I'll open a separate issue about that...

Disclaimer: I peer-reviewed the original coding work for this project, but missed this issue pre-publication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant