Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation metrics #52

Closed

Conversation

sdrewc
Copy link
Contributor

@sdrewc sdrewc commented Sep 15, 2017

Adds a new file, route_stats_ft.txt to contain route-level statistics for specified time windows, and adds additional fields to stop_times_ft.txt and trips_ft.txt to support calculation of statistics

ddorinson and others added 30 commits March 21, 2017 19:08
In the table of optional files, the link labeled "fare_rules_ft.txt" is 404.  File has been renamed in the standard to fare_periods_ft, so change it here as well.
Start to add TCQSM variables
and remove number_loading_areas, which belongs at a station/stop in stops_ft

Optional Attributes | Description
---------- | -------------
`schedule_time` | Integer, mean number of minutes from scheduled `arrival_time` at first stop to scheduled `departure_time` at last stop.
Copy link
Contributor

@e-lo e-lo Sep 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest avg_scheduled_runtime ?

Optional Attributes | Description
---------- | -------------
`schedule_time` | Integer, mean number of minutes from scheduled `arrival_time` at first stop to scheduled `departure_time` at last stop.
`actual_time` | Integer, mean number of minutes from actual `arrival_time` at first stop to actual `departure_time` at last stop.
Copy link
Contributor

@e-lo e-lo Sep 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest avg_observed_runtime

---------- | -------------
`schedule_time` | Integer, mean number of minutes from scheduled `arrival_time` at first stop to scheduled `departure_time` at last stop.
`actual_time` | Integer, mean number of minutes from actual `arrival_time` at first stop to actual `departure_time` at last stop.
`std_dev` | Float, standard deviation of `actual_time`.
Copy link
Contributor

@e-lo e-lo Sep 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest: stdev_observed_runtime

should we have for schedule too? there is certainly variation

`schedule_time` | Integer, mean number of minutes from scheduled `arrival_time` at first stop to scheduled `departure_time` at last stop.
`actual_time` | Integer, mean number of minutes from actual `arrival_time` at first stop to actual `departure_time` at last stop.
`std_dev` | Float, standard deviation of `actual_time`.
`semi_std_dev` | Float, semi-standard deviation between scheduled and actual route run time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

semi_stdev_observed_runtime

`actual_time` | Integer, mean number of minutes from actual `arrival_time` at first stop to actual `departure_time` at last stop.
`std_dev` | Float, standard deviation of `actual_time`.
`semi_std_dev` | Float, semi-standard deviation between scheduled and actual route run time.
`schedule_stop_time`| Integer, mean number of minutes scheduled stop time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meaning time spent the stop? or time serving passengers specifically? time with doors open? just be specific.

Might also want to use stopped rather than stop

`semi_std_dev` | Float, semi-standard deviation between scheduled and actual route run time.
`schedule_stop_time`| Integer, mean number of minutes scheduled stop time.
`actual_stop_time` | Integer, mean number of minutes actual stop time.
`stop_delay` | Integer, mean number of minutes of stop delay.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what delay this is referring to and how we would get it from data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the deviation from the GTFS schedules and the actual dwell time? the gtfs stuff might be hard to trust...hmmm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant stop_delay for stop delay to just be the diff between scheduled and observed stopped time. I agree we may not be able to get this from GTFS; often arrival_time==departure_time, but that's why it's optional

@@ -17,6 +17,8 @@ File MAY contain the following attributes:

Optional Attributes | Description
---------- | -------------
`actual_arrival_time` | Actual arrival time at a specific stop for a specific trip on a route in HH:MM:SS format measured from midnight. For trips that span multiple dates, the time should be entered as a value greater than 2400000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if this needs to be a separate file with similar values for matching with dates/times as you have in routes_stats?

@@ -13,3 +13,15 @@ Required Attributes | Description
`trip_id` | ID that uniquely identifies a vehicle trip
`vehicle_name` | Name of vehicle type, which is to match a description in [`vehicles_ft.txt`](vehicles_ft.md)

File MAY contain the following attributes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this for individual trips wheras route_stats is for groups of trips? suggest a new file

Copy link
Contributor

@e-lo e-lo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I made specific line comments, but I'm wondering if you need separate files for performance rather than lumping with the scheduled data?

  2. See line-level notes about naming conventions. I am not wedded to these specific names, but think they should be a little more explicit.

  3. What about what you originally termed "virtual routes" or corridor segments? I think it would be good to define and have a file to summarize them.

@e-lo
Copy link
Contributor

e-lo commented Sep 20, 2017

Decided to move information in this pull request to a new, to-be-named standard: https://github.com/osplanning-data-standards/GTFS-RPT

@e-lo e-lo closed this Sep 20, 2017
@barbeau
Copy link

barbeau commented Sep 21, 2017

Just curious - have you all been following GTFS-ride?

https://groups.google.com/forum/#!topic/transit-developers/cPTGF-rxtMo

Do you see GTFS-RPT you mention above overlapping with GTFS-ride at all, or will it stick primarily to vehicle performance (and not ridership)?

@e-lo
Copy link
Contributor

e-lo commented Sep 21, 2017

Howdy! We have been following, contributing comments to, and excited about using GTFS-ride. We will be using it as the standard ridership output from our Fast-Trips transit assignment software. However, there are several other aspects of transit travel that we need to move data around for, so we had to extend GTFS into the following:

  • GTFS-PLUS ...we are open to a better name... which brings in more aspects about the actual transit service like vehicles and capacities.
  • dyno-path, which shows individual passenger trajectories.
  • dyno-demand, which lists individual passenger demand.

...and then after talking amongst our team we decided to create GTFS-RPT (or whatever we decide to call it @sdrewc gets naming rights) in order to capture the other aspects of performance (travel times, reliability), summarized across a few dimensions similar to how GTFS-RIDE does for ridership.

If there is another standard or effort that is underway to summarize performance as such, we would be very open to adopting it as well as morphing this one to be something that is more universally helpful to the community. We are all-ears!

@barbeau
Copy link

barbeau commented Sep 22, 2017

@e-lo Awesome, thanks for the summary! I'll keep tabs on these. The one we'd most likely be immediately interested in is the GTFS-RPT, which, from my understanding, would be capturing things like schedule deviation and on time performance at stops, etc. We've been archiving GTFS-realtime data from a few places, originally for this project - https://www.nctr.usf.edu/wp-content/uploads/2017/05/NCTR-79050-17-Transit-Service-Reliability.pdf. As part of this we developed this proof-of-concept on time performance calculation tool - https://github.com/CUTR-at-USF/ontime-performance-calculator. My main interest is producing performance metrics that can be used for better real-time predictions using machine learning, but it could serve a lot of other purposes as well. There is definitely a need for a format to exchange this type of data, and we haven't really dove into that yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants