Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTFS-ride aggregation and GTFS versioning #25

Open
antrim opened this issue Oct 6, 2018 · 8 comments
Open

GTFS-ride aggregation and GTFS versioning #25

antrim opened this issue Oct 6, 2018 · 8 comments

Comments

@antrim
Copy link

antrim commented Oct 6, 2018

More documentation and consideration of how this data should/would be managed, stored, and amended over the course of time would be useful.

  • Should GTFS-ride consuming software be able to ingest multiple GTFS-ride feeds and show all data at the same time?
  • Is the best practice to collect a growing feed representing all past calendars? Per service period? How does the management of a GTFS-ride feed link up with the ongoing history of a live GTFS feed, with merged calendars, edited calendars, etc.? If GTFS datasets are merged, IDs for calendars and trips may need to be changed.
  • What are the implications for managing and publishing GTFS data? Should we have a way of differentiating between when a new dataset (aka feed version) corrects an error in previous data vs. describes an upcoming service change? For some thinking on this, see:
@johanricher
Copy link

Why not just use git? WRI and AFD are experimenting this by proposing a GitLab instance as a data infrastructure for GTFS-producing projects in developing countries:
http://git.digitaltransport4africa.org/

Versioning of the GTFS is built-in! (Example for Accra)

@e-lo
Copy link

e-lo commented Oct 25, 2018

Re Versioning:

It seems to me that versioning of "history" (what GTFS-RIDE's main function is) is a problem with less dimensions that versioning different potential futures. Just two that I can think of, in fact:

  1. correcting an error (i.e. "oops, it should be 500 not 5000)
  2. updating a format/specification (i.e. we are going to make add a column about cats on buses called "cats_onboard")

In the instance of a specification change, I would say that it should be up to the interpreting software to be able to verify and accept different specification versions; AND, per GTFS theory, any new specification should be backward compatible. Therefore I see less of a reason to maintain previous specification versions.

In the case of correcting an error, I do believe it is very important to keep old versions because people will have referenced them.

Therefore, it is fairly straightforward to keep a file in a single git repo with a single branch, advancing (and appropriately tagging) for whichever type of commit you are making (error vs format).

BUT.....

GTFS-RIDE is also a format that can be used for "the future" and here is where I think it gets really dicey (as you can see in the preso that @antrim referenced). SO many potential dimensions. Le sigh.

@e-lo
Copy link

e-lo commented Oct 25, 2018

W.r.t. "best practice" around file management, I think we should consider that smaller file sizes are better in general because:

  1. git doesn't like big files.
  2. opening a file once it is written should only be done if you are correcting an error; otherwise you might create a new error.
  3. it is easier to spot "diffs"
  4. they are easier to move around

AKA - we should be storing files in the size in which they are created; likely each day IMHO and then at regular aggregation intervals.

@carletop
Copy link
Contributor

As in the presentation from @e-lo, this is an issue for which GTFS-ride doesn't have a solution. The current best practice which the project team has been using to create pilot GTFS-ride feeds follows the process in the comment from @ODOT-RPTD-mb referenced above. The most glaring issue arises when a new feed is published to correct an error in a previous feed. There is currently no mechanism to indicate which feed should be used to associate the ridership data when dates overlap. The most recently published feed is assumed "active" from its date until a subsequently published feed supersedes it. It seems this issue stems from the fact that GTFS is intended to be a forward-looking plan for anticipated services, but GTFS-ride needs a historical account of the services which were actually offered. The frequent publishing of new GTFS feeds is another issue contributing to the clumsy cumbersomeness of needing to handle many, large GTFS-ride feeds. It seems a merged, corrected "GTFS-retro" feed is what is desired. The idea of using GTFS-realtime together with GTFS to create such dataset was an intriguing idea, but probably still far off. I like the git idea as well, but this sounds like a broader issue with GTFS practices than one can be solved here. @antrim should this issue be closed or do you feel that more action is needed here?

@scrudden
Copy link

@carletop I have been thinking about accessing demand based on GTFS ride data. One thing that would help to estimate demand is an update on the demand for the previous vehicle to pass the same stop.

This lead me to think about the issue you describe here and in particular your comments about GTFS as being forward-looking and GTFS-ride looking back.

I have in the past used CapMetrics as a source of data for working on predictions. The way they have gathered the data from GTFS-realtime vehicle locations and posting it to GIT was very useful.

It would be very useful if there was a standard way of providing the data corresponding to a row of board_alight.txt in real-time (on doors close). This could then be archived in GIT along with the current active GTFS and in turn use this GIT repostory and the realtime feed to further inform demand predictions.

Is this something that would be possible using current APC systems?

@barbeau
Copy link

barbeau commented Feb 28, 2019

@scrudden One key challenge for archiving occupancy from GTFS-realtime today is that GTFS-rt only supports a high-level enumeration of occupancy with values like "MANY_SEATS_AVAILABLE, FEW_SEATS_AVAILABLE", etc.:
https://github.com/google/transit/blob/master/gtfs-realtime/spec/en/reference.md#enum-occupancystatus

There is currently a proposal being drafted that would allow more details about a vehicle, including more granular quantitative occupancy, to be expressed in GTFS and GTFS-rt. I'd welcome comments and ideas from everyone on the current draft spec:
http://bit.ly/gtfs-vehicles

@scrudden
Copy link

@barbeau Where is the best place to comment on gtfs-vehicles? Directly in the google doc? From what I have read so far the proposal seems to capture occupancy well but for my intended purpose, I would like to know the number of passengers boarding and alighting at each stop.

@barbeau
Copy link

barbeau commented Feb 28, 2019

Yes, just comment in the Google Doc right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants