Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add file linked_datasets.txt containing the Trip Updates, Vehicle Positions and Service Alerts URLs #93

Closed
wants to merge 9 commits into from

Conversation

LeoFrachet
Copy link
Contributor

@LeoFrachet LeoFrachet commented Aug 9, 2018

Current Issue

As data consumer, knowing which producer has real-time and tracking which producer adds real-time is time consuming and requires manual searches. Also, keeping track of the changes of the different real-time URLs (Trip Updates, Vehicle Positions and Service Alerts URLs) is also time consuming and could lead to silent bugs or time-downs.

[Updated on 2018-08-09 to add discovery]
[Updated on 2018-08-15 14:55 UTC to add the example]
[Updated on 2018-08-27 12:37 UTC to add the fields authentication_type , authentication_info_url & api_key_parameter_name ]

Proposal

We can add those real-time URLs in an extra file of the GTFS.

This proposal adds a new file called linked_datasets.txt, containing the fields:

  • url (String, required): A full URL, linking to the other dataset;
  • trip_updates (Boolean, required): Whether the dataset at this URL may contain a TripUpdate entity.
  • vehicle_positions (Boolean, required): Whether the dataset at this URL may contain a VehiclePosition entity.
  • service_alerts (Boolean, required): Whether the dataset at this URL may contain an Alert entity.
  • authentication_type (Integer, required): Defines the type of authentication required to access the URL. The allowed values are:
    • 0 or empty: No authentication required.
    • 1: Ad-hoc authentication required, please check on agency website for more.
    • 2: The authentication requires an API key, which should be passed as value of the parameter api_key_parameter_name in the URL.
  • authentication_info_url (String, optional): If an authentication is required, this field contains an URL to a human readable page describing how the authentication should be done and how potential credentials can be created. Required if authentication_type is 1 or greater.
  • api_key_parameter_name (String, optional): Name of the parameter to pass in the URL to provide the API key. Required if authentication_type is 2.

Please note: those datasets are only linked to the core GTFS, they are not owned by it, and therefore the data in feed_info.txt doesn't apply to them (e.g. start & end date).

Discussion around the naming of the file

TriMet (Ping @mgilligan & @fpurcell) currently uses the extra file realtime_feeds.txt with the fields url, trip_updates, service_alerts and vehicle_positions.

My current proposal is only different in the naming of the file (linked_datasets.txt vs realtime_feeds.txt), for two reasons:

  • “realtime”: In the future, other URLs will likely be added, and they may not be of “real-time” datasets. Therefore I would avoid setting in stone a limitation which isn’t needed (e.g. ServiceChanges proposal which allow up to 24h changes).
  • “Feed” is a misleading word (IMHO) which is currently used in the specification with different meanings depending of the context. It is either a specific dataset (see feed_info.feed_start_date) or the set of the different versions of the dataset (see feed_version definition). Some open proposal even define “feed” as being the unique and constant source of the different datasets (see feed_id proposal).

For those reasons, I think linked_datasets.txt encapsulate unambiguously the content of the new file, without adding any useless limitations for the future.

Background

This has already been proposed in 2014 in the GTFS-changes Google Group (see here and there). TriMet is currently using the 2014 proposal in their production feed.

Example

For example, the linked_datasets.txt file for Madison Metro Transit (which does not require authentication) would be:

url,trip_updates,vehicle_positions,service_alerts,authentication_type
http://transitdata.cityofmadison.com/TripUpdate/TripUpdates.pb,1,0,0,0
http://transitdata.cityofmadison.com/Vehicle/VehiclePositions.pb,0,1,0,0
http://transitdata.cityofmadison.com/Alert/Alerts.pb,0,0,1,0

... and for TriMet (which requires an API key as a URL parameter called appID for authentication) it would be:

url,trip_updates,vehicle_positions,service_alerts,authentication_type,authentication_info_url,api_key_parameter_name
http://developer.trimet.org/ws/V1/TripUpdate/,1,0,0,2,https://developer.trimet.org/GTFS.shtml,appID
http://developer.trimet.org/ws/gtfs/VehiclePositions,0,1,0,2,https://developer.trimet.org/GTFS.shtml,appID
http://developer.trimet.org/ws/V1/FeedSpecAlerts/,0,0,1,2,https://developer.trimet.org/GTFS.shtml,appID

... and for Metra in Chicago (which requires ad-hoc authentication not enumerated in our options, in this case via HTTP basic authentication) it would be:

url,trip_updates,vehicle_positions,service_alerts,authentication_type,authentication_info_url
https://gtfsapi.metrarail.com/gtfs/raw/tripUpdates.dat,1,0,0,1,https://metrarail.com/developers/metra-gtfs-api
https://gtfsapi.metrarail.com/gtfs/raw/positionUpdates.dat,0,1,0,1,https://metrarail.com/developers/metra-gtfs-api
https://gtfsapi.metrarail.com/gtfs/raw/alerts.dat,0,0,1,1,https://metrarail.com/developers/metra-gtfs-api

@googlebot
Copy link
Collaborator

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

@abyrd
Copy link

abyrd commented Aug 9, 2018

Hi @LeoFrachet, I remember there being a significant amount of discussion about this a few years ago on one of the mailing lists, but I don't remember the conclusions. It would be helpful to track down that conversation and see what issues have already been raised on this subject.

A few initial observations:

  • Ideally the problem of changing realtime URLs should be a small one - producers should avoid not change these URLs
  • We might not see widespread adoption of this new file, and I suspect the producers careful enough to add the new file would not be the ones causing you problems by unexpectedly changing their realtime URLs

I agree that inconsistent use of the word "feed" is problematic and it may make sense to prefer other terms. However I would argue that the word "feed" itself is not ambiguous - it evokes the image of an endless stream of new data being regularly updated, as in "RSS feed". I believe the original intended meaning of "feed" in GTFS is that there's a stable URL where new data appears from time to time, with the feed being the ongoing series of files published at that URL.

@LeoFrachet
Copy link
Contributor Author

Hi Andrew, thanks for your answer!

It would be helpful to track down that conversation and see what issues have already been raised on this subject.

This is the goal of the "Background" section of my first message. You'll find in it the two links to the two conversations from the Google Group speaking about it. About the conclusions of those discussions, the consensus seemed pretty broad on the solution that I'm describing above and which is used by TriMet. The only difference is on the naming of the file as I'm point out.

[...] causing you problems [...]

Just to be sure we are on the same page, please note I'm not working at Transit anymore. I'm working for RMI on the @MobilityData program, which is the continuation of the ITD program you knew, which created the Best Practices.

I suspect the producers careful enough to add the new file would not be the ones causing you problems by unexpectedly changing their realtime URLs

Using linked_datasets.txt would also help consumers to discover which producers have real-time, since there is currently no simple way to know it. It would also allow consumers to be informed when a producer adds real-time. (I'm gonna add this in my first message, since it's another important half of the issue).

@barbeau
Copy link
Collaborator

barbeau commented Aug 9, 2018

I agree that this is needed. One example that bit us earlier this year was MBTA's movement of GTFS-realtime URLs. MBTA announced this on their Google Group:
https://groups.google.com/forum/#!topic/massdotdevelopers/NH4pDN_HWcM

...but we're archiving data from several different agencies and haven't been manually monitoring all the developer Google Groups for those agencies. So MBTA made a best effort to announce this change, but we missed it. If there was a programmatic way to communicate this change (i.e., this proposal), this design easily scales and we could have caught the change.

In the MBTA case, from a consumer's perspective it would have been nice to also have a feed_start_date and feed_end_date too to know when the transition was being made ahead of time.

@abyrd
Copy link

abyrd commented Aug 9, 2018 via email

@skinkie
Copy link
Contributor

skinkie commented Aug 10, 2018

What would happen if multiple URLs to the same type of data are applied (tripUpdates/...) ? Should the consumer choose, or should it merge?

@LeoFrachet
Copy link
Contributor Author

@skinkie

This is currently not supported AFAIK with the current specification. This would be adding additional feature to the current GTFS behavior.

I'm not against it (in fact, I'm even in favor of it), but that's outside of the scope of this proposal IMHO. It could be the next step once this proposal has been adopted.

(But, otherwise, I agree, we could be scoping by agency_id for example).

@googlebot
Copy link
Collaborator

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

@skinkie
Copy link
Contributor

skinkie commented Aug 10, 2018

@LeoFrachet if that is not in the scope then I fail to see why an extra file is needed, opposed to adding extra columns in feed_info.txt

@LeoFrachet
Copy link
Contributor Author

@skinkie: Because according to the current specification, agencies have to choice to either provide one unique URL with the three feeds in the same protobuf, or one URLs by dataset (aka a total of 3 URLs).

@barbeau
Copy link
Collaborator

barbeau commented Aug 10, 2018

I think @skinkie is trying to say that this format works when you have a single agency in the dataset:

feed_publisher_name feed_publisher_url feed_lang feed_start_date feed_end_date feed_version trip_updates_url vehicle_positions_url service_alerts_url
511 SF Bay http://511.org/developer-resources_transit-data-feed.asp en 20130311 20190519 20180522 http://511.org/trip-update http://http://511.org/vehicle-position http://http://511.org/service-alert

...but wouldn't work if you have more than one agency in the feed that each has their own GTFS-realtime feed. (@skinkie is that right?)

I'd say if we have a producer that's willing to represent mapping of agencies to GTFS-realtime feeds then we can include agency_id in Leo's proposal. If not, Leo's proposal still gives us the flexility to adopt agency_id in the future without changing the structure/file.

I do agree that there is some overlap with fields in feed_info.txt (in particular, the ones I mentioned - start_date and end_date), although I don't know think we can always assume that the GTFS feed_info.txt fields apply to GTFS-realtime feeds.

@drewda
Copy link

drewda commented Aug 10, 2018

+1 to the overall goal of adding GTFS-RT URLs in to static GTFS feeds. This would be useful for @transitland to associate existing static and real-time feeds, to discover new real-time feeds, and to identified real-time feeds with changed URLs.

@skinkie
Copy link
Contributor

skinkie commented Aug 10, 2018

@barbeau what @LeoFrachet is saying: "The spec is currently not drafted that multiple realtime feeds are allowed, and could be merged on the client." This means to me that per GTFS there can be only one GTFS-RT feed of a single type. A per agency split is not supported. Thus then I wonder, why not have the realtime URLs as feed_info.txt in three columns (per type), since only one feed (per type) is supported anyway.

The Dutch situation is as follows: we have (three) train GTFS-RT and a (three) GTFS-RT for the rest of public transport. Hence it would be in our own interest to have the option to specify multiple GTFS-RT feeds for a single GTFS file. The current proposal does not write anything about an integrated feed of the three feeds as one, and nothing about merging. So in the current state it would just make things more difficult instead of explicit.

@skinkie
Copy link
Contributor

skinkie commented Aug 10, 2018

I would like to add something else, that may be interesting for Google/Bing people. While software like OpenTripPlanner et al can build graphs in hours, and other software (no spam) in minutes. The problem at Google Transit, Bing Maps and others is that it takes days. In this period tripIds might be different from operator standpoint. Synchronisation between the GTFS-RT producing application timetable version and the GTFS version active on the client would suggest that feed_info.txt might be a good place to set up multiple URLs of GTFS-RT for specific usage. Hence if Google Transit had version "Monday" loaded and a new timetable at the producer was at "Wednesday" the feed_info.txt might specify which older versions of the GTFS-RT URLs might be supported.

@barbeau
Copy link
Collaborator

barbeau commented Aug 10, 2018

The Dutch situation is as follows: we have (three) train GTFS-RT and a (three) GTFS-RT for the rest of public transport. Hence it would be in our own interest to have the option to specify multiple GTFS-RT feeds for a single GTFS file.

Are more than one of these GTFS-RT feeds for the same agency_id in the single GTFS zip file? Or is each GTFS-RT feed tied to a single agency_id from GTFS?

@LeoFrachet is saying "The spec is currently not drafted that multiple realtime feeds are allowed, and could be merged on the client." This means to me that per GTFS there can be only one GTFS-RT feed of a single type. A per agency split is not supported.

The current proposal is silent on how more than one TripUpdate feed in linked_datasets.txt should be handled, but I don't see why we couldn't add text to say if more than one URL for the same entity_type is provided the client should consume all of the entities from all of the URLs (for active periods, if we define start and end dates), if you're willing to produce this data and this addresses your use case.

@skinkie
Copy link
Contributor

skinkie commented Aug 10, 2018

Are more than one of these GTFS-RT feeds for the same agency_id in the single GTFS zip file? Or is each GTFS-RT feed tied to a single agency_id from GTFS?

Both GTFS-RT tripUpdate.pb files (train vs non-train), contain multiple agencies.

@LeoFrachet
Copy link
Contributor Author

Thanks @skinkie and @barbeau for your thoughts!

If I try to list the concerns that you have with the current proposal, I got four of them:

  1. Need to be allowed to specify multiple Trip Updates feeds (respectively Vehicle Positions feeds, Service Alerts feeds) for one GTFS dataset. (@skinkie)
  2. Need to synchronize between the « static » GTFS dataset and the real-time feeds. (@skinkie)
  3. Need to scope the real-time URLs by adding a start date and an end date. (@barbeau).
  4. Why not just add the URLs in feed_info.txt instead of adding an extra file?

The 4 (adding the real-time URLs in the feed_info.txt) is feasible, but IMHO would prevent any future improvement, since it would bind the URLs to the GTFS metadata (like the start and end date).

Putting them in another file keeps the doors open to future improvements. And if you both tell me that there is a need today for such improvements, then we can extend the proposal with extra fields, to address the needs you’ve spoken about:

Concern 3: Need to scope in time the real-time URLs by adding a start date and an end date

It’s pretty straightforward, as you said earlier we can add in linked_datasets.txt the fields:

  • start_date & end_date (Optional): Defines the date range in which the URL should be used. If the field is empty or if the column is missing, the start_date will fallback on the feed_info.feed_start_date, and the end_date will fallback on the feed_info.feed_end_date. Please note that « dates » are defined as service date, not calendar date, just like for any other date in the GTFS.

=> @barbeau, would you have a potential producer for this?

Concern 1: Need to be allowed to specify multiple Trip Updates feeds (respectively Vehicle Positions feeds, Service Alerts feeds) for one GTFS dataset

The simplest solution would be to add in linked_datasets.txt the field:

  • agency_id (Optional): Defines the agency, and therefore the set of routes and trips, for which this URL should be used. If the field is empty or if the column is missing, the URL will be used for the whole dataset.

The two limitations I see are:

  1. If an URL applies to multiple agencies, then we have to list them one by one, creating redundancy of information, up to (number of agencies x 3) rows. The two possibles improvement to address this limitation are:
  • We factorize this in an relationship table, with fields agency_id and dataset_id: this reduce the redundancy but add complexity to the format.
  • We keep the redundancy, to keep the formal simple and limit the spreading of information.

=> So far, in the GTFS format, simplicity has always prevailed over reduction of redundancy. I would therefore advocate for just adding an agency_id in the proposed linked_datasets.txt

  1. If the routes that require different URLs belongs to the same agency, what do we do?
  • We could add an optional route_id column in linked_datasets.txt, but this could make this table huge, since the number of routes is often one or two orders of magnitude bigger than the number of agencies.
  • We could defined another type of grouping for the routes.
    => I would just not handle this case for now. If it needs to be done, it can be hacked by splitting the agency (Yeah, it’s a hack, I agree).

=> @skinkie, could OpenOV be potentially a producer for this?

Concern 2: Need to synchronize between the « static » GTFS dataset and the real-time feeds

I agree this is a huge need. I would like to see it solved, but we need a unique identifier for the GTFS dataset for that. Therefore this is a distinct issue that this one IMHO.

@skinkie
Copy link
Contributor

skinkie commented Aug 13, 2018

@skinkie, could OpenOV be potentially a producer for this?
What we now provide to Google are six GTFS-RT URLs and they know they have to mix it.

This doesn't work for OpenTripPlanner, hence two GTFS files are exported, and the 3 GTFS-RT files each are matched to a single dataset. Why the later is not a sustainable solution for me: more and more features in GTFS such as pathways.txt observe GTFS to integrate an entire network, at least geographically.

Given that we have 'by coincidence' a mode split, others may have an agency spilt. I am not against enforcing the option to only have a single GTFS feed paired with tripUpdates / serviceAlerts / vehicleUpdates. But given that you agree that this is a huge need, I rather have the semantics available to use something side by side without specifying how it is split.

@LeoFrachet
Copy link
Contributor Author

Indeed.

So for now, we could stick to the current proposal, and just add the following warning:

Warning: Multiples URLs can provide the same type of data (e.g. two URLs can both provide data for Trip Updates). They will be merged by data consumer and used as if it was only one feed. The merging will be done simply by using the entities of the different feeds, without selection or pruning.

@barbeau
Copy link
Collaborator

barbeau commented Aug 14, 2018

start_date & end_date (Optional): Defines the date range in which the URL should be used. If the field is empty or if the column is missing, the start_date will fallback on the feed_info.feed_start_date, and the end_date will fallback on the feed_info.feed_end_date. Please note that « dates » are defined as service date, not calendar date, just like for any other date in the GTFS.
=> @barbeau, would you have a potential producer for this?

This one might be tricky to get a producer and consumer for - we can add it to our USF campus shuttle feed, but the fields would just be blank as we don't anticipate changing the URL any time soon. So for this to be meaningfully tested with a real feed we'd need to find someone who is planning to change their GTFS-realtime URLs.

I'm fine with this proposal continuing without the start and end dates - we could add these in a future change when a producer needs to change their GTFS-realtime URLs. The existing proposal would still support an immediate changeover - the producer would just need to make sure that the old and new GTFS-realtime URLs worked in parallel for a while during the transition.


| Field Name | Required | Details |
| ------ | ------ | ------ |
| url | **Required** | The **url** fields contains the URL to the linked dataset. The value must be a fully qualified URL that includes **http**:// or **https**://, and any special characters in the URL must be correctly escaped. See http://www.w3.org/Addressing/URL/4_URI_Recommentations.html for a description of how to fully qualified URL values. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably shouldn't limit to http and https - so it should read The value must be a fully qualified URL that includes a scheme such as **http**:// or **https**://,....

I would also add `The value must be the exact URL a client could use to make a request that returns GTFS-realtime data in the protocol buffer format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include websocket as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@barbeau & @skinkie: Do we want to defined exhaustively the protocols which are allowed (e.g. HTTP GET and WebSockets), or do we just want to just say "Should be a fully qualified URL" without specifying a protocol?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should exhaustively list protocols here - the purpose of this sentence and including "http/https" is just to give an example that we expect a fully qualified URL.

To my knowledge all websocket implementations of GTFS-realtime are currently for DIFFERENTIAL feeds, which aren't currently officially supported in GTFS-realtime (see #84), so I'd suggest omitting ws:// examples to avoid confusion. When DIFFERENTIAL feeds are officially supported we can add more text here that says something like "such as http for FULL_DATASET and ws for DIFFERENTIAL feeds".

@LeoFrachet
Copy link
Contributor Author

As requested, I've added the following example in the first post of this thread.

Example

For example, the linked_datasets.txt file for Madison Metro Transit would be:

url,trip_updates,vehicle_positions,service_alerts
http://transitdata.cityofmadison.com/TripUpdate/TripUpdates.pb,1,0,0
http://transitdata.cityofmadison.com/Vehicle/VehiclePositions.pb,0,1,0
http://transitdata.cityofmadison.com/Alert/Alerts.pb,0,0,1

@barbeau
Copy link
Collaborator

barbeau commented Aug 15, 2018

So here's a wrinkle that we haven't discussed - TriMet's GTFS-realtime feeds require an API key (i.e., AppID):
https://developer.trimet.org/GTFS.shtml

If you make a request to the URL listed in TriMet's GTFS data:
http://developer.trimet.org/ws/V1/TripUpdate/

...without an API key, you get a "403 Forbidden" response. You also get the same response if you use an invalid URL:
http://developer.trimet.org/ws/V1/foo/

To clarify to consumers whether or not a given URL requires an API key for a valid response, perhaps we should add a requires_api_key column with 0 and 1 values?

@skinkie
Copy link
Contributor

skinkie commented Aug 15, 2018

@LeoFrachet it is good you gave this example. My interpretation was different.

@LeoFrachet
Copy link
Contributor Author

@barbeau: Very good point.

If the API key can always be passed in the URL as argument (which is really common), we should even better give the field name like api_key_field_name. Which for TriMet would be appID.

Another option, which is less clean IMHO, would be to put it directly in the URL with a placeholder, like https://developer.trimet.org/ws/V1/arrivals?appID=$ApiKey.

@barbeau
Copy link
Collaborator

barbeau commented Aug 15, 2018

If the API key can always be passed in the URL as argument (which is really common), we should even better give the field name like api_key_field_name. Which for TriMet would be appID.

This would be great. It would definitely be useful for the transit-feed-quality-calculator that I've been working with. Right now I just have a bunch of API parameter names and keys hard-coded in URLs in a CSV file.

If an agency required authentication with something other than an API key, like HTTP basic authentication (see CUTR-at-USF/transit-feed-quality-calculator#31 for Metra in Chicago), how would we handle this?

Maybe we need two fields - requires_authentication, api_key_parameter_name?

So TriMet's data would be:

url,trip_updates,alerts,vehicle_positions,requires_authentication,api_key_parameter_name
http://developer.trimet.org/ws/V1/TripUpdate/,1,0,0,1,appID
http://developer.trimet.org/ws/V1/FeedSpecAlerts/,0,1,0,1,appID
http://developer.trimet.org/ws/gtfs/VehiclePositions,0,0,1,1,appID

...and a feed with basic HTTP authentication would be:

url,trip_updates,alerts,vehicle_positions,requires_authentication,api_key_parameter_name
http://my.feed.org/ws/V1/TripUpdate/,1,0,0,1,
http://my.feed.org/ws/V1/FeedSpecAlerts/,0,1,0,1,
http://my.feed.org/ws/gtfs/VehiclePositions,0,0,1,1,

In other words, requires_authentication = 1 and a blank api_key_parameter_name means that some other type of authentication is required.

@paulswartz
Copy link
Contributor

@mbta should also be able to produce this before the end of the year.

@fpurcell
Copy link

Hey @LeoFrachet, do you know of anyone who was using the older realtime_feeds.txt file? If Mike makes the change, should realtime_feeds.txt continue to be included in the feed for a period of time? (e.g., is there a policy or best practice on how to deprecate something like realtime_feeds.txt?).

@LeoFrachet
Copy link
Contributor Author

Good question! realtime_feeds.txt has never been an "official" GTFS experimentation, but only was a TriMet own extension.

I'm not aware of anybody using it. But the real answer is that I don't know.

You can try to inform your main consumers & the ones with whom you have an official / legal relationship. For the other ones... well... just pull the plug (aka remove it) and wait for anybody to start screaming (🙏 please don't do that around Xmas or NYE though, developers have families too #BeenThere).

@LeoFrachet
Copy link
Contributor Author

Ping @juanborre & @gcamp. We should soon have at least 3 producers (MBTA, TriMet & Trillium's datasets).

@fpurcell
Copy link

Thanks @LeoFrachet ... brings up an interesting point, in that we really don't know the users of our feed. Guessing other producers are also in the dark on who (beyond the gang of four: Google, Apple, Bing and Transit App) the consumers are...

@LeoFrachet
Copy link
Contributor Author

This is out of the scope of this conversation, but still is a very important other conversation.

Maybe one practical way to do it is to survey your riders. They'll tell you which app(s) they are using. This won't give you the exhaustive list of who's using your data, but at least you'll be able to reach out the most important ones (rider-wise).

@mgilligan
Copy link

@LeoFrachet, I'll add it to our GTFS by the end of the day

@LeoFrachet
Copy link
Contributor Author

LeoFrachet commented Dec 6, 2018

TriMet GTFS contains linked_datasets.txt. 😱🎉

@TransitApp (@gcamp & @juanborre) you can start the engin!

@drewda
Copy link

drewda commented Feb 4, 2019

We can start to consume this in @transitland as part of our GTFS Realtime cataloging and validation efforts.

We're not sure if storing this information inside static GTFS feeds is the best solution in the long run -- we want to continue a wider discussion about formats for linking static GTFS, GTFS Realtime, GBFS, MDS, etc -- but it's worth experimenting with this along the way.

@drewda
Copy link

drewda commented Feb 4, 2019

The one type of "ad-hoc" authentication scheme that has been mentioned on this thread is basic auth. There are also some endpoints that include API keys in their URL paths. For example:

  • Long Island Rail Road: https://mnorth.prod.acquia-sites.com/wse/LIRR/gtfsrt/realtime/[PutAPIKeyHere]/proto
  • Metro North: https://mnorth.prod.acquia-sites.com/wse/gtfsrtwebapi/v1/gtfsrt/[PutAPIKeyHere]/getfeed

These are so rare it's probably not necessary to handle -- I just mention it for future readers wondering what "ad-hoc" auth might cover.

@LeoFrachet
Copy link
Contributor Author

@drewda Sounds good! Please let us know when it will be implemented in TransitLand. Thanks!

@TomGoBravo
Copy link

Like many people aggregating GTFS feeds I want an easy way to maintain a collection of GTFS schedules and joined GTFS-realtime feeds, the "Current Issue" Leo mentioned when opening this PR. I saw the transitfeeds.com provider as a possible solution and found this discussion when digging a bit more. Similar to what @drewda wrote perhaps we have an easy short term solution to the current issue which doesn't require agencies to maintain a new file: modify the transitfeeds.com API to publish feed URLs by provider.

The common case is a single schedule zip and 0 - 3 realtime feeds referring to it. Some providers have multiple zips and realtime feeds, for example mta. The good news is it looks like mta uses the same GTFS ids across zips, for example MTA Bronx GTFS/100646 and MTA Queens GTFS/100646. We'd need to clean up things like provider NJ Transit which has zips containing conflicting ids because it is unclear which is associated with the realtime feed.

The idea of a global registry of feeds, grouped by provider that use the same GTFS identifier namespace was discussed quite a bit in the thread Feed identification and naming Leo linked to. Back in 2013 nobody was volunteering to maintain it but now it looks like Transitfeeds aka OpenMobilityData is already doing it. Perhaps this can be the beginning of the https://github.com/transitland/distributed-mobility-feed-registry. I haven't thought about how this can be connected to the transitland feed_id and Onestop ID scheme. Does transitland have a list of realtime feeds per provider/operator? I can't them in the API.

@barbeau
Copy link
Collaborator

barbeau commented Mar 2, 2019

@TomGoBravo No GTFS-rt feeds in Transit.land as of today, but @drewda is working on that as part of this project - https://www.interline.io/blog/transportation-research-board-funds-gtfs-realtime/.

@tsherlockcraig
Copy link

Trillium has started producing this file but it won't appear for many of our real-time-enabled feeds for a few more weeks, pending some decisions on public API keys.

We have published this data in one of our GTFS feeds, for Marin Transit: see

@drewda
Copy link

drewda commented Apr 12, 2019

Findings from @transitland by @irees:

these feeds contain at least one feed version with linked_datasets.txt or realtime_feeds.txt:

=> ["f-drt3-909983", "f-dr-coachcompany~ma~us", "f-drt0-limoliner~ma~us", "f-9qf-calaveras~ca~us", "f-9qdd-maderaareaexpress~ca~us", "f-dnn3-thecomet~sc~us", "f-9q8yy-missionbaytma~ca~us", "f-9xhg-winterpark~co~us", "f-dhm-keywest~fl~us", "f-9q6-kcapta~ca~us", "f-9zqv-coralvilletransitsystem~ia~us", "f-9zqvh-cambus~ia~us", "f-drmg-marthasvineyard~ma~us", "f-drq-capecodregionaltransitauthorityccrta", "f-9r8-josephinecounty~or~us", "f-9q9p3-emerygoround", "f-dp4j-citybus", "f-dnr-tta~regionalbus~nc~us", "f-9q8zr-tidelinewatertaxi~ca~us", "f-c20-trimet", "f-9rc-cascadeseast~or~us", "f-9xj5s-viamobilityservices~co~us", "f-9q5d-707", "f-9mgve-avalon~ca~us", "f-c20w-rivercitiestransit~wa~us", "f-c0-bcferries~bc~ca", "f-c22-mason~wa~us", "f-drt7-merrimackvalley~ma~us", "f-9xh-cme~co~us", "f-9qc-westcat~ca~us", "f-dre-greenmtncn~vt~us", "f-dnm6-asheville~nc~us", "f-9rbm-corvallis~or~us", "f-9rb-cascadespoint~or~us", "f-9vxz-jatran~ms~us", "f-9r92-basin~or~us", "f-dps-michiganflyer~mi~us", "f-9qh-victorville~ca~us", "f-9px-cooscounty~or~us", "f-9rb-oregonexpressshuttle", "f-drg-addisoncounty~vt~us", "f-dp8m-metrotransit~cityofmadison", "f-9we1-redappletransit~nm~us", "f-dp8d-jts~wi~us", "f-9q5-vctc~ca~us", "f-9qf-laketahoe~ca~us", "f-dnru-chapel~hill~transit~nc~us", "f-9zmzj-767", "f-c3j-757", "f-c25-ctuir~or~us", "f-dr7f-greaterbridgeporttransit", "f-9q-easternsierra~ca~us", "f-9r1-tehama~ca~us", "f-9q7-kerncounty~ca~us", "f-9myr-paloverdevalley~ca~us", "f-drt-mbta", "f-dr5r-path~nj~us", "f-9qbb-marintransit", "f-9qd-yosemite~ca~us", "f-dru-ruralcommunity~vt~us", "f-9qd-stanislaus~ca~us", "f-9r4-lassen~ca~us", "f-djjt-suntran~fl~us", "f-9qh1f-elmontetransit", "f-9qh1-norwalktransitsystem", "f-9r-peoplemover~or~us", "f-djz4-carta~sc~us", "f-9ze3-755", "f-c2qft-udash~mt~us", "f-9qhf-bigbear~ca~us", "f-9q56-thousandoaks~ca~us", "f-c20-tillamook~or~us", "f-9zep-siouxareametro~sd~us", "f-9q9-modesto~ca~us", "f-9rb-pacificcrest~or~us", "f-dp0r-citylink", "f-drq5-hylinecruises~ma~us", "f-9qh0-anaheim~ca~us", "f-9xh3-vailtransit", "f-dnrg-cary~transit~nc~us", "f-9qb-sonomacounty~ca~us", "f-9qd-mercedthebus~ca~us", "f-dp89-beloittransit~wi~us", "f-dpc8-sheboygan~wi~us", "f-dpq-lakecounty~oh~us", "f-djkj-starmetro", "f-9qc0-soltrans~ca~us", "f-9t9p7-universityofaz~cattran~freeshuttleservice", "f-9xh-ecotransit~co~us", "f-drsb-wrta", "f-dq259-ncsu~wolfline~nc~us", "f-9qc-eldoradotransit~ca~us", "f-drt6-lowellregionaltransitauthority", "f-c20dz-washingtonparkshuttle~or~us", "f-c214-sandy~or~us", "f-dq25-capital~area~transit~nc~us", "f-drsf-montachusett~ma~us", "f-9yz-jeffersoncounty~mo~us", "f-drm-blockislandferry~ri~us", "f-dnru-durham~area~transit~authority~nc~us", "f-djj2-pascocountypublictransit~fl~us", "f-drt-baystatecruisecompany~ma~us", "f-c21-hoodriver~or~us", "f-9q9p3-datatrilliumtransitcom", "f-dne-bgcap~ky~us", "f-djq-sunshinebuscompany~fl~us", "f-9q9hy-mountainview", "f-9pp-humboldtcounty~ca~us", "f-9rb-benton~or~us", "f-9xj5sg-universitycoloradoboulder~co~us", "f-dj3-ecat~fl~us", "f-9pz-lincolncounty~or~us", "f-dhvk-manatee~fl~us", "f-9qbc9-petalumatransit~petaluma~ca~us", "f-drq4-thewave~nantucketregionaltransitauthority", "f-9wgzn-741", "f-9r0-redding~ca~us", "f-drtd-capeann~ma~us", "f-drs-crtransit~vt~us", "f-djf8-montgomerytransit~al~us", "f-dre-berkshire~ma~us", "f-dnh0-gwinnettcountytransit", "f-dnrug-duke~nc~us", "f-9vg3-startransit~tx~us", "f-dp9k-waukeshacounty~wi~us", "f-dpc4-oshkosh~wi~us", "f-dp9e-belleurbansystem", "f-9qc-fairfield~ca~us", "f-c20f-cccxpress~or~us", "f-9x-bustang~co~us", "f-9qd-madera~ca~us", "f-9qb-mendocino~ca~us", "f-drmm-southeasternregionaltransitauthority", "f-drsh-dvtamoover~vt~us", "f-9rb-hut~or~us", "f-dru-chittendoncounty~vt~us", "f-dj75-baytowntrolley~fl~us", "f-9muq-lagunabeach~ca~us"]

mostly trillium feeds, which appear to now include linked_datasets.txt by default, even if the file is empty (<-- in case you aren't already aware @thomastrillium)

only trimet includes realtime_feeds.txt these days — some feeds had it in earlier versions

@tsherlockcraig
Copy link

mostly trillium feeds, which appear to now include linked_datasets.txt by default, even if the file is empty (<-- in case you aren't already aware @thomastrillium)

that's right, we're working on including content for a number of other feeds for which we'll need to include API key information, http://www.marintransit.org/data/google_transit.zip is the one feed that has entries according to the spec. in a week or so here, a service alerts feed will be added for Marin as well.

@paulswartz
Copy link
Contributor

The GTFS for @mbta also has linked_datasets.txt: https://cdn.mbta.com/MBTA_GTFS.zip

@ericouyang
Copy link
Contributor

For authentication_type, could we also add the following?

  • 3: The authentication requires an HTTP header, which should be passed as the value of the header api_key_parameter_name in the HTTP request.

We (Swiftly) generally prefer for consumers of real-time feeds to use this instead of a URL parameter to help protect the value of the API key.

@e-lo
Copy link

e-lo commented Apr 14, 2021

Checkin in on the status of this PR.

  1. It sounds like there are producers (@mgilligan TriMet, @paulswartz MBTA)
  2. Are there consumers?
  3. Is there a reason why this hasn't been put to a vote?

@anomalily
Copy link

anomalily commented Apr 14, 2021 via email

@e-lo
Copy link

e-lo commented Apr 14, 2021

...(second set of questions)...

We are exploring if/how to link various GTFS Schedule datasets – specifically for the use case of specifying inter-dataset fare rules and pathways. To the best of my knowledge this is the most relevant, (sorta) current, related discussion on that matter?

cc: TransitApp @gcamp @ and Cal-ITP colleagues @mcplanner @antrim

@skinkie
Copy link
Contributor

skinkie commented Apr 14, 2021

We are exploring if/how to link various GTFS Schedule datasets – specifically for the use case of specifying inter-dataset fare rules and pathways. To the best of my knowledge this is the most relevant, (sorta) current, related discussion on that matter?

I disagree. The most important part for inter-dataset stuff is unique, consistent and predictable identifiers. This issue on how different parts of the same producer can be discovered iff it is open data.

@e-lo
Copy link

e-lo commented Apr 14, 2021

@skinkie

  • I think we are in agreement that identifiers (with the attributes you gave) are very important and would be superior (if they exist)
  • While I understand that this PR (as currently drafted) is only intended for relationships between the same producer, it is structured in a way that could theoretically be extended beyond that.

I am certainly not arguing that this is an optimal way to implement relationships between GTFS Producers - just asking as to if it is most relevant discussion on references beyond a dataset (if not – I would appreciate a link to the appropriate place).

@stale
Copy link

stale bot commented Aug 21, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale Issues and Pull Requests that have remained inactive for 30 calendar days or more. label Aug 21, 2021
@stale
Copy link

stale bot commented Aug 28, 2021

This pull request has been closed due to inactivity. Pull requests can always be reopened after they have been closed. See the Specification Amendment Process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GTFS Realtime Issues and Pull Requests that focus on GTFS Realtime GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule Status: Stale Issues and Pull Requests that have remained inactive for 30 calendar days or more.
Projects
None yet
Development

Successfully merging this pull request may close these issues.