Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define "header.incrementality: DIFFERENTIAL" consumer/producer behavior #84

Open
barbeau opened this issue Dec 8, 2017 · 42 comments
Open
Labels
GTFS Realtime Issues and Pull Requests that focus on GTFS Realtime

Comments

@barbeau
Copy link
Collaborator

barbeau commented Dec 8, 2017

GTFS-rt spec current says the following about differential messages:

currently, this mode is unsupported and behavior is unspecified for feeds that use this mode. There are discussions on the GTFS Realtime mailing list around fully specifying the behavior of DIFFERENTIAL mode and the documentation will be updated when those discussions are finalized.

Based on a discussion starting in opentripplanner/OpenTripPlanner#2516 (comment), differential feeds are being used in practice.

I've opened this issue to start to document behavior for deployed feeds using differential messages, with the goal of working towards a proposal/pull request to better define differential producer/consumer behavior in the GTFS-realtime spec.

In opentripplanner/OpenTripPlanner#2516 (comment), @abyrd says:

To summarize, our differential GTFS-RT uses trip-level granularity, and the effects of successive messages about the same trip do not accumulate.

An update about a trip is always cleanly applied to the base trip from the original static feed, with no consideration for any previously received message. The differential effect comes from accumulating changes to different trips across the whole data set.

If anyone else is producing/consuming differential messages, please add comments here for any design documentation for expected consumer/producer behavior.

@barbeau
Copy link
Collaborator Author

barbeau commented Dec 8, 2017

One item that should be addressed is how propagation behaves across trips in the same block. To my knowledge the spec is currently silent on this issue in general (see CUTR-at-USF/gtfs-realtime-validator#90), so a more general proposal could be created to address this question for FULL_DATASET feeds too.

@TeXitoi
Copy link

TeXitoi commented Dec 8, 2017

Here, we have https://github.com/CanalTP/kirin that read realtime feeds, reconstruct the realtime trip, and then send to https://github.com/CanalTP/navitia using GTFS-RT (with internal ids and extentions to send messages) the complete trips that changed (with no delay, with timestamps). The messages are pushed using AMQP.

It is exactly the behavior @abyrd is tanking about. We just take this path after a talk with him. Working nice, limiting the load of the engine with only relevant data.

We don't support (yet) blocks with realtime, but I think the block problem is not related to DIFFERENTIAL mode.

@barbeau
Copy link
Collaborator Author

barbeau commented Feb 2, 2018

Here's a proposal for DIFFERENTIAL semantics written a while back:
https://docs.google.com/document/d/19Dy6afltgs1ebbxKQGX4jpzWHh--Iw4AOO_rtX1bQoc/edit#heading=h.giueiiu2ge3y

It came up in another discussion (https://groups.google.com/d/msg/gtfs-realtime/JGITCQLw8Ww/BZePPYp5AgAJ).

@barbeau
Copy link
Collaborator Author

barbeau commented Sep 11, 2018

Here's an implementation of a MQTT GTFS-RT trip update updater in OpenTripPlanner, which could be used as an RI for header.incrementality: DIFFERENTIAL consumer:
opentripplanner/OpenTripPlanner#2516

@abyrd
Copy link

abyrd commented Sep 11, 2018

@barbeau the GTFS-RT implementation in OpenTripPlanner has followed these same differential incrementality semantics for about 5 years and has been used in production in the Netherlands for most of that time. The GTFS-RT-over-MQTT implementation implementation is using these same established semantics. Around 2013 we were originally planning to use zeroMQ for message transport and Brian Ferris asked us to go with Websockets since that was perceived as more standard. So there has been a potential reference implementation of this for quite a while.

@barbeau
Copy link
Collaborator Author

barbeau commented Sep 11, 2018

From opentripplanner/OpenTripPlanner#2516 (comment), regarding the existing feed that @abyrd mentioned:

The GTFS-RT-feed is published on mqtt.hsl.fi, using the topic gtfsrt/v1/hsl/tu, as specified by the config. The architecture and service documentation unfortunately still refer to the old APIs, but will be updated soon. The public facing ui is located at https://www.reittiopas.fi/, and the OTP API is available at https://api.digitransit.fi/routing/v1/routers/hsl/.

@abyrd
Copy link

abyrd commented Sep 12, 2018

@barbeau I think there might be a misunderstanding, I didn't mention any existing feed on this ticket.

The information you quoted is from @hannesj about the Helsinki system. It is delivering GTFS-RT messages over an MQTT transport, and used with the Finland static and realtime feeds cited.

The existing OTP RT implementation I referred to is GTFS-RT over websocket transport, which is used with the Netherlands open data from OpenOV / Plannerstack.

The GTFS-RT spec mentions a differential mode, but does not specify how those messages are delivered. The authors may have originally intended something different such as HTTP fetch of differential data sets, but we decided to stream the messages over a persistent connection. zeroMQ was considered in NL, then websockets were used in OTP, @TeXitoi reports using AMQP, and Finland is using MQTT. The transport layer should actually be mostly or entirely separate from the handling of the streaming realtime messages themselves.

This approach is known to be used in production in at least three large regions, and everyone I've talked to has reported positive results in terms of bandwidth, server load, and immediacy of realtime availability to consumers.

We can safely standardize on this, I think it's got at least as much uptake and cumulative experience behind it than many other changes we've made to the spec. I have re-read the document you cited above (https://docs.google.com/document/d/19Dy6afltgs1ebbxKQGX4jpzWHh--Iw4AOO_rtX1bQoc/edit#heading=h.giueiiu2ge3y) which I wrote to explain our Dutch implementation from 2014, and I believe it's still accurate as a proposal. It also contains some useful comments on points that need clarification.

@TeXitoi @hannesj please feel free to add comments to that document explaining how your implementations / interpretations may vary from the Dutch case.

@barbeau
Copy link
Collaborator Author

barbeau commented Sep 12, 2018

Thanks for clarifying @abyrd! Yes, if @TeXitoi, and @hannesj can review your existing proposal as well and if everyone can agree on a spec that covers the existing implementations it will give us something to move forwards on for header.incrementality: DIFFERENTIAL

@derhuerst
Copy link

In derhuerst/gtfs-rt-differential-to-full-dataset#1 (comment), i documented what gtfs-rt-differential-to-full-dataset needs to change to follow the (draft) spec.

@skinkie
Copy link
Contributor

skinkie commented Sep 14, 2020

@derhuerst Could you please document why the deleting is required? Wouldn't it be better to keep the Canceled in the new dataset?

@abyrd
Copy link

abyrd commented Sep 16, 2020

The recent comments on this ticket brought it back to my attention. It has been around for years, and re-reading the contents, I think we had already arrived at a pretty clear proposal about two years ago, and confirmed that the proposed semantics were in use in at least three large-scale systems. @barbeau, should we revive this and make a move to standardize?

@barbeau
Copy link
Collaborator Author

barbeau commented Sep 22, 2020

@abyrd I'm certainly in favor of trying to standardize if we have real-world producers and consumers. I think ideally the proposal would come from one of those producers and consumers in the form of a GitHub pull request so others can review and comment there. Would anyone currently producing or consuming be willing to do this?

@skinkie
Copy link
Contributor

skinkie commented Sep 22, 2020

@abyrd I'm certainly in favor of trying to standardize if we have real-world producers and consumers. I think ideally the proposal would come from one of those producers and consumers in the form of a GitHub pull request so others can review and comment there. Would anyone currently producing or consuming be willing to do this?

Ok, this part I don't understand. We have standardised DIFFERENTIAL since it has been part of the GTFS-RT since the beginning. What part do you want to have standardised?

@barbeau
Copy link
Collaborator Author

barbeau commented Sep 22, 2020

@skinkie The GTFS-realtime spec for Incrementality currently says:

DIFFERENTIAL: currently, this mode is unsupported and behavior is unspecified for feeds that use this mode. There are discussions on the GTFS Realtime mailing list around fully specifying the behavior of DIFFERENTIAL mode and the documentation will be updated when those discussions are finalized.

@skinkie
Copy link
Contributor

skinkie commented Sep 22, 2020

DIFFERENTIAL: currently, this mode is unsupported and behavior is unspecified for feeds that use this mode. There are discussions on the GTFS Realtime mailing list around fully specifying the behavior of DIFFERENTIAL mode and the documentation will be updated when those discussions are finalized.

How can we make this description behavior instead of implementation (maybe @antrim can shine his light on this too). Do you want a put a statement here in the direction: "A differential datastream updates part of entities with the subset that has been send, it does not replace entire entities."

@abyrd
Copy link

abyrd commented Sep 23, 2020

@skinkie the flag has been in the spec since the beginning, so in that sense it's standardized (mentioned in the standard), but with undefined semantics. The way differential was/is actually used in the Netherlands is a proposal the consortia created together during the MMRI project, in an effort to fill this gap in the GTFS-RT spec. I then wrote up a document on how we used differential GTFS-RT, serve as a proposal for an official change.

@barbeau cited this document in an earlier comment on this issue. It's something like seven years old but probably still perfectly valid. In 2018 I posted a comment above summarizing the proposal and linking again to the proposal document (#84 (comment)). It seems like the only thing remaining is to condense that document into an actual pull request updating the spec, but ideally coming from someone producing such a feed.

@TeXitoi confirmed in a comment above that CanalTP is using this approach. Feeds in the Netherlands have followed this convention for years, and I believe @hannesj stated that this approach is used in Finland. It would be very helpful to have commentary from some of those producers to confirm how widespread this approach is, and how similar it is to the proposal (or if it's identical to the proposal).
Then, if none of the producers or consumers are available to make the PR I could probably condense my original proposal doc into a PR.

So anyone producing or consuming this kind of data, please get in touch with me so we can get this finalized. As far as I know this is an example of successful industry collaboration to establish a uniform, well thought-out way of doing things, but on the standardization front it has completely stalled.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Status: Stale Issues and Pull Requests that have remained inactive for 30 calendar days or more. label Nov 22, 2021
@barbeau
Copy link
Collaborator Author

barbeau commented Nov 22, 2021

Not stale

@github-actions github-actions bot removed the Status: Stale Issues and Pull Requests that have remained inactive for 30 calendar days or more. label Nov 23, 2021
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Status: Stale Issues and Pull Requests that have remained inactive for 30 calendar days or more. label Nov 23, 2022
@github-actions
Copy link

github-actions bot commented Dec 7, 2022

This issue has been closed due to inactivity. Issues can always be reopened after they have been closed.

@github-actions github-actions bot closed this as completed Dec 7, 2022
@derhuerst
Copy link

This is still relevant.

@abyrd
Copy link

abyrd commented Feb 8, 2023

As mentioned in my comment above from September 2020, as far as I know several producers and consumers of voluminous GTFS-RT data have adopted interpretations of the DIFFERENTIAL flag aligned with the semantics established almost ten years ago in the MMRI project in the Netherlands. Documents have been prepared and proposals made since then to standardize this interpretation.

It's strange to see issues being closed as "stale", when they describe essential details which are relied upon and time-tested in some of the larger-scale RT systems in the world, and people have actually sunk the time and effort into clearly documenting them.

@skinkie @barbeau @TeXitoi and @hannesj please take another look at the above comment and let me know your thoughts on getting this standardized.
#84 (comment)

@abyrd
Copy link

abyrd commented Feb 8, 2023

@bdferris-v2 reading back through the document linked above I realized you might also be interested in this issue since you commented on it extensively at the time.

@skinkie
Copy link
Contributor

skinkie commented Feb 8, 2023

Sometimes it just feels like if you are not part of mobility data, the change request don't matter.

@bdferris-v2
Copy link
Collaborator

Meta comment. I don't mean this to sound harsh, but offering this as advice on how to make progress here.

  • At the end of the day, MobilityData does not control the GTFS change process. It's the same as it's always been. Anyone can propose a change. Anyone can call for a vote.
  • Spec changes do not just magically happen. They take time and effort from someone (or multiple someones) to solicit feedback, build consensus, and grow interest around a proposal.
  • MobilityData is not an army of people. It's like two people actively working on spec work. They can't realistically work on everything and they've chosen to prioritize the aspects of GTFS as prioritized by their membership (seems reasonable to me).
  • It is frustrating when bugs like this get marked as stale, because often the issues raised are real and worthy of attention. But the process aspect of whether this bug is open or closed doesn't change the fact that to make real progress, someone has to invest real time. If we would all feel better leaving the bug open, then that's reasonable. But I don't think that it will magically cause progress where the was none before.
  • I for one would be happy to see more spec discussions driven by more members of the community. If we think people aren't engaging because they don't feel welcome, then that's a real problem that I'd be interested in talking more about.
  • But for me, it all comes down to time. I'm so oversubscribed that I haven't even gotten around to closing the vote on Revert some changes made in transfers.txt transfer_type=4 and 5 proposal #363 from like 3 weeks ago and coming back with a scaled-down proposal and that's for a minor change. I've also heard from other folks that there are too many change discussions to follow along with as-is.

So tl;dr - there are useful spec change proposals that will be marked as stale because there are not enough cycles to drive them all at once. It's doesn't mean it's not relevant and if the stale-bug-bot is implying otherwise, maybe we should turn it off. But it doesn't change the larger problem.

@abyrd
Copy link

abyrd commented Dec 8, 2023

This came up over on #403 (comment) today. It's true that this not part of the official spec, for some of the reasons @bdferris-v2 outlined. But the fact that it's not part of the spec is causing people to neglect differential semantics, and as the RT spec grows it may reach a point where it's no longer possible to apply this pre-existing convenient interpretation because it was ignored when extensions were added.

Given that it's been around for 10 years and stood up quite well to large and continuous data streams, and was / is apparently used in several places, this really feels like it should be part of the spec to me. I would expect differential to become the norm for how GTFS-RT data is distributed some day, and am aware of big metropolitan systems that work this way, some of which as far as I know were directly inspired by the GTFS/OTP way of doing it.

@eliasmbd @tzujenchanmbd @barbeau any chance we can get this ticket re-opened and at least attempt to identify where this is being used in practice? There's a real chance we can just track down some existing system that's been running for years.

Aside: The fact that unresolved issues are closed automatically and hidden from view is odd and often frustrating to me, especially even when multiple people are commenting "not stale" and "this is relevant". But anyway I know this has become the norm. I can speculate about where the trend came from but it won't help anything.

@barbeau
Copy link
Collaborator Author

barbeau commented Dec 8, 2023

Hey all, FYI, I've left USF and no longer work on GTFS related projects.

To get this into this spec it's just going to take a champion to do the research, put a proposal into a pull request, and respond to and address comments.

I'll reopen the issue because I originally opened it, but I'll leave it to the currently active GTFS community to figure out how to move forward. 👋

@barbeau barbeau reopened this Dec 8, 2023
@emmambd emmambd removed the Status: Stale Issues and Pull Requests that have remained inactive for 30 calendar days or more. label Dec 8, 2023
@abyrd
Copy link

abyrd commented Dec 9, 2023

Hey all, FYI, I've left USF and no longer work on GTFS related projects.

Thanks, Sean. Sorry to bother you, I was just mentioning people who had participated previously because I wasn't sure who might have the permissions to reopen it. I'm following up now with people who might be producing and consuming differential realtime, and will post here when I've got some responses.

@abyrd
Copy link

abyrd commented Dec 10, 2023

Quoting #403 (comment)

Coming back to this document 10 years later it's actually kind of odd that we mentioned alerts as a special case. I think that was just an expedient choice, based on an assumption that alerts are not very numerous and not updating as often as vehicle positions, and allowing us to focus on handling the high-frequency updates. Looking at it now, I'm not sure it's a good idea to make them different than other message types.

We should rethink whether this exception for alerts still makes sense. I just noted that since the GTFS-RT protobuf definition was added in 2015, every FeedEntity has an ID and an is_deleted field. So it's certainly possible to delete any item including alerts differentially. You have to ensure all connected clients who may have seen the non-deleted version receive the is_deleted version, but many message queue systems offer "deliver exactly once" / "deliver at least once" guarantees. On the other hand, it may still be common for alerts to be handled in polling mode. It would be useful to survey the existing differential RT implementations to check on alert data volumes and best practices.

0ed83c1#diff-a1a14af3b4aed7d10a3b167231b52fdb483cf71b88f1655ed2e28bccd18800ceR83-R87

@skinkie
Copy link
Contributor

skinkie commented Dec 10, 2023

Overwriting with a deleted message that by itself is not retained. Would do the trick?

@abyrd
Copy link

abyrd commented Dec 14, 2023

Overwriting with a deleted message that by itself is not retained. Would do the trick?

In the specific case of MQTT something like this might work. However I am not clear on whether MQTT topics drop their retained message when a non-retained message is published, or you need to publish a retained message with an empty payload.

But all this is a description of message broker software (implementation strategy). From the RT specification perspective (the protocol not the implementation), the rule would just be that any subscriber who may have seen the original alert should receive a FeedEntity with the same ID as the original alert but the is_deleted flag set.

@abyrd
Copy link

abyrd commented Dec 14, 2023

First Producer/Consumer Example: Helsinki, Finland to OpenTripPlanner

This is not the oldest example (which dates back to 2013) but a more recent example for which we have ample documentation.

Producers: HSL (Helsinki) and Digitransit (Finland)

Helsingin seudun liikenne (HSL) is the Helsinki Regional Transport Authority. Digitransit is the open-source trip planner they began building in 2016 together with Fintraffic (covering the rest of Finland).

Here is a Github repository for the HSL realtime component: https://github.com/HSLdevcom/transitdata
Its readme contains a large diagram of the whole system: HSL System Diagram The output MQTT brokers visible at the top edge of the diagram. The one on the left cmqttdev.cinfra.fi shows a topic gtfs/rt/dev/fi/hsl/vp which is VehiclePositions. The one on the right pred.rt.hsl.fi shows a family of topics gtfsrt/[dev|v2]/fi/hsl/[tu|sa] which are the trip updates and service alerts. Note that these are HSL-specific brokers and provide information only on HSL services.

More information on the HSL realtime data flow is provided here: https://github.com/HSLdevcom/digitransit/blob/master/Dataflows.md#realtime-data-flow

The Digitransit brokers cover everything "not HSL", with a breakdown into feeds documented at https://digitransit.fi/en/developers/apis/4-realtime-api/vehicle-positions/digitransit-mqtt/#available-cities-and-regions
Documentation should be updated soon, as these now include the national trains as well as a subset of local trains operating in the HSL area.

The division into multiple brokers is apparently for historical reasons. HSL has had its own brokers for over 7 years, and the newer Digitransit MQTT broker is an extension funded by the other participants in the project.

Connecting to Sources

The topics on the HSL brokers do not have a leading slash and will close the connection if you add one. The HSL brokers seem to accept only unencrypted mqtt connections on port 1883. The topics on the Digitransit brokers do have a leading slash. Digitransit brokers accept the mqtts (secure) protocol on port 8883, or unencrypted mqtt on port 1883.

To see a feed of TripUpdates for the Helsinki region on the HSL broker:
mqtt subscribe -h pred.rt.hsl.fi -p 1883 -t "gtfsrt/v2/fi/hsl/tu" | xxd -c 32

To see a feed of ServiceAlerts for the Helsinki region on the HSL broker:
mqtt subscribe -h pred.rt.hsl.fi -p 1883 -t "gtfsrt/v2/fi/hsl/tu" | xxd -c 32

To see a feed of VehiclePosition for one of the non-HSL feeds (here, Tampere) on the Digitransit broker:
mqtt subscribe -h mqtt.digitransit.fi -p 1883 -t "/gtfsrt/vp/tampere/#" | xxd -c 32

Upstream Use of GTFS-RT

If you trace the data flow diagram back from host cmqttdev.cinfra.fi topic gtfsrt/dev/fi/hsl/vp the data are coming from another MQTT broker mqtt.hsl.fi providing HFP messages. Upstream those are being translated from Matkahuolto GTFS-RT. So these HSL vehicle positions seem to be translated from GTFS-RT to HFP back to GTFS-RT, but using individual message passing for the whole "signal chain".

Consumer: OpenTripPlanner

Digitransit is running a national-scale trip planner which autoscales computation nodes running OTP. Typically there are around 50 instances subscribing to the MQTT broker at once, with three external applications also subscribing. In September 2018 via opentripplanner/OpenTripPlanner#2516 (comment) OpenTripPlanner gained the ability to consume incremental GTFS-RT TripUpdate messages over MQTT as opposed to websockets. The configuration I used for testing is here: https://github.com/HSLdevcom/OpenTripPlanner-data-container/blob/v3/hsl/router-config.json#L180 but I removed the parking and bicycle updaters.

End-to-End Data Rate

I instrumented an OpenTripPlanner instance to count bytes, messages, and updates received and let it run for about 24 minutes. The average data rate was 34kB/sec with 33 messages per second, each containing one TripUpdate.

These match figures supplied by HSL: an average of 25 messages per second over the whole day, reaching 50 messages per second or higher at rush hour. Between 7AM and 7PM the range is typically 30-50 messages per second.

Note that at these rates, the whole unfiltered stream could in theory be carried on EDGE and definitely on 3G.

Sources:

Interview on OpenTripPlanner developer chat with HSL employee @optionsome on 2023-12-13.
Previous Github comments from @hannesj (who no longer works for HSL).
Digitransit and HSL documentation.

Additional Text

To incorporate into proposal and/or specification:
opentripplanner/OpenTripPlanner#2516 (comment)
https://groups.google.com/g/gtfs-realtime/c/IztqE4IU9_Q/m/ghh8tJ55BAAJ

@abyrd
Copy link

abyrd commented Dec 14, 2023

Second Producer/Consumer: Kirin and Kraken in Ile-de-France (Paris Region)

Navitia is a public transit passenger information system created by CanalTP, then Kisio Digital, then Hove. Many of its components are open source, and the whole system has been provided as a hosted service by these companies. This service backs various user-facing trip planner applications including ViaNavigo from Paris regional transit authority Ile-de-France Mobilités.

The Navitia system is quite complex. It’s not just the route optimization component, but includes the entire data integration pipeline for schedules and realtime data that feed into the router. Communication of transit data within this pipeline is via “enriched” GTFS and GTFS-RT.

As of this year, development of Navitia is no longer open. Apparently the copyright holders decided to stop licensing the codebase under the Affero GPL, but the previous versions of the source code are still accessible. There is an architecture overview on the main Navitia readme and the project wiki:
https://github.com/hove-io/navitia#architecture-overview
https://github.com/hove-io/navitia/wiki/Architecture

Kirin and Chaos feed GTFS-RT to Kraken

Both of these show two modules called Kirin and Chaos handling realtime and service disruptions data, which pass through a message broker and on to the Kraken public transport routing component. Here are some repositories that demonstrate the use of GTFS and GTFS-RT by the Navitia pipeline including Chaos and Kirin:
https://github.com/hove-io/transit_model
https://github.com/hove-io/chaos-proto

We have a statement in a comment above at #84 (comment) from a CanalTP employee in 2017 stating that Navitia realtime handling exactly corresponds to the GTFS-RT approach I have described, that this approach was adopted after discussing it with me personally, and that this approach is effective in leveling load. In all discussions I’ve had with Navitia developers over the years, my impression is that this system continues to function in the same way, with individual messages being applied at the trip level in a stream against the scheduled trips.

The case of Navitia is a bit ambiguous because the message-oriented, differential use of GTFS-RT is arguably within a single system, but it’s between two separate open source components of that system. It is used at a scale that we would often associate with an agency realtime integration pipeline producing data separately from consumer passenger information apps, but my sense is that both roles have been contracted out to a single entity here.

While the main regional realtime system is SIRI-based, my understanding is that it is similarly incremental message-based, and is interconnected with GTFS-RT based systems upstream and at the final integration and routing stages. Several years back I vouched for the differential message passing approach during the design of this regional system, and I heard from a participant in the design process that this approach was adopted. So this regional SIRI-based system may arguably be inspired by how GTFS-RT was used in the Netherlands and other countries, as well as in Navitia itself.

@paulswartz
Copy link
Contributor

While I don't have a diagram, I can speak to how MBTA is using DIFFERENTIAL GTFS-RT. At the moment, we're using MQTT and only for VehiclePositions. For each set of GPS updates, we publish a DIFFERENTIAL feed with those updates. Periodically (currently every 60s) we publish a FULL_DATASET feed as the "retained" message for the queue, allowing new clients to receive a full, if slightly stale, dataset. This also handles deletions for us, by removing offline vehicles from the FULL_DATASET feed. The code is available in https://github.com/mbta/concentrate, but the relevant bit is here where we distinguish between a partial_update (only updates the items specified in the feed) and a regular update where we remove the existing state first.

@abyrd
Copy link

abyrd commented Dec 14, 2023

Example 3: OpenOV to OpenTripPlanner (Netherlands)

This last example is the oldest one, dating back ten years to 2013. This is the original streaming message-oriented GTFS-RT system, and perhaps the first ever large scale open realtime data feed.

Producer

The Nationale Database Openbaar Vervoer (NDOV, "National Public Transport Database") is a government-run service in the Netherlands combining timetable, realtime, and fare data for use in trip planning services. Up until 2013 there was only one authorized subscriber which was a commercial trip planning API, and access to the raw data was very expensive. Stichting OpenGeo’s OpenOV project changed this situation by creating a second national-scale travel information service, thereby gaining access to the NDOV raw data in the local BISON Koppelvlak (KV) formats and republishing it in open standard formats. Over time the OpenOV GTFS feeds were recognized as the highest quality feeds for the Netherlands. They have since become the feeds consumed by Google for example.

At that time over 3000 vehicles were already monitored at any given moment, producing positions and arrival time predictions with updates on average every 10 seconds. It took around 2 seconds for a position to propagate from the vehicle down to stop arrival time predictions. The only way to handle this volume of information while preserving the low latency and high update frequency was to stream individual messages.

Consumer

Emergence of this feed is what pushed OpenTripPlanner to originally add support for realtime trip updates. This means that OpenTripPlanner realtime trip update support has always been differential message-oriented. Although OTP had support for realtime alerts as early as 2011, the first mentions of realtime trip updates in the OTP commit log date to the summer of 2012, with commit opentripplanner/OpenTripPlanner@1fdb5a6 (2012-06-21) applying Dutch KV8 data (analogous to GTFS-RT) from a ZeroMQ broker. This was quickly followed by draft GTFS-RT handlers and revisions to allow cleanly updating stoptimes concurrently with active search threads.

By a year later in the summer of 2013 we see commit opentripplanner/OpenTripPlanner@d444bca entitled “basic GTFS-RT over websockets”. What happened in between is the Multimodale Reisinformatie (MMRI, Multimodal Travel Information) project of the Dutch Transportation Ministry's "Beter Benutten" (Better Utilization) program. Three of the four consortia that participated in this pre-commercial research and development program were working with OpenTripPlanner, and one of those (https://github.com/bliksemlabs) created R4 which later inspired R5, and in turn inspired the OpenTripPlanner 2 transit routing engine.

Once the original OTP streaming realtime support had been built and demonstrated, the approach was described in this document: https://docs.google.com/document/d/19Dy6afltgs1ebbxKQGX4jpzWHh--Iw4AOO_rtX1bQoc

Current Status

As far as I know the MMRI message broker has been running since 2013 and has been used by various applications in the Netherlands. It is currently undergoing an overhaul to MQTT. More information should be available in the coming months.

@abyrd
Copy link

abyrd commented Apr 23, 2024

I am still planning to turn this into a proposal for the GTFS-RT spec. The three case studies above would be shared with the community to illustrate adoption of this idea. But I feel like this whole idea of "differential GTFS Realtime" cannot be fully communicated via a data format specification alone. It should be illustrated via practical examples like those above.

The GTFS documentation currently consists of a specification and best practices. Is there a good place to publish "case studies" or "state of practice" sections to support/illustrate an idea in the spec? Not promoting them as "best" practices but just example practices?

Pinging @eliasmbd @tzujenchanmbd @isabelle-dr since you seem to be actively involved in managing/structuring the documentation updates.

@eliasmbd
Copy link
Collaborator

Pinging @eliasmbd @tzujenchanmbd @isabelle-dr since you seem to be actively involved in managing/structuring the documentation updates.

Thanks for bringing it up. We are actively restructuring the documentation, more information will follow. This insight will help us propose a better solution in the way we present documentation. We believe that visual representation makes the information easier to digest.

@abyrd
Copy link

abyrd commented May 9, 2024

We believe that visual representation makes the information easier to digest.

To clarify, the visual representations are not the main thing I'd like to include in the documentation. I'm not even sure if they would be included as they are not my own work. It's the text descriptions that I find important to have as case studies, and are within my power to contribute.

@isabelle-dr
Copy link
Collaborator

Hey Andrew,

I think the case studies would live in the Resources section of gtfs.org, and they could be linked from other parts of the site (or even the spec itself).
The content in this section currently comes from the awesome-transit-list, what do you think of adding a section "case-studies" in there?

@abyrd
Copy link

abyrd commented Nov 13, 2024

@skinkie has the Netherlands message broker been migrated to MQTT? Is there a place we can follow the status of this work?

@abyrd
Copy link

abyrd commented Nov 13, 2024

@isabelle-dr I would ideally like to get all relevant information into the specification itself, rather than related but external information sources. At least some core portion of this information should definitely be in the spec, to supersede the text that has been causing confusion for a long time now: "DIFFERENTIAL: currently, this mode is unsupported and behavior is unspecified for feeds that use this mode."
https://github.com/google/transit/blob/master/gtfs-realtime/spec/en/reference.md#enum-incrementality

The supporting information on how exactly differential mode should be used in practice may also fall within the scope of the specification repo. For aspects of GTFS like this one that are complex, and where practices have already evolved for over a decade, I think it makes sense to actually describe those practices in detail in the "best practices" section of the spec, perhaps in a single-topic separate markdown page. This also gives people a way to identify and contact the community members who have carried out past implementation, which may be the only way to fully grasp how this feature is used.

@skinkie
Copy link
Contributor

skinkie commented Nov 13, 2024

@skinkie has the Netherlands message broker been migrated to MQTT? Is there a place we can follow the status of this work?

It is a new implementation from a new system that within that same architecture is also publishes VDV453/VDV454 and SIRI for the primairy reason to decouple the prediction from the publication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GTFS Realtime Issues and Pull Requests that focus on GTFS Realtime
Projects
None yet
Development

No branches or pull requests

10 participants