-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define "header.incrementality: DIFFERENTIAL" consumer/producer behavior #84
Comments
One item that should be addressed is how propagation behaves across trips in the same block. To my knowledge the spec is currently silent on this issue in general (see CUTR-at-USF/gtfs-realtime-validator#90), so a more general proposal could be created to address this question for |
Here, we have https://github.com/CanalTP/kirin that read realtime feeds, reconstruct the realtime trip, and then send to https://github.com/CanalTP/navitia using GTFS-RT (with internal ids and extentions to send messages) the complete trips that changed (with no delay, with timestamps). The messages are pushed using AMQP. It is exactly the behavior @abyrd is tanking about. We just take this path after a talk with him. Working nice, limiting the load of the engine with only relevant data. We don't support (yet) blocks with realtime, but I think the block problem is not related to DIFFERENTIAL mode. |
Here's a proposal for DIFFERENTIAL semantics written a while back: It came up in another discussion (https://groups.google.com/d/msg/gtfs-realtime/JGITCQLw8Ww/BZePPYp5AgAJ). |
Here's an implementation of a MQTT GTFS-RT trip update updater in OpenTripPlanner, which could be used as an RI for |
@barbeau the GTFS-RT implementation in OpenTripPlanner has followed these same differential incrementality semantics for about 5 years and has been used in production in the Netherlands for most of that time. The GTFS-RT-over-MQTT implementation implementation is using these same established semantics. Around 2013 we were originally planning to use zeroMQ for message transport and Brian Ferris asked us to go with Websockets since that was perceived as more standard. So there has been a potential reference implementation of this for quite a while. |
From opentripplanner/OpenTripPlanner#2516 (comment), regarding the existing feed
|
@barbeau I think there might be a misunderstanding, I didn't mention any existing feed on this ticket. The information you quoted is from @hannesj about the Helsinki system. It is delivering GTFS-RT messages over an MQTT transport, and used with the Finland static and realtime feeds cited. The existing OTP RT implementation I referred to is GTFS-RT over websocket transport, which is used with the Netherlands open data from OpenOV / Plannerstack. The GTFS-RT spec mentions a differential mode, but does not specify how those messages are delivered. The authors may have originally intended something different such as HTTP fetch of differential data sets, but we decided to stream the messages over a persistent connection. zeroMQ was considered in NL, then websockets were used in OTP, @TeXitoi reports using AMQP, and Finland is using MQTT. The transport layer should actually be mostly or entirely separate from the handling of the streaming realtime messages themselves. This approach is known to be used in production in at least three large regions, and everyone I've talked to has reported positive results in terms of bandwidth, server load, and immediacy of realtime availability to consumers. We can safely standardize on this, I think it's got at least as much uptake and cumulative experience behind it than many other changes we've made to the spec. I have re-read the document you cited above (https://docs.google.com/document/d/19Dy6afltgs1ebbxKQGX4jpzWHh--Iw4AOO_rtX1bQoc/edit#heading=h.giueiiu2ge3y) which I wrote to explain our Dutch implementation from 2014, and I believe it's still accurate as a proposal. It also contains some useful comments on points that need clarification. @TeXitoi @hannesj please feel free to add comments to that document explaining how your implementations / interpretations may vary from the Dutch case. |
Thanks for clarifying @abyrd! Yes, if @TeXitoi, and @hannesj can review your existing proposal as well and if everyone can agree on a spec that covers the existing implementations it will give us something to move forwards on for |
In derhuerst/gtfs-rt-differential-to-full-dataset#1 (comment), i documented what |
@derhuerst Could you please document why the deleting is required? Wouldn't it be better to keep the Canceled in the new dataset? |
The recent comments on this ticket brought it back to my attention. It has been around for years, and re-reading the contents, I think we had already arrived at a pretty clear proposal about two years ago, and confirmed that the proposed semantics were in use in at least three large-scale systems. @barbeau, should we revive this and make a move to standardize? |
@abyrd I'm certainly in favor of trying to standardize if we have real-world producers and consumers. I think ideally the proposal would come from one of those producers and consumers in the form of a GitHub pull request so others can review and comment there. Would anyone currently producing or consuming be willing to do this? |
Ok, this part I don't understand. We have standardised DIFFERENTIAL since it has been part of the GTFS-RT since the beginning. What part do you want to have standardised? |
@skinkie The GTFS-realtime spec for
|
How can we make this description behavior instead of implementation (maybe @antrim can shine his light on this too). Do you want a put a statement here in the direction: "A differential datastream updates part of entities with the subset that has been send, it does not replace entire entities." |
@skinkie the flag has been in the spec since the beginning, so in that sense it's standardized (mentioned in the standard), but with undefined semantics. The way differential was/is actually used in the Netherlands is a proposal the consortia created together during the MMRI project, in an effort to fill this gap in the GTFS-RT spec. I then wrote up a document on how we used differential GTFS-RT, serve as a proposal for an official change. @barbeau cited this document in an earlier comment on this issue. It's something like seven years old but probably still perfectly valid. In 2018 I posted a comment above summarizing the proposal and linking again to the proposal document (#84 (comment)). It seems like the only thing remaining is to condense that document into an actual pull request updating the spec, but ideally coming from someone producing such a feed. @TeXitoi confirmed in a comment above that CanalTP is using this approach. Feeds in the Netherlands have followed this convention for years, and I believe @hannesj stated that this approach is used in Finland. It would be very helpful to have commentary from some of those producers to confirm how widespread this approach is, and how similar it is to the proposal (or if it's identical to the proposal). So anyone producing or consuming this kind of data, please get in touch with me so we can get this finalized. As far as I know this is an example of successful industry collaboration to establish a uniform, well thought-out way of doing things, but on the standardization front it has completely stalled. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Not stale |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been closed due to inactivity. Issues can always be reopened after they have been closed. |
This is still relevant. |
As mentioned in my comment above from September 2020, as far as I know several producers and consumers of voluminous GTFS-RT data have adopted interpretations of the It's strange to see issues being closed as "stale", when they describe essential details which are relied upon and time-tested in some of the larger-scale RT systems in the world, and people have actually sunk the time and effort into clearly documenting them. @skinkie @barbeau @TeXitoi and @hannesj please take another look at the above comment and let me know your thoughts on getting this standardized. |
@bdferris-v2 reading back through the document linked above I realized you might also be interested in this issue since you commented on it extensively at the time. |
Sometimes it just feels like if you are not part of mobility data, the change request don't matter. |
Meta comment. I don't mean this to sound harsh, but offering this as advice on how to make progress here.
So tl;dr - there are useful spec change proposals that will be marked as stale because there are not enough cycles to drive them all at once. It's doesn't mean it's not relevant and if the stale-bug-bot is implying otherwise, maybe we should turn it off. But it doesn't change the larger problem. |
This came up over on #403 (comment) today. It's true that this not part of the official spec, for some of the reasons @bdferris-v2 outlined. But the fact that it's not part of the spec is causing people to neglect differential semantics, and as the RT spec grows it may reach a point where it's no longer possible to apply this pre-existing convenient interpretation because it was ignored when extensions were added. Given that it's been around for 10 years and stood up quite well to large and continuous data streams, and was / is apparently used in several places, this really feels like it should be part of the spec to me. I would expect differential to become the norm for how GTFS-RT data is distributed some day, and am aware of big metropolitan systems that work this way, some of which as far as I know were directly inspired by the GTFS/OTP way of doing it. @eliasmbd @tzujenchanmbd @barbeau any chance we can get this ticket re-opened and at least attempt to identify where this is being used in practice? There's a real chance we can just track down some existing system that's been running for years. Aside: The fact that unresolved issues are closed automatically and hidden from view is odd and often frustrating to me, especially even when multiple people are commenting "not stale" and "this is relevant". But anyway I know this has become the norm. I can speculate about where the trend came from but it won't help anything. |
Hey all, FYI, I've left USF and no longer work on GTFS related projects. To get this into this spec it's just going to take a champion to do the research, put a proposal into a pull request, and respond to and address comments. I'll reopen the issue because I originally opened it, but I'll leave it to the currently active GTFS community to figure out how to move forward. 👋 |
Thanks, Sean. Sorry to bother you, I was just mentioning people who had participated previously because I wasn't sure who might have the permissions to reopen it. I'm following up now with people who might be producing and consuming differential realtime, and will post here when I've got some responses. |
Quoting #403 (comment)
We should rethink whether this exception for alerts still makes sense. I just noted that since the GTFS-RT protobuf definition was added in 2015, every FeedEntity has an ID and an is_deleted field. So it's certainly possible to delete any item including alerts differentially. You have to ensure all connected clients who may have seen the non-deleted version receive the is_deleted version, but many message queue systems offer "deliver exactly once" / "deliver at least once" guarantees. On the other hand, it may still be common for alerts to be handled in polling mode. It would be useful to survey the existing differential RT implementations to check on alert data volumes and best practices. 0ed83c1#diff-a1a14af3b4aed7d10a3b167231b52fdb483cf71b88f1655ed2e28bccd18800ceR83-R87 |
Overwriting with a deleted message that by itself is not retained. Would do the trick? |
In the specific case of MQTT something like this might work. However I am not clear on whether MQTT topics drop their retained message when a non-retained message is published, or you need to publish a retained message with an empty payload. But all this is a description of message broker software (implementation strategy). From the RT specification perspective (the protocol not the implementation), the rule would just be that any subscriber who may have seen the original alert should receive a FeedEntity with the same ID as the original alert but the is_deleted flag set. |
First Producer/Consumer Example: Helsinki, Finland to OpenTripPlannerThis is not the oldest example (which dates back to 2013) but a more recent example for which we have ample documentation. Producers: HSL (Helsinki) and Digitransit (Finland)Helsingin seudun liikenne (HSL) is the Helsinki Regional Transport Authority. Digitransit is the open-source trip planner they began building in 2016 together with Fintraffic (covering the rest of Finland). Here is a Github repository for the HSL realtime component: https://github.com/HSLdevcom/transitdata More information on the HSL realtime data flow is provided here: https://github.com/HSLdevcom/digitransit/blob/master/Dataflows.md#realtime-data-flow The Digitransit brokers cover everything "not HSL", with a breakdown into feeds documented at https://digitransit.fi/en/developers/apis/4-realtime-api/vehicle-positions/digitransit-mqtt/#available-cities-and-regions The division into multiple brokers is apparently for historical reasons. HSL has had its own brokers for over 7 years, and the newer Digitransit MQTT broker is an extension funded by the other participants in the project. Connecting to SourcesThe topics on the HSL brokers do not have a leading slash and will close the connection if you add one. The HSL brokers seem to accept only unencrypted mqtt connections on port 1883. The topics on the Digitransit brokers do have a leading slash. Digitransit brokers accept the mqtts (secure) protocol on port 8883, or unencrypted mqtt on port 1883. To see a feed of TripUpdates for the Helsinki region on the HSL broker: To see a feed of ServiceAlerts for the Helsinki region on the HSL broker: To see a feed of VehiclePosition for one of the non-HSL feeds (here, Tampere) on the Digitransit broker: Upstream Use of GTFS-RTIf you trace the data flow diagram back from host Consumer: OpenTripPlannerDigitransit is running a national-scale trip planner which autoscales computation nodes running OTP. Typically there are around 50 instances subscribing to the MQTT broker at once, with three external applications also subscribing. In September 2018 via opentripplanner/OpenTripPlanner#2516 (comment) OpenTripPlanner gained the ability to consume incremental GTFS-RT TripUpdate messages over MQTT as opposed to websockets. The configuration I used for testing is here: https://github.com/HSLdevcom/OpenTripPlanner-data-container/blob/v3/hsl/router-config.json#L180 but I removed the parking and bicycle updaters. End-to-End Data RateI instrumented an OpenTripPlanner instance to count bytes, messages, and updates received and let it run for about 24 minutes. The average data rate was 34kB/sec with 33 messages per second, each containing one TripUpdate. These match figures supplied by HSL: an average of 25 messages per second over the whole day, reaching 50 messages per second or higher at rush hour. Between 7AM and 7PM the range is typically 30-50 messages per second. Note that at these rates, the whole unfiltered stream could in theory be carried on EDGE and definitely on 3G. Sources:Interview on OpenTripPlanner developer chat with HSL employee @optionsome on 2023-12-13. Additional TextTo incorporate into proposal and/or specification: |
Second Producer/Consumer: Kirin and Kraken in Ile-de-France (Paris Region)Navitia is a public transit passenger information system created by CanalTP, then Kisio Digital, then Hove. Many of its components are open source, and the whole system has been provided as a hosted service by these companies. This service backs various user-facing trip planner applications including ViaNavigo from Paris regional transit authority Ile-de-France Mobilités. The Navitia system is quite complex. It’s not just the route optimization component, but includes the entire data integration pipeline for schedules and realtime data that feed into the router. Communication of transit data within this pipeline is via “enriched” GTFS and GTFS-RT. As of this year, development of Navitia is no longer open. Apparently the copyright holders decided to stop licensing the codebase under the Affero GPL, but the previous versions of the source code are still accessible. There is an architecture overview on the main Navitia readme and the project wiki: Both of these show two modules called Kirin and Chaos handling realtime and service disruptions data, which pass through a message broker and on to the Kraken public transport routing component. Here are some repositories that demonstrate the use of GTFS and GTFS-RT by the Navitia pipeline including Chaos and Kirin: We have a statement in a comment above at #84 (comment) from a CanalTP employee in 2017 stating that Navitia realtime handling exactly corresponds to the GTFS-RT approach I have described, that this approach was adopted after discussing it with me personally, and that this approach is effective in leveling load. In all discussions I’ve had with Navitia developers over the years, my impression is that this system continues to function in the same way, with individual messages being applied at the trip level in a stream against the scheduled trips. The case of Navitia is a bit ambiguous because the message-oriented, differential use of GTFS-RT is arguably within a single system, but it’s between two separate open source components of that system. It is used at a scale that we would often associate with an agency realtime integration pipeline producing data separately from consumer passenger information apps, but my sense is that both roles have been contracted out to a single entity here. While the main regional realtime system is SIRI-based, my understanding is that it is similarly incremental message-based, and is interconnected with GTFS-RT based systems upstream and at the final integration and routing stages. Several years back I vouched for the differential message passing approach during the design of this regional system, and I heard from a participant in the design process that this approach was adopted. So this regional SIRI-based system may arguably be inspired by how GTFS-RT was used in the Netherlands and other countries, as well as in Navitia itself. |
While I don't have a diagram, I can speak to how MBTA is using |
Example 3: OpenOV to OpenTripPlanner (Netherlands)This last example is the oldest one, dating back ten years to 2013. This is the original streaming message-oriented GTFS-RT system, and perhaps the first ever large scale open realtime data feed. ProducerThe Nationale Database Openbaar Vervoer (NDOV, "National Public Transport Database") is a government-run service in the Netherlands combining timetable, realtime, and fare data for use in trip planning services. Up until 2013 there was only one authorized subscriber which was a commercial trip planning API, and access to the raw data was very expensive. Stichting OpenGeo’s OpenOV project changed this situation by creating a second national-scale travel information service, thereby gaining access to the NDOV raw data in the local BISON Koppelvlak (KV) formats and republishing it in open standard formats. Over time the OpenOV GTFS feeds were recognized as the highest quality feeds for the Netherlands. They have since become the feeds consumed by Google for example. At that time over 3000 vehicles were already monitored at any given moment, producing positions and arrival time predictions with updates on average every 10 seconds. It took around 2 seconds for a position to propagate from the vehicle down to stop arrival time predictions. The only way to handle this volume of information while preserving the low latency and high update frequency was to stream individual messages. ConsumerEmergence of this feed is what pushed OpenTripPlanner to originally add support for realtime trip updates. This means that OpenTripPlanner realtime trip update support has always been differential message-oriented. Although OTP had support for realtime alerts as early as 2011, the first mentions of realtime trip updates in the OTP commit log date to the summer of 2012, with commit opentripplanner/OpenTripPlanner@1fdb5a6 (2012-06-21) applying Dutch KV8 data (analogous to GTFS-RT) from a ZeroMQ broker. This was quickly followed by draft GTFS-RT handlers and revisions to allow cleanly updating stoptimes concurrently with active search threads. By a year later in the summer of 2013 we see commit opentripplanner/OpenTripPlanner@d444bca entitled “basic GTFS-RT over websockets”. What happened in between is the Multimodale Reisinformatie (MMRI, Multimodal Travel Information) project of the Dutch Transportation Ministry's "Beter Benutten" (Better Utilization) program. Three of the four consortia that participated in this pre-commercial research and development program were working with OpenTripPlanner, and one of those (https://github.com/bliksemlabs) created R4 which later inspired R5, and in turn inspired the OpenTripPlanner 2 transit routing engine. Once the original OTP streaming realtime support had been built and demonstrated, the approach was described in this document: https://docs.google.com/document/d/19Dy6afltgs1ebbxKQGX4jpzWHh--Iw4AOO_rtX1bQoc Current StatusAs far as I know the MMRI message broker has been running since 2013 and has been used by various applications in the Netherlands. It is currently undergoing an overhaul to MQTT. More information should be available in the coming months. |
I am still planning to turn this into a proposal for the GTFS-RT spec. The three case studies above would be shared with the community to illustrate adoption of this idea. But I feel like this whole idea of "differential GTFS Realtime" cannot be fully communicated via a data format specification alone. It should be illustrated via practical examples like those above. The GTFS documentation currently consists of a specification and best practices. Is there a good place to publish "case studies" or "state of practice" sections to support/illustrate an idea in the spec? Not promoting them as "best" practices but just example practices? Pinging @eliasmbd @tzujenchanmbd @isabelle-dr since you seem to be actively involved in managing/structuring the documentation updates. |
Thanks for bringing it up. We are actively restructuring the documentation, more information will follow. This insight will help us propose a better solution in the way we present documentation. We believe that visual representation makes the information easier to digest. |
To clarify, the visual representations are not the main thing I'd like to include in the documentation. I'm not even sure if they would be included as they are not my own work. It's the text descriptions that I find important to have as case studies, and are within my power to contribute. |
Hey Andrew, I think the case studies would live in the Resources section of gtfs.org, and they could be linked from other parts of the site (or even the spec itself). |
@skinkie has the Netherlands message broker been migrated to MQTT? Is there a place we can follow the status of this work? |
@isabelle-dr I would ideally like to get all relevant information into the specification itself, rather than related but external information sources. At least some core portion of this information should definitely be in the spec, to supersede the text that has been causing confusion for a long time now: "DIFFERENTIAL: currently, this mode is unsupported and behavior is unspecified for feeds that use this mode." The supporting information on how exactly differential mode should be used in practice may also fall within the scope of the specification repo. For aspects of GTFS like this one that are complex, and where practices have already evolved for over a decade, I think it makes sense to actually describe those practices in detail in the "best practices" section of the spec, perhaps in a single-topic separate markdown page. This also gives people a way to identify and contact the community members who have carried out past implementation, which may be the only way to fully grasp how this feature is used. |
It is a new implementation from a new system that within that same architecture is also publishes VDV453/VDV454 and SIRI for the primairy reason to decouple the prediction from the publication. |
GTFS-rt spec current says the following about differential messages:
Based on a discussion starting in opentripplanner/OpenTripPlanner#2516 (comment), differential feeds are being used in practice.
I've opened this issue to start to document behavior for deployed feeds using differential messages, with the goal of working towards a proposal/pull request to better define differential producer/consumer behavior in the GTFS-realtime spec.
In opentripplanner/OpenTripPlanner#2516 (comment), @abyrd says:
If anyone else is producing/consuming differential messages, please add comments here for any design documentation for expected consumer/producer behavior.
The text was updated successfully, but these errors were encountered: