Skip to content

Technical Overview

Nikhil VJ edited this page Apr 28, 2018 · 2 revisions
  • The core program is a Python3 script in GTFSManager.py. It launches a simple web server via Tornado module, and waits for asynchronous GET and POST requests.

  • These requests are made by the javascript in front-end HTML files as they are loaded in the browser and user navigates the program (which is like a typical website). The javascript makes GET or POST calls to the API, and gets data as callback to show to user.

  • Most of the data from the GTFS feed files is directly shown in tabular form on the front-end through a javascript library named Tabulator.js. While the user is making changes, it is mostly happening on front-end side only; the back-end server is dormant. When the user presses a button to commit changes to DB, that's when the JS sends the modified data to backend via a POST request, accompanied by a password key for authentication. The backend intercepts this data, processes it, writes it back to DB and returns a success/fail callback message.

  • The data is (as of v1.4.2) stored in a GTFS/db.json file, operated through tinydb module. Query-based or whole-table read-write operations are performed on this database while the user interacts with the frontend. Currently this system gets very inefficient at larger data sizes (especially when stop_times.txt runs into millions of lines). Here is a discussion thread to brainstorm ideas on improving this.

  • There is an additional sequence.json also maintained, storing a default sequence of stops per route and chosen shapes which is not part of official GTFS spec, but is used here to help structure the way new trips are created under a route.

  • Export of data results in CSVs being generated and zipped up to form a gtfs.zip. Its link is made available to the user for downloading. Exact reverse process is there for importing of GTFS feed.

  • The tables are heavily inter-linked, hence the tool features a dedicated Maintenance section (found under Misc) for handling renaming and deleting of id's. Deleting an id entails futher deletions elsewhere in the database : either whole row delete where it was a primary key, or zapping of the field where it was secondary.

  • Owing to this interlinking, even in other sections of the program, lists of id's and in some cases full info from different tables need to be loaded up. Example: Schedules > Trips table: service_id references calendar table, shape_id references shapes. And the table itself is loaded by picking a route_id whose list is loaded from routes. When provisioning a new trip, these fields have to be populated from existing entries in the other tables; simply entering in new values will make the GTFS feed invalid. So, a lot of structuring and validation has to be done by the program when handling the data and handling the user's changes to it.

  • There is a separate XML Import section created to cater to the project's first client KMRL's specific data format which needs to be converted to GTFS. There are many diagnostic functions at work there to ensure that the data uploaded and information entered is valid.