Skip to content

transit_service_analyst documentation

stefancoe edited this page Nov 13, 2023 · 58 revisions

Overview

transit_service_analyst is a python library that provides access to GTFS files as Pandas DataFrames & GeoPandas GeoDataFrames for a specific date, as well as several functions that should help bootstrap a wide array of service related analysis, including geospatial analysis.

Representing Route Level Service

The GTFS specification does not have an explicit way to identify unique service by route. The transit_service_analyst package handles this by finding all unique stop sequences by route and labels each using the first trip_id encountered as a representative trip_id called rep_trip_id. This idea is borrowed from INRO Emme travel modeling software, which PSRC's travel model Soundcast uses for traffic and transit assignment and to create level of service skims. Therefore, transit_service_analyst often represents service at a (disaggregate) sub-route level that is based on rep_trip_id. For example, if a route has an inbound and an outbound schedule pattern as its only difference, then it will have two representative trip ids. Some routes will have more than two, for example routes that have both local and express service or include skip stop patterns. If a given route contains the same service pattern (stop sequence) for each trip, then that route will have just one representative trip (rep_trip_id). Since both route_id, direction_id & route_type are always maintained in functions that return Dataframes with rep_trip_id as the unique identifier, subsequent data aggregations are always possible. The tool is designed to represent route level service at its most disaggregate level so that all route service patterns are represented.

Please checkout this link for some example notebooks.

Installation

Enter the following in a command prompt:
pip install transit-service-analyst
This will install transit_service_analyst in your current python environment. You can visit the PyPI page here:
https://pypi.org/project/transit-service-analyst/

Example:

import transit_service_analyst as tsa
service_tool = tsa.load_gtfs('c:/gtfs_folder', 20210914)
service_tool.get_total_trips_by_line().head(2)

rep_trip_id route_id direction_id total_trips
1 001e912 0 9
15 009f09a 1 121

Usage:

Import:
import transit_service_analyst as tsa

Tool Access:
The entry point to the transit_service_analyst library is through the load_gtfs command:
tsa.load_gtfs(<gtfs_dir>, <service_date>)

  • <gtfs_dir> String. The location of the GTFS files.

  • <service_date> Integer. The date in YYYYMMDD format that represents the service date of interest. The idea here is to pick a date that is typical of the service you wish to analyze. For example, we use a non holiday Tuesday in May to represent weekday spring service.

Tool Properties & Methods

  • calendar - GTFS calendar.txt as a Pandas DataFrame.
  • routes - GTFS routes.txt as a Pandas DataFrame. Only records specific to service date are included.
  • shapes - GTFS shapes.txt as a Pandas DataFrame. Only records specific to service date are included.
  • stop_times - GTFS stop_times.txt as a Pandas DataFrame. Only records specific to service date are included.
  • stops - GTFS stops.txt as a Pandas DataFrame. Only records specific to service date are included.
  • trips - GTFS trips.txt as a Pandas DataFrame. Only records specific to service date are included.
  • service_ids - A list containing each service_id specific to service date, start_time & end_time.
  • schedule_pattern_df - A Pandas DataFrame containing a record for each unique rep_trip_id. Other columns are route_id, orig_trip_id, & shape_id.
  • get_lines_gdf() - Returns a Geopandas GeoDataFrame containing a record for each unique rep_trip_id. The geometry is from the shape_id used for this trip.
  • get_line_stops_gdf() - Returns a GeoPandas GeoDataFrame containing a record for each stop for each rep_trip_id. The geometry is from stop_lat and stop_lon columns on the stops file.
  • get_tph_by_line() - Returns a Pandas DataFrame containing the number of trips by hour for each unique rep_trip_id.
  • get_tph_at_stops() - Returns a Pandas DataFrame containing the number of trips by hour for each unique stop_id.
  • get_service_hours_by_line() - Returns a Pandas DataFrame containing the number of service hours by rep_trip_id.
  • get_total_trips_by_line() - Returns a Pandas DataFrame containing the total number of trips by rep_trip_id.
Clone this wiki locally