Skip to content

Commit

Permalink
First revision (#2)
Browse files Browse the repository at this point in the history
* wip

* WIP: first starting to export new transfers

* Can now split a trip and export a transfer

* Splitting services; untested logic to duplicate transfers along with trip

* Try a different less headachy representation of transfers?

* Initial commit

* Clarify comments

* More readability improvements

* Think post-midnight works properly

* Insert non-transfer trips for clarity

* Fix bugs around transfer deletion and day shifting

* Fix some mistakes in the config description

* Concept for fixing trip-to-trip transfers: trip clusters and specializations

* Works correctly for non-split case

* WIP (wrong)

* WIP: dag expansion

* Split trips based on cont graph

* WIP: hacky idea for cyclic blocks

* WIP: Handle +24H, handle disambiguation

* WIP: simpler way to signal that trips need fixing

* Fancy bitset for perf

* Simplify connection between block converter and graph simplifier

* Export and most of validation of agency-defined transfers

* This contains all components we need but not in the right order

* Reorganize: extract transfer_type logic from continuation logic

* Add a tool to export nodes along a path

* Cycle detection

* Cycle autobreaker

* Simplify PathEntry

* Simplify linear_exporter

* Simplify graph representation somewhat

* Better cycle detection; start to consider join/split

* Needs a refactor but that should do a decent job wrt vehicle split/join

* Fix some issues with export/linearize

* A bit redundant but this avoids having the rest of the code consider type 1 transfers

* Fix some risky implicit trust of order

* Justify some of why this even works

* Add some nicer docs

* Fix some glitches

* Add pytest runner

* Clean up some cruft

* Provide more material to use in tests

* Always test both linear/non-linear for every test case by default

* Make notes for the test cases we still need to add

* Add GH actions

* Add argument to delete output folder first

* Apply suggestions [thx @JMilot1 & @jsteelz]

* Reformat with yapf

* Remove tests not yet ready

* Remove most monkeypatched fields

* Do not stuff resolved schema from CSV files into classes
  • Loading branch information
npaun authored Mar 14, 2022
1 parent 0feb9d1 commit 5226e00
Show file tree
Hide file tree
Showing 34 changed files with 1,939 additions and 350 deletions.
38 changes: 38 additions & 0 deletions .github/workflows/pull-request.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: Build on pull request

on:
push:
branches: [ master ]
pull_request:
branches: [ master ]

jobs:
build-and-test:
runs-on: [self-hosted, linux, ci-transitapp]
strategy:
matrix:
python-version: [pypy-3.7]

steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install flake8 pytest
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
python -m pytest .
34 changes: 34 additions & 0 deletions .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# This workflow will upload a Python Package using Twine when a release is created
# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries

name: Upload Python package

on:
push:
branches: [ master ]

jobs:
deploy:

runs-on: [self-hosted, linux, ci-transitapp]

steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.7'
- name: Install dependencies
run: |
python3 -m pip install --upgrade pip
pip install setuptools wheel twine
- name: Build
run: |
python3 setup.py sdist bdist_wheel
- name: Upload
run: |
twine upload dist/*
env:
TWINE_USERNAME: transit
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD_BUILD_TRANSITAPP_COM }}
TWINE_REPOSITORY_URL: https://pypi.transitapp.com:443
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@
*.egg-info
.idea/
.*.sw*
.vscode/
tests/.work/
2 changes: 2 additions & 0 deletions .style.yapf
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[style]
based_on_style = google
39 changes: 39 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# gtfs-blocks-to-transfers

Converts GTFS blocks, defined by setting [trip.block\_id](https://github.com/google/transit/blob/master/gtfs/spec/en/reference.md#example-blocks-and-service-day) into a series of [trip-to-trip transfers (proposal)](https://github.com/google/transit/pull/303). Uses configurable heuristics to predict whether two trips are connected as _in-seat transfers_ or as _vehicle continuations_ only. This tool also validates predefined trip-to-trip transfers in `transfers.txt`.

Usage: `./convert.py <input feed> <directory for output>`


## How it works

Throughout this tool, sets of _service days_ are used to relate trips. They are defined in [service\_days.py](#), and are represented as a bitmap per `service_id`, with bit `n` set to 1 if that service operates on the `n`th day since the beginning of the feed. The term _trip's service days_ refers to the service days for `trip.service_id`. If the first departure of a trip is after `24:00:00`, the service days are stored _as-if_ the trip began the next day between `00:00:00` and `23:59:59`.

For each block defined in the feed, [`convert_blocks.py`](#) finds the most likely continuations for each trip, starting the search after the final arrival time of the trip. The program searches for a matching continuation for all of the trip's service days, greedily selecting continuation trips in order of wait time. Some days may remain unmatched if a configurable threshold is exceeded (`config.TripToTripTransfers.max_wait_time`). [`classify_transfers.py`](#) uses heuristics to assign `transfer_type=4` (in-seat transfer) or `transfer_type=5` to each continuation.

Generated transfers are combined with predefined transfers from `transfers.txt` in [`simplify_graph.py`](#). If necessary, this step will split trips such that for any given `from_trip_id`, each of the potential `to_trip_id`, will operate on a disjoint set of service days. For example bus 50 could continue to bus 15 on Monday through Thursday, but continue to bus 20 on Fridays. Both generated and predefined transfers are validated to ensure they are unambiguous and conform to the specification.

[`simplify_export.py`](#) converts the continuation graph back to a series of transfers, resuing the feed's existing `trip_id`s and `service_id`s when an exact match can be found, or creating new entities if required. This step will preserve trip-to-trip transfers that don't represent vehicle continuations (e.g. [`transfer_type=2`](https://github.com/google/transit/blob/master/gtfs/spec/en/reference.md#transferstxt) used to estimate walk time between two vehicles).

## Heuristics

An in-seat transfer is likely if:

* Riders only need to wait a short time between trips.
* The next trip begins at the same stop as the preceding trip ended, or the two stops are very close to each other.
* The next trip goes a different destination than the preceding trip, or the two trips serve a loop route.


Riders probably won't be able to, or want to, to stay on board if:

* The wait time aboard the bus is quite long.
* The next trip is very similar to the preceding trip, but in reverse. We assess similarity by comparing the sequence of stop locations of the two trips using a modified [Hausdorff metric](https://en.wikipedia.org/wiki/Hausdorff_distance).

You can adjust thresholds or entirely disable a heuristic in [`blocks_to_transfers/config.py`](#).


## Advanced

* `simplify_linear.py`: You probably don't want to enable this option, unless your system happens to have the same constraints described in this section. If enabled, trips will be split so that each trip has at most one incoming continuation, and at most one outgoing continuation. Where cycles exist (e.g. an automated people mover that serves trip 1 -> trip 2 -> trip 1 every day until the end of the feed), back edges are removed. Trips that decouple into multiple vehicles, or that are formed through the coupling of multiple vehicles are preserved as is.
* Test cases can be found in the `tests/` directory.
* This program will run much faster using [PyPy](https://www.pypy.org), a jitted interpreter for Python.
75 changes: 47 additions & 28 deletions blocks_to_transfers/__main__.py
Original file line number Diff line number Diff line change
@@ -1,41 +1,60 @@
import argparse
import ctypes
import math
import timeit
import os
import shutil
from . import convert_blocks, editor, service_days, classify_transfers, simplify_graph, simplify_linear, simplify_export

from blocks_to_transfers.shape_similarity import LatLon, hausdorff
from . import editor, augment

def process(in_dir,
out_dir,
use_simplify_linear=False,
remove_existing_files=False):
gtfs = editor.load(in_dir)

def main():
gtfs = editor.load('/Users/np/GTFSs/BCTWK_734/211_cleaned')
gtfs = augment.augment(gtfs)

shape_lats = {shape_id: [LatLon(pt.shape_pt_lat, pt.shape_pt_lon) for pt in pts] for shape_id, pts in gtfs.shapes.items()}
services = service_days.ServiceDays(gtfs)
converted_transfers = convert_blocks.convert(gtfs, services)
classify_transfers.classify(gtfs, converted_transfers)

graph = simplify_graph.simplify(gtfs, services, converted_transfers)

"""
for a_id, a_pt in cleaned_shapes.items():
for b_id, b_pt in cleaned_shapes.items():
print(a_id, b_id, hausdorff(a_pt, b_pt))
"""
if use_simplify_linear:
output_graph = simplify_linear.simplify(graph)
else:
output_graph = graph
simplify_export.export_visit(output_graph)

haus_cache = {}
for block, trips in gtfs.trips_by_block.items():
for i_trip, trip in enumerate(trips):
for trip2 in trips[i_trip+1:]:
if not trip.data.shape_id or not trip2.data.shape_id:
continue
if remove_existing_files:
shutil.rmtree(out_dir, ignore_errors=True)

key = (trip.data.shape_id, trip2.data.shape_id)
rkey = (trip2.data.shape_id, trip.data.shape_id)
if key not in haus_cache and rkey not in haus_cache:
haus_cache[key] = hausdorff(shape_lats[trip.data.shape_id], shape_lats[trip2.data.shape_id])
print('eval', key, haus_cache[key])
editor.patch(gtfs, gtfs_in_dir=in_dir, gtfs_out_dir=out_dir)
print('Done.')


#editor.patch(gtfs, gtfs_in_dir='/Users/np/GTFSs/BCTWK_734/211_cleaned', gtfs_out_dir='mimi')
x = 5
def main():
cmd = argparse.ArgumentParser(
description=
'Predicts trip-to-trip transfers from block_ids in GTFS feeds')
cmd.add_argument('feed', help='Path to a directory containing a GTFS feed')
cmd.add_argument('out_dir', help='Directory to contain the modified feed')
cmd.add_argument('-L',
'--linear',
action='store_true',
help='Apply linear simplification')
cmd.add_argument(
'--remove-existing-files',
action='store_true',
help='Remove all files in the output directory before expoting')
args = cmd.parse_args()

if os.environ.get('VSCODE_DEBUG'):
import debugpy
print('Waiting for VSCode to attach')
debugpy.listen(5678)
debugpy.wait_for_client()

process(args.feed,
args.out_dir,
use_simplify_linear=args.linear,
remove_existing_files=args.remove_existing_files)


if __name__ == '__main__':
Expand Down
84 changes: 0 additions & 84 deletions blocks_to_transfers/augment.py

This file was deleted.

93 changes: 93 additions & 0 deletions blocks_to_transfers/classify_transfers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
"""
For each continuation identified by converting blocks, use heuristics to
predict whether a transfer is most likely to be of type:
4: In-seat transfer
5: Vehicle continuation only (for operational reasons)
"""
from .editor.schema import DAY_SEC, TransferType
from . import config, shape_similarity


class ShapeMatchState:

def __init__(self):
self.shape_ptr_by_trip = {}
self.shape_ptr_by_shape = {}
self.similarity_by_shape_ptr = {}


def classify(gtfs, transfers):
print('Predicting transfer_type for each identified continuation')
shape_match = ShapeMatchState()
for transfer in transfers:
transfer.transfer_type = get_transfer_type(gtfs, shape_match, transfer)

print(
f'\tComparison by similarity metric required for {len(shape_match.shape_ptr_by_trip)} trips having {len(shape_match.shape_ptr_by_shape)} distinct stop_times shapes'
)


def get_transfer_type(gtfs, shape_match, transfer):
trip = gtfs.trips[transfer.from_trip_id]
cont_trip = gtfs.trips[transfer.to_trip_id]

wait_time = cont_trip.first_departure - trip.last_arrival
if cont_trip.first_departure < trip.last_arrival:
wait_time += DAY_SEC

# transfer would require riders to wait for an excessively long time
if wait_time > config.InSeatTransfers.max_wait_time:
return TransferType.VEHICLE_CONTINUATION

# cont_trip resumes too far away from where trip ended (probably involves deadheading)
if trip.last_point.distance_to(
cont_trip.first_point
) > config.InSeatTransfers.same_location_distance:
return TransferType.VEHICLE_CONTINUATION

# trip and cont_trip form a full loop, so riders may want to stay
# onboard despite similarity in shape.
if (trip.first_point.distance_to(cont_trip.first_point) <
config.InSeatTransfers.same_location_distance and
trip.last_point.distance_to(cont_trip.last_point) <
config.InSeatTransfers.same_location_distance):
return TransferType.IN_SEAT

if config.InSeatTransfers.ignore_return_via_same_route:
if trip.route_id == cont_trip.route_id and trip.direction_id != cont_trip.direction_id:
return TransferType.VEHICLE_CONTINUATION

if config.InSeatTransfers.ignore_return_via_similar_trip:
if shape_similarity.trip_shapes_similar(
shape_match.similarity_by_shape_ptr,
get_shape_ptr(shape_match, trip),
get_shape_ptr(shape_match, cont_trip)):
return TransferType.VEHICLE_CONTINUATION

# We presume that the rider will be able to stay onboard the vehicle
return TransferType.IN_SEAT


def get_shape_ptr(shape_match, trip):
"""
For a given trip, we first check if we've already found a representative
for its shape. If so, we return that pointer.
For trips not previously encountered, we hash its shape to determine if a
representative is already set for that shape. If so, we return that
pointer.
Otherwise, we use the trip's stop_shape object as the representative and
later trips sharing the same shape will point to it.
"""

shape_ptr = shape_match.shape_ptr_by_trip.get(trip.trip_id)

if shape_ptr:
return shape_ptr

shape_ptr = shape_match.shape_ptr_by_shape.setdefault(
trip.stop_shape, trip.stop_shape)
shape_match.shape_ptr_by_trip[trip.trip_id] = shape_ptr
return shape_ptr
Loading

0 comments on commit 5226e00

Please sign in to comment.