Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First revision #2

Merged
merged 52 commits into from
Mar 14, 2022
Merged
Show file tree
Hide file tree
Changes from 48 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
35b73a9
wip
npaun Dec 21, 2021
5b08b98
WIP: first starting to export new transfers
npaun Dec 23, 2021
11244b3
Can now split a trip and export a transfer
npaun Dec 23, 2021
f1fc60e
Splitting services; untested logic to duplicate transfers along with …
npaun Dec 24, 2021
46f7b2f
Try a different less headachy representation of transfers?
npaun Dec 24, 2021
f3809bb
Initial commit
npaun Dec 24, 2021
816f6fe
Merge branch 'master' of ssh://github.com/TransitApp/GTFS-blocks-to-t…
npaun Dec 24, 2021
e716e10
Clarify comments
npaun Dec 28, 2021
d7ebc6b
More readability improvements
npaun Dec 28, 2021
c78f93e
Think post-midnight works properly
npaun Dec 28, 2021
dac3f9d
Insert non-transfer trips for clarity
npaun Dec 28, 2021
5f35e70
Fix bugs around transfer deletion and day shifting
npaun Dec 28, 2021
e3b867e
Fix some mistakes in the config description
npaun Dec 30, 2021
373fc38
Concept for fixing trip-to-trip transfers: trip clusters and speciali…
npaun Dec 31, 2021
c029ee2
Works correctly for non-split case
npaun Jan 4, 2022
0e64a1d
WIP (wrong)
npaun Jan 7, 2022
55f39b9
WIP: dag expansion
npaun Jan 11, 2022
17adae5
Split trips based on cont graph
npaun Jan 12, 2022
128218f
WIP: hacky idea for cyclic blocks
npaun Jan 13, 2022
5fc0ac2
WIP: Handle +24H, handle disambiguation
npaun Feb 13, 2022
e4fea18
WIP: simpler way to signal that trips need fixing
npaun Feb 13, 2022
9691776
Fancy bitset for perf
npaun Feb 14, 2022
c70f8ba
Simplify connection between block converter and graph simplifier
npaun Feb 14, 2022
77cc7a9
Export and most of validation of agency-defined transfers
npaun Feb 14, 2022
6fb656d
This contains all components we need but not in the right order
npaun Feb 16, 2022
75c4ef1
Reorganize: extract transfer_type logic from continuation logic
npaun Feb 16, 2022
761eac4
Add a tool to export nodes along a path
npaun Feb 22, 2022
2a92c4e
Cycle detection
npaun Feb 24, 2022
f662ead
Cycle autobreaker
npaun Feb 24, 2022
52f424d
Simplify PathEntry
npaun Feb 25, 2022
fb115b0
Simplify linear_exporter
npaun Feb 25, 2022
86ae4fd
Simplify graph representation somewhat
npaun Feb 25, 2022
51db0f6
Better cycle detection; start to consider join/split
npaun Feb 28, 2022
2307ed9
Needs a refactor but that should do a decent job wrt vehicle split/join
npaun Feb 28, 2022
4f29347
Fix some issues with export/linearize
npaun Feb 28, 2022
fe9b8a8
A bit redundant but this avoids having the rest of the code consider …
npaun Feb 28, 2022
03e3053
Fix some risky implicit trust of order
npaun Feb 28, 2022
847d6a9
Justify some of why this even works
npaun Mar 1, 2022
b744e8e
Add some nicer docs
npaun Mar 1, 2022
2f5422f
Fix some glitches
npaun Mar 1, 2022
8cce939
Add pytest runner
npaun Mar 2, 2022
cbc7058
Clean up some cruft
npaun Mar 2, 2022
a7f884c
Provide more material to use in tests
npaun Mar 2, 2022
b3e1c83
Always test both linear/non-linear for every test case by default
npaun Mar 3, 2022
ece6b9c
Make notes for the test cases we still need to add
npaun Mar 3, 2022
70a8711
Add GH actions
npaun Mar 8, 2022
d5c3ab6
Add argument to delete output folder first
npaun Mar 8, 2022
a89f640
Apply suggestions [thx @JMilot1 & @jsteelz]
npaun Mar 10, 2022
34395e1
Reformat with yapf
npaun Mar 10, 2022
93d636d
Remove tests not yet ready
npaun Mar 10, 2022
7f2a96a
Remove most monkeypatched fields
npaun Mar 14, 2022
cd7a2c4
Do not stuff resolved schema from CSV files into classes
npaun Mar 14, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions .github/workflows/pull-request.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: Build on pull request

on:
push:
branches: [ master ]
pull_request:
branches: [ master ]

jobs:
build-and-test:
runs-on: [self-hosted, linux, ci-transitapp]
strategy:
matrix:
python-version: [pypy-3.7]

steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install flake8 pytest
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
python -m pytest .
34 changes: 34 additions & 0 deletions .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# This workflow will upload a Python Package using Twine when a release is created
# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries

name: Upload Python package

on:
push:
branches: [ master ]

jobs:
deploy:

runs-on: [self-hosted, linux, ci-transitapp]

steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.7'
- name: Install dependencies
run: |
python3 -m pip install --upgrade pip
pip install setuptools wheel twine
- name: Build
run: |
python3 setup.py sdist bdist_wheel
- name: Upload
run: |
twine upload dist/*
env:
TWINE_USERNAME: transit
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD_BUILD_TRANSITAPP_COM }}
TWINE_REPOSITORY_URL: https://pypi.transitapp.com:443
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@
*.egg-info
.idea/
.*.sw*
.vscode/
tests/.work/
39 changes: 39 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# gtfs-blocks-to-transfers

Converts GTFS blocks, defined by setting [trip.block\_id](https://github.com/google/transit/blob/master/gtfs/spec/en/reference.md#example-blocks-and-service-day) into a series of [trip-to-trip transfers (proposal)](https://github.com/google/transit/pull/303). Uses configurable heuristics to predict whether two trips are connected as _in-seat transfers_ or as _vehicle continuations_ only. This tool also validates predefined trip-to-trip transfers in `transfers.txt`.

Usage: `./convert.py <input feed> <directory for output>`


## How it works

Throughout this tool, sets of _service days_ are used to relate trips. They are defined in [service\_days.py](#), and are represented as a bitmap per `service_id`, with bit `n` set to 1 if that service operates on the `n`th day since the beginning of the feed. The term _trip's service days_ refers to the service days for `trip.service_id`. If the first departure of a trip is after `24:00:00`, the service days are stored _as-if_ the trip began the next day between `00:00:00` and `23:59:59`.
npaun marked this conversation as resolved.
Show resolved Hide resolved

For each block defined in the feed, [`convert_blocks.py`](#) finds the most likely continuations for each trip, starting the search after the final arrival time of the trip. The program searches for a matching continuation for all of the trip's service days, greedily selecting continuation trips in order of wait time. Some days may remain unmatched if a configurable threshold is exceeded (`config.TripToTripTransfers.max_wait_time`). [`classify_transfers.py`](#) uses heuristics to assign `transfer_type=4` (in-seat transfer) or `transfer_type=5` to each continuation.

Generated transfers are combined with predefined transfers from `transfers.txt` in [`simplify_graph.py`](#). If necessary, this step will split trips such that for any given `from_trip_id`, each of the potential `to_trip_id`, will operate on a disjoint set of service days. For example bus 50 could continue to bus 15 on Monday through Thursday, but continue to bus 20 on Fridays. Both generated and predefined transfers are validated to ensure they are unambiguous and conform to the specification.

[`simplify_export.py`](#) converts the continuation graph back to a series of transfers, resuing the feed's existing `trip_id`s and `service_id`s when an exact match can be found, or creating new entities if required. This step will preserve trip-to-trip transfers that don't represent vehicle continuations (e.g. [`transfer_type=2`](https://github.com/google/transit/blob/master/gtfs/spec/en/reference.md#transferstxt) used to estimate walk time between two vehicles).

## Heuristics

An in-seat transfer is likely if:

* Riders only need to wait a short time between trips.
* The next trip begins at the same stop as the preceding trip ended, or the two stops are very close to each other.
* The next trip goes a different destination than the preceding trip, or the two trips serve a loop route.


Riders probably won't be able to, or want to, to stay on board if:

* The wait time aboard the bus is quite long.
* The next trip is very similar to the preceding trip, but in reverse. We assess similarity by comparing the sequence of stop locations of the two trips using a modified [Hausdorrf metric](https://en.wikipedia.org/wiki/Hausdorff_distance).

You can adjust thresholds or entirely disable a heuristic in [`blocks_to_transfers/config.py`](#).


## Advanced

* `simplify_linear.py`: You probably don't want to enable this option, unless your system happens to have the same constraints described in this section. If enabled, trips will be split so that each trip has at most one incoming continuation, and at most one outgoing continuation. Where cycles exist (e.g. an automated people mover that serves trip 1 -> trip 2 -> trip 1 every day until the end of the feed), back edges are removed. Trips that decouple into multiple vehicles, or that are formed through the coupling of multiple vehicles are preserved as is.
* Test cases can be found in the `tests/` directory.
* This program will run much faster using [PyPy](https://www.pypy.org), a jitted interpreter for Python.
64 changes: 36 additions & 28 deletions blocks_to_transfers/__main__.py
Original file line number Diff line number Diff line change
@@ -1,41 +1,49 @@
import argparse
import ctypes
import math
import timeit
import os
import shutil
from . import convert_blocks, editor, service_days, classify_transfers, simplify_graph, simplify_linear, simplify_export

from blocks_to_transfers.shape_similarity import LatLon, hausdorff
from . import editor, augment

def process(in_dir, out_dir, use_simplify_linear=False, remove_existing_files=False):
gtfs = editor.load(in_dir)

def main():
gtfs = editor.load('/Users/np/GTFSs/BCTWK_734/211_cleaned')
gtfs = augment.augment(gtfs)

shape_lats = {shape_id: [LatLon(pt.shape_pt_lat, pt.shape_pt_lon) for pt in pts] for shape_id, pts in gtfs.shapes.items()}
services = service_days.ServiceDays(gtfs)
converted_transfers = convert_blocks.convert(gtfs, services)
classify_transfers.classify(gtfs, converted_transfers)

graph = simplify_graph.simplify(gtfs, services, converted_transfers)

"""
for a_id, a_pt in cleaned_shapes.items():
for b_id, b_pt in cleaned_shapes.items():
print(a_id, b_id, hausdorff(a_pt, b_pt))
"""
if use_simplify_linear:
output_graph = simplify_linear.simplify(graph)
else:
output_graph = graph
simplify_export.export_visit(output_graph)

haus_cache = {}
for block, trips in gtfs.trips_by_block.items():
for i_trip, trip in enumerate(trips):
for trip2 in trips[i_trip+1:]:
if not trip.data.shape_id or not trip2.data.shape_id:
continue
if remove_existing_files:
shutil.rmtree(out_dir, ignore_errors=True)

key = (trip.data.shape_id, trip2.data.shape_id)
rkey = (trip2.data.shape_id, trip.data.shape_id)
if key not in haus_cache and rkey not in haus_cache:
haus_cache[key] = hausdorff(shape_lats[trip.data.shape_id], shape_lats[trip2.data.shape_id])
print('eval', key, haus_cache[key])
editor.patch(gtfs, gtfs_in_dir=in_dir, gtfs_out_dir=out_dir)
print('Done.')


#editor.patch(gtfs, gtfs_in_dir='/Users/np/GTFSs/BCTWK_734/211_cleaned', gtfs_out_dir='mimi')
x = 5
def main():
cmd = argparse.ArgumentParser(description='Predicts trip-to-trip transfers from block_ids in GTFS feeds')
cmd.add_argument('feed', help='Path to a directory containing a GTFS feed')
cmd.add_argument('out_dir', help='Directory to contain the modified feed')
cmd.add_argument('-L','--linear', action='store_true', help='Apply linear simplification')
cmd.add_argument('--remove-existing-files', action='store_true', help='Remove all files in the output directory before expoting')
args = cmd.parse_args()

if os.environ.get('VSCODE_DEBUG'):
import debugpy
print('Waiting for VSCode to attach')
debugpy.listen(5678)
debugpy.wait_for_client()


process(args.feed, args.out_dir,
use_simplify_linear=args.linear,
remove_existing_files=args.remove_existing_files)


if __name__ == '__main__':
Expand Down
84 changes: 0 additions & 84 deletions blocks_to_transfers/augment.py

This file was deleted.

56 changes: 56 additions & 0 deletions blocks_to_transfers/classify_transfers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
"""
For each continuation identified by converting blocks, use heuristics to
predict whether a transfer is most likely to be of type:

4: In-seat transfer
5: Vehicle continuation only (for operational reasons)
"""
from .editor.schema import DAY_SEC, TransferType
from . import config, shape_similarity


def classify(gtfs, transfers):
print('Predicting transfer_type for each identified continuation')
unique_shapes = {} # Used to merge identical sequences of stop_times from different trips
shape_similarity_results = {} # Used to cache Hausdorff metric calculations
for transfer in transfers:
transfer.transfer_type = get_transfer_type(gtfs, unique_shapes, shape_similarity_results, transfer)

def get_transfer_type(gtfs, unique_shapes, shape_similarity_results, transfer):
trip = gtfs.trips[transfer.from_trip_id]
cont_trip = gtfs.trips[transfer.to_trip_id]
npaun marked this conversation as resolved.
Show resolved Hide resolved

wait_time = cont_trip.first_departure - trip.last_arrival
if cont_trip.first_departure < trip.last_arrival:
wait_time += DAY_SEC
npaun marked this conversation as resolved.
Show resolved Hide resolved

# transfer would require riders to wait for an excessively long time
if wait_time > config.InSeatTransfers.max_wait_time:
return TransferType.VEHICLE_CONTINUATION

# cont_trip resumes too far away from where trip ended (probably involves deadheading)
if trip.last_point.distance_to(cont_trip.first_point) > config.InSeatTransfers.same_location_distance:
return TransferType.VEHICLE_CONTINUATION

# trip and cont_trip form a full loop, therefore riders may want to stay
# onboard despite similarity in shape.
if (trip.first_point.distance_to(cont_trip.first_point) < config.InSeatTransfers.same_location_distance
and trip.last_point.distance_to(cont_trip.last_point) < config.InSeatTransfers.same_location_distance):
return TransferType.IN_SEAT

if config.InSeatTransfers.ignore_return_via_same_route:
if trip.route_id == cont_trip.route_id and trip.direction_id != cont_trip.direction_id:
return TransferType.VEHICLE_CONTINUATION

if config.InSeatTransfers.ignore_return_via_similar_trip:
if not hasattr(trip, 'shape_ref'):
trip.shape_ref = unique_shapes.setdefault(trip.stop_shape, trip.stop_shape)

if not hasattr(cont_trip, 'shape_ref'):
cont_trip.shape_ref = unique_shapes.setdefault(cont_trip.stop_shape, cont_trip.stop_shape)

if shape_similarity.trip_shapes_similar(shape_similarity_results, trip.shape_ref, cont_trip.shape_ref):
return TransferType.VEHICLE_CONTINUATION

# We presume that the rider will be able to stay onboard the vehicle
return TransferType.IN_SEAT
47 changes: 38 additions & 9 deletions blocks_to_transfers/config.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,38 @@
config = {
# Limits to be detected as transfer_type=5 (vehicle continues; passenger may not remain on board)
'vehicle_cont': {
'max_wait_time_in_seconds': 900,
'max_distance_between_end_stops_in_meters': 1000,
'max_similarity_between_shapes_in_percent': 80,
'max_distance_between_inner_stops': 100
}
}
# Controls whether two trips in a block will be interpreted as trip-to-trip transfers, or ignored
class TripToTripTransfers:
# Maximum layover between a trip and its continuation
max_wait_time = 1200 # seconds

# If a block is invalid because it cannot be operated using a single vehicle, because a later trip departs before
# the previous trip has completed, should the algorithm still attempt to find a plausible continuation trip?
force_allow_invalid_blocks = False

# If true, existing trip-to-trip transfers will be overwritten with predicted continuations from the algorithm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just for trips that are detected, right? or are we overwriting all transfers with trip_ids defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature doesn't really work properly, but the intention is that if you already have both block IDs and trip-to-trip transfers in your feed, it should avoid trashing your existing transfers.

overwrite_existing = False


# Controls whether an identified continuation is marked as an in-seat transfer, where riders are permitted to stay
# onboard.
class InSeatTransfers:
# Maximum wait time for riders aboard the vehicle. May be -1 if this agency never allows in-seat transfers.
max_wait_time = 600 # seconds

# Determines whether two stops are sufficiently close to be considered 'at the same location'.
# Used to:
# - Discard in-seat transfers where the last stop of previous trip and first stop of ensuing trip are further apart than this distance
# - Calculate whether or not a trip is a return trip of the previous trip
same_location_distance = 100 # meters

# If true, ignore all trips serving the same route in the opposite direction
ignore_return_via_same_route = False

# If true, ignore all trips which appear to return along a similar path (determined by sequence of stop locations),
# regardless of whether or not they are served by the same route
ignore_return_via_similar_trip = True

# Similarity of trips is predicted using a modified Hausdorff metric: are {similarity_percentile}% of the stops of
# one trip within {similarity_distance} m of the other trip?
#
# The provided constants work best in urban areas, but are far from perfect even there.
similarity_percentile = .8 # / 1.0
similarity_distance = 500 # meters
Loading