Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First revision #2

Merged
merged 52 commits into from
Mar 14, 2022
Merged

First revision #2

merged 52 commits into from
Mar 14, 2022

Conversation

npaun
Copy link
Member

@npaun npaun commented Mar 1, 2022

1600 lines of Python later, here it is! I've copied and pasted the README below as it gives the clearest explanation of how this all works.

Still TODO:

  • Add more tests, and automate them with pytest.
  • Integrate into USD and bgtfs compressor.
  • Overwrite existing transfers doesn't work correctly yet.

gtfs-blocks-to-transfers

Converts GTFS blocks, defined by setting trip.block_id into a series of trip-to-trip transfers (proposal). Uses configurable heuristics to predict whether two trips are connected as in-seat transfers or as vehicle continuations only. This tool also validates predefined trip-to-trip transfers in transfers.txt.

Usage: ./convert.py <input feed> <directory for output>

How it works

Throughout this tool, sets of service days are used to relate trips. They are defined in service_days.py, and are represented as a bitmap per service_id, with bit n set to 1 if that service operates on the nth day since the beginning of the feed. The term trip's service days refers to the service days for trip.service_id. If the first departure of a trip is after 24:00:00, the service days are stored as-if the trip began the next day between 00:00:00 and 23:59:59.

For each block defined in the feed, convert_blocks.py finds the most likely continuations for each trip, starting the search after the final arrival time of the trip. The program searches for a matching continuation for all of the trip's service days, greedily selecting continuation trips in order of wait time. Some days may remain unmatched if a configurable threshold is exceeded (config.TripToTripTransfers.max_wait_time). classify_transfers.py uses heuristics to assign transfer_type=4 (in-seat transfer) or transfer_type=5 to each continuation.

Generated transfers are combined with predefined transfers from transfers.txt in simplify_graph.py. If necessary, this step will split trips such that for any given from_trip_id, each of the potential to_trip_id, will operate on a disjoint set of service days. For example bus 50 could continue to bus 15 on Monday through Thursday, but continue to bus 20 on Fridays. Both generated and predefined transfers are validated to ensure they are unambiguous and conform to the specification.

simplify_export.py converts the continuation graph back to a series of transfers, resuing the feed's existing trip_ids and service_ids when an exact match can be found, or creating new entities if required. This step will preserve trip-to-trip transfers that don't represent vehicle continuations (e.g. transfer_type=2 used to estimate walk time between two vehicles).

Heuristics

An in-seat transfer is likely if:

  • Riders only need to wait a short time between trips.
  • The next trip begins at the same stop as the preceding trip ended, or the two stops are very close to each other.
  • The next trip goes a different destination than the preceding trip, or the two trips serve a loop route.

Riders probably won't be able to, or want to, to stay on board if:

  • The wait time aboard the bus is quite long.
  • The next trip is very similar to the preceding trip, but in reverse. We assess similarity by comparing the sequence of stop locations of the two trips using a modified Hausdorrf metric.

You can adjust thresholds or entirely disable a heuristic in blocks_to_transfers/config.py.

Advanced

  • simplify_linear.py: You probably don't want to enable this option, unless your system happens to have the same constraints described in this section. If enabled, trips will be split so that each trip has at most one incoming continuation, and at most one outgoing continuation. Where cycles exist (e.g. an automated people mover that serves trip 1 -> trip 2 -> trip 1 every day until the end of the feed), back edges are removed. Trips that decouple into multiple vehicles, or that are formed through the coupling of multiple vehicles are preserved as is.
  • Test cases can be found in the tests/ directory.
  • This program will run much faster using PyPy, a jitted interpreter for Python.

force_allow_invalid_blocks = False

# If true, existing trip-to-trip transfers will be overwritten with predicted continuations from the algorithm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just for trips that are detected, right? or are we overwriting all transfers with trip_ids defined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature doesn't really work properly, but the intention is that if you already have both block IDs and trip-to-trip transfers in your feed, it should avoid trashing your existing transfers.

@jsteelz
Copy link
Member

jsteelz commented Mar 1, 2022

Would it be worth having the gtfs schema/loader and shape calculations as separate repos or libraries, since:

  • we have or probably will reuse them elsewhere
  • for shape calculation, maybe agencies have a better idea of what their similarity heurisics should look like?

@npaun
Copy link
Member Author

npaun commented Mar 1, 2022

  • schema and loader are already in a separate sub-package called 'editor/`, so we'll be able to move it into a new package as soon as the need arises.
  • shape calculation: I doubt we'll ever need it elsewhere but I could be wrong.

README.md Show resolved Hide resolved
Copy link

@JMilot1 JMilot1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. The code is clear and understandable.

I noticed that the only Abrev. used are cont_ and dist_. I'd try to avoid this for consistency with the rest. (but this is very nitpicking).

blocks_to_transfers/classify_transfers.py Show resolved Hide resolved
blocks_to_transfers/classify_transfers.py Outdated Show resolved Hide resolved
blocks_to_transfers/convert_blocks.py Show resolved Hide resolved
blocks_to_transfers/editor/__init__.py Show resolved Hide resolved
blocks_to_transfers/editor/__init__.py Outdated Show resolved Hide resolved
blocks_to_transfers/editor/types.py Outdated Show resolved Hide resolved
blocks_to_transfers/simplify_linear.py Outdated Show resolved Hide resolved
Copy link

@JMilot1 JMilot1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once comments are addressed

Copy link
Member

@jsteelz jsteelz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For added tests, shall I open a separate PR?

@npaun
Copy link
Member Author

npaun commented Mar 10, 2022

@jsteelz Yes, please do them in a separate PR.

@npaun npaun merged commit 5226e00 into master Mar 14, 2022
@npaun npaun deleted the npaun/work2 branch March 14, 2022 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants