Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix: Preserve complete trips by resolving trip_ids from filters #66

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

invisiblefunnel
Copy link
Contributor

@invisiblefunnel invisiblefunnel commented Nov 17, 2020

Pre-1.0 versions of partridge preserved relationally-complete trips by resolving each filter clause to a set of reachable trip_ids. This behavior was mistakenly changed in the 1.0 refactor. You could be affected if you use partridge to filter by stop_times.txt or an ancestor of stop_times.txt in the dependency graph. This use case seems uncommon in practice.

To reproduce the issue, filter stops.txt by some field and observe that only stops matching the filter are kept, potentially leaving trips with missing stops. See the test case for details.

TODO

  • review
  • merge
  • release

    Unfortunately, the refactor for v1.0 introduced a bug where
    filtering by a file other than trips.txt could cause
    the view to be incomplete.

view = {"stops.txt": {"stop_id": full_feed.stops.stop_id[0]}}
feed = ptg.load_feed(fixture("trimet-vermont-2018-02-06"), view=view)
assert feed.stops.stop_id.shape[0] == 72
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one stop is present without the fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I fully understand, because if the view specifies a single stop_id, wouldn't it be expected that only that stop is present in the output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Stuart! This is a bit confusing. The reason is because partridge should always preserve the referential integrity of trips. If we consider trips the "atomic unit" of GTFS, then filtering by stop_id means "show me all the trips that use this stop". To keep trips whole, the resulting feed will have all stops, stop_times, etc. associated with those trips.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that makes sense. Thanks Danny! :)

@invisiblefunnel
Copy link
Contributor Author

@dget this is also good to merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants