-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fix: Preserve complete trips by resolving trip_ids from filters #66
base: master
Are you sure you want to change the base?
Conversation
Unfortunately, the refactor for v1.0 introduced a bug where filtering by a file other than trips.txt could cause the view to be incomplete.
|
||
view = {"stops.txt": {"stop_id": full_feed.stops.stop_id[0]}} | ||
feed = ptg.load_feed(fixture("trimet-vermont-2018-02-06"), view=view) | ||
assert feed.stops.stop_id.shape[0] == 72 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one stop is present without the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I fully understand, because if the view
specifies a single stop_id
, wouldn't it be expected that only that stop is present in the output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Stuart! This is a bit confusing. The reason is because partridge should always preserve the referential integrity of trips. If we consider trips the "atomic unit" of GTFS, then filtering by stop_id
means "show me all the trips that use this stop". To keep trips whole, the resulting feed will have all stops, stop_times, etc. associated with those trips.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that makes sense. Thanks Danny! :)
@dget this is also good to merge! |
Pre-1.0 versions of partridge preserved relationally-complete trips by resolving each filter clause to a set of reachable trip_ids. This behavior was mistakenly changed in the 1.0 refactor. You could be affected if you use partridge to filter by stop_times.txt or an ancestor of stop_times.txt in the dependency graph. This use case seems uncommon in practice.
To reproduce the issue, filter stops.txt by some field and observe that only stops matching the filter are kept, potentially leaving trips with missing stops. See the test case for details.
TODO