Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix: Preserve complete trips by resolving trip_ids from filters #66

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions partridge/readers.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,12 +106,16 @@ def finalize() -> None:
def _load_feed(path: str, view: View, config: nx.DiGraph) -> Feed:
"""Multi-file feed filtering"""
config_ = remove_node_attributes(config, ["converters", "transformations"])
feed_ = Feed(path, view={}, config=config_)
trip_ids = set(Feed(path, config=config_).trips.trip_id)
for filename, column_filters in view.items():
config_ = reroot_graph(config_, filename)
view_ = {filename: column_filters}
feed_ = Feed(feed_, view=view_, config=config_)
return Feed(feed_, config=config)
trip_ids &= set(
Feed(
path,
view={filename: column_filters},
config=reroot_graph(config_, filename),
).trips.trip_id
)
return Feed(path, view={"trips.txt": {"trip_id": trip_ids}}, config=config)


def _busiest_date(feed: Feed) -> Tuple[datetime.date, FrozenSet[str]]:
Expand Down
9 changes: 9 additions & 0 deletions tests/test_readers.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,15 @@ def test_load_feed():
assert feed.stop_times.dtypes["arrival_time"] == np.float64


def test_load_feed_with_view():
full_feed = ptg.load_feed(fixture("trimet-vermont-2018-02-06"))
assert full_feed.stops.shape[0] == 102

view = {"stops.txt": {"stop_id": full_feed.stops.stop_id[0]}}
feed = ptg.load_feed(fixture("trimet-vermont-2018-02-06"), view=view)
assert feed.stops.stop_id.shape[0] == 72
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one stop is present without the fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I fully understand, because if the view specifies a single stop_id, wouldn't it be expected that only that stop is present in the output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Stuart! This is a bit confusing. The reason is because partridge should always preserve the referential integrity of trips. If we consider trips the "atomic unit" of GTFS, then filtering by stop_id means "show me all the trips that use this stop". To keep trips whole, the resulting feed will have all stops, stop_times, etc. associated with those trips.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that makes sense. Thanks Danny! :)



def test_load_geo_feed():
gpd = pytest.importorskip("geopandas")

Expand Down