You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there,
for PT networks of hundreds of thousands to millions of links, quetzal's integrity check functions integrity_test_sequences() and integrity_test_circular_lines() take an indefinite long time (I had to interrupt the last test with 2 million links after one day). This is why I suggest some faster logic:
The sequence testing only accounts for the length of the trip, which might overlook situations like 1-->2-->2-->4, but that is less probable (does not occur in my GTFS feeds):
On the other hand, the fix methods are a bit too fast, dropping all affected trips. I would suggest a thorough fix by splitting up trip_id's, knowing, that this causes in additional interchanges. That does not represent reality, but is better than dropping trips, when their number is considerable.
A suggestion for trip sequences:
def fix_sequences(trip):
if len(trip) > 1:
trip = trip.sort_values('link_sequence')
# Check link succession
ind = list(trip.index)
for i in range(len(trip.index) - 1):
try:
assert trip.loc[ind[i], 'b'] == trip.loc[ind[i+1], 'a'], \
'broken trip {}: stop {} has no successor link'.format(
trip['trip_id'].unique()[0], trip.loc[ind[i], 'b'])
except AssertionError:
trip.loc[ind[i+1]:ind[-1], 'trip_id'] = \
trip.loc[ind[i+1]:ind[-1], 'trip_id'] + '_' + str(i)
# Repair sequences
if len(trip) != trip['link_sequence'].max():
trip['link_sequence'] = trip.groupby('trip_id')['link_sequence'].apply(
lambda t: [j for j in range(1, len(t.index)+1)]).sum()
return trip
self.links = self.links.groupby('trip_id').apply(fix_sequences).reset_index(level=0, drop=True)
My suggestion for circular lines fixes 97% of circularity the issues:
def fix_circular_split(trip):
def split_trip(trip, split_by):
split = [trip.index.get_loc(i) for i in trip.loc[trip[split_by].duplicated(keep=False)].index]
if len(split) >= 1:
trips = []
# First stops
trips.append(trip.iloc[: split[0]+1])
# Middle stops
for i in range(1, len(split)):
t = trip.iloc[split[i-1]+1 : split[i]]
t['trip_id'] = t['trip_id'] + '_' + str(i) + str(split_by)
t['link_sequence'] = list(range(1, len(t)+1))
trips.append(t)
# Last stops
t = trip.iloc[split[-1] :]
t['trip_id'] = t['trip_id'] + '_n' + str(split_by)
t['link_sequence'] = list(range(1, len(t)+1))
trips.append(t)
return pd.concat(trips)
else:
return trip
# Split duplicated b stops
trip = split_trip(trip, 'b')
# Split duplicated a stops
trip = trip.groupby('trip_id').apply(split_trip, 'a')
return trip
fixed = self.circular_lines.groupby('trip_id').apply(fix_circular_split).reset_index(level='trip_id', drop=True)
initial_circular = self.circular_lines.copy()
fixed.groupby('trip_id').apply(test_circular).reset_index(level='trip_id', drop=True)
fixed.drop(self.circular_lines.index, inplace=True)
self.links = self.links.loc[~sm.links['trip_id'].isin(initial_circular['trip_id'].unique())]
self.links = self.links.append(fixed)
It's all tested with the PT network of entire Germany. I hope I made no mistakes translating the logic it into quetzal function suggestions.
I would suggest keeping the current methods, but including an option for "quick-checks" and "thorough-fixes".
Cheers
The text was updated successfully, but these errors were encountered:
Hi there,
for PT networks of hundreds of thousands to millions of links,
quetzal
's integrity check functionsintegrity_test_sequences()
andintegrity_test_circular_lines()
take an indefinite long time (I had to interrupt the last test with 2 million links after one day). This is why I suggest some faster logic:The sequence testing only accounts for the length of the trip, which might overlook situations like 1-->2-->2-->4, but that is less probable (does not occur in my GTFS feeds):
The circular lines test should account for any case where duplicate stops occur within one trip:
On the other hand, the fix methods are a bit too fast, dropping all affected trips. I would suggest a thorough fix by splitting up trip_id's, knowing, that this causes in additional interchanges. That does not represent reality, but is better than dropping trips, when their number is considerable.
A suggestion for trip sequences:
My suggestion for circular lines fixes 97% of circularity the issues:
It's all tested with the PT network of entire Germany. I hope I made no mistakes translating the logic it into quetzal function suggestions.
I would suggest keeping the current methods, but including an option for "quick-checks" and "thorough-fixes".
Cheers
The text was updated successfully, but these errors were encountered: