You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need a way to generate columns of data that represent interactions between chooser and alternative. This could be for distances between locations, for weights that vary depending on the category of chooser, and so on.
I'm proposing an InteractionGenerator() class for storing such relationships and calculating them on demand. This approach provides computational and memory efficiencies when there are very large numbers of choosers and alternatives.
InteractionGenerator() would be a template class. We'll provide a couple of implementations, like DistanceGenerator() for calculating distances, and advanced users can write their own.
Usage example:
choosers# pd.DataFrame with index, lat, lngalternatives# pd.DataFrame with index, lat, lngdg=DistanceGenerator(choosers, alternatives, type='straight_line')
print(dg.get_data(chooser_ids=[...], alternative_ids=[...])
# include the column in a merged & sampled tablemerged_table=MergedChoiceTable(choosers, alternatives, sample_size=10, interactions=[dg])
Another common use case will be providing an InteractionGenerator() to specify sampling weights.
There is a rough sketch of these classes in my branch of the code: interaction.py#L21-L85
The text was updated successfully, but these errors were encountered:
Digging into it a bit more, I think the clearest justification for this implementation is in calculating sampling weights.
For J choosers (maybe millions) and K alternatives (maybe millions), we would need to generate J x K sampling weights, but only K of them would need to be in memory at any given time (for passing to np.random.choice).
Interaction data columns can be generated after the sampling, which would be easier in most cases than writing a subclass of InteractionGenerator(). For example:
I think that would add the column directly into the object's underlying dataframe, since df is a reference, but we should probably write explicit methods for this.
We need a way to generate columns of data that represent interactions between chooser and alternative. This could be for distances between locations, for weights that vary depending on the category of chooser, and so on.
I'm proposing an
InteractionGenerator()
class for storing such relationships and calculating them on demand. This approach provides computational and memory efficiencies when there are very large numbers of choosers and alternatives.InteractionGenerator()
would be a template class. We'll provide a couple of implementations, likeDistanceGenerator()
for calculating distances, and advanced users can write their own.Usage example:
Another common use case will be providing an
InteractionGenerator()
to specify sampling weights.There is a rough sketch of these classes in my branch of the code: interaction.py#L21-L85
The text was updated successfully, but these errors were encountered: