You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The triggering idea here is an action dictionary/catalog for vowpal wabbit
Feature dictionaries can be loaded (id to parsed features, we only care about the features not the labels). Then incoming examples can reference the id instead of including the entire feature string and we can avoid parsing the same example strings many times.
Possible solution/implementation details
The idea is to
expose an API in VW that allows the loading of pre-parsed VW::example_features* that are referenced by a unique id (defined by the user)
examples can be extended to reference an id instead of holding the full feature string
a reduction will be added that:
holds a reference to this loaded dictionary (load/access to it should be thread safe in case an external thread (parser?) decides to reload the dictionary)
checks incoming examples for the existing id and if found swaps the incoming example's features with the features from the dictionary (un swapped on the way out of the reduction)
the loading of the dictionary from a library usage POV is up to the API caller since VW expects a map from id to VW::example_features*
TBD: from a CLI POV we need to decide on a format that can be parsed and set by VW during setup
*VW::example_feature is a new struct that holds the VW::v_array<namespace_index> indices and std::array<features, NUM_NAMESPACES> feature_space that is the full information of an example's features and potentially other information needed for feature counting
Other things to consider
All parsers that want to support this feature need to accept a reference id (json already supports this)
Cache: needs to be extended to hold the example id, and if used with cache (as with any other parser) the dictionary needs to be available
If someone sets an example id and also features in that example what do we do?
parser that is processing that example could reject it OR
ignore the extra features OR
add them to the dictionary example after the swap (and remove them from the dictionary example prior to exiting the reduction)
if the features loaded in the dictionary are populated with audit information then audit is complete otherwise it is just incomplete, this is up to the caller of the API
Failure mode:
if someone references a non existent id then we throw it is a non recoverable error
The text was updated successfully, but these errors were encountered:
Short description
The triggering idea here is an action dictionary/catalog for vowpal wabbit
Feature dictionaries can be loaded (id to parsed features, we only care about the features not the labels). Then incoming examples can reference the id instead of including the entire feature string and we can avoid parsing the same example strings many times.
Possible solution/implementation details
The idea is to
VW::example_feature
s* that are referenced by a unique id (defined by the user)VW::example_feature
s**
VW::example_feature
is a new struct that holds theVW::v_array<namespace_index> indices
andstd::array<features, NUM_NAMESPACES> feature_space
that is the full information of an example's features and potentially other information needed for feature countingOther things to consider
The text was updated successfully, but these errors were encountered: