-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Explore Eval
What: Evaluate exploration algorithms
The goal of explore eval is to evaluate different exploration algorithms using the data from a logged policy. The eval policy does not learn from all of the logged examples but there is some rejection sampling in order to do counterfactual simulation:
- for each example in the logged policy
- get the pmf of the prediction of the policy being evaluated (eval policy)
- for the action that was logged (which has a probability of
p_log
) find the eval policy probability for that actionp_eval
- calculate a threshold
p_eval
/p_log
- flip a biased coin (using threshold)
- have the eval policy learn from this example or skip to the next example
The --multiplier
cli argument can be provided which will be applied to the threshold (threshold *= multiplier
) and will affect the sampling rejection rate.
For all examples the average loss is calculated using the IPS technique (logged_cost * (p_eval / p_log)
) and reported at the end.
Explore eval once run, gives us some information about the sampling rate:
update count = <N>
violation count = <V>
final multiplier = <M>
where:
-
update count
is the number of examples that were used to update the policy being evaluated -
violation count
is the number of examples that had athreshold > 1
which means the eval policy had a larger probability for the logged action than the logged probability, and therefore is for sure used to update the eval policy -
final multiplier
is the final multiplier used
We can see that for eval policies' that are similar to the logged policy, the rejection rate will be lower than if the eval policies are very different to the logged policy. This can result in very different confidence intervals for different eval policies.
One way to tackle this is playing around with the multiplier for different policies. Another way is to use the --block_size
cli argument
The examples will be processed in blocks of block_size
. If an example update is found in that block no other examples in the block will be used to update the policy. If an example is not used in the block then the quota rolls over and the next block can update more than one examples. This has the effect of the acceptance rate not going over the example count that we want, while at the same time sampling evenly from the entire example set (not just the first N examples).
The best way to use this argument is to first do a run of explore_eval
(without the block_size
set), on all of the exploration policies being evaluated. Then find the smallest update count
and use that to set the block_size
by doing num_of_logged_examples / smallest_update_count
.
This way all policies should be evaluated with a similar rejection rate
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: