Skip to content

Contextual Bandits with Large Action Spaces (LAS)

olgavrou edited this page Sep 19, 2022 · 9 revisions

What: Contextual Bandits (CB) with many actions to explore over (> 50). Combine with epsilon-greedy or squarecb exploration.

Publication: https://proceedings.mlr.press/v162/zhu22b.html

Large action spaces is an algorithm that lets exploration happen efficiently when there are multiple actions in a contextual bandit dataset. The main idea behind it is to eliminate actions that are similar and explore only over the most diverse actions in the dataset.

Once the redundant actions are eliminated, the remaining actions are processed by either epsilon-greedy or squarecb and the final result will be a probability mass function over the chosen actions. The redundant actions will have zero probabilities.

The way actions are eliminated is based on two main methods:

  1. Randomized truncated singular value decomposition over the sparse matrix of the actions representations, in order to reduce the dimensionality of the actions representation (https://arxiv.org/abs/0909.4061)
  2. Apply c-approximate barycentric spanner on the left singular vectors produced after the SVD (Awerbuch & Kleinberg STOC'04: https://www.cs.cornell.edu/~rdk/papers/OLSP.pdf)

Intuition of LAS algorithm

Lets say we have a very small example consisting of 5 actions

 | math:2.0 music:0.5 sports:0.1 science:4.0 computers:0.1 literature:0 social_media:0 dancing:0 news:0
 | math:0 music:1.5 sports:0 science:0 computers:0 literature:2.0 social_media:0 dancing:2.0 news:0
 | math:0 music:1.5 sports:4.0 science:0 computers:0 literature:2.0 social_media:4.0 dancing:2.0 news:4.0
 | math:0 music:1.5 sports:0 science:0 computers:0 literature:2.0 social_media:4.0 dancing:2.0 news:0
 | math:0 music:1.5 sports:0 science:0 computers:0 literature:2.0 social_media:4.0 dancing:2.0 news:0

Each action is described by some characteristics, some are stronger that others in specific actions. If we replace each categorical feature with an index we can get VW's actions representation matrix in a dense format representation from VW

math -> 0, music -> 1 , sports -> 2, etc

We can represent this as a matrix where each row is an action

math music sports science computers literature social_media dancing news
2.0 0.5 0.1 4.0 0.1 0 0 0 0
0 1.5 0 0 0 2.0 0 2.0 0
0 1.5 4.0 0 0 2.0 4.0 2.0 4.0
0 1.5 0 0 0 2.0 4.0 2.0 0
0 1.5 0 0 0 2.0 4.0 2.0 0

Just by looking at the actions represented this way we can see that the first action is quite distinct and that the following 4 actions have stronger similarities (the last two actions are duplicates). We can also see that from the 4 similar actions, action 3 is the one that stands out next.

If we perfom truncated SVD on A, setting the reduced dimenstion to d = 2 (--max_actions 2) we can examing the singular values and the left singular vectors. We expect the left singular vectors to hold the "essence" of the original matrix but in a reduced dimension space

dimension 1 dimension 2
-0.638989 0.0765502
0.0988239 0.51184
0.562224 0.576712
-0.364576 0.446969
-0.364576 0.446969

and singular values: 19.2866, 5.39217, 1.17257e-07, 4.79044e-08, 0

We can see that we have two large and distinct singular values and then that the remaining three are quite small and similar. This indicates that the original matrix has two dominant components, and that we can also state that it makes sense to select two actions since any more that we select are not going to add much information since any more actions choosen after the first two are going to be similar

The resulting left singular vector matrix is will then be fed to the spanner algorithm. The purpose of the spanner algorithm is to select the d actions that are more diverse and thus increase the volume of the final matrix (volume of final matrix defined by the abs value of the matrix norm). In the above case the two (d = 2) actions that do that are the first and the third actions

Spanner result:

dimension 1 dimension 2
-0.638989 0.0765502
0.562224 0.576712

The final pmf that VW will return will be spread over the resulting two actions. If we save the above input to a file and run VW with large action spaces (--cb_explore_adf --large_action_space --max_actions 2 --full_predictions --noconstant --predictions predfile.txt)

cat predfile.txt

0:0.5,2:0.5,1:0,3:0,4:0

we can see that the first and third actions (indexed starting at zero) are given a probability and that the rest of the actions are given a zero probability.

If --full_predictions is omited then the actions with zero probabilities are removed from the final prediction.

Clone this wiki locally