Skip to content

Truncated gradient descent example

lihongli edited this page Dec 8, 2011 · 4 revisions

Truncated Gradient Descent Example for VW

VW has an efficient (approximate) implementation of the truncated gradient algorithm for online L1 regularization. This paper provides an example using the rcv1 data set to illustrate the use of it. The (exact) online L2 regularization in VW can be done similarly, with the --l1 option below replaced by --l2.

We use the same training and test data prepared as in the RCV1 example; the cache files are cache_train and cache_test. The test label file will be needed for classifier evaluation, and is obtained by

zcat rcv1.test.dat.gz | cut -d ' ' -f 1 | sed -e 's/^-1/0/' > test_labels

The following three steps run (1) training, (2) testing, (3) evaluation of ROC, and (4) measuring model size, respectively:

vw --cache_file cache_train --final_regressor r_temp --passes 3 --readable_model r_temp.txt --l1 lambda
vw --testonly --initial_regressor r_temp --cache_file cache_test --predictions p_out
perf -ROC -files test_labels p_out
cat r_temp.txt | grep -c ^[0-9]


  1. lambda is the regularization level applied to online learning
  2. r_temp.txt is the human-readable model file for us to count the number of nonzero weights in the learned regressor

By varying lambda, we see the role of L1 regularization on prediction performance (ROC in particular) and model size:

lambda ROC Model Size
0 0.98346 41409
5e-8 0.98345 39985
1e-7 0.98345 38822
5e-7 0.98345 31899
1e-6 0.98345 26559
5e-6 0.98319 12564
1e-5 0.98288 7647
5e-5 0.98068 1860
1e-4 0.97804 921
1e-3 0.92469 53
Clone this wiki locally