Skip to content

Working locally

Mehdi Cherti edited this page Jul 11, 2018 · 14 revisions

As discussed in Anatomy of a ramp-kit, the directory submissions/ of the RAMP kit should contain one directory per submission.

In order to test locally your submission, you should use ramp_test_submission. Suppose you have two folders submissions/starting_kit and submissions/random_forest_10_10 in your RAMP kit. In order to test these submissions locally you can use:

ramp_test_submission --submission starting_kit

and

ramp_test_submission --submission random_forest_10_10

The model will be trained and the different metrics will be display for training, validation, and test data. It is often necessary to compare the performance of different submissions. To do that, you need to save the predictions and the metrics using the option --save-y-preds:

ramp_test_submission --submission <name> --save-y-preds

e.g.,

ramp_test_submission --submission starting_kit --save-y-preds

and

ramp_test_submission --submission random_forest_10_10 --save-y-preds

Now that the predictions and scores are saved, you can use ramp_leaderboard to compare them. Typing ramp_leaderboard in the RAMP kit folder can display something like this:

> ramp_leaderboard

+----+---------------------+--------------+--------------+--------------+
|    | submission          | train_acc    | valid_acc    | test_acc     |
+====+=====================+==============+==============+==============+
|  1 | random_forest_10_10 | 1.00 ± 0.000 | 0.95 ± 0.000 | 0.90 ± 0.021 |
+----+---------------------+--------------+--------------+--------------+
|  0 | starting_kit        | 0.61 ± 0.026 | 0.65 ± 0.000 | 0.62 ± 0.083 |
+----+---------------------+--------------+--------------+--------------+

Each row of the table correspond to a submission, each column corresponds to a metric computed on a split of the data (train, validation, or test). acc here stands for accuracy. The list of available metrics depend on the RAMP kit and are specified in problem.py.

By default only a default metric is displayed, but it is customizable:

> ramp_leaderboard --metric=nll
+----+---------------------+--------------+--------------+--------------+
|    | submission          | train_nll    | valid_nll    | test_nll     |
+====+=====================+==============+==============+==============+
|  0 | starting_kit        | 0.98 ± 0.197 | 0.59 ± 0.069 | 0.76 ± 0.041 |
+----+---------------------+--------------+--------------+--------------+
|  1 | random_forest_10_10 | 0.02 ± 0.007 | 0.12 ± 0.008 | 0.20 ± 0.019 |
+----+---------------------+--------------+--------------+--------------+

To get the list of available metrics, you can use the following:

ramp_leaderboard --help-metrics

It is also possible to specify exactly the columns of the displayed table by indicating the list of metrics and the split (train, validation, or test):

> ramp_leaderboard --cols=train_acc,train_nll,valid_nll

+----+---------------------+--------------+--------------+--------------+
|    | submission          | train_acc    | train_nll    | valid_nll    |
+====+=====================+==============+==============+==============+
|  1 | random_forest_10_10 | 1.00 ± 0.000 | 0.02 ± 0.007 | 0.12 ± 0.008 |
+----+---------------------+--------------+--------------+--------------+
|  0 | starting_kit        | 0.61 ± 0.026 | 0.98 ± 0.197 | 0.59 ± 0.069 |
+----+---------------------+--------------+--------------+--------------+

It is also possible to specify on which metric (and split) to sort:

ramp_leaderboard --metric=nll --sort_by=valid_nll,test_nll --asc

will sort first by valid_nll then test_nll (in case of ties) ascending (by default, if asc is not given, it is descending).

For more information:

ramp_leaderboard --help