-
Notifications
You must be signed in to change notification settings - Fork 42
Working locally
As discussed in Anatomy of a ramp-kit, the directory submissions/ of the RAMP kit should contain one directory per submission.
In order to test locally your submission, you should use ramp_test_submission
. Suppose you have two folders submissions/starting_kit
and submissions/random_forest_10_10
in your RAMP kit. In order to test these submissions locally you can use:
ramp_test_submission --submission starting_kit
and
ramp_test_submission --submission random_forest_10_10
The model will be trained and the different metrics will be display for training, validation, and test data. It is often necessary to compare the performance of different submissions. To do that, you need to save the predictions and the metrics using the option --save-y-preds:
ramp_test_submission --submission <name> --save-y-preds
e.g.,
ramp_test_submission --submission starting_kit --save-y-preds
and
ramp_test_submission --submission random_forest_10_10 --save-y-preds
Now that the predictions and scores are saved, you can use ramp_leaderboard
to compare them.
Typing ramp_leaderboard
in the RAMP kit folder can display something like this:
> ramp_leaderboard
+----+---------------------+--------------+--------------+--------------+
| | submission | train_acc | valid_acc | test_acc |
+====+=====================+==============+==============+==============+
| 1 | random_forest_10_10 | 1.00 ± 0.000 | 0.95 ± 0.000 | 0.90 ± 0.021 |
+----+---------------------+--------------+--------------+--------------+
| 0 | starting_kit | 0.61 ± 0.026 | 0.65 ± 0.000 | 0.62 ± 0.083 |
+----+---------------------+--------------+--------------+--------------+
Each row of the table correspond to a submission, each column corresponds to a metric computed
on a split of the data (train, validation, or test). acc
here stands for accuracy.
The list of available metrics depend on the RAMP kit and are specified in problem.py
.
By default only a default metric is displayed, but it is customizable:
> ramp_leaderboard --metric=nll
+----+---------------------+--------------+--------------+--------------+
| | submission | train_nll | valid_nll | test_nll |
+====+=====================+==============+==============+==============+
| 0 | starting_kit | 0.98 ± 0.197 | 0.59 ± 0.069 | 0.76 ± 0.041 |
+----+---------------------+--------------+--------------+--------------+
| 1 | random_forest_10_10 | 0.02 ± 0.007 | 0.12 ± 0.008 | 0.20 ± 0.019 |
+----+---------------------+--------------+--------------+--------------+
To get the list of available metrics, you can use the following:
ramp_leaderboard --help-metrics
It is also possible to specify exactly the columns of the displayed table by indicating the list of metrics and the split (train, validation, or test):
> ramp_leaderboard --cols=train_acc,train_nll,valid_nll
+----+---------------------+--------------+--------------+--------------+
| | submission | train_acc | train_nll | valid_nll |
+====+=====================+==============+==============+==============+
| 1 | random_forest_10_10 | 1.00 ± 0.000 | 0.02 ± 0.007 | 0.12 ± 0.008 |
+----+---------------------+--------------+--------------+--------------+
| 0 | starting_kit | 0.61 ± 0.026 | 0.98 ± 0.197 | 0.59 ± 0.069 |
+----+---------------------+--------------+--------------+--------------+
It is also possible to specify on which metric (and split) to sort:
ramp_leaderboard --metric=nll --sort_by=valid_nll,test_nll --asc
will sort first by valid_nll
then test_nll
(in case of ties)
ascending (by default, if asc is not given, it is descending).
For more information:
ramp_leaderboard --help
Copyright (c) 2014 - 2018 Paris-Saclay Center for Data Science (http://www.datascience-paris-saclay.fr/)