Skip to content

Comments

Added random number generator to shuffling of values#41

Merged
wschin merged 1 commit intocjlin1:masterfrom
mstfbl:addedRNG
May 12, 2020
Merged

Added random number generator to shuffling of values#41
wschin merged 1 commit intocjlin1:masterfrom
mstfbl:addedRNG

Conversation

@mstfbl
Copy link
Contributor

@mstfbl mstfbl commented May 11, 2020

This PR adds a random number generator to the shuffling of values in the random map generation process. This PR also uses void shuffle(_RanIt _First, _RanIt _Last, _Urng&& _Func) instead of void random_shuffle(_RanIt _First, _RanIt _Last) to be able to use the RNG.

The function void random_shuffle(_RanIt _First, _RanIt _Last) is implemented differently on different platforms. I found this problem while working on ML .NET, where one of our matrix factorization unit tests produced varying results when run consecutively on MacOS. This variance led to these functions, where the seed variable of rand() is set through a srand(0):

libmf/mf.cpp

Lines 1115 to 1123 in e70b9a3

vector<mf_int> Utility::gen_random_map(mf_int size)
{
srand(0);
vector<mf_int> map(size, 0);
for(mf_int i = 0; i < size; ++i)
map[i] = i;
random_shuffle(map.begin(), map.end());
return map;
}

libmf/mf.cpp

Lines 4070 to 4123 in e70b9a3

mf_double CrossValidatorBase::do_cross_validation()
{
vector<mf_int> cv_blocks;
srand(0);
for(mf_int block = 0; block < nr_bins*nr_bins; ++block)
cv_blocks.push_back(block);
random_shuffle(cv_blocks.begin(), cv_blocks.end());
if(!quiet)
{
cout.width(4);
cout << "fold";
cout.width(10);
cout << util.get_error_legend();
cout << endl;
}
cv_error = 0;
for(mf_int fold = 0; fold < nr_folds; ++fold)
{
mf_int begin = fold*nr_blocks_per_fold;
mf_int end = min((fold+1)*nr_blocks_per_fold, nr_bins*nr_bins);
vector<mf_int> hidden_blocks(cv_blocks.begin()+begin,
cv_blocks.begin()+end);
mf_double err = do_cv1(hidden_blocks);
cv_error += err;
if(!quiet)
{
cout.width(4);
cout << fold;
cout.width(10);
cout << fixed << setprecision(4) << err;
cout << endl;
}
}
if(!quiet)
{
cout.width(14);
cout.fill('=');
cout << "" << endl;
cout.fill(' ');
cout.width(4);
cout << "avg";
cout.width(10);
cout << fixed << setprecision(4) << cv_error/nr_folds;
cout << endl;
}
return cv_error/nr_folds;
}

This srand(0) command with void random_shuffle(_RanIt _First, _RanIt _Last) does not seem to be producing reliable consecutive results on MacOS. By using void shuffle(_RanIt _First, _RanIt _Last, _Urng&& _Func) and providing an appropriate random number generator, consistent results can be obtained on varying builds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants