You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use this code to implement the paragraph2vec, and I found that there my be several iterations of the training.If we use the following code to train for several times like this:
for (int i = 0; i < n; i++)
model.train(sentences);
The first loop is right,but the next loops' memory usage is larger and larger.
After reviewing the code word2vec.h,I have found the code in word2vec.hmay have a problem:
#pragma omp parallel for
for (size_t i=0; i < n_sentences; ++i) {
auto sentence = sentences[i].get();
if (sentence->tokens_.empty())
continue;
size_t len = sentence->tokens_.size();
for (size_t i=0; i<len; ++i) {
auto it = vocab_.find(sentence->tokens_[i]);
if (it == vocab_.end()) continue;
Word *word = it->second.get();
// subsampling
if (sample_ > 0) {
float rnd = (sqrt(word->count_ / (sample_ * total_words) ) + 1) * (sample_ * total_words) / word->count_;
if (rnd < rng(eng)) continue;
}
sentence->words_.emplace_back(it->second.get());
}
The vector sentence will be larger and larger if we use the train function for the second time . We can clear the vector first.
#pragma omp parallel for
for (size_t i=0; i < n_sentences; ++i) {
auto sentence = sentences[i].get();
//By Largelymfs
sentence.clear();
if (sentence->tokens_.empty())
continue;
size_t len = sentence->tokens_.size();
for (size_t i=0; i<len; ++i) {
auto it = vocab_.find(sentence->tokens_[i]);
if (it == vocab_.end()) continue;
Word *word = it->second.get();
// subsampling
if (sample_ > 0) {
float rnd = (sqrt(word->count_ / (sample_ * total_words) ) + 1) * (sample_ * total_words) / word->count_;
if (rnd < rng(eng)) continue;
}
sentence->words_.emplace_back(it->second.get());
}
And we can put the train function into a loop.
Thanks a lot.
The text was updated successfully, but these errors were encountered:
hi,i want to see the result by training text8,but confront a lot of problems.i enter the commands:g++ main.cc,but there are many errors.how can i do? Thank you in advanced!
Hi!
The code is great!
I use this code to implement the paragraph2vec, and I found that there my be several iterations of the training.If we use the following code to train for several times like this:
The first loop is right,but the next loops' memory usage is larger and larger.
After reviewing the code
word2vec.h
,I have found the code inword2vec.h
may have a problem:The vector sentence will be larger and larger if we use the
train function
for the second time . We can clear the vector first.And we can put the
train function
into a loop.Thanks a lot.
The text was updated successfully, but these errors were encountered: