Fix NN ensemble training and learning on one-document corpus #506
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Training or learning NN ensemble projects on a corpus consisting of only one document has been failing due to
UnboundLocalError
from Keras.On Annif side the reason for this was an error for calculating the number of batches in
LMDBSequence
, which was based on a wrongself._counter
value: when loading the LMDB the counter value was obtained as the index of the last element, which resulted in the counter value to be too small by one (for a corpus of one documentself._counter = 0
, and thuslen(seq) = 0
). So, Keras did not find any batches to iterate over, but raised the (spurious) error.This PR fixes the off-by-one error enabling training and - more importantly - learning nn_ensemble projects on one document only (learn is typically called on one doc at a time).
Also raises
NotSupportedException
for attempts of training or learning with empty corpora.Closes #504 and #505.