Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NN ensemble training and learning on one-document corpus #506

Merged
merged 3 commits into from
Jul 12, 2021

Conversation

juhoinkinen
Copy link
Member

Training or learning NN ensemble projects on a corpus consisting of only one document has been failing due to UnboundLocalError from Keras.

On Annif side the reason for this was an error for calculating the number of batches in LMDBSequence, which was based on a wrong self._counter value: when loading the LMDB the counter value was obtained as the index of the last element, which resulted in the counter value to be too small by one (for a corpus of one document self._counter = 0, and thus len(seq) = 0). So, Keras did not find any batches to iterate over, but raised the (spurious) error.

This PR fixes the off-by-one error enabling training and - more importantly - learning nn_ensemble projects on one document only (learn is typically called on one doc at a time).

Also raises NotSupportedException for attempts of training or learning with empty corpora.

Closes #504 and #505.

@juhoinkinen juhoinkinen added this to the 0.54 milestone Jul 12, 2021
@codecov
Copy link

codecov bot commented Jul 12, 2021

Codecov Report

Merging #506 (2dcb990) into master (a1348aa) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #506   +/-   ##
=======================================
  Coverage   99.49%   99.49%           
=======================================
  Files          78       78           
  Lines        5687     5695    +8     
=======================================
+ Hits         5658     5666    +8     
  Misses         29       29           
Impacted Files Coverage Δ
annif/backend/nn_ensemble.py 100.00% <100.00%> (ø)
tests/test_backend_nn_ensemble.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a1348aa...2dcb990. Read the comment docs.

@juhoinkinen juhoinkinen linked an issue Jul 12, 2021 that may be closed by this pull request
@sonarcloud
Copy link

sonarcloud bot commented Jul 12, 2021

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@juhoinkinen juhoinkinen changed the title Fix fitting NN ensemble model on one document corpus Fix NN ensemble training and learning on one-document corpus Jul 12, 2021
@juhoinkinen juhoinkinen merged commit f6c6f11 into master Jul 12, 2021
@juhoinkinen juhoinkinen deleted the fix-nn-ensemble-with-one-document-corpus branch July 13, 2021 13:35
@juhoinkinen juhoinkinen modified the milestones: 0.54, 0.53 Aug 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
1 participant