Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved handling of source weights in NN ensemble #458

Merged
merged 2 commits into from
Jan 8, 2021
Merged

Conversation

osma
Copy link
Member

@osma osma commented Dec 9, 2020

This PR contains fixes aiming at improving the way the NN ensemble handles sources that have been assigned non-default weights.

  1. The weights assigned to source backends are now always normalized so that they add up to 1 (L1 normalization).
  2. The NN ensemble multiplies incoming score vectors by the assigned weights. Since the normalized weights are now generally smaller than they used to be (e.g. not 1:1:1 but 0.33:0.33:0.33), they are multiplied - outside the neural network - by the number of sources to compensate for this change.
  3. While researching this, I discovered that the NN ensemble performs better when the incoming weights are slightly larger numerically. Taking the square root of the scores (which increases the scores, since sqrt(x) > x when 0<x<1) before feeding them to the neural network seems to improve F1@5 scores by an additional 1-2 percentage points!

Fixes #457

@osma osma added this to the 0.51 milestone Dec 9, 2020
@osma osma self-assigned this Dec 9, 2020
@sonarcloud
Copy link

sonarcloud bot commented Dec 9, 2020

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@codecov
Copy link

codecov bot commented Dec 9, 2020

Codecov Report

Merging #458 (90f3f76) into master (98bde23) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #458   +/-   ##
=======================================
  Coverage   99.41%   99.41%           
=======================================
  Files          65       65           
  Lines        4627     4630    +3     
=======================================
+ Hits         4600     4603    +3     
  Misses         27       27           
Impacted Files Coverage Δ
annif/backend/nn_ensemble.py 100.00% <100.00%> (ø)
annif/util.py 97.72% <100.00%> (+0.10%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 98bde23...90f3f76. Read the comment docs.

@osma
Copy link
Member Author

osma commented Dec 9, 2020

There's also the question what happens if you use an old model, trained with the current/old NN ensemble code, while running the new code. I tested that with the example model from #457 that originally achieved an average F1@5 score of 0.4418; when evaluated with the new code, the score was 0.4330, which is a bit worse, but not catastrophically so - and in fact for some of the individual data sets the scores had improved.

I think it's enough to state in the release notes that retraining NN ensemble models is recommended, but not absolutely necessary.

@osma osma requested a review from juhoinkinen December 9, 2020 13:56
@osma
Copy link
Member Author

osma commented Dec 10, 2020

I verified that this PR doesn't change the results of a simple ensemble. (The annif.util.parse_sources that was modified in this PR is used by all ensembles)

Still need to double-check that PAV isn't affected

@osma
Copy link
Member Author

osma commented Jan 8, 2021

I verified this once more with the Annif tutorial data sets. The results improved significantly for the STW data set, but declined slightly for the YSO one (though depends on metric).

Double-checked that PAV is unaffected. Merging this now.

@osma osma merged commit 9fd8779 into master Jan 8, 2021
@osma osma deleted the nn-ensemble-tweaks2 branch January 8, 2021 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NN ensemble gives poor results with weighted sources
2 participants