Store vocabs in AnnifRegistry so they are shared between projects #610

osma · 2022-08-17T12:55:45Z

Fixes #603

This PR makes the handling of vocabularies more efficient by making it possible to use a single AnnifVocabulary instance shared by multiple projects. The vocabularies are now stored in AnnifRegistry, similar to how projects are stored.

I did a little benchmarking. I set up TFIDF, MLLM and Parabel projects with YSO as the vocabulary, as well as an ensemble using these three as sources. I trained them using the archaeology test corpora from the test suite. Then I measured the user time and maximum RSS of two commands targeting the ensemble project: a simple annif suggest and a parallelized annif eval -j4 command against the fulltext documents in the test suite, before and after this PR:

	time before	time after	RSS before	RSS after
suggest	6.81	5.51	438072	305828
eval -j 4	24.35	23.24	418492	304464

This PR avoids loading the YSO vocabulary four times (once per project) and instead loads it only once. The result is 1.1-1.3 second reduction in CPU time and 110-130MB reduction in RAM usage.

#603

sonarqubecloud · 2022-08-17T12:56:21Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.0% Duplication

codecov · 2022-08-17T12:59:23Z

Codecov Report

Merging #610 (e48a2fc) into master (3fd2202) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #610   +/-   ##
=======================================
  Coverage   99.58%   99.58%           
=======================================
  Files          87       87           
  Lines        5834     5840    +6     
=======================================
+ Hits         5810     5816    +6     
  Misses         24       24

Impacted Files	Coverage Δ
annif/vocab.py	`95.89% <ø> (-0.50%)`	⬇️
annif/project.py	`99.38% <100.00%> (-0.01%)`	⬇️
annif/registry.py	`100.00% <100.00%> (ø)`
tests/test_vocab.py	`100.00% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

lgtm-com · 2022-08-17T13:38:31Z

This pull request fixes 1 alert when merging e48a2fc into 3fd2202 - view on LGTM.com

fixed alerts:

1 for Module is imported with 'import' and 'import from'

juhoinkinen

LGTM

Store vocabs in AnnifRegistry so they are shared between projects. Fixes

e48a2fc

#603

osma added this to the 0.59 milestone Aug 17, 2022

osma self-assigned this Aug 17, 2022

osma marked this pull request as ready for review August 17, 2022 13:27

osma requested a review from juhoinkinen August 17, 2022 13:27

juhoinkinen approved these changes Aug 17, 2022

View reviewed changes

osma merged commit c291930 into master Aug 18, 2022

osma deleted the issue603-shared-vocabs branch August 18, 2022 06:48

osma mentioned this pull request Sep 22, 2023

optimization: load a vocabulary only once even if used in different languages #736

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store vocabs in AnnifRegistry so they are shared between projects #610

Store vocabs in AnnifRegistry so they are shared between projects #610

osma commented Aug 17, 2022 •

edited

Loading

sonarqubecloud bot commented Aug 17, 2022

codecov bot commented Aug 17, 2022 •

edited

Loading

lgtm-com bot commented Aug 17, 2022

juhoinkinen left a comment

Store vocabs in AnnifRegistry so they are shared between projects #610

Store vocabs in AnnifRegistry so they are shared between projects #610

Conversation

osma commented Aug 17, 2022 • edited Loading

sonarqubecloud bot commented Aug 17, 2022

codecov bot commented Aug 17, 2022 • edited Loading

Codecov Report

lgtm-com bot commented Aug 17, 2022

juhoinkinen left a comment

Choose a reason for hiding this comment

osma commented Aug 17, 2022 •

edited

Loading

codecov bot commented Aug 17, 2022 •

edited

Loading