Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store vocabs in AnnifRegistry so they are shared between projects #610

Merged
merged 1 commit into from
Aug 18, 2022

Conversation

osma
Copy link
Member

@osma osma commented Aug 17, 2022

Fixes #603

This PR makes the handling of vocabularies more efficient by making it possible to use a single AnnifVocabulary instance shared by multiple projects. The vocabularies are now stored in AnnifRegistry, similar to how projects are stored.

I did a little benchmarking. I set up TFIDF, MLLM and Parabel projects with YSO as the vocabulary, as well as an ensemble using these three as sources. I trained them using the archaeology test corpora from the test suite. Then I measured the user time and maximum RSS of two commands targeting the ensemble project: a simple annif suggest and a parallelized annif eval -j4 command against the fulltext documents in the test suite, before and after this PR:

time before time after RSS before RSS after
suggest 6.81 5.51 438072 305828
eval -j 4 24.35 23.24 418492 304464

This PR avoids loading the YSO vocabulary four times (once per project) and instead loads it only once. The result is 1.1-1.3 second reduction in CPU time and 110-130MB reduction in RAM usage.

@osma osma added this to the 0.59 milestone Aug 17, 2022
@osma osma self-assigned this Aug 17, 2022
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@codecov
Copy link

codecov bot commented Aug 17, 2022

Codecov Report

Merging #610 (e48a2fc) into master (3fd2202) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #610   +/-   ##
=======================================
  Coverage   99.58%   99.58%           
=======================================
  Files          87       87           
  Lines        5834     5840    +6     
=======================================
+ Hits         5810     5816    +6     
  Misses         24       24           
Impacted Files Coverage Δ
annif/vocab.py 95.89% <ø> (-0.50%) ⬇️
annif/project.py 99.38% <100.00%> (-0.01%) ⬇️
annif/registry.py 100.00% <100.00%> (ø)
tests/test_vocab.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@osma osma marked this pull request as ready for review August 17, 2022 13:27
@osma osma requested a review from juhoinkinen August 17, 2022 13:27
@lgtm-com
Copy link

lgtm-com bot commented Aug 17, 2022

This pull request fixes 1 alert when merging e48a2fc into 3fd2202 - view on LGTM.com

fixed alerts:

  • 1 for Module is imported with 'import' and 'import from'

Copy link
Member

@juhoinkinen juhoinkinen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@osma osma merged commit c291930 into master Aug 18, 2022
@osma osma deleted the issue603-shared-vocabs branch August 18, 2022 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Share vocabulary objects between projects
2 participants