Optimize the optimize command #477

osma · 2021-03-22T14:09:56Z

I noticed that the annif optimize command is extremely slow for ensemble projects. The problem was that the ensemble backends return VectorSuggestionResult objects, while regular backends (tfidf, omikuji, stwfsa...) usually return ListSuggestionResult objects. The optimize command does a lot of filtering of results (using different limit and threshold values) and this is a very slow operation with VectorSuggestionResult.

The fix is to ensure that the results given by the project are first converted to ListSuggestionResult. Conveniently, VectorSuggestionResult.filter already returns a ListSuggestionResult and in any case, it makes sense to pre-filter the results down to at most 15 suggestions since only the top 15 will be used anyway and keeping the others will just create more work when filtering. However, since it's not guaranteed that VectorSuggestionResult.filter will always keep returning a ListSuggestionResult, I added an extra assert statement to verify this and fail fast instead of working extremely slowly.

I tested this using STW thesaurus based projects from the Annif tutorial. I defined a tfidf project and an omikuji project and trained them with the stw-econbiz-small corpus. Then I defined an ensemble combining both. Here are some benchmark results for the optimize command (targeting the test corpus):

Backend	User time before	User time after	RAM before	RAM after
tfidf	445s	377s	559544	557428
omikuji	452s	398s	630988	626408
ensemble	3129s	426s	733684	691048

This brings a speedup of 12-15% for the regular projects and a whopping 86% for the ensemble project. RAM use is practically unchanged except for a 6% reduction for the ensemble case.

…onResult.

sonarqubecloud · 2021-03-22T14:11:59Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.0% Duplication

juhoinkinen

LGTM

Optimize the optimize command by pre-filtering and using ListSuggesti…

7fbcdaa

…onResult.

osma added the bug label Mar 22, 2021

osma added this to the 0.52 milestone Mar 22, 2021

osma requested a review from juhoinkinen March 22, 2021 14:09

juhoinkinen approved these changes Mar 22, 2021

View reviewed changes

osma merged commit 7f42c96 into master Mar 23, 2021

osma deleted the fix-optimize-optimize-command branch March 23, 2021 07:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the optimize command #477

Optimize the optimize command #477

osma commented Mar 22, 2021 •

edited

Loading

sonarqubecloud bot commented Mar 22, 2021

juhoinkinen left a comment

Optimize the optimize command #477

Optimize the optimize command #477

Conversation

osma commented Mar 22, 2021 • edited Loading

sonarqubecloud bot commented Mar 22, 2021

juhoinkinen left a comment

Choose a reason for hiding this comment

osma commented Mar 22, 2021 •

edited

Loading