Annif 0.53
This release adds two new backends, YAKE and SVC. The YAKE backend is a wrapper around the YAKE library, which performs lexical unsupervised keyword extraction. There is no need for training data. See the YAKE wiki page for more information. In future Annif releases, it would be possible to extend YAKE support so that it can be used to suggest new terms for a vocabulary (the keywords that are not found in the vocabulary).
The SVC backend implements Linear Support Vector Classification. It is well suited for multiclass (but not multilabel) classification, for example classifying documents with the Dewey Decimal Classification or the 20 Newsgroups classification. It requires relatively little training data, and is suitable for classifications of up to around 10,000 classes. See the SVC wiki page for more information.
This release also upgrades many dependencies, which enables all Annif backends to run on Python 3.9 (previously nn_ensemble backend was available only for 3.6-3.8). The Docker image uses now Python 3.8 instead of 3.7.
Note that nn_ensemble models are not compatible across Python versions: e.g. a model trained on Python 3.7 can be used only on Python 3.7. Training the nn_ensemble models shows a CustomMaskWarning
, but it is harmless (caused by a TensorFlow bug) and can be ignored.
Due to the update of scikit-learn, using TFIDF, MLLM or Omikuji models trained on older Annif versions will show warnings about the TfidfVectorizer
. To the best of our knowledge, these are harmless and can be ignored. You have to retrain the models to get rid of the warnings.
This release includes also many minor improvements and bug fixes.
New features:
#486 New SVC (support vector classification) backend using scikit-learn
#439/#461 YAKE backend
#490/#494 Make --version option show Annif version
Improvements:
#488 Add support for ngram setting in omikuji backend
Maintenance:
#499 Update dependencies v0.53
#487 Upgrade scikit-learn to 0.24.2
#498 Update Dockerfile
Bug fixes:
#484/#495 Show error when training MLLM on empty corpus
#489 Add Codecov Action to GH workflow for uploading reports
#491 Raise NotSupportedException for attempt to train YAKE
#497 Remove execute permissions of some files