Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn instead of error in case of multiple subjects per doc in SVC training #509

Merged
merged 3 commits into from
Aug 10, 2021

Conversation

juhoinkinen
Copy link
Member

PR #501 fixed training SVC on full-text corpus, and also added raising NotSupportedException if any training document had multiple uris (as SVC is a multi-class, not multi-label algorithm). However in real-life even in those data-sets meant to have only one subject per doc there can be some docs with many subjects. To allow training SVC on such data sets, this PR changes the error to warning, and lets a random one of the uris to be used as the target uri.

@juhoinkinen juhoinkinen added enhancement classification Relevant for classification use cases labels Aug 10, 2021
@juhoinkinen juhoinkinen added this to the Short term milestone Aug 10, 2021
@codecov
Copy link

codecov bot commented Aug 10, 2021

Codecov Report

Merging #509 (13a3c9e) into master (e349d99) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #509      +/-   ##
==========================================
+ Coverage   99.51%   99.53%   +0.01%     
==========================================
  Files          82       82              
  Lines        5784     5771      -13     
==========================================
- Hits         5756     5744      -12     
+ Misses         28       27       -1     
Impacted Files Coverage Δ
tests/conftest.py 100.00% <ø> (ø)
annif/backend/svc.py 100.00% <100.00%> (ø)
tests/test_backend_svc.py 100.00% <100.00%> (ø)
annif/backend/stwfsa.py 100.00% <0.00%> (+1.56%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e349d99...13a3c9e. Read the comment docs.

annif/backend/svc.py Outdated Show resolved Hide resolved
@osma
Copy link
Member

osma commented Aug 10, 2021

LGTM. Suggested an extremely minor change for you to consider before merging.

@juhoinkinen juhoinkinen force-pushed the warn-on-multiple-subjects-for-svc branch from 3a5adb6 to 13a3c9e Compare August 10, 2021 10:47
@sonarcloud
Copy link

sonarcloud bot commented Aug 10, 2021

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@juhoinkinen juhoinkinen merged commit 02111ca into master Aug 10, 2021
@juhoinkinen juhoinkinen deleted the warn-on-multiple-subjects-for-svc branch August 10, 2021 10:54
juhoinkinen added a commit that referenced this pull request Aug 10, 2021
…aining (#509)

* Warn instead of error in case of multiple subjects per doc for SVC

* Remove now unnecessary single-subject-document-corpus fixture

* Avoid creating list of uris
@osma osma modified the milestones: Short term, 0.54 Aug 17, 2021
@juhoinkinen juhoinkinen modified the milestones: 0.54, 0.53 Aug 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
classification Relevant for classification use cases enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants