-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiprocessing in eval command #418
Conversation
Codecov Report
@@ Coverage Diff @@
## master #418 +/- ##
==========================================
- Coverage 99.39% 99.29% -0.11%
==========================================
Files 60 61 +1
Lines 4309 4371 +62
==========================================
+ Hits 4283 4340 +57
- Misses 26 31 +5
Continue to review full report at Codecov.
|
…es a SubjectIndex
… takes a SubjectIndex
20b5e64
to
b70191b
Compare
Rebased on current |
ListSuggestionResult, so less data to serialize/deserialize during multiprocessing
After performing some testing, I'm fairly confident that this feature works with at least most types of backend (tested tfidf, omikuji, fasttext, maui and simple ensemble), but the speedup is not very big - the parallelization overhead is quite significant and the initialization of models and postprocessing of results, which cannot be parallelized, take up significant chunks of time too. In practice, with two parallel jobs, the evaluation takes around the same time as with one job, and to get any improvement in overall evaluation time, you need to use jobs=4 or more. I changed the default to jobs=1 so that parallel evaluation is only performed when requested by the user. I will open another issue on implementing a parallel optimize command. Still some more refactoring, then this can be merged I think. |
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
This pull request fixes 1 alert when merging 495f4cc into 321bf0a - view on LGTM.com fixed alerts:
|
This PR makes it possible to use multiprocessing to speed up the
eval
command.The majority of the changes are actually refactorings to decouple the subject index from the SuggestionResult classes. After the changes, SuggestionResult instances no longer keep a reference to SubjectIndex. Instead a SubjectIndex is passed as a parameter to individual SuggestionResult methods as necessary. Since properties cannot take parameters, the
hits
property has been changed to theas_list
method and thevector
property has been changed to theas_vector
method.The actual multiprocessing implementation is still very rough and further changes are needed:
--jobs
CLI argument)Fixes #65