-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve LISI multiprocessing specification #301
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks great, I like the simplification of the API. The chunk calculation is now no longer done... any reason why?
From what I understood from the code, the number of splits is basically the same as the number of cpu cores - 1, so I simplified it as such. |
True, just got this as well. And you don't want to use the whole node if not specified otherwise? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Only one thing: check if you still need to import multiprocessing if you removed the call to find how many cpus there are on the machine.
I realised that we actually had a bug in the C code. When Unfortunately, the last step failed on github actions likely because the g++ version I was using was not compatible with the one on github actions. For that, I also included the lisi code compilation into the @LuckyMD Would be great if I could have another review on this decision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this difficult to review as I didn't write the .cpp
code. Maybe @mbuttner could take a look at this to okay it?
Additional tests look good
I actually like the compilation in setup.py. No testing whether g++ is installed though, so it might create some strange errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the LISI code looks much nicer now, thanks for adding the tests and cleaning up the multiprocessing part. Please add an error report when compilation does not work. Well done!
@@ -201,6 +191,9 @@ def lisi_graph_py( | |||
By default, perplexity is chosen as 1/3 * number of nearest neighbours in the knn-graph. | |||
""" | |||
|
|||
# use no more than the available cores | |||
n_cores = max(1, min(n_cores, mp.cpu_count())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous implementation of the multiprocessing, I have taken all but one CPU for the process to be a little less greedy. But when I checked my previous implementation, it is apparently not doing what the comment said but taking all CPUs. So please ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, I was adapting it to have an upper bound and initially used n_cpu-1, before realising that the code will only run in single core on github action, since the machine only has 2 cores. So the lower bound is redundant, will remove.
Thanks @mbuttner for the review! I updated the compilation call so that any error message will be printed, but the installation will continue. The output is only visible with Once everything runs through I'll merge and update scib to a new version. |
LISI supports multiprocessing, however specifying the number of cores and whether to use multiprocessing is not well defined.
Here I address
multiprocessing
key from lisi functionsnodes
ton_cores
n_cores=1
for running without parallelisationn_cores=1
as default for metrics wrapper functionsBug fix included:
n_chunks
forknn_graph.o
to actually reflect the number of chunks (#chunks) created, NOT the index of the largest chunk (chunks-1)knn_graph.cpp
file on install withsetup.py