Improve LISI multiprocessing specification #301

mumichae · 2022-04-26T15:56:06Z

LISI supports multiprocessing, however specifying the number of cores and whether to use multiprocessing is not well defined.
Here I address

remove multiprocessing key from lisi functions
rename nodes to n_cores
use n_cores=1 for running without parallelisation
use n_cores=1 as default for metrics wrapper functions

Bug fix included:

changed expected input of n_chunks for knn_graph.o to actually reflect the number of chunks (#chunks) created, NOT the index of the largest chunk (chunks-1)
compile knn_graph.cpp file on install with setup.py

…ine-ci-fix

LuckyMD

Overall this looks great, I like the simplification of the API. The chunk calculation is now no longer done... any reason why?

mumichae · 2022-04-26T16:33:04Z

From what I understood from the code, the number of splits is basically the same as the number of cpu cores - 1, so I simplified it as such.
To be exact, n_chunks == n_processes and n_splits == n_chunks - 1

LuckyMD · 2022-04-26T21:34:43Z

To be exact, n_chunks == n_processes and n_splits == n_chunks - 1

True, just got this as well. And you don't want to use the whole node if not specified otherwise?

LuckyMD

Looks good!

Only one thing: check if you still need to import multiprocessing if you removed the call to find how many cpus there are on the machine.

mumichae · 2022-04-28T13:45:44Z

I realised that we actually had a bug in the C code. When lisi_graph_py is called with 2 cores, the script will set the max chunk index to 0, although it should be 1. I fixed this and had to recompile the code.

Unfortunately, the last step failed on github actions likely because the g++ version I was using was not compatible with the one on github actions. For that, I also included the lisi code compilation into the setup.py. Might be a bit hacky, but that's the only thing that I managed to get to work.

@LuckyMD Would be great if I could have another review on this decision.

LuckyMD

I find this difficult to review as I didn't write the .cpp code. Maybe @mbuttner could take a look at this to okay it?

Additional tests look good

scib/knn_graph/.gitignore

scib/knn_graph/knn_graph.cpp

LuckyMD · 2022-04-28T14:26:00Z

I actually like the compilation in setup.py. No testing whether g++ is installed though, so it might create some strange errors.

setup.py

mbuttner

I think that the LISI code looks much nicer now, thanks for adding the tests and cleaning up the multiprocessing part. Please add an error report when compilation does not work. Well done!

scib/knn_graph/knn_graph.cpp

mbuttner · 2022-04-29T06:46:22Z

scib/metrics/lisi.py

@@ -201,6 +191,9 @@ def lisi_graph_py(
    By default, perplexity is chosen as 1/3 * number of nearest neighbours in the knn-graph.
    """

+    # use no more than the available cores
+    n_cores = max(1, min(n_cores, mp.cpu_count()))


In the previous implementation of the multiprocessing, I have taken all but one CPU for the process to be a little less greedy. But when I checked my previous implementation, it is apparently not doing what the comment said but taking all CPUs. So please ignore.

Oh right, I was adapting it to have an upper bound and initially used n_cpu-1, before realising that the code will only run in single core on github action, since the machine only has 2 cores. So the lower bound is redundant, will remove.

setup.py

mumichae · 2022-04-29T09:38:55Z

Thanks @mbuttner for the review!

I updated the compilation call so that any error message will be printed, but the installation will continue. The output is only visible with pip install scib -v, but that's a pip thing.

Once everything runs through I'll merge and update scib to a new version.

mumichae added 8 commits April 21, 2022 12:13

move tempdir removal to end of function

c46a959

turn off multiprocessing for LISI scores

12e307d

move tempdir removal to end of function

03afb57

turn off multiprocessing for LISI scores

36b4e3c

Merge branch 'pipeline-ci-fix' of github.com:theislab/scib into pipel…

fc712dc

…ine-ci-fix

use None for multiprocessing

f1949e7

remove multiprocessing parameter

5d76d7b

simplified lisi estimate code

ec5382f

mumichae self-assigned this Apr 26, 2022

mumichae requested a review from LuckyMD April 26, 2022 15:56

add n_cores parameter description to metrics docstring

bff2a3a

LuckyMD approved these changes Apr 26, 2022

View reviewed changes

mumichae added 5 commits April 27, 2022 15:57

Change n_chunks from 0-based to 1-based & fix error when using 2 cores

a896605

remove redundant character

d80c06e

rename variables for readability

9dbf698

compile knn_graph during installation

9a14f36

implement parallel lisi calls

13b9d70

mumichae added the bug Something isn't working label Apr 28, 2022

include max core check

a9ca039

mumichae requested a review from LuckyMD April 28, 2022 13:56

LuckyMD reviewed Apr 28, 2022

View reviewed changes

scib/knn_graph/.gitignore Show resolved Hide resolved

scib/knn_graph/knn_graph.cpp Show resolved Hide resolved

LuckyMD requested a review from mbuttner April 28, 2022 14:25

add check for g++

b4f2794

LuckyMD reviewed Apr 28, 2022

View reviewed changes

setup.py Outdated Show resolved Hide resolved

mbuttner approved these changes Apr 29, 2022

View reviewed changes

catch error when compiling C code

721e51e

mumichae merged commit ee04da8 into main Apr 29, 2022

mumichae deleted the pipeline-ci-fix branch April 29, 2022 09:51

mumichae restored the pipeline-ci-fix branch May 2, 2022 16:07

mumichae deleted the pipeline-ci-fix branch May 13, 2022 17:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve LISI multiprocessing specification #301

Improve LISI multiprocessing specification #301

mumichae commented Apr 26, 2022 •

edited

Loading

LuckyMD left a comment

mumichae commented Apr 26, 2022 •

edited

Loading

LuckyMD commented Apr 26, 2022

LuckyMD left a comment

mumichae commented Apr 28, 2022

LuckyMD left a comment

LuckyMD commented Apr 28, 2022

mbuttner left a comment

mbuttner Apr 29, 2022

mumichae Apr 29, 2022

mumichae commented Apr 29, 2022

Improve LISI multiprocessing specification #301

Improve LISI multiprocessing specification #301

Conversation

mumichae commented Apr 26, 2022 • edited Loading

LuckyMD left a comment

Choose a reason for hiding this comment

mumichae commented Apr 26, 2022 • edited Loading

LuckyMD commented Apr 26, 2022

LuckyMD left a comment

Choose a reason for hiding this comment

mumichae commented Apr 28, 2022

LuckyMD left a comment

Choose a reason for hiding this comment

LuckyMD commented Apr 28, 2022

mbuttner left a comment

Choose a reason for hiding this comment

mbuttner Apr 29, 2022

Choose a reason for hiding this comment

mumichae Apr 29, 2022

Choose a reason for hiding this comment

mumichae commented Apr 29, 2022

mumichae commented Apr 26, 2022 •

edited

Loading

mumichae commented Apr 26, 2022 •

edited

Loading