Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve LISI multiprocessing specification #301

Merged
merged 17 commits into from
Apr 29, 2022
Merged

Improve LISI multiprocessing specification #301

merged 17 commits into from
Apr 29, 2022

Conversation

mumichae
Copy link
Collaborator

@mumichae mumichae commented Apr 26, 2022

LISI supports multiprocessing, however specifying the number of cores and whether to use multiprocessing is not well defined.
Here I address

  1. remove multiprocessing key from lisi functions
  2. rename nodes to n_cores
  3. use n_cores=1 for running without parallelisation
  4. use n_cores=1 as default for metrics wrapper functions

Bug fix included:

  1. changed expected input of n_chunks for knn_graph.o to actually reflect the number of chunks (#chunks) created, NOT the index of the largest chunk (chunks-1)
  2. compile knn_graph.cpp file on install with setup.py

@mumichae mumichae self-assigned this Apr 26, 2022
@mumichae mumichae requested a review from LuckyMD April 26, 2022 15:56
Copy link
Collaborator

@LuckyMD LuckyMD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great, I like the simplification of the API. The chunk calculation is now no longer done... any reason why?

@mumichae
Copy link
Collaborator Author

mumichae commented Apr 26, 2022

From what I understood from the code, the number of splits is basically the same as the number of cpu cores - 1, so I simplified it as such.
To be exact, n_chunks == n_processes and n_splits == n_chunks - 1

@LuckyMD
Copy link
Collaborator

LuckyMD commented Apr 26, 2022

To be exact, n_chunks == n_processes and n_splits == n_chunks - 1

True, just got this as well. And you don't want to use the whole node if not specified otherwise?

Copy link
Collaborator

@LuckyMD LuckyMD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Only one thing: check if you still need to import multiprocessing if you removed the call to find how many cpus there are on the machine.

@mumichae mumichae added the bug Something isn't working label Apr 28, 2022
@mumichae
Copy link
Collaborator Author

I realised that we actually had a bug in the C code. When lisi_graph_py is called with 2 cores, the script will set the max chunk index to 0, although it should be 1. I fixed this and had to recompile the code.

Unfortunately, the last step failed on github actions likely because the g++ version I was using was not compatible with the one on github actions. For that, I also included the lisi code compilation into the setup.py. Might be a bit hacky, but that's the only thing that I managed to get to work.

@LuckyMD Would be great if I could have another review on this decision.

@mumichae mumichae requested a review from LuckyMD April 28, 2022 13:56
Copy link
Collaborator

@LuckyMD LuckyMD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this difficult to review as I didn't write the .cpp code. Maybe @mbuttner could take a look at this to okay it?

Additional tests look good

scib/knn_graph/.gitignore Show resolved Hide resolved
scib/knn_graph/knn_graph.cpp Show resolved Hide resolved
@LuckyMD LuckyMD requested a review from mbuttner April 28, 2022 14:25
@LuckyMD
Copy link
Collaborator

LuckyMD commented Apr 28, 2022

I actually like the compilation in setup.py. No testing whether g++ is installed though, so it might create some strange errors.

setup.py Outdated Show resolved Hide resolved
Copy link
Contributor

@mbuttner mbuttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the LISI code looks much nicer now, thanks for adding the tests and cleaning up the multiprocessing part. Please add an error report when compilation does not work. Well done!

scib/knn_graph/knn_graph.cpp Show resolved Hide resolved
@@ -201,6 +191,9 @@ def lisi_graph_py(
By default, perplexity is chosen as 1/3 * number of nearest neighbours in the knn-graph.
"""

# use no more than the available cores
n_cores = max(1, min(n_cores, mp.cpu_count()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous implementation of the multiprocessing, I have taken all but one CPU for the process to be a little less greedy. But when I checked my previous implementation, it is apparently not doing what the comment said but taking all CPUs. So please ignore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I was adapting it to have an upper bound and initially used n_cpu-1, before realising that the code will only run in single core on github action, since the machine only has 2 cores. So the lower bound is redundant, will remove.

setup.py Outdated Show resolved Hide resolved
@mumichae
Copy link
Collaborator Author

Thanks @mbuttner for the review!

I updated the compilation call so that any error message will be printed, but the installation will continue. The output is only visible with pip install scib -v, but that's a pip thing.

Once everything runs through I'll merge and update scib to a new version.

@mumichae mumichae merged commit ee04da8 into main Apr 29, 2022
@mumichae mumichae deleted the pipeline-ci-fix branch April 29, 2022 09:51
@mumichae mumichae restored the pipeline-ci-fix branch May 2, 2022 16:07
@mumichae mumichae deleted the pipeline-ci-fix branch May 13, 2022 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants