-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions regarding the batch size on memory issue #804
Comments
Hi @tracy666, Are you using the If you want to incorporate the spatial information of the mouse embryos, as we did in the paper and the tutorial, you would use the If you want to reduce the memory, you would have to set the |
Dear @MUCDK , I sincerely appreciate your continuous support and guidance. Sorry for the lack of clarity in my previous post. At this stage, I do not need to incorporate spatial information, so I am using Upon reviewing your previous advice, I realized that I may have misunderstood your suggestion. Initially, I assumed that adjusting either To address this, I have now conducted a new experiment where I systematically lower the rank value from 100 to 5 (I selected these values based on my basic understanding of the paper "Low-Rank Sinkhorn Factorization", but if this range is inappropriate, please let me know). For each rank value, I also gradually reduce the batch_size by halving the dataset until it reaches a threshold of 50 (if a lower batch size is advisable, I would greatly appreciate your input). Despite these adjustments, I am still unable to find a suitable configuration to successfully run the file. Could you kindly review my approach and let me know if there is anything I might be overlooking or any further modifications I should try? For your reference, I have attached my full code and the logger file detailing the execution process. Thank you once again for your patience and invaluable assistance! My code (without .txt is exactly the py file I use): timepoint_mapping_python_version_X_pca_with_logger_with_rank.py.txt The logger file it created: |
Dear contributors and Dear @MUCDK ,
Following your suggestions last time, I've tried to set
batch_size
to lower the memory comsumption. I am experimenting with the MOSTA dataset, which is built-in by moscot.However, I found the program still report a memory issue even if I am testing with a very small
batch_size
value (which is 15). And I notice that, no matter what value I set, the error message is the same:"E0226 11:52:40.852305 2249403 pjrt_stream_executor_client.cc:3045] Execution of replica 0 failed: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 2335208976 bytes."
The byte occupation isn't lowering following smaller
batch_size
value. And theorectically, I can run dataset pancreas with more than 20,000 cells, so I assume a batch size higher than 10,000 should be okay for MOSTA.Therefore I am wondering if I am using this parameter wrongly. Any suggestion is highly appreciated!
The relevant code for reproduction:
Part of the result from Terminal:
Thank you very much for your attention to this matter!
The text was updated successfully, but these errors were encountered: