-
-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 ANTs doesn't respect memory constraints #1404
Comments
I don't know enough to know how much memory is reasonable, but I hear
|
I'm not sure I quite understand [{
"name": "resting_preproc_sub-A00013809_ses-DS2.nuisance_regressor_0_0.aCompCor_cosine_filter",
"time": 1605723308.305644,
"rss_GiB": 36.45994949316406,
"cpus": 103.0,
"vms_GiB": 40.09559631347656,
"interface": "Function",
"params": "_scan_rest_acq-645__selector_WM-2mm-M_CSF-2mm-M_tC-5PCT2-PC5_aC-CSF+WM-2mm-PC5_G-M_M-SDB_P-2_BP-B0.01-T0.1",
"mapnode": 0
}, {
"name": "resting_preproc_sub-A00013809_ses-DS2.nuisance_regressor_0_0.aCompCor_cosine_filter",
"time": 1605723285.300852,
"rss_GiB": 28.016590118164064,
"cpus": 832.8,
"vms_GiB": 40.09559631347656,
"interface": "Function",
"params": "_scan_rest_acq-645__selector_WM-2mm-M_CSF-2mm-M_tC-5PCT2-PC5_aC-CSF+WM-2mm-PC5_G-M_M-SDB_P-2_BP-B0.01-T0.1",
"mapnode": 0
}, {
"name": "resting_preproc_sub-A00013809_ses-DS2.nuisance_regressor_0_0.aCompCor_cosine_filter",
"time": 1605723284.15847,
"rss_GiB": 27.693824767578125,
"cpus": 2742.9,
"vms_GiB": 31.022430419921875,
"interface": "Function",
"params": "_scan_rest_acq-645__selector_WM-2mm-M_CSF-2mm-M_tC-5PCT2-PC5_aC-CSF+WM-2mm-PC5_G-M_M-SDB_P-2_BP-B0.01-T0.1",
"mapnode": 0
}] |
Another clue, interacting with the graphs on brainlife, I see it's actually
|
Those brief spikes are huge!!! Do you also mean that it probably is the run.py, not the ANTs? |
Not sure. I think it's probably how C-PAC is allocating memory for ANTs |
After @ccraddock explained to me that
I started to set There are still some spikes, but they're much less dramatic. I'm iterating now, setting estimates based on logged memory usage after fresh runs after setting more estimates. |
This is looking good. I think that there is an issue with the number of threads being estimated by the callback, or the gantt chart creation script is pulling in the wrong numbers. Some of the nodes are reporting using 210 threads! As for your earlier comment on run.py, I think that since this is the parent process, it 'owns' all of the memory used by the child threads. So the amount of memory attributed to it should be the cumulative amount of memory used by all of the nodes that are currently executing. |
I thought maybe I see the profile uses I'll try updating the callback to get the actual number of threads, run a few C-PAC runs and see how it looks. |
Awesome @shnizzedy !! |
This looks like it's working for memory. I'm going through now and adding and adding to the developer docs as I go |
Describe the bug
This is particularly a problem on clusters that have hard memory limits.
Here's an example
command.txt
that uses too much memory:(same thing, wrapped for ease of reading):
To Reproduce
Steps to reproduce the behavior:
--preconfig fmriprep-options
n_cpus
andmem_gb
mem_gb
exceeded by ANTsExpected behavior
ANTs uses no more than
mem_gb
memory at a time.Versions
Additional context
These issues are almost certainly a result of this issue:
montage_a
&montage_s
#1338Possibly related: #1054, nipy/nipype#2776
— My registration fails with an error: Memory errors. ANTs wiki
There seems to be no specific memory limitations in the
antsRegistration
command apart from--float
:antsJointLabelFusion.sh
relies onsbatch
,rev
andcut
to limit memory:Other leads: maybe use
LegacyMultiProc
,nipreps/fmriprep#836,
nipreps/fmriprep#839,
nipreps/fmriprep#854,
nipy/nipype#2284 (
maxtasksperchild 1
? https://nipype.readthedocs.io/en/latest/api/generated/nipype.pipeline.plugins.legacymultiproc.html), nipy/nipype#2548, nipy/nipype#2773The text was updated successfully, but these errors were encountered: