Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: Retry renaming pkl(z) files on fail #3404

Merged
merged 1 commit into from
Feb 19, 2022

Conversation

effigies
Copy link
Member

@effigies effigies commented Nov 2, 2021

@shashankbansal6 ran into an issue on TACC:

exception calling callback for <Future at 0x2ab460e40490 state=finished raised FileNotFoundError>
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/work/06850/sbansal6/frontera/miniconda3/envs/fitlins38_2/lib/python3.8/site-packages/nipype/pipeline/plugins/multiproc.py", line 67, in run_node
    result["result"] = node.run(updatehash=updatehash)
  File "/work/06850/sbansal6/frontera/miniconda3/envs/fitlins38_2/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 512, in run
    savepkl(op.join(outdir, "_node.pklz"), self)
  File "/work/06850/sbansal6/frontera/miniconda3/envs/fitlins38_2/lib/python3.8/site-packages/nipype/utils/filemanip.py", line 721, in savepkl
    os.rename(tmpfile, filename)
FileNotFoundError: [Errno 2] No such file or directory: '/scratch1/06850/sbansal6/ds001734-workdir/nistats_smooth/fitlins_wf/loader/_node.pklz.tmp' -> '/scratch1/06850/sbansal6/ds001734-workdir/nistats_smooth/fitlins_wf/loader/_node.pklz'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/work/06850/sbansal6/frontera/miniconda3/envs/fitlins38_2/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/work/06850/sbansal6/frontera/miniconda3/envs/fitlins38_2/lib/python3.8/site-packages/nipype/pipeline/plugins/multiproc.py", line 70, in run_node
    result["result"] = node.result
  File "/work/06850/sbansal6/frontera/miniconda3/envs/fitlins38_2/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 216, in result
    return _load_resultfile(
  File "/work/06850/sbansal6/frontera/miniconda3/envs/fitlins38_2/lib/python3.8/site-packages/nipype/pipeline/engine/utils.py", line 291, in load_resultfile
    raise FileNotFoundError(results_file)
FileNotFoundError: /scratch1/06850/sbansal6/ds001734-workdir/nistats_smooth/fitlins_wf/loader/result_loader.pklz
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/work/06850/sbansal6/frontera/miniconda3/envs/fitlins38_2/lib/python3.8/concurrent/futures/_base.py", line 328, in _invoke_callbacks
    callback(self)
  File "/work/06850/sbansal6/frontera/miniconda3/envs/fitlins38_2/lib/python3.8/site-packages/nipype/pipeline/plugins/multiproc.py", line 159, in _async_callback
    result = args.result()
  File "/work/06850/sbansal6/frontera/miniconda3/envs/fitlins38_2/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/work/06850/sbansal6/frontera/miniconda3/envs/fitlins38_2/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
FileNotFoundError: /scratch1/06850/sbansal6/ds001734-workdir/nistats_smooth/fitlins_wf/loader/result_loader.pklz

Looking at the full error, we get this 7 times, which makes me worry that we're somehow getting a race condition where each worker is actually trying to run the same node, which is not mapped. But this patch should at least catch if it's a filesystem issue.

@codecov
Copy link

codecov bot commented Nov 2, 2021

Codecov Report

Merging #3404 (6230371) into master (eab4b2a) will increase coverage by 0.24%.
The diff coverage is 37.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3404      +/-   ##
==========================================
+ Coverage   65.20%   65.45%   +0.24%     
==========================================
  Files         307      307              
  Lines       40457    41191     +734     
  Branches     5350     5637     +287     
==========================================
+ Hits        26379    26960     +581     
- Misses      13003    13126     +123     
- Partials     1075     1105      +30     
Flag Coverage Δ
unittests 64.92% <37.50%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
nipype/utils/filemanip.py 72.45% <37.50%> (-0.73%) ⬇️
nipype/pipeline/engine/nodes.py 79.89% <0.00%> (+0.95%) ⬆️
nipype/interfaces/base/core.py 89.08% <0.00%> (+1.13%) ⬆️
nipype/utils/subprocess.py 88.75% <0.00%> (+2.39%) ⬆️
nipype/interfaces/spm/preprocess.py 57.01% <0.00%> (+6.77%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eab4b2a...6230371. Read the comment docs.

@effigies
Copy link
Member Author

effigies commented Dec 8, 2021

According to @shashankbansal6 the issue has resolved when using this branch.

Anybody care to review?

@effigies effigies merged commit b3b3bf3 into nipy:master Feb 19, 2022
@effigies effigies deleted the fix/rename_fail branch February 19, 2022 02:37
@effigies effigies added this to the 1.7.1 milestone Apr 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant