Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: load_resultfile crashes if open resultsfile from crashed job #3182

Merged
merged 13 commits into from
Mar 9, 2020

Conversation

daniel-ge
Copy link
Contributor

Problem

If a job crashes on computing nodes a result-file is written containing a dictionary (with traceback, etc). Then the 'controller' job reads the dictionary in order to produce the pickeled crash file. During reading of the results-file an exception arises in load_resultfile with AttributeError: 'dict' object has no attribute 'outputs'.

Details

In #2985 loadpkl was replaced by load_resultfile. But result_data could be a dictionary as assumed in the following source snippet:

else:
results_file = glob(os.path.join(node_dir, "result_*.pklz"))[0]
result_data = load_resultfile(results_file)
result_out = dict(result=None, traceback=None)
if isinstance(result_data, dict):
result_out["result"] = result_data["result"]

But result_data will never be a dictionary, because in load_resultfile it has to have an attribute outputs:
result = loadpkl(results_file)
if resolve and result.outputs:

Solution

Just check if result has an attribute outputs before accessing it.

Copy link
Member

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This seems to be #3085.

It would be good to have a regression test so that we can be sure this issue doesn't recur. Do you have any thoughts here? You can make an interface that reliably crashes like so:

def crasher(x):
    raise ValueError(x)

crash_if = Function(function=crasher)

Judging by the snippet that you shared, I guess this was happening in an SGE context. I don't know if we have a test plugin for exercising that code.

nipype/pipeline/engine/utils.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Mar 4, 2020

Codecov Report

Merging #3182 into master will increase coverage by 0.07%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3182      +/-   ##
==========================================
+ Coverage   64.88%   64.95%   +0.07%     
==========================================
  Files         299      299              
  Lines       39506    39506              
  Branches     5219     5219              
==========================================
+ Hits        25632    25663      +31     
+ Misses      12824    12784      -40     
- Partials     1050     1059       +9     
Flag Coverage Δ
#unittests 64.95% <100.00%> (+0.07%) ⬆️
Impacted Files Coverage Δ
nipype/pipeline/engine/utils.py 71.60% <100.00%> (+0.45%) ⬆️
nipype/pipeline/plugins/tools.py 80.26% <0.00%> (+1.31%) ⬆️
nipype/utils/config.py 62.22% <0.00%> (+3.88%) ⬆️
nipype/pipeline/plugins/base.py 64.10% <0.00%> (+5.20%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b75a7ce...4a6d6b8. Read the comment docs.

@effigies
Copy link
Member

effigies commented Mar 6, 2020

Hi @daniel-ge, any chance of a regression test? Let us know if you need help.

@daniel-ge
Copy link
Contributor Author

That's a first attempt for a regression test. I am pretty sure there is much room for improvements. I tried to make it independent of existing/available installations of SLUM, SGE etc by running the batch script directly.

Copy link
Member

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test! I tested with and without your fix to verify that it should catch a regression.

I made some suggestions to reduce boilerplate...

nipype/pipeline/plugins/tests/test_sgelike.py Outdated Show resolved Hide resolved
nipype/pipeline/plugins/tests/test_sgelike.py Outdated Show resolved Hide resolved
nipype/pipeline/plugins/tests/test_sgelike.py Outdated Show resolved Hide resolved
nipype/pipeline/plugins/tests/test_sgelike.py Outdated Show resolved Hide resolved
nipype/pipeline/plugins/tests/test_sgelike.py Outdated Show resolved Hide resolved
nipype/pipeline/plugins/tests/test_sgelike.py Outdated Show resolved Hide resolved
nipype/pipeline/plugins/tests/test_sgelike.py Outdated Show resolved Hide resolved
@effigies effigies added this to the 1.5.0 milestone Mar 8, 2020
@daniel-ge
Copy link
Contributor Author

Thanks for your improvements. I have used black for better code style.

@effigies
Copy link
Member

effigies commented Mar 9, 2020

@daniel-ge I removed some unused imports. Apologies for pushing to your master branch.

@effigies effigies merged commit 347311c into nipy:master Mar 9, 2020
@oesteban
Copy link
Contributor

oesteban commented Mar 15, 2020

I suspect this PR might have messed up with nodes parameterized by iterables. Will open an issue if confirmed.

EDIT: For now, we should put #3174 on hold.

@oesteban
Copy link
Contributor

Okay, I just replicated with 1.4.2 so the problem is elsewhere. Sorry for the noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants