-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: load_resultfile crashes if open resultsfile from crashed job #3182
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This seems to be #3085.
It would be good to have a regression test so that we can be sure this issue doesn't recur. Do you have any thoughts here? You can make an interface that reliably crashes like so:
def crasher(x):
raise ValueError(x)
crash_if = Function(function=crasher)
Judging by the snippet that you shared, I guess this was happening in an SGE context. I don't know if we have a test plugin for exercising that code.
Codecov Report
@@ Coverage Diff @@
## master #3182 +/- ##
==========================================
+ Coverage 64.88% 64.95% +0.07%
==========================================
Files 299 299
Lines 39506 39506
Branches 5219 5219
==========================================
+ Hits 25632 25663 +31
+ Misses 12824 12784 -40
- Partials 1050 1059 +9
Continue to review full report at Codecov.
|
Co-Authored-By: Chris Markiewicz <[email protected]>
Hi @daniel-ge, any chance of a regression test? Let us know if you need help. |
That's a first attempt for a regression test. I am pretty sure there is much room for improvements. I tried to make it independent of existing/available installations of SLUM, SGE etc by running the batch script directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test! I tested with and without your fix to verify that it should catch a regression.
I made some suggestions to reduce boilerplate...
Co-Authored-By: Chris Markiewicz <[email protected]>
Co-Authored-By: Chris Markiewicz <[email protected]>
Co-Authored-By: Chris Markiewicz <[email protected]>
Co-Authored-By: Chris Markiewicz <[email protected]>
Co-Authored-By: Chris Markiewicz <[email protected]>
Co-Authored-By: Chris Markiewicz <[email protected]>
Co-Authored-By: Chris Markiewicz <[email protected]>
Thanks for your improvements. I have used |
@daniel-ge I removed some unused imports. Apologies for pushing to your |
I suspect this PR might have messed up with nodes parameterized by iterables. Will open an issue if confirmed. EDIT: For now, we should put #3174 on hold. |
Okay, I just replicated with 1.4.2 so the problem is elsewhere. Sorry for the noise. |
Problem
If a job crashes on computing nodes a result-file is written containing a dictionary (with traceback, etc). Then the 'controller' job reads the dictionary in order to produce the pickeled crash file. During reading of the results-file an exception arises in
load_resultfile
withAttributeError: 'dict' object has no attribute 'outputs'
.Details
In #2985
loadpkl
was replaced byload_resultfile
. Butresult_data
could be a dictionary as assumed in the following source snippet:nipype/nipype/pipeline/plugins/base.py
Lines 531 to 536 in b75a7ce
But
result_data
will never be a dictionary, because inload_resultfile
it has to have an attributeoutputs
:nipype/nipype/pipeline/engine/utils.py
Lines 293 to 294 in b75a7ce
Solution
Just check if
result
has an attributeoutputs
before accessing it.