-
Notifications
You must be signed in to change notification settings - Fork 0
Updated to MPAS-Analysis v0.6.5 #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is not ready to be merged, I still need to figure out a couple of things about setting parallelTasks for mpas-analysis. And I need to test in batch mode. For now, I have tested successfully on the login node on edison (and it was so nice to see how much faster this is with respect to its predecessors!). |
| echo | ||
| echo "Failed ocean/ice diagnostics" | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This did not help. One of my tests failed in the streamfunctionMOC and this did not pick it up. So, that means that, if one of the mpas-analysis tasks fail, it will not picked up by this exit status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@milenaveneziani, could you tell me more about the error that you saw where this was not triggered? Was it in setup_and_check() of a task? If setup_and_check() fails, it doesn't cause MPAS-Analysis to fail because we consider this to potentially be caused by a run that simply isn't configured for a given task, e.g. a melt-rate task in a run without ice-shelf cavities.
According to these lines, we should be exiting with error code 1 if any tasks failed during the run_task() stage:
https://github.com/MPAS-Dev/MPAS-Analysis/blob/v0.6/run_mpas_analysis#L351-L355
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it was the setup_and_check stage, but I may be wrong. Here is the error:
Plotting streamfunction of Meridional Overturning Circulation (MOC)...
*** MOC Analysis Member is on ***
...but not yet supported. Using offline MOC
Reading region and transect mask for Atlantic...
Compute and/or plot post-processed MOC climatological streamfunction...
Read previously computed MOC streamfunction from file...
Compute and/or plot post-processed Atlantic MOC time series...
Load data...
Read in previously computed MOC time series
analysis task streamfunctionMOC failed during run
Traceback (most recent call last):
File "/global/u2/m/milena/a-prime/python/MPAS-Analysis/mpas_analysis/shared/analysis_task.py", line 323, in run
self.run_task()
File "/global/u2/m/milena/a-prime/python/MPAS-Analysis/mpas_analysis/ocean/streamfunction_moc.py", line 171, in run_task
dsMOCTimeSeries = self._compute_moc_time_series_postprocess()
File "/global/u2/m/milena/a-prime/python/MPAS-Analysis/mpas_analysis/ocean/streamfunction_moc.py", line 495, in _compute_moc_time_series_postprocess
dsMOCIn.year[inIndex].values == years,
File "/global/homes/m/milena/proj_acme/milena/miniconda2.7/lib/python2.7/site-packages/xarray/core/common.py", line 168, in __getattr__
(type(self).__name__, name))
AttributeError: 'Dataset' object has no attribute 'year'
it was due to a previous abort of the analysis, so the MOCtimeseries file was created but it had incorrect information in it. The problem went away once I purged that file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@milenaveneziani, were you running in serial mode or in parallel? If in parallel, what does the log file for the parallel job look like? That's where we would potentially see one of the exit statements I linked above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it was in serial mode.
I can try to generate cases with problems and see how this behaves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clearly something to fix on the MPAS-Analysis side and make a quick patch for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for looking into this @xylar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue has (hopefully) been addressed in MPAS-Dev/MPAS-Analysis#283
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your changes here look good. I think what you had before would have worked but this is cleaner and clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I agree with both statements. Since I got confused with identifying the right exit status during testing, I thought of making it cleaner.
|
I have just submitted a batch job with this branch on edison. |
|
@xylar: I have realized I still need to keep the variable |
Yep, that makes sense to do. That's fine. |
I can look into this. Perhaps when a test fails we're not exiting with an error from MPAS-Analysis for some reason? |
|
Batch-mode test also passed on edison. Results here: |
|
@salilmahajan or @kevans32: could you do a couple of tests at OLCF with this? perhaps a serial one on rhea and a batch-mode test on titan? |
|
@milenaveneziani, I tested on Edison on both a login node and in batch mode using a beta2 run. Everything seems to have worked great and here are the results (from the login-node test, which overwrote the results of the batch mode test): One question: I left the default |
xylar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of really good small changes here. Everything looks good to me.
|
I can test for sure, but I am in flight until this aft and Salil is at an all day thing at ORNL today. But I should be able to do this afternoon once I hit the hotel. |
|
Okay @milenaveneziani, I just created MPAS-Analysis release v0.6.5: If you update this branch to point to that release, it should fix the exit status issue you saw (at least for error during the run phase of a given MPAS-Analysis task). |
|
great, thanks. I will update this in the next hour. |
|
done. @vanroekel: you may want to update your branch with this latest commit, which includes an update of the submodule (Luke is working on adding anvil support here). @salilmahajan: I am thinking we can solve the error handling issue on the atm side too, by doing something similar. We could add lines like these: right after a python call in each of the bash atm scripts. What do you think? |
|
@milenaveneziani, I'm re-running my tests right now.
What you suggest seems like exactly what would be needed. Presumably there's not anything weird and fancy going on on the atm side like in MPAS-Analysis that would mean errors get caught and don't end up leading to a non-zero exit status. |
|
@milenaveneziani anvil support should be working now. It has been rebased for the 0.6.5 update and pushed. A question that came up when putting it in. In the script submission section of run_aprime you have one line in the atmosphere section that is and in the ocean section it is Is there a reason for the change? Pinging @salilmahajan as well. |
|
Great, thanks @vanroekel. |
|
Ah, that's a mistake! I should have changed both to 'titan'.. |
|
Tested in login node and batch.
From: Milena Veneziani <[email protected]>
Reply-To: ACME-Climate/a-prime <[email protected]>
Date: Wednesday, December 6, 2017 at 11:42 AM
To: ACME-Climate/a-prime <[email protected]>
Cc: "Van Roekel, Luke" <[email protected]>, Mention <[email protected]>
Subject: Re: [ACME-Climate/a-prime] Updated to MPAS-Analysis v0.6.5 (#24)
Great, thanks @vanroekel<https://github.com/vanroekel>.
Did you have a chance to test this on anvil, either on the login node or in batch?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#24 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ANB5UWjIkcgZYmzx3lmr3YNM7hwuM2xQks5s9uAngaJpZM4Q24wS>.
|
0cae836 to
349c2a1
Compare
|
I have tested successfully on titan, in both batch mode and on the login node. @xylar: note that I have added some small changes to @salilmahajan: can you please check the exit status part for the atm diagnostics bash scripts? Especially the exit messages: not sure if they are helpful enough. I may want to squash a couple of commits before merging. |
|
@milenaveneziani, looking at the code, I see that the MOC links would still exist even if I didn't include the MOC because there's a template for the HTML for that whole section, not for individual tasks. That's fine as long as a-prime users understand what's going on (i.e. it's the expected behavior). Obviously, this will change if you link to the MPAS-Analysis-generated website instead. Speaking of which, I think that can wait for another PR. I think you would just need to turn on html generation (currently turned off in the config file) and give a subfolder in the desired HTML output folder as the destination. Then, you could replace the whole ocn/ice part of the HTML with a link to the website. Should be pretty easy (fingers crossed). |
|
Hi @xylar: no, if you look closely there is an extra But I agree that those would be the steps to make the MPAS-Analysis page, except perhaps that the html generation is probably already on by default. So it might be even easier in the end. |
|
@milenaveneziani, ah, I see. I was only looking at the parts you changed but now I see that there's already code to skip MOC. Sorry for the confusion. Looks great! |
It looks like you set html generation to be off by default, which makes sense if we're not using it. Indeed, I looked for it and didn't see any directory with the MPAS-Analysis html output, so I think it's working as expected right now. |
you are right, now I remember what I did. |
|
@salilmahajan: any chance you can just take a look at the changes I made for the atm scripts and approve or make some changes there? |
|
@salilmahajan can you move a case over to OLCF for testing of this? @kevans32, I did not copy any ocean output on my space on Lustre when I was testing for the enso diags earlier. I just initiated a transfer of the beta3rc10 archive directory using globus. It will be available here when its done: /lustre/atlas1/cli115/proj-shared/salil/archive/20171122.beta3rc10_1850.ne30_oECv3_ICG.edison |
|
@milenaveneziani: Thanks for taking this on! Yes, the changes to the onscreen output and error handling look good to me. |
|
Thanks @salilmahajan for checking! @kevans32: you can still use beta2 data for testing this on rhea (worked well for me on titan). Can you please just use the setting I have on my run_aprime (info in my last comment)? thanks. |
|
Also, globus has started copying the latest beta3rc10 run over to OLCF. The files should be here soon: |
|
On rhea, some plots were missing. Now running again with atm turned on to get a full accounting of what works and what does not. |
also make sure that |
|
Everything from this branch (including MOC ran successfully on rhea that I could see- and I looked at a variety of plots yay! |
|
yay! Thanks @kevans32! I will merge now. |
ee760b6 to
9cd8b2a
Compare
Integrate MPAS-Analysis v0.6.5, which 1) changes parallel tasks to use one single node, 2) computes all ocean/ice climatologies and process data for all timeseries as two prerequisites tasks before computing any diagnostics, 3) loads data for timeseries using ncrcat instead of xarray, and 4) adds python 3 support to MPAS-Analysis tasks.
Also, addresses some misc issues such as #16, #13 and hopefully most of #11.
Finally, adds anvil support, addressing #2.