Updated to MPAS-Analysis v0.6.5 #24

milenaveneziani · 2017-12-05T20:07:30Z

Integrate MPAS-Analysis v0.6.5, which 1) changes parallel tasks to use one single node, 2) computes all ocean/ice climatologies and process data for all timeseries as two prerequisites tasks before computing any diagnostics, 3) loads data for timeseries using ncrcat instead of xarray, and 4) adds python 3 support to MPAS-Analysis tasks.

Also, addresses some misc issues such as #16, #13 and hopefully most of #11.

Finally, adds anvil support, addressing #2.

milenaveneziani · 2017-12-05T20:12:44Z

This is not ready to be merged, I still need to figure out a couple of things about setting parallelTasks for mpas-analysis. And I need to test in batch mode.

For now, I have tested successfully on the login node on edison (and it was so nice to see how much faster this is with respect to its predecessors!).

milenaveneziani · 2017-12-05T20:16:57Z

bash_scripts/aprime_ocnice_diags.bash

+  echo
+  echo "Failed ocean/ice diagnostics"
+  exit 1
+fi


This did not help. One of my tests failed in the streamfunctionMOC and this did not pick it up. So, that means that, if one of the mpas-analysis tasks fail, it will not picked up by this exit status.

@milenaveneziani, could you tell me more about the error that you saw where this was not triggered? Was it in setup_and_check() of a task? If setup_and_check() fails, it doesn't cause MPAS-Analysis to fail because we consider this to potentially be caused by a run that simply isn't configured for a given task, e.g. a melt-rate task in a run without ice-shelf cavities.

According to these lines, we should be exiting with error code 1 if any tasks failed during the run_task() stage:
https://github.com/MPAS-Dev/MPAS-Analysis/blob/v0.6/run_mpas_analysis#L351-L355

I don't think it was the setup_and_check stage, but I may be wrong. Here is the error:

Plotting streamfunction of Meridional Overturning Circulation (MOC)... *** MOC Analysis Member is on *** ...but not yet supported. Using offline MOC Reading region and transect mask for Atlantic... Compute and/or plot post-processed MOC climatological streamfunction... Read previously computed MOC streamfunction from file... Compute and/or plot post-processed Atlantic MOC time series... Load data... Read in previously computed MOC time series analysis task streamfunctionMOC failed during run Traceback (most recent call last): File "/global/u2/m/milena/a-prime/python/MPAS-Analysis/mpas_analysis/shared/analysis_task.py", line 323, in run self.run_task() File "/global/u2/m/milena/a-prime/python/MPAS-Analysis/mpas_analysis/ocean/streamfunction_moc.py", line 171, in run_task dsMOCTimeSeries = self._compute_moc_time_series_postprocess() File "/global/u2/m/milena/a-prime/python/MPAS-Analysis/mpas_analysis/ocean/streamfunction_moc.py", line 495, in _compute_moc_time_series_postprocess dsMOCIn.year[inIndex].values == years, File "/global/homes/m/milena/proj_acme/milena/miniconda2.7/lib/python2.7/site-packages/xarray/core/common.py", line 168, in __getattr__ (type(self).__name__, name)) AttributeError: 'Dataset' object has no attribute 'year'

it was due to a previous abort of the analysis, so the MOCtimeseries file was created but it had incorrect information in it. The problem went away once I purged that file.

@milenaveneziani, were you running in serial mode or in parallel? If in parallel, what does the log file for the parallel job look like? That's where we would potentially see one of the exit statements I linked above.

no, it was in serial mode.

I can try to generate cases with problems and see how this behaves.

Clearly something to fix on the MPAS-Analysis side and make a quick patch for.

thanks for looking into this @xylar.

This issue has (hopefully) been addressed in MPAS-Dev/MPAS-Analysis#283

Your changes here look good. I think what you had before would have worked but this is cleaner and clearer.

yes, I agree with both statements. Since I got confused with identifying the right exit status during testing, I thought of making it cleaner.

milenaveneziani · 2017-12-05T21:10:01Z

I have just submitted a batch job with this branch on edison.

milenaveneziani · 2017-12-05T21:12:46Z

@xylar: I have realized I still need to keep the variable mpas_analysis_tasks in run_aprime, so that we can change the default parallelTask config option when we are on a compute node.

xylar · 2017-12-05T21:14:20Z

I still need to keep the variable mpas_analysis_tasks in run_aprime, so that we can change the default parallelTask config option when we are on a compute node.

Yep, that makes sense to do. That's fine.

xylar · 2017-12-05T21:15:25Z

One of my tests failed in the streamfunctionMOC and this did not pick it up.

I can look into this. Perhaps when a test fails we're not exiting with an error from MPAS-Analysis for some reason?

milenaveneziani · 2017-12-05T21:59:51Z

Batch-mode test also passed on edison. Results here:
http://portal.nersc.gov/project/acme/milena/20171122.beta3rc10_1850.ne30_oECv3_ICG.edison_years1-70_vs_obs/

milenaveneziani · 2017-12-05T22:06:36Z

@salilmahajan or @kevans32: could you do a couple of tests at OLCF with this? perhaps a serial one on rhea and a batch-mode test on titan?

xylar · 2017-12-06T12:21:45Z

@milenaveneziani, I tested on Edison on both a login node and in batch mode using a beta2 run. Everything seems to have worked great and here are the results (from the login-node test, which overwrote the results of the batch mode test):
http://portal.nersc.gov/project/acme/xylar/20170915.beta2.A_WCYCL1850S.ne30_oECv3_ICG.edison_years31-40_vs_obs/

One question: I left the default export generate_moc=0. Is it a problem that the generated webpage still has links for the MOC that are just dead? Or do you want to exclude the MOC from the webpage as well if it's not being generated?

xylar

There are a lot of really good small changes here. Everything looks good to me.

kevans32 · 2017-12-06T12:54:43Z

I can test for sure, but I am in flight until this aft and Salil is at an all day thing at ORNL today. But I should be able to do this afternoon once I hit the hotel.

xylar · 2017-12-06T15:14:08Z

Okay @milenaveneziani, I just created MPAS-Analysis release v0.6.5:
https://github.com/MPAS-Dev/MPAS-Analysis/releases/tag/v0.6.5

If you update this branch to point to that release, it should fix the exit status issue you saw (at least for error during the run phase of a given MPAS-Analysis task).

milenaveneziani · 2017-12-06T15:51:50Z

great, thanks. I will update this in the next hour.

milenaveneziani · 2017-12-06T16:40:32Z

done.

@vanroekel: you may want to update your branch with this latest commit, which includes an update of the submodule (Luke is working on adding anvil support here).

@salilmahajan: I am thinking we can solve the error handling issue on the atm side too, by doing something similar. We could add lines like these:

exstatus=$?
if $exstatus -ne 0; then
  echo
  echo "Failed $doing-atm-task"
  exit 1
fi

right after a python call in each of the bash atm scripts. What do you think?
I can do that within this PR also.

xylar · 2017-12-06T16:47:09Z

@milenaveneziani, I'm re-running my tests right now.

I am thinking we can solve the error handling issue on the atm side too, by doing something similar.

What you suggest seems like exactly what would be needed. Presumably there's not anything weird and fancy going on on the atm side like in MPAS-Analysis that would mean errors get caught and don't end up leading to a non-zero exit status.

vanroekel · 2017-12-06T18:38:06Z

@milenaveneziani anvil support should be working now. It has been rebased for the 0.6.5 update and pushed. A question that came up when putting it in. In the script submission section of run_aprime you have one line in the atmosphere section that is

elif [ $machname == "titan" ];

and in the ocean section it is

elif [ $machname == "olcf" ];

Is there a reason for the change? Pinging @salilmahajan as well.

milenaveneziani · 2017-12-06T18:42:42Z

Great, thanks @vanroekel.
Did you have a chance to test this on anvil, either on the login node or in batch?

milenaveneziani · 2017-12-06T18:44:01Z

Ah, that's a mistake! I should have changed both to 'titan'..
I will change it.

vanroekel · 2017-12-06T18:46:30Z

Tested in login node and batch. From: Milena Veneziani <[email protected]> Reply-To: ACME-Climate/a-prime <[email protected]> Date: Wednesday, December 6, 2017 at 11:42 AM To: ACME-Climate/a-prime <[email protected]> Cc: "Van Roekel, Luke" <[email protected]>, Mention <[email protected]> Subject: Re: [ACME-Climate/a-prime] Updated to MPAS-Analysis v0.6.5 (#24) Great, thanks @vanroekel<https://github.com/vanroekel>. Did you have a chance to test this on anvil, either on the login node or in batch? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#24 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ANB5UWjIkcgZYmzx3lmr3YNM7hwuM2xQks5s9uAngaJpZM4Q24wS>.

milenaveneziani · 2017-12-07T06:01:56Z

I have tested successfully on titan, in both batch mode and on the login node.

@xylar: note that I have added some small changes to generate_html so that, when some ocean/ice diagnostics is not generated, the links do not appear. I am thinking to add a link to the MPAS-Analysis webpage, but that is the only thing missing at this point. We may also leave it for the next PR.

@salilmahajan: can you please check the exit status part for the atm diagnostics bash scripts? Especially the exit messages: not sure if they are helpful enough.

I may want to squash a couple of commits before merging.

xylar · 2017-12-07T10:34:01Z

@milenaveneziani, looking at the code, I see that the MOC links would still exist even if I didn't include the MOC because there's a template for the HTML for that whole section, not for individual tasks. That's fine as long as a-prime users understand what's going on (i.e. it's the expected behavior). Obviously, this will change if you link to the MPAS-Analysis-generated website instead.

Speaking of which, I think that can wait for another PR. I think you would just need to turn on html generation (currently turned off in the config file) and give a subfolder in the desired HTML output folder as the destination. Then, you could replace the whole ocn/ice part of the HTML with a link to the website. Should be pretty easy (fingers crossed).

milenaveneziani · 2017-12-07T14:16:17Z

Hi @xylar: no, if you look closely there is an extra if statement that allows for the skipping links, MOC included. See an example here:
http://projects.olcf.ornl.gov/acme/milena/20170915.beta2.A_WCYCL1850S.ne30_oECv3_ICG.edison_years1-10_vs_obs/

But I agree that those would be the steps to make the MPAS-Analysis page, except perhaps that the html generation is probably already on by default. So it might be even easier in the end.

xylar · 2017-12-07T14:40:11Z

@milenaveneziani, ah, I see. I was only looking at the parts you changed but now I see that there's already code to skip MOC. Sorry for the confusion. Looks great!

xylar · 2017-12-07T14:43:15Z

except perhaps that the html generation is probably already on by default. So it might be even easier in the end.

https://github.com/ACME-Climate/a-prime/blob/master/python/setup_ocnice_config.py#L42-L43

It looks like you set html generation to be off by default, which makes sense if we're not using it. Indeed, I looked for it and didn't see any directory with the MPAS-Analysis html output, so I think it's working as expected right now.

milenaveneziani · 2017-12-07T14:44:58Z

It looks like you set html generation to be off by default

you are right, now I remember what I did.

milenaveneziani · 2017-12-08T15:24:02Z

Hi @kevans32: did you run your test with the branch in this PR? (wasn't sure from your comment on #15)

In terms of configuration, please use the run information in $proj_cli115/milena/a-prime/run_aprime_mv_titan.bash. Thanks.

milenaveneziani · 2017-12-08T15:25:22Z

@salilmahajan: any chance you can just take a look at the changes I made for the atm scripts and approve or make some changes there?
In terms of testing we should be good, especially after @kevans32 completes hers.
Thanks.

kevans32 · 2017-12-08T15:39:39Z

@salilmahajan can you move a case over to OLCF for testing of this?

@kevans32, I did not copy any ocean output on my space on Lustre when I was testing for the enso diags earlier. I just initiated a transfer of the beta3rc10 archive directory using globus. It will be available here when its done: /lustre/atlas1/cli115/proj-shared/salil/archive/20171122.beta3rc10_1850.ne30_oECv3_ICG.edison

salilmahajan · 2017-12-08T15:40:27Z

@milenaveneziani: Thanks for taking this on! Yes, the changes to the onscreen output and error handling look good to me.

milenaveneziani · 2017-12-08T15:41:39Z

Thanks @salilmahajan for checking!

@kevans32: you can still use beta2 data for testing this on rhea (worked well for me on titan). Can you please just use the setting I have on my run_aprime (info in my last comment)? thanks.

salilmahajan · 2017-12-08T15:46:39Z

Also, globus has started copying the latest beta3rc10 run over to OLCF. The files should be here soon:
/lustre/atlas1/cli115/proj-shared/salil/archive/20171122.beta3rc10_1850.ne30_oECv3_ICG.edison/archive

kevans32 · 2017-12-08T16:29:23Z

On rhea, some plots were missing. Now running again with atm turned on to get a full accounting of what works and what does not.

milenaveneziani · 2017-12-08T16:33:44Z

On rhea, some plots were missing

also make sure that generate_moc is turned on (we turn that off by default).

kevans32 · 2017-12-08T18:59:08Z

Everything from this branch (including MOC ran successfully on rhea that I could see- and I looked at a variety of plots yay!

milenaveneziani · 2017-12-08T19:00:27Z

yay! Thanks @kevans32!

I will merge now.

milenaveneziani added Do not merge (yet) enhancement labels Dec 5, 2017

milenaveneziani requested review from salilmahajan and xylar December 5, 2017 20:10

milenaveneziani self-assigned this Dec 5, 2017

milenaveneziani commented Dec 5, 2017

View reviewed changes

xylar mentioned this pull request Dec 5, 2017

Fix exit status when errors occur in tasks in serial mode MPAS-Dev/MPAS-Analysis#283

Merged

xylar approved these changes Dec 6, 2017

View reviewed changes

xylar mentioned this pull request Dec 6, 2017

"Submitting jobs" syntax is confusing #16

Closed

xylar changed the title ~~Updated to MPAS-Analysis v0.6~~ Updated to MPAS-Analysis v0.6.5 Dec 6, 2017

milenaveneziani force-pushed the add_mpasAnalysis_v0.6 branch 2 times, most recently from 0cae836 to 349c2a1 Compare December 6, 2017 23:10

milenaveneziani removed the Do not merge (yet) label Dec 7, 2017

xylar mentioned this pull request Dec 8, 2017

support for python 3 #15

Closed

milenaveneziani and others added 3 commits December 8, 2017 11:02

Updated to MPAS-Analysis v0.6.5

772e898

Adds anvil support

8217cde

Updated atm bash scripts to catch exit status of python commands

9cd8b2a

milenaveneziani force-pushed the add_mpasAnalysis_v0.6 branch from ee760b6 to 9cd8b2a Compare December 8, 2017 19:03

milenaveneziani merged commit 9cd8b2a into master Dec 8, 2017

This was referenced Dec 8, 2017

MPAS-Analysis v0.6 #14

Closed

Additional data needed? #13

Closed

When MPAS throws an exception or atm tasks fail, status is set to success #11

Closed

Create and test script to run on Anvil #2

Closed

milenaveneziani deleted the add_mpasAnalysis_v0.6 branch December 8, 2017 19:34

Updated to MPAS-Analysis v0.6.5 #24

Updated to MPAS-Analysis v0.6.5 #24

Uh oh!

Conversation

milenaveneziani commented Dec 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

milenaveneziani commented Dec 5, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

milenaveneziani commented Dec 5, 2017

Uh oh!

milenaveneziani commented Dec 5, 2017

Uh oh!

xylar commented Dec 5, 2017

Uh oh!

xylar commented Dec 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

milenaveneziani commented Dec 5, 2017

Uh oh!

milenaveneziani commented Dec 5, 2017

Uh oh!

xylar commented Dec 6, 2017

Uh oh!

xylar left a comment

Choose a reason for hiding this comment

Uh oh!

kevans32 commented Dec 6, 2017

Uh oh!

xylar commented Dec 6, 2017

Uh oh!

milenaveneziani commented Dec 6, 2017

Uh oh!

milenaveneziani commented Dec 6, 2017

Uh oh!

xylar commented Dec 6, 2017

Uh oh!

vanroekel commented Dec 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

milenaveneziani commented Dec 6, 2017

Uh oh!

milenaveneziani commented Dec 6, 2017

Uh oh!

vanroekel commented Dec 6, 2017 via email

Uh oh!

milenaveneziani commented Dec 7, 2017

Uh oh!

xylar commented Dec 7, 2017

Uh oh!

milenaveneziani commented Dec 7, 2017

Uh oh!

xylar commented Dec 7, 2017

Uh oh!

xylar commented Dec 7, 2017

Uh oh!

milenaveneziani commented Dec 7, 2017

Uh oh!

milenaveneziani commented Dec 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

milenaveneziani commented Dec 5, 2017 •

edited

Loading

xylar commented Dec 5, 2017 •

edited

Loading

vanroekel commented Dec 6, 2017 •

edited

Loading

milenaveneziani commented Dec 8, 2017 •

edited

Loading