Skip to content

Conversation

@milenaveneziani
Copy link
Contributor

@milenaveneziani milenaveneziani commented Dec 5, 2017

Integrate MPAS-Analysis v0.6.5, which 1) changes parallel tasks to use one single node, 2) computes all ocean/ice climatologies and process data for all timeseries as two prerequisites tasks before computing any diagnostics, 3) loads data for timeseries using ncrcat instead of xarray, and 4) adds python 3 support to MPAS-Analysis tasks.

Also, addresses some misc issues such as #16, #13 and hopefully most of #11.

Finally, adds anvil support, addressing #2.

@milenaveneziani
Copy link
Contributor Author

This is not ready to be merged, I still need to figure out a couple of things about setting parallelTasks for mpas-analysis. And I need to test in batch mode.

For now, I have tested successfully on the login node on edison (and it was so nice to see how much faster this is with respect to its predecessors!).

echo
echo "Failed ocean/ice diagnostics"
exit 1
fi
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This did not help. One of my tests failed in the streamfunctionMOC and this did not pick it up. So, that means that, if one of the mpas-analysis tasks fail, it will not picked up by this exit status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milenaveneziani, could you tell me more about the error that you saw where this was not triggered? Was it in setup_and_check() of a task? If setup_and_check() fails, it doesn't cause MPAS-Analysis to fail because we consider this to potentially be caused by a run that simply isn't configured for a given task, e.g. a melt-rate task in a run without ice-shelf cavities.

According to these lines, we should be exiting with error code 1 if any tasks failed during the run_task() stage:
https://github.com/MPAS-Dev/MPAS-Analysis/blob/v0.6/run_mpas_analysis#L351-L355

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it was the setup_and_check stage, but I may be wrong. Here is the error:

Plotting streamfunction of Meridional Overturning Circulation (MOC)...
*** MOC Analysis Member is on ***
...but not yet supported. Using offline MOC

  Reading region and transect mask for Atlantic...

  Compute and/or plot post-processed MOC climatological streamfunction...
   Read previously computed MOC streamfunction from file...

  Compute and/or plot post-processed Atlantic MOC time series...
   Load data...
   Read in previously computed MOC time series
analysis task streamfunctionMOC failed during run 
Traceback (most recent call last):
  File "/global/u2/m/milena/a-prime/python/MPAS-Analysis/mpas_analysis/shared/analysis_task.py", line 323, in run
    self.run_task()
  File "/global/u2/m/milena/a-prime/python/MPAS-Analysis/mpas_analysis/ocean/streamfunction_moc.py", line 171, in run_task
    dsMOCTimeSeries = self._compute_moc_time_series_postprocess()
  File "/global/u2/m/milena/a-prime/python/MPAS-Analysis/mpas_analysis/ocean/streamfunction_moc.py", line 495, in _compute_moc_time_series_postprocess
    dsMOCIn.year[inIndex].values == years,
  File "/global/homes/m/milena/proj_acme/milena/miniconda2.7/lib/python2.7/site-packages/xarray/core/common.py", line 168, in __getattr__
    (type(self).__name__, name))
AttributeError: 'Dataset' object has no attribute 'year'

it was due to a previous abort of the analysis, so the MOCtimeseries file was created but it had incorrect information in it. The problem went away once I purged that file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milenaveneziani, were you running in serial mode or in parallel? If in parallel, what does the log file for the parallel job look like? That's where we would potentially see one of the exit statements I linked above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it was in serial mode.

I can try to generate cases with problems and see how this behaves.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearly something to fix on the MPAS-Analysis side and make a quick patch for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for looking into this @xylar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue has (hopefully) been addressed in MPAS-Dev/MPAS-Analysis#283

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your changes here look good. I think what you had before would have worked but this is cleaner and clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I agree with both statements. Since I got confused with identifying the right exit status during testing, I thought of making it cleaner.

@milenaveneziani
Copy link
Contributor Author

I have just submitted a batch job with this branch on edison.

@milenaveneziani
Copy link
Contributor Author

@xylar: I have realized I still need to keep the variable mpas_analysis_tasks in run_aprime, so that we can change the default parallelTask config option when we are on a compute node.

@xylar
Copy link
Contributor

xylar commented Dec 5, 2017

I still need to keep the variable mpas_analysis_tasks in run_aprime, so that we can change the default parallelTask config option when we are on a compute node.

Yep, that makes sense to do. That's fine.

@xylar
Copy link
Contributor

xylar commented Dec 5, 2017

One of my tests failed in the streamfunctionMOC and this did not pick it up.

I can look into this. Perhaps when a test fails we're not exiting with an error from MPAS-Analysis for some reason?

@milenaveneziani
Copy link
Contributor Author

@milenaveneziani
Copy link
Contributor Author

@salilmahajan or @kevans32: could you do a couple of tests at OLCF with this? perhaps a serial one on rhea and a batch-mode test on titan?

@xylar
Copy link
Contributor

xylar commented Dec 6, 2017

@milenaveneziani, I tested on Edison on both a login node and in batch mode using a beta2 run. Everything seems to have worked great and here are the results (from the login-node test, which overwrote the results of the batch mode test):
http://portal.nersc.gov/project/acme/xylar/20170915.beta2.A_WCYCL1850S.ne30_oECv3_ICG.edison_years31-40_vs_obs/

One question: I left the default export generate_moc=0. Is it a problem that the generated webpage still has links for the MOC that are just dead? Or do you want to exclude the MOC from the webpage as well if it's not being generated?

Copy link
Contributor

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of really good small changes here. Everything looks good to me.

@kevans32
Copy link
Contributor

kevans32 commented Dec 6, 2017

I can test for sure, but I am in flight until this aft and Salil is at an all day thing at ORNL today. But I should be able to do this afternoon once I hit the hotel.

@xylar
Copy link
Contributor

xylar commented Dec 6, 2017

Okay @milenaveneziani, I just created MPAS-Analysis release v0.6.5:
https://github.com/MPAS-Dev/MPAS-Analysis/releases/tag/v0.6.5

If you update this branch to point to that release, it should fix the exit status issue you saw (at least for error during the run phase of a given MPAS-Analysis task).

@xylar xylar changed the title Updated to MPAS-Analysis v0.6 Updated to MPAS-Analysis v0.6.5 Dec 6, 2017
@milenaveneziani
Copy link
Contributor Author

great, thanks. I will update this in the next hour.

@milenaveneziani
Copy link
Contributor Author

done.

@vanroekel: you may want to update your branch with this latest commit, which includes an update of the submodule (Luke is working on adding anvil support here).

@salilmahajan: I am thinking we can solve the error handling issue on the atm side too, by doing something similar. We could add lines like these:

exstatus=$?
if $exstatus -ne 0; then
  echo
  echo "Failed $doing-atm-task"
  exit 1
fi

right after a python call in each of the bash atm scripts. What do you think?
I can do that within this PR also.

@xylar
Copy link
Contributor

xylar commented Dec 6, 2017

@milenaveneziani, I'm re-running my tests right now.

I am thinking we can solve the error handling issue on the atm side too, by doing something similar.

What you suggest seems like exactly what would be needed. Presumably there's not anything weird and fancy going on on the atm side like in MPAS-Analysis that would mean errors get caught and don't end up leading to a non-zero exit status.

@vanroekel
Copy link
Contributor

vanroekel commented Dec 6, 2017

@milenaveneziani anvil support should be working now. It has been rebased for the 0.6.5 update and pushed. A question that came up when putting it in. In the script submission section of run_aprime you have one line in the atmosphere section that is

elif [ $machname == "titan" ];

and in the ocean section it is

elif [ $machname == "olcf" ];

Is there a reason for the change? Pinging @salilmahajan as well.

@milenaveneziani
Copy link
Contributor Author

Great, thanks @vanroekel.
Did you have a chance to test this on anvil, either on the login node or in batch?

@milenaveneziani
Copy link
Contributor Author

Ah, that's a mistake! I should have changed both to 'titan'..
I will change it.

@vanroekel
Copy link
Contributor

vanroekel commented Dec 6, 2017 via email

@milenaveneziani milenaveneziani force-pushed the add_mpasAnalysis_v0.6 branch 2 times, most recently from 0cae836 to 349c2a1 Compare December 6, 2017 23:10
@milenaveneziani
Copy link
Contributor Author

I have tested successfully on titan, in both batch mode and on the login node.

@xylar: note that I have added some small changes to generate_html so that, when some ocean/ice diagnostics is not generated, the links do not appear. I am thinking to add a link to the MPAS-Analysis webpage, but that is the only thing missing at this point. We may also leave it for the next PR.

@salilmahajan: can you please check the exit status part for the atm diagnostics bash scripts? Especially the exit messages: not sure if they are helpful enough.

I may want to squash a couple of commits before merging.

@xylar
Copy link
Contributor

xylar commented Dec 7, 2017

@milenaveneziani, looking at the code, I see that the MOC links would still exist even if I didn't include the MOC because there's a template for the HTML for that whole section, not for individual tasks. That's fine as long as a-prime users understand what's going on (i.e. it's the expected behavior). Obviously, this will change if you link to the MPAS-Analysis-generated website instead.

Speaking of which, I think that can wait for another PR. I think you would just need to turn on html generation (currently turned off in the config file) and give a subfolder in the desired HTML output folder as the destination. Then, you could replace the whole ocn/ice part of the HTML with a link to the website. Should be pretty easy (fingers crossed).

@milenaveneziani
Copy link
Contributor Author

Hi @xylar: no, if you look closely there is an extra if statement that allows for the skipping links, MOC included. See an example here:
http://projects.olcf.ornl.gov/acme/milena/20170915.beta2.A_WCYCL1850S.ne30_oECv3_ICG.edison_years1-10_vs_obs/

But I agree that those would be the steps to make the MPAS-Analysis page, except perhaps that the html generation is probably already on by default. So it might be even easier in the end.

@xylar
Copy link
Contributor

xylar commented Dec 7, 2017

@milenaveneziani, ah, I see. I was only looking at the parts you changed but now I see that there's already code to skip MOC. Sorry for the confusion. Looks great!

@xylar
Copy link
Contributor

xylar commented Dec 7, 2017

except perhaps that the html generation is probably already on by default. So it might be even easier in the end.

https://github.com/ACME-Climate/a-prime/blob/master/python/setup_ocnice_config.py#L42-L43

It looks like you set html generation to be off by default, which makes sense if we're not using it. Indeed, I looked for it and didn't see any directory with the MPAS-Analysis html output, so I think it's working as expected right now.

@milenaveneziani
Copy link
Contributor Author

It looks like you set html generation to be off by default

you are right, now I remember what I did.

@milenaveneziani
Copy link
Contributor Author

milenaveneziani commented Dec 8, 2017

Hi @kevans32: did you run your test with the branch in this PR? (wasn't sure from your comment on #15)

In terms of configuration, please use the run information in $proj_cli115/milena/a-prime/run_aprime_mv_titan.bash. Thanks.

@milenaveneziani
Copy link
Contributor Author

@salilmahajan: any chance you can just take a look at the changes I made for the atm scripts and approve or make some changes there?
In terms of testing we should be good, especially after @kevans32 completes hers.
Thanks.

@xylar xylar mentioned this pull request Dec 8, 2017
@kevans32
Copy link
Contributor

kevans32 commented Dec 8, 2017

@salilmahajan can you move a case over to OLCF for testing of this?

@kevans32, I did not copy any ocean output on my space on Lustre when I was testing for the enso diags earlier. I just initiated a transfer of the beta3rc10 archive directory using globus. It will be available here when its done: /lustre/atlas1/cli115/proj-shared/salil/archive/20171122.beta3rc10_1850.ne30_oECv3_ICG.edison

@salilmahajan
Copy link
Contributor

@milenaveneziani: Thanks for taking this on! Yes, the changes to the onscreen output and error handling look good to me.

@milenaveneziani
Copy link
Contributor Author

Thanks @salilmahajan for checking!

@kevans32: you can still use beta2 data for testing this on rhea (worked well for me on titan). Can you please just use the setting I have on my run_aprime (info in my last comment)? thanks.

@salilmahajan
Copy link
Contributor

Also, globus has started copying the latest beta3rc10 run over to OLCF. The files should be here soon:
/lustre/atlas1/cli115/proj-shared/salil/archive/20171122.beta3rc10_1850.ne30_oECv3_ICG.edison/archive

@kevans32
Copy link
Contributor

kevans32 commented Dec 8, 2017

On rhea, some plots were missing. Now running again with atm turned on to get a full accounting of what works and what does not.

@milenaveneziani
Copy link
Contributor Author

On rhea, some plots were missing

also make sure that generate_moc is turned on (we turn that off by default).

@kevans32
Copy link
Contributor

kevans32 commented Dec 8, 2017

Everything from this branch (including MOC ran successfully on rhea that I could see- and I looked at a variety of plots yay!

@milenaveneziani
Copy link
Contributor Author

yay! Thanks @kevans32!

I will merge now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants