Skip to content

Conversation

@xylar
Copy link
Collaborator

@xylar xylar commented Nov 21, 2017

This is accomplished through a new task, MpasTimeSeriesTask, which allows other tasks to add variables and then extracts requested variables with a single call to ncrcat (per component). All time-series tasks except MOC have been updated to used this functionality.

For the MOC time series, monthly mean files are being opened one by one as separate data sets (saving the trouble of creating a combined data set only to break it back up into time slices).

All MPAS data sets are now opened with open_mpas_dataset, which parses time from xtime or xtime_startMonthly and xtime_endMonthly but doesn't try to combine multiple files. This seems to spare us from may of the problems we've run into with dask in the past.

@xylar xylar self-assigned this Nov 21, 2017
@xylar
Copy link
Collaborator Author

xylar commented Nov 21, 2017

@milenaveneziani, this should be the last PR for v0.6. (Unless, of course, there are bug fixes.) I will do some testing today and post the results along with a suggestion for a test you can run as a sanity check.

@xylar
Copy link
Collaborator Author

xylar commented Nov 21, 2017

Should address #225

@xylar
Copy link
Collaborator Author

xylar commented Nov 21, 2017

I discovered a bug in testing this branch, #273. While the bug isn't directly related to this PR, I think it should be fixed before we continue with testing, since it's hard to tell if this branch is behaving properly until we fix the other bug.

@xylar
Copy link
Collaborator Author

xylar commented Nov 21, 2017

Testing

Ubuntu

Runs successfully

Edison

Batch mode: memory error during MOC

Anvil

Batch mode and login node: memory error during MOC

Grizzly

Batch mode: memory error during MOC

Theta

Batch mode: run did not finish in allotted hour

@xylar
Copy link
Collaborator Author

xylar commented Nov 21, 2017

MOC Error:

> cat /scratch1/scratchdirs/xylar/analysis/beta1/ncrcat/logs/streamfunctionMOC.log

Plotting streamfunction of Meridional Overturning Circulation (MOC)...

  Reading region and transect mask for Atlantic...

  Compute and/or plot post-processed MOC climatological streamfunction...
   Load data...
   Compute Atlantic MOC...
    Compute transport through region southern transect...
   Compute Global MOC...
    Compute transport through region southern transect...
   Save global and regional MOC to file...

  Compute and/or plot post-processed Atlantic MOC time series...
   Load data...
     date: 0001-01
/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/dask/core.py:306: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  elif type_arg is type(key) and arg == key:
/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/dask/core.py:306: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  elif type_arg is type(key) and arg == key:
analysis task streamfunctionMOC failed during run 
Traceback (most recent call last):
  File "/global/u2/x/xylar/mpas_work/analysis/switch_to_ncrcat/mpas_analysis/shared/analysis_task.py", line 315, in run
    self.run_task()
  File "/global/u2/x/xylar/mpas_work/analysis/switch_to_ncrcat/mpas_analysis/ocean/streamfunction_moc.py", line 178, in run_task
    dsMOCTimeSeries = self._compute_moc_time_series_postprocess()
  File "/global/u2/x/xylar/mpas_work/analysis/switch_to_ncrcat/mpas_analysis/ocean/streamfunction_moc.py", line 507, in _compute_moc_time_series_postprocess
    dvEdge, refLayerThickness, latAtlantic, regionCellMask)
  File "/global/u2/x/xylar/mpas_work/analysis/switch_to_ncrcat/mpas_analysis/ocean/streamfunction_moc.py", line 529, in _compute_moc_time_series
    horizontalVel = dsLocal.timeMonthly_avg_normalVelocity.values
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/xarray/core/dataarray.py", line 403, in values
    return self.variable.values
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/xarray/core/variable.py", line 329, in values
    return _as_array_or_item(self._data)
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/xarray/core/variable.py", line 205, in _as_array_or_item
    data = np.asarray(data)
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/numpy/core/numeric.py", line 531, in asarray
    return array(a, dtype, copy=False, order=order)
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/dask/array/core.py", line 1092, in __array__
    x = self.compute()
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/dask/base.py", line 99, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/dask/base.py", line 206, in compute
    results = get(dsk, keys, **kwargs)
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/dask/threaded.py", line 75, in get
    pack_exception=pack_exception, **kwargs)
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/dask/local.py", line 521, in get_async
    raise_exception(exc, tb)
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/dask/local.py", line 290, in execute_task
    result = _execute_task(task, data)
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/dask/local.py", line 270, in _execute_task
    args2 = [_execute_task(a, cache) for a in args]
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/dask/local.py", line 270, in _execute_task
    args2 = [_execute_task(a, cache) for a in args]
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/dask/local.py", line 271, in _execute_task
    return func(*args2)
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/dask/array/core.py", line 56, in getter
    c = np.asarray(c)
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/numpy/core/numeric.py", line 531, in asarray
    return array(a, dtype, copy=False, order=order)
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/xarray/core/indexing.py", line 408, in __array__
    return np.asarray(self.array, dtype=dtype)
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/numpy/core/numeric.py", line 531, in asarray
    return array(a, dtype, copy=False, order=order)
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/xarray/core/indexing.py", line 375, in __array__
    return np.asarray(array[self.key], dtype=None)
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/numpy/core/numeric.py", line 531, in asarray
    return array(a, dtype, copy=False, order=order)
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/xarray/core/indexing.py", line 375, in __array__
    return np.asarray(array[self.key], dtype=None)
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison_acme_unified_2017.9.26/lib/python2.7/site-packages/xarray/backends/netCDF4_.py", line 60, in __getitem__
    data = getitem(self.get_array(), key)
  File "netCDF4/_netCDF4.pyx", line 3804, in netCDF4._netCDF4.Variable.__getitem__
MemoryError

@xylar
Copy link
Collaborator Author

xylar commented Nov 21, 2017

I'm going to see if switching from open_multifile_dataset to open_mpas_dataset makes a difference. Hopefully this avoids dask and things will work.

@milenaveneziani
Copy link
Collaborator

If that doesn't work either, we may have to load the previously computed annual averages, instead of the monthly means from the timeSeriesStats files. That would do, for a post-processed MOC timeseries, I think. And hopefully it will be less likely that we get into memory errors..

@xylar
Copy link
Collaborator Author

xylar commented Nov 21, 2017

Okay, so it's technically working but it seems like a giant waste of time to first copy GB upon GB of data (164 GB for the example on Edison I'm working with, and that's only 22 years of MOC at EC60to30 res.) before the processing can even begin. It would probably make more sense to read directly from each monthly file in the particular case of the MOC time series. I'll look into it...

@xylar
Copy link
Collaborator Author

xylar commented Nov 21, 2017

For what it's worth, it worked...

@xylar xylar removed the in progress label Nov 21, 2017
@xylar
Copy link
Collaborator Author

xylar commented Nov 21, 2017

Testing Update

Ubuntu

Runs successfully

Edison

Batch mode (all) and login node (MOC only): runs successfully

Anvil

Batch mode: runs successfully

Grizzly

Batch mode: runs successfully

Theta

Batch mode: errors in several tasks. However, Theta seems to be having issues or be down for maintenance or something so I am not convinced the errors are related to this PR.

@xylar
Copy link
Collaborator Author

xylar commented Nov 21, 2017

@milenaveneziani, if you want to run a test of your own, please do so. I think this is finally ready to merge.

@xylar
Copy link
Collaborator Author

xylar commented Nov 21, 2017

Oops, some more debugging to do. If I rerun without the --purge flag I get an error because it tries to run ncrcat with no input files. Should be easy to fix...

@xylar
Copy link
Collaborator Author

xylar commented Nov 21, 2017

I think I have the error fixed but some more testing is needed. Probably will have to wait until tomorrow.

@milenaveneziani
Copy link
Collaborator

I will do some testing of this tonight. Thanks @xylar.

@xylar
Copy link
Collaborator Author

xylar commented Nov 21, 2017

Okay, that would be good. I have a feeling we are likely to expose some more bugs by doing further testing. Better sooner than later.

@xylar
Copy link
Collaborator Author

xylar commented Nov 22, 2017

The theta run worked in the end, though it required 1 hour as a batch script (the maximum allowed on a single node) followed by a run on the login node to finish up.

All my tests have now passed, including re-running an analysis job without the --purge flag.

@milenaveneziani
Copy link
Collaborator

I have just started a test on titan.

Quick question: how many years did you ask for when computing time series, in your various tests?

@xylar
Copy link
Collaborator Author

xylar commented Nov 22, 2017

I tested time series of the following lengths:

  • Laptop: 5 years
  • Anvil: 22 years
  • Edison: 22 years
  • Grizzly: 10 years
  • Theta: 26 years

@milenaveneziani
Copy link
Collaborator

@xylar: unfortunately the two subsequent tests I did today did not work as desired. Here is a summary.

I used the same beta2 run as before, but saving stuff in a different directory, and asked for computing timeseries tasks only (in my previous tests I was computing everything). My first test had year_start, year_end = 1, 10 and all went well as expected. Then, I did two more tests without purging:

  1. chose year_start, year_end = 5, 15
    Output started with this:
Computing MPAS time series from files:
    mpaso.hist.am.timeSeriesStatsMonthly.0005-01-01.nc through
    mpaso.hist.am.timeSeriesStatsMonthly.0015-12-01.nc
Execution time: 0:00:00.03

and I already knew something was wrong because it did not go through ncrcat.
Then I got an error at the OHC task:

Plotting OHC time series and T, S, and OHC vertical trends...
  Read in depth and compute specific depth indexes...
  Load ocean data...
analysis task timeSeriesOHC failed during run 
Traceback (most recent call last):
  File "/global/u2/m/milena/MPAS-git-repositories/MPAS-Analysis/mpas_analysis/shared/analysis_task.py", line 315, in run
    self.run_task()
  File "/global/u2/m/milena/MPAS-git-repositories/MPAS-Analysis/mpas_analysis/ocean/time_series_ohc.py", line 239, in run_task
    dsFirstYear.rename(renameDict, inplace=True)
  File "/global/homes/m/milena/proj_acme/milena/miniconda2.7/lib/python2.7/site-packages/xarray/core/dataset.py", line 1530, in rename
    "variable or dimension in this dataset" % k)
ValueError: cannot rename 'timeMonthly_avg_avgValueWithinOceanLayerRegion_avgLayerThickness' because it is not a variable or dimension in this dataset

didn't get the same error for SST timeseries, but I believe nothing happened there (timeseries was not updated to go from year 5 to 15). After this, I canceled the test.

  1. asked for year_start, year_end = 1, 15
    In this case, I did not get any error in OHC, but all of the timeseries plots were not updated from the previous years=1-10 run, probably because ncrcat was not called again in this case also.
    The only timeseries that are indeed updated are the MOC timeseries, I guess because they are unrelated to the file created by ncrcat.

I am not sure what causes the OHC error in 1), but I am thinking these issues are all related to the fact that, for some reason, ncrcat is not called again after the very first run.

I wonder what happens if I choose a period that does not overlap at all with the first one. I will try this case now.

@milenaveneziani
Copy link
Collaborator

ok, so, this last test was for year_start, year_end = 11, 20, i.e. no overlap with previous run.
Also this time, in the timeseries task, it appears that the ncrcat command is not invoked and then I get an error in all mpas-o and mpas-cice time series tasks (OHC, SST, etc), but the MOC timeseries, saying that:
ValueError: The data set contains no Time entries between dates 0011-01-01 00:00:00 and 0020-12-31 23:59:59.
so, clearly, this has to do with ncrcat not being invoked after the first time around.

xylar added 10 commits November 23, 2017 07:46
This is accomplished through a new task, MpasTimeSeriesTask,
which allows other tasks to add variables and then extracts
requested variables with a single call to ncrcat (per component).

All time-series tasks except MOC have been updated to used this
functionatly.  It may also make sense to use this functionality
for MOC but, since it requires substantially more data than other
tasks, this has been left out for the time being.
This is because we don't want to slow down other tasks with the
potentially very large time series that needs to be extracted
for MOC.
This is an alternative that opens a single data file but still
parses the MPAS time variable as before and can subset the Time
dimension using a start and end date.
Some observational and pre-processed datasets still use the old
generalized_reader to open multiple files and process the xtime
variable.
Instead, opening each monthly file one at a time to compute each
data point in the MOC time series.
Fix a bug in how the final date is extracted
@xylar
Copy link
Collaborator Author

xylar commented Nov 23, 2017

Thanks @milenaveneziani, I will try all of these things myself and see if I can make some progress.

@xylar
Copy link
Collaborator Author

xylar commented Nov 23, 2017

Enjoy your holiday!

The first year is now always included in the time series, allowing
for anomalies to be computed.

A bug in computing which dates were already present in the
time-series file has been fixed (adding lists behaves very
differently from adding numpy arrays!).
This merge fixes a bug in renaming variables in the first-year
data set if one is created.  Previously, not all variables in the
renameDict were included in the data set (since not all are used)
but now all are kept for simplicity.
@xylar
Copy link
Collaborator Author

xylar commented Nov 23, 2017

@milenaveneziani, I don't expect you to work on this at all over the long weekend. I just wanted to let you know that I think I have fixed the 3 issues your testing showed. Here is a summary:

  • The OHC computation had a bug when year 0001 was not the start year of the time series. It created a data set for year 0001 but it didn't rename the variables correctly. This is unrelated to the rest of this PR but has been fixed here because it seems like overkill to have a separate PR for a very simple fix.
  • Year 0001 was not always being included in the time-series data. This would mess up the OHC and any other computation based on anomalies for obvious reasons. This has now been fixed.
  • A dumb bug (adding lists instead of numpy.arrays) prevented the code for finding missing dates in the time series data from working properly. This has now been fixed.

I'm sorry for not doing a better job of testing my own code and for making you discover errors that I should have uncovered myself. Thanks again for your help.

@xylar
Copy link
Collaborator Author

xylar commented Nov 23, 2017

By the way, because of the way ncrcat works, you can change the range of dates for time series to include more dates at the end (and cut off dates you previously used at the beginning) but you can't start with later dates and move to earlier dates. This is because we can append new times but we can't insert missing times somewhere in the middle of the data set. I don't think that's an issue, since we wouldn't typically be skipping around like that (unless we were purging) but I just wanted to make sure that was clear.

@milenaveneziani
Copy link
Collaborator

@xylar: I re-did the 3 tests listed above and this time all went well. Great!

Only one question left for me: the MOC time series do not remember previous runs at all, even if I ask for the exact same number of years. Is that to avoid memory problems with ncrcat?
But before it could at least check whether the timeseries MOC file existed, and if it did, it would be very fast.

One other thing: I haven't tested in batch mode at all. Let me know if you want me to do one such test on some machine.

@xylar
Copy link
Collaborator Author

xylar commented Nov 24, 2017

@milenaveneziani, thanks for re-testing!

Ah, I'm glad you noticed that problem with the MOC. That should be easy to fix. When I switched to using ncrcat for the MOC data I must have messed up the check for the stored time-series data.

I think it's perfectly fine to check in batch mode. Presumably we want to do both login-node and batch-mode testing from time to time but I don't think it's necessary for every PR.

Cache the MOC time series to an  output file and only compute
time entries that are not already cached.

Also, some very minor PEP8 clean-up in an unrelated file.
@xylar
Copy link
Collaborator Author

xylar commented Nov 24, 2017

@milenaveneziani, could you run a test where you look at MOC time-series caching again? You can just run with the --generate=streamfunctionMOC and run from the login node to save time. The test I did was:

  • run the MOC with time series covering years 1-3
  • rerun the MOC (without purge) to cover years 1-5. Only years 4 and 5 were computed in the time series.
  • rerun the MOC (without purge) to cover years 1-5 once again. No new values were computed. I didn't check but the cache file should also not get rewritten.

Hopefully, you see the same kind of behavior

@milenaveneziani
Copy link
Collaborator

@xylar: thanks for the latest changes.
I did a similar test to yours with streamfunctionMOC and everything worked as expected.
I also tested a batch job on edison. All good.

I'd say you can merge this.

@xylar
Copy link
Collaborator Author

xylar commented Nov 25, 2017

@milenaveneziani, thank you so much for doing this testing over the holiday!

@xylar xylar merged commit 6fe2d28 into MPAS-Dev:develop Nov 25, 2017
@xylar xylar deleted the switch_to_ncrcat branch November 25, 2017 08:37
xylar added a commit that referenced this pull request Jan 1, 2018
Clean up the nino34 index task

This clean up is needed to generalize the task in preparation for supporting comparison with a reference run (which will replace one of the panels).

A new task has been added to extract time series needed by climate indices (which may have different time bounds than time series). Support for separate time bounds was eliminated, perhaps accidentally, in #271.

Nino34 spectra are now packed into dictionaries for easier iteration and panels are plotted in loops.

The function for determining the maximum value of all spectra (to determine the bounds of the y axes of these plots) has been cleaned up so it no longer relies on a plot already having been performed).

All warnings previously produced with warnings.warn have been switched to just using print('Warning:..') because this produces much more intuitive output without a (nearly always useless) stack trace.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants