Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CF conventions for time doesn't support years #1467

Open
mangecoeur opened this issue Jun 27, 2017 · 10 comments
Open

CF conventions for time doesn't support years #1467

mangecoeur opened this issue Jun 27, 2017 · 10 comments

Comments

@mangecoeur
Copy link
Contributor

CF conventions code supports: {'microseconds': 'us', 'milliseconds': 'ms', 'seconds': 's', 'minutes': 'm', 'hours': 'h', 'days': 'D'}, but not 'years'. See example file https://www.dropbox.com/s/34dcpliko928yaj/histsoc_population_0.5deg_1861-2005.nc4?dl=0

@fmaussion
Copy link
Member

I am not sure to understand what you are asking us to do here. The problem with "years" is that their use is not recommended by the CF conventions.

Very often (and I think your file means it this way), users would like years to be simple "calendar years" , i.e. : 1901-01-01, 1902-01-01, but this is not what the unit "years" means in the CF conventions: see http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#time-coordinate

@mangecoeur
Copy link
Contributor Author

I think I do mean 'years' in the CF convention sense, in this case the time dimension is:

double time(time=145);
      :standard_name = "time";
      :units = "years since 1860-1-1 12:00:00";
      :calendar = "proleptic_gregorian";

This is correctly interpreted by the NASA Panoply NetCDF file viewer. From glancing at the xarray code, it seems it depends on the pandas Timedelta object which in turn doesn't support years as delta objects (although date ranges can be generated at year intervals so it should be possible to implement).

@fmaussion
Copy link
Member

I think I do mean 'years' in the CF convention sense

Can you pinpoint to which part of the CF convention? From the link I read: a year is exactly 365.242198781 days, which would lead to highly unlikely calendar dates.

I agree however that interpreting "years" as being "calendar years" is the only way that makes sense.

For the record, netCDF4 also doesn't like "years":

import netCDF4
ds = netCDF4.Dataset('/home/mowglie/Downloads/histsoc_population_0.5deg_1861-2005.nc4')
time = ds.variables['time']
netCDF4.num2date(time[:], units=time.units)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-b38f64c7bce4> in <module>()
      2 ds = netCDF4.Dataset('histsoc_population_0.5deg_1861-2005.nc4')
      3 time = ds.variables['time']
----> 4 netCDF4.num2date(time[:], units=time.units)

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.num2date (netCDF4/_netCDF4.c:66463)()

ValueError: unsupported time units

@benbovy
Copy link
Member

benbovy commented Jun 28, 2017

Although I'm not a specialist of CF conventions, this issue may be related to this one: Unidata/cftime#5. The forthcoming NetCDFTimeIndex (#1252) uses the netcdftime package. It's rather about using common_year with noleap calendar, though.

@jhamman
Copy link
Member

jhamman commented Jun 28, 2017

I would think that this sort of feature belongs in netcdftime, not xarray. There are obvious issues with defining what a year (or a month) is but I image we can sort those out.

@matthiasdemuzere
Copy link

I actually have a similar issues with respect to 'months'. I want to write out my xarray dataarray as a netcdf file, with months as time intervals (one value per month, doesn't matter what day of the month is used as a reference). As with the 'years' described above, this does not seem to work in the current framework?

@matthiasdemuzere
Copy link

In order to construct a netcdf file with a 2D field on a monthly resolution (for X number of years), I currently use the lines of code mentioned below. Since I do not care about the type of calendar, I just use 360_day, in which each month of the year has 30 days. Perhaps this can be useful for others. In case a better solution is available, please let me know!

import numpy as np
import pandas as pd
import xarray as xr

# 51 years, saving first day of each month.
mmhours = np.arange(0,(51*360*24),30*24)
attrs = {'units': 'Hours since 1955-01-01T12:00:00', 'calendar' : '360_day'}

target = np.random.rand(len(mmhours),10,10)
lat = np.arange(50,51,0.1)
lon = np.arange(3,4,0.1)

target_xr = xr.Dataset({'test': (['time', 'lat', 'lon'],  target)},
                       coords={'time': ('time', mmhours, attrs) ,'lat': lat, 'lon': lon})

target_xr.to_netcdf('test.nc', encoding={'test': {'zlib': True}})

@fmaussion
Copy link
Member

fmaussion commented Aug 1, 2017

Hi Matthias, I think your solution is fine. The best is simply to avoid "months" as units altogether.

If one has a "real" calendar one can also let pandas and xarray do the job:

t = pd.date_range(start='1980-01', end='2010-12', freq='MS')
target = np.random.rand(len(t), 10, 10)
lat = np.arange(50, 51, 0.1)
lon = np.arange(3, 4, 0.1)
target_xr = xr.Dataset({'test': (['time', 'lat', 'lon'],  target)},
                       coords={'time': ('time', t),
                               'lat': lat, 'lon': lon}
                      )
target_xr.to_netcdf('test_2.nc')

which creates the following time units automatically:

	int64 time(time) ;
		time:units = "days since 1980-01-01 00:00:00" ;
		time:calendar = "proleptic_gregorian" ;

@rabernat
Copy link
Contributor

rabernat commented Oct 4, 2018

Month unit support in cftime is being discussed in in Unidata/cftime#69

Perhaps xarray folks would like to weigh in.

@jbusecke
Copy link
Contributor

I have run into this problem multiple times. The latest example I found were some [CORE ocean model runs] (https://rda.ucar.edu/datasets/ds262.0/index.html#!description).
The time dimension of some (they mix units) of these files is given as

netcdf MRI-A_sigma0_monthly {
dimensions:
	level = 51 ;
	latitude = 368 ;
	longitude = 364 ;
	time = UNLIMITED ; // (720 currently)
	time_bnds = 2 ;
variables:
	double latitude(latitude) ;
		latitude:units = "degrees_north                                                   " ;
		latitude:axis = "Y" ;
	double longitude(longitude) ;
		longitude:units = "degrees_east                                                    " ;
		longitude:axis = "X" ;
	double level(level) ;
		level:units = "m                                                               " ;
		level:axis = "Z" ;
	double time(time) ;
		time:units = "years since 1948-1-1 00:00:00                                   " ;
		time:axis = "T" ;
		time:bounds = "time_bnds" ;
		time:calendar = "noleap" ;
	double time_bnds(time, time_bnds) ;
	float sigma0(time, level, latitude, longitude) ;
		sigma0:units = "kg/m^3          " ;
		sigma0:long_name = "Monthly-mean potential density (sigma-0)                                                                                        " ;
		sigma0:missing_value = -9.99e+33f ;
}

I understand that 'fully' supporting to decode this unit is hard and should probably addressed upstream.

But I think it might be useful to have a utility function that converts a dataset with these units into someting quickly useable with xarray?
E.g. one could load the dataset with ds = xr.open_dataset(..., decode_times=False) and then maybe call xr.decode_funky_units(ds, units='calendaryears', ...), which could default to the first day of a year (or the first day of a month for units of months since.

This way the user is aware that something is not decoded exactly, but can work with the data.
Is this something that people could see useful here? Id be happy to give an implementation a shot if there is interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants