Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a rule to enforce restrictions on time co-ordinates #15

Open
pont-us opened this issue Jan 10, 2025 · 3 comments
Open

Add a rule to enforce restrictions on time co-ordinates #15

pont-us opened this issue Jan 10, 2025 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@pont-us
Copy link
Member

pont-us commented Jan 10, 2025

Is your feature request related to a problem? Please describe.

Currently there are few checks on time co-ordinates. For instance, a plain int64 array with attributes of {"units": "marshmallows"} passes all current checks.

Describe the solution you'd like

First we need to decide what time specifications should be allowed and disallowed.

  • One obvious restriction is that time specifiers should be timezone-aware; it is hard to come up with a plausible use case for a timezone-naive time co-ordinate, and it tends to cause errors when indexing.
  • datetime64[ns] is a special case; it's the standard time type used by xarray, it doesn't contain any explicit timezone information, and using it as input to the pandas.Timestamp constructor gives a timezone-naive timestamp. However, a close reading of the NumPy documentation confirms that datetime64 is defined in terms of the Unix epoch, and therefore has an implicit UTC timezone.
  • When reading a dataset, xarray will automatically decode CF-compliant time specifiers to datetime64[ns] (if possible) or cftime. Should we also support "raw" time co-ordinates with undecoded CF attributes?
  • What should we support outside of standard xarray practice and CF conventions? e.g. the compliant dataset in the xrlint example notebook has an int64 time co-ordinate with the annotation {"units": "years"}, which follows neither xarray nor CF conventions.

Describe alternatives you've considered

One alternative would be to decide that time co-ordinates are simply too variable and complex to be checked reliably, and to refrain from writing any rules for them.

Additional context

@pont-us pont-us added the enhancement New feature or request label Jan 10, 2025
@pont-us pont-us self-assigned this Jan 10, 2025
@forman
Copy link
Member

forman commented Jan 10, 2025

@pont-us thanks for the issue :)

e.g. the compliant dataset in the xrlint example notebook has an int64 time co-ordinate with the annotation {"units": "years"}, which follows neither xarray nor CF conventions.

While it is perfectly valid to have a dimension comprising years (climatologies) as units, I actually don't now what the correct unit is. Therefore you could write a rule that checks for "correct" time units! In my case, maybe years since 0000-00-00 00:00:00.0000 is correct?

Please consider:

  • rules can be of type "suggestion" which relaxes their use.
  • we can define multiple rules for different use cases. Plugins can provide predefined configurations, hence we can write dedicated time rules but not include them in some predefined configurations only, or not at all.

@forman
Copy link
Member

forman commented Jan 10, 2025

And consider:

  • You can write rules that are applied only if datasets are opened with decode_cf=False!

@forman
Copy link
Member

forman commented Jan 10, 2025

I vote for

  • rule 1: that checks if the time coord var was properly decoded and whether its unit attr follows the CF pattern " since ",
  • rule 2: that checks that time specifiers are timezone-aware

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants