Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove date limits imposed by Python's datetime module to support BC and far-future dates #26

Closed
jmurty opened this issue Apr 13, 2018 · 1 comment
Assignees

Comments

@jmurty
Copy link
Contributor

jmurty commented Apr 13, 2018

We currently rely on Python's built-in datetime module – and its datetime and date constructs – to represent and return date/time information, but this module only supports dates between 1 and 9999 AD. The in-built Python libraries for handling date/times are simply not designed or intended for representing dates in the distant past such as any BC dates, or dates in the far future.

Currently we ignore "extreme" dates that cannot be handled by datetime by bounding recognised dates to the datetime.MINYEAR (1) and datetime.MAXYEAR (9999) boundary constants.

We will update the date parsing and processing in python-edtf to avoid this limitation by removing any use of the datetime module which imposes the limits, and will use Python's much more flexible struct_time representation when returning values that were previously returned as date or datetime objects.

While this change will make it more awkward to work with some of the API, it will allow us to better represent the real data (rather than bounding the supported date ranges) and should be relatively easy for users to convert into richer Python objects like datetimes when possible (1 AD <= year <= 9999 AD) or something like astropy.time for dates outside datetime's supported range.

WARNING: This change will cause a breaking API change for the following methods of EDTF classes which will return struct_time constructs instead of date or datetime objects:

  • lower_strict()
  • upper_strict()
  • lower_fuzzy()
  • upper_fuzzy()
  • or indeed any features that rely on the underlying _strict_date() methods
@jmurty jmurty self-assigned this Apr 13, 2018
@jmurty jmurty changed the title Remove date limits imposed by Python's datetime module Remove date limits imposed by Python's datetime module to support BC and far-future dates Apr 13, 2018
jmurty added a commit that referenced this issue Apr 13, 2018
Permit processing and representation of years
before 1 AD and after 999 AD by removing use of
Pyhon's `datetime` and related modules, which are
limited to these years.

This changes the public API so the following
methods return `struct_time` objects instead of
`date` or `datetime` objects: `lower_strict()`,
`upper_strict()`, `lower_fuzzy()`, `upper_fuzzy()`

Details:

- stop using `datetime` modules internally when
  parsing and processing EDTF syntax
- change `_strict_date()` and all dependent
  methods to return `struct_time` instead of
  objects from the `datetime` module.
  This affects public API methods listed above
- remove deliberate coercion of out-of-date year
  values to `date.min` and `date.max` boundaries
- update tests to exercise broader date ranges
jmurty added a commit that referenced this issue Apr 13, 2018
The `struct_time` fields tm_wday, tm_yday, tm_isdst
are not supported by this library because we
cannot reliably generate them for years outside
1 AD to 9999 AD, therefore we ignore these fields
in comparison methods when a `struct_time`
object is provided.
jmurty added a commit that referenced this issue Apr 13, 2018
- explain the switch from `datetime` module objects
  to using `time.struct_time` as responses to
  upper/lower strict/fuzzy methods
- update example code to reflect the change
- remove reference to `date.MIN` and `date.MAX`
  restrictions.
jmurty added a commit that referenced this issue Apr 13, 2018
jmurty added a commit that referenced this issue Apr 13, 2018
The change to returning `struct_time` objects from
the lower/upper strict/fuzzy methods broke the
`EDTFField` implementation.

This change fixes it so it can write to derived
`DateField`s again, though this is still subject
to the 1 to 9999 AD year restrictions of Python's
`datetime` module so a better fix would be to
support or require numeric target fields instead.
jmurty added a commit that referenced this issue Apr 16, 2018
jmurty added a commit that referenced this issue Apr 16, 2018
Fixed bug where target field was not properly
looked-up, and therefore not recognised correctly.
jmurty added a commit that referenced this issue Apr 17, 2018
Add the python module `jdutil` with many useful
date conversions, including conversion from
arbitrary dates/times to a numerical
floating-point representation that can be easily
stored in databases and other systems where we
need easy and reliably sorting and comparison of
arbitrary dates.

Note that I chose this module over other library
options available for Python because in my testing
it did the best job of keeping accuracy for
extreme dates (like year +-999,999,999) while also
generating using a number that is easier to store
than the two-float representation often used for
Julian dates.

Source: https://gist.github.com/jiffyclub/1294443
Credit to: Matt Davis, https://github.com/jiffyclub
jmurty added a commit that referenced this issue Apr 17, 2018
Update `EDTFField` to store derived upper/lower
static/float dates as float values when these
fields are defined as `DoubleField` in the target
model.

This change permits representation of arbitrary
dates not limited to 1 to 9999 AD like the
alternative `DateField` target fields.

Note that the stored value is not completely
accurate and, in my testing, may drift by a few
years for negative years in the thousands or
millions – which seems like a reasonable
limitation.
jmurty added a commit that referenced this issue Apr 19, 2018
Unit test time conversion utilities, and fix a
couple of bugs surfaced by tests.
jmurty added a commit that referenced this issue Apr 19, 2018
Use a better name `edtf.convert` instead of
`edtf.utils`
jmurty added a commit that referenced this issue Apr 19, 2018
Add `struct_time_to_jd` and `jd_to_struct_time`
functions to `edtf.convert` module to handle
conversion to and from Julian Date numerical
float values.

These functions greatly simplify Julian Date (JD)
conversions, especially for converting from JD
values where fiddly handling may be required to
convert negative month/day/hour/minute/second
values returned by the `jdutil` module.
@jmurty
Copy link
Contributor Author

jmurty commented May 31, 2018

This work is now merged to master as release 4.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant