diff --git a/CHANGELOG.md b/CHANGELOG.md index 43faadc..33bb8bf 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -21,6 +21,11 @@ # Unreleased * Added Python 3.9 support +* Switched to using a C implementation of `timezone` objects. + * Much faster parse times for timestamps with timezone information + * ~3x faster on Python 2.7, 25% faster on Python 3.7 + * Thanks to [`pendulum`](https://github.com/sdispater/pendulum) and @sdispater for the code. + * Python 2.7 users no longer need to install `pytz` dependency :smiley: # 2.x.x diff --git a/MANIFEST.in b/MANIFEST.in index 3ce73c4..1edc9c6 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -1,3 +1,4 @@ include LICENSE include README.rst include CHANGELOG.md +include timezone.h diff --git a/README.rst b/README.rst index d38a942..5a0d3ff 100644 --- a/README.rst +++ b/README.rst @@ -76,7 +76,7 @@ Parsing a timestamp with no time zone information (ex. ``2014-01-09T21:48:00``): .. -.. table:: +.. table:: +---------------+----------+----------+----------+----------+----------+-------------------------------+-----------------------------------------------+ | Module |Python 3.8|Python 3.7|Python 3.6|Python 3.5|Python 3.4| Python 2.7 |Relative Slowdown (versus ciso8601, Python 3.8)| @@ -118,7 +118,7 @@ Parsing a timestamp with time zone information (ex. ``2014-01-09T21:48:00-05:30` .. -.. table:: +.. table:: +---------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+----------+-------------------------------+-----------------------------------------------+ | Module | Python 3.8 | Python 3.7 | Python 3.6 | Python 3.5 |Python 3.4| Python 2.7 |Relative Slowdown (versus ciso8601, Python 3.8)| @@ -185,29 +185,6 @@ For full benchmarking details (or to run the benchmark yourself), see `benchmark .. _`benchmarking/README.rst`: https://github.com/closeio/ciso8601/blob/master/benchmarking/README.rst -Dependency on pytz (Python 2) ------------------------------ - -In Python 2, ``ciso8601`` uses the `pytz`_ library while parsing timestamps with time zone information. This means that if you wish to parse such timestamps, you must first install ``pytz``: - -.. _pytz: http://pytz.sourceforge.net/ - -.. code:: python - - pip install pytz - -Otherwise, ``ciso8601`` will raise an exception when you try to parse a timestamp with time zone information: - -.. code:: python - - In [2]: ciso8601.parse_datetime('2014-12-05T12:30:45.123456-05:30') - Out[2]: ImportError: Cannot parse a timestamp with time zone information without the pytz dependency. Install it with `pip install pytz`. - -``pytz`` is intentionally not an explicit dependency of ``ciso8601``. This is because many users use ``ciso8601`` to parse only naive timestamps, and therefore don't need this extra dependency. -In Python 3, ``ciso8601`` makes use of the built-in `datetime.timezone`_ class instead, so ``pytz`` is not necessary. - -.. _datetime.timezone: https://docs.python.org/3/library/datetime.html#timezone-objects - Supported Subset of ISO 8601 ---------------------------- @@ -227,11 +204,11 @@ The following date formats are supported: ``YYYY-MM-DD`` ``2018-04-29`` ✅ ``YYYY-MM`` ``2018-04`` ✅ ``YYYYMMDD`` ``2018-04`` ✅ - ``--MM-DD`` (omitted year) ``--04-29`` ❌ + ``--MM-DD`` (omitted year) ``--04-29`` ❌ ``--MMDD`` (omitted year) ``--0429`` ❌ - ``±YYYYY-MM`` (>4 digit year) ``+10000-04`` ❌ - ``+YYYY-MM`` (leading +) ``+2018-04`` ❌ - ``-YYYY-MM`` (negative -) ``-2018-04`` ❌ + ``±YYYYY-MM`` (>4 digit year) ``+10000-04`` ❌ + ``+YYYY-MM`` (leading +) ``+2018-04`` ❌ + ``-YYYY-MM`` (negative -) ``-2018-04`` ❌ ============================= ============== ================== Week dates or ordinal dates are not currently supported. @@ -247,7 +224,7 @@ Week dates or ordinal dates are not currently supported. ``YYYY-Www-D`` (week date) ``2009-W01-1`` ❌ ``YYYYWwwD`` (week date) ``2009-W01-1`` ❌ ``YYYY-DDD`` (ordinal date) ``1981-095`` ❌ - ``YYYYDDD`` (ordinal date) ``1981095`` ❌ + ``YYYYDDD`` (ordinal date) ``1981095`` ❌ ============================= ============== ================== Time Formats @@ -264,22 +241,22 @@ The following time formats are supported: .. table:: :widths: auto - =================================== =================== ============== - Format Example Supported - =================================== =================== ============== - ``hh`` ``11`` ✅ - ``hhmm`` ``1130`` ✅ - ``hh:mm`` ``11:30`` ✅ - ``hhmmss`` ``113059`` ✅ - ``hh:mm:ss`` ``11:30:59`` ✅ - ``hhmmss.ssssss`` ``113059.123456`` ✅ - ``hh:mm:ss.ssssss`` ``11:30:59.123456`` ✅ - ``hhmmss,ssssss`` ``113059,123456`` ✅ - ``hh:mm:ss,ssssss`` ``11:30:59,123456`` ✅ - Midnight (special case) ``24:00:00`` ✅ - ``hh.hhh`` (fractional hours) ``11.5`` ❌ - ``hh:mm.mmm`` (fractional minutes) ``11:30.5`` ❌ - =================================== =================== ============== + =================================== =================== ============== + Format Example Supported + =================================== =================== ============== + ``hh`` ``11`` ✅ + ``hhmm`` ``1130`` ✅ + ``hh:mm`` ``11:30`` ✅ + ``hhmmss`` ``113059`` ✅ + ``hh:mm:ss`` ``11:30:59`` ✅ + ``hhmmss.ssssss`` ``113059.123456`` ✅ + ``hh:mm:ss.ssssss`` ``11:30:59.123456`` ✅ + ``hhmmss,ssssss`` ``113059,123456`` ✅ + ``hh:mm:ss,ssssss`` ``11:30:59,123456`` ✅ + Midnight (special case) ``24:00:00`` ✅ + ``hh.hhh`` (fractional hours) ``11.5`` ❌ + ``hh:mm.mmm`` (fractional minutes) ``11:30.5`` ❌ + =================================== =================== ============== **Note:** Python datetime objects only have microsecond precision (6 digits). Any additional precision will be truncated. @@ -291,9 +268,9 @@ Time zone information may be provided in one of the following formats: .. table:: :widths: auto - ========== ========== =========== - Format Example Supported - ========== ========== =========== + ========== ========== =========== + Format Example Supported + ========== ========== =========== ``Z`` ``Z`` ✅ ``z`` ``z`` ✅ ``±hh`` ``+11`` ✅ diff --git a/module.c b/module.c index c9f1677..775d778 100644 --- a/module.c +++ b/module.c @@ -1,6 +1,7 @@ #include #include #include +#include "timezone.h" #define STRINGIZE(x) #x #define EXPAND_AND_STRINGIZE(x) STRINGIZE(x) @@ -11,12 +12,6 @@ ((PY_MAJOR_VERSION == 3 && PY_MINOR_VERSION >= 3) || PY_MAJOR_VERSION > 3) #define PY_VERSION_AT_LEAST_36 \ ((PY_MAJOR_VERSION == 3 && PY_MINOR_VERSION >= 6) || PY_MAJOR_VERSION > 3) -#define PY_VERSION_AT_LEAST_37 \ - ((PY_MAJOR_VERSION == 3 && PY_MINOR_VERSION >= 7) || PY_MAJOR_VERSION > 3) - -#if !PY_VERSION_AT_LEAST_37 -static PyObject *fixed_offset; -#endif static PyObject *utc; @@ -427,32 +422,34 @@ _parse(PyObject *self, PyObject *args, int parse_any_tzinfo, int rfc3339_only) tzminute += 60 * tzhour; tzminute *= tzsign; -#if !PY_VERSION_AT_LEAST_32 - if (fixed_offset == NULL || utc == NULL) { - PyErr_SetString(PyExc_ImportError, - "Cannot parse a timestamp with time zone " - "information without the pytz dependency. " - "Install it with `pip install pytz`."); - return NULL; - } -#endif - if (tzminute == 0) { tzinfo = utc; } - else { -#if PY_VERSION_AT_LEAST_37 - delta = PyDelta_FromDSU(0, 60 * tzminute, 0); - tzinfo = PyTimeZone_FromOffset(delta); + else if (abs(tzminute) >= 1440) { + /* Format the error message as if we were still using pytz + * for Python 2 and datetime.timezone for Python 3. + * This is done to maintain complete backwards + * compatibility with ciso8601 2.0.x. Perhaps change to a + * simpler message in ciso8601 v3.0.0. + */ +#if PY_MAJOR_VERSION >= 3 + delta = PyDelta_FromDSU(0, tzminute * 60, 0); + PyErr_Format(PyExc_ValueError, + "offset must be a timedelta" + " strictly between -timedelta(hours=24) and" + " timedelta(hours=24)," + " not %R.", + delta); Py_DECREF(delta); -#elif PY_VERSION_AT_LEAST_32 - tzinfo = PyObject_CallFunction( - fixed_offset, "N", - PyDelta_FromDSU(0, 60 * tzminute, 0)); #else - tzinfo = - PyObject_CallFunction(fixed_offset, "i", tzminute); + PyErr_Format(PyExc_ValueError, + "('absolute offset is too large', %d)", + tzminute); #endif + return NULL; + } + else { + tzinfo = new_fixed_offset(60 * tzminute); if (tzinfo == NULL) /* ie. PyErr_Occurred() */ return NULL; } @@ -542,12 +539,6 @@ PyInit_ciso8601(void) initciso8601(void) #endif { -#if !PY_VERSION_AT_LEAST_32 - PyObject *pytz; -#elif !PY_VERSION_AT_LEAST_37 - PyObject *datetime; -#endif - #if PY_MAJOR_VERSION >= 3 PyObject *module = PyModule_Create(&moduledef); #else @@ -558,28 +549,18 @@ initciso8601(void) EXPAND_AND_STRINGIZE(CISO8601_VERSION)); PyDateTime_IMPORT; -#if PY_VERSION_AT_LEAST_37 - utc = PyDateTime_TimeZone_UTC; -#elif PY_VERSION_AT_LEAST_32 - datetime = PyImport_ImportModule("datetime"); - if (datetime == NULL) - return NULL; - fixed_offset = PyObject_GetAttrString(datetime, "timezone"); - if (fixed_offset == NULL) - return NULL; - utc = PyObject_GetAttrString(fixed_offset, "utc"); - if (utc == NULL) + +// PyMODINIT_FUNC returns void in Python 2, PyObject* in Python 3 +#if PY_MAJOR_VERSION >= 3 + if (initialize_timezone_code(module) < 0) return NULL; #else - pytz = PyImport_ImportModule("pytz"); - if (pytz == NULL) { - PyErr_Clear(); - } - else { - fixed_offset = PyObject_GetAttrString(pytz, "FixedOffset"); - utc = PyObject_GetAttrString(pytz, "UTC"); - } + initialize_timezone_code(module); #endif + + utc = new_fixed_offset(0); + +// PyMODINIT_FUNC returns void in Python 2, PyObject* in Python 3 #if PY_MAJOR_VERSION >= 3 return module; #endif diff --git a/setup.py b/setup.py index d4818f8..875220a 100644 --- a/setup.py +++ b/setup.py @@ -40,7 +40,7 @@ url="https://github.com/closeio/ciso8601", license="MIT", ext_modules=[Extension("ciso8601", - sources=["module.c"], + sources=["module.c", "timezone.c"], define_macros=[("CISO8601_VERSION", VERSION)] )], packages=["ciso8601"], diff --git a/tests.py b/tests.py index bdab5c7..6969fab 100644 --- a/tests.py +++ b/tests.py @@ -1,9 +1,12 @@ # -*- coding: utf-8 -*- +import copy import datetime +import pickle +import re import sys -from ciso8601 import parse_datetime, parse_datetime_as_naive, parse_rfc3339 +from ciso8601 import FixedOffset, parse_datetime, parse_datetime_as_naive, parse_rfc3339 from generate_test_timestamps import generate_valid_timestamp_and_datetime, generate_invalid_timestamp if sys.version_info.major == 2: @@ -275,12 +278,21 @@ def test_invalid_tz_minute(self): ) def test_invalid_tz_offsets_too_large(self): - # The Python interpreter crashes if you give the datetime constructor a TZ offset with an absolute value >= 1440 - # TODO: Determine whether these are valid ISO 8601 values and therefore whether ciso8601 should support them. + # The TZ offsets with an absolute value >= 1440 minutes are not supported by the tzinfo spec. + # See https://docs.python.org/3/library/datetime.html#datetime.tzinfo.utcoffset + + # Error message differs whether or not we are using pytz or datetime.timezone + # (and also by which Python version. Python 3.7 has different timedelta.repr()) + # Of course we no longer use either, but for backwards compatibility + # with v2.0.x, we did not change the error messages. + if sys.version_info.major >= 3: + expected_error_message = re.escape("offset must be a timedelta strictly between -timedelta(hours=24) and timedelta(hours=24), not {0}.".format(repr(datetime.timedelta(minutes=-5940)))) + else: + expected_error_message = r"\('absolute offset is too large', -5940\)" + self.assertRaisesRegex( ValueError, - # Error message differs whether or not we are using pytz or datetime.timezone - r"^offset must be a timedelta strictly between" if sys.version_info.major >= 3 else r"\('absolute offset is too large', -5940\)", + expected_error_message, parse_datetime, '2018-01-01T00:00:00.00-99', ) @@ -358,6 +370,23 @@ def test_invalid_rfc3339_timestamps(self): parse_rfc3339(timestamp) +class PicklingTestCase(unittest.TestCase): + # Found as a result of https://github.com/movermeyer/backports.datetime_fromisoformat/issues/12 + def test_basic_pickle_and_copy(self): + dt = parse_datetime('2018-11-01 20:42:09') + dt2 = pickle.loads(pickle.dumps(dt)) + self.assertEqual(dt, dt2) + dt3 = copy.deepcopy(dt) + self.assertEqual(dt, dt3) + + # FixedOffset + dt = parse_datetime('2018-11-01 20:42:09+01:30') + dt2 = pickle.loads(pickle.dumps(dt)) + self.assertEqual(dt, dt2) + dt3 = copy.deepcopy(dt) + self.assertEqual(dt, dt3) + + class GithubIssueRegressionTestCase(unittest.TestCase): # These are test cases that were provided in GitHub issues submitted to ciso8601. # They are kept here as regression tests. diff --git a/timezone.c b/timezone.c new file mode 100644 index 0000000..9b7856a --- /dev/null +++ b/timezone.c @@ -0,0 +1,228 @@ +/* This code was originally copied from Pendulum +(https://github.com/sdispater/pendulum/blob/13ff4a0250177f77e4ff2e7bd1f442d954e66b22/pendulum/parsing/_iso8601.c#L176) +Pendulum (like ciso8601) is MIT licensed, so we have included a copy of its +license here. +*/ + +/* +Copyright (c) 2015 Sébastien Eustace + +Permission is hereby granted, free of charge, to any person obtaining +a copy of this software and associated documentation files (the +"Software"), to deal in the Software without restriction, including +without limitation the rights to use, copy, modify, merge, publish, +distribute, sublicense, and/or sell copies of the Software, and to +permit persons to whom the Software is furnished to do so, subject to +the following conditions: + +The above copyright notice and this permission notice shall be +included in all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE +LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION +OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION +WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. +*/ + +#include "timezone.h" + +#include +#include +#include + +#define SECS_PER_MIN 60 +#define SECS_PER_HOUR (60 * SECS_PER_MIN) +#define TWENTY_FOUR_HOURS_IN_SECONDS 86400 + +/* + * class FixedOffset(tzinfo): + */ +typedef struct { + // Seconds offset from UTC. + // Must be in range (-86400, 86400) seconds exclusive. + // ie. (-1440, 1440) minutes exclusive. + PyObject_HEAD int offset; +} FixedOffset; + +/* + * def __init__(self, offset): + * self.offset = offset + */ +static int +FixedOffset_init(FixedOffset *self, PyObject *args, PyObject *kwargs) +{ + int offset; + if (!PyArg_ParseTuple(args, "i", &offset)) + return -1; + + if (abs(offset) >= TWENTY_FOUR_HOURS_IN_SECONDS) { + PyErr_Format(PyExc_ValueError, + "offset must be an integer in the range (-86400, 86400), " + "exclusive"); + return -1; + } + + self->offset = offset; + return 0; +} + +/* + * def utcoffset(self, dt): + * return timedelta(seconds=self.offset * 60) + */ +static PyObject * +FixedOffset_utcoffset(FixedOffset *self, PyObject *args) +{ + return PyDelta_FromDSU(0, self->offset, 0); +} + +/* + * def dst(self, dt): + * return timedelta(seconds=self.offset * 60) + */ +static PyObject * +FixedOffset_dst(FixedOffset *self, PyObject *args) +{ + return PyDelta_FromDSU(0, self->offset, 0); +} + +/* + * def tzname(self, dt): + * sign = '+' + * if self.offset < 0: + * sign = '-' + * return "%s%d:%d" % (sign, self.offset / 60, self.offset % 60) + */ +static PyObject * +FixedOffset_tzname(FixedOffset *self, PyObject *args) +{ + char result_tzname[7] = {0}; + char sign = '+'; + int offset = self->offset; + + if (offset < 0) { + sign = '-'; + offset *= -1; + } + + snprintf(result_tzname, 7, "%c%02u:%02u", sign, + (offset / SECS_PER_HOUR) & 31, + offset / SECS_PER_MIN % SECS_PER_MIN); + + return PyUnicode_FromString(result_tzname); +} + +/* + * def __repr__(self): + * return self.tzname() + */ +static PyObject * +FixedOffset_repr(FixedOffset *self) +{ + return FixedOffset_tzname(self, NULL); +} + +/* + * def __getinitargs__(self): + * return (self.offset,) + */ +static PyObject * +FixedOffset_getinitargs(FixedOffset *self) +{ + PyObject *args = PyTuple_Pack(1, PyLong_FromLong(self->offset)); + return args; +} + +/* + * Class member / class attributes + */ +static PyMemberDef FixedOffset_members[] = { + {"offset", T_INT, offsetof(FixedOffset, offset), 0, "UTC offset"}, {NULL}}; + +/* + * Class methods + */ +static PyMethodDef FixedOffset_methods[] = { + {"utcoffset", (PyCFunction)FixedOffset_utcoffset, METH_VARARGS, ""}, + {"dst", (PyCFunction)FixedOffset_dst, METH_VARARGS, ""}, + {"tzname", (PyCFunction)FixedOffset_tzname, METH_VARARGS, ""}, + {"__getinitargs__", (PyCFunction)FixedOffset_getinitargs, METH_VARARGS, + ""}, + {NULL}}; + +static PyTypeObject FixedOffset_type = { + PyVarObject_HEAD_INIT(NULL, 0) "ciso8601.FixedOffset", /* tp_name */ + sizeof(FixedOffset), /* tp_basicsize */ + 0, /* tp_itemsize */ + 0, /* tp_dealloc */ + 0, /* tp_print */ + 0, /* tp_getattr */ + 0, /* tp_setattr */ + 0, /* tp_as_async */ + (reprfunc)FixedOffset_repr, /* tp_repr */ + 0, /* tp_as_number */ + 0, /* tp_as_sequence */ + 0, /* tp_as_mapping */ + 0, /* tp_hash */ + 0, /* tp_call */ + (reprfunc)FixedOffset_repr, /* tp_str */ + 0, /* tp_getattro */ + 0, /* tp_setattro */ + 0, /* tp_as_buffer */ + Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ + "TZInfo with fixed offset", /* tp_doc */ +}; + +/* + * Instantiate new FixedOffset_type object + * Skip overhead of calling PyObject_New and PyObject_Init. + * Directly allocate object. + * Note that this also doesn't do any validation of the offset parameter. + * Callers must ensure that offset is within \ + * the range (-86400, 86400), exclusive. + */ +PyObject * +new_fixed_offset_ex(int offset, PyTypeObject *type) +{ + FixedOffset *self = (FixedOffset *)(type->tp_alloc(type, 0)); + + if (self != NULL) + self->offset = offset; + + return (PyObject *)self; +} + +PyObject * +new_fixed_offset(int offset) +{ + return new_fixed_offset_ex(offset, &FixedOffset_type); +} + +/* ------------------------------------------------------------- */ + +int +initialize_timezone_code(PyObject *module) +{ + PyDateTime_IMPORT; + FixedOffset_type.tp_new = PyType_GenericNew; + FixedOffset_type.tp_base = PyDateTimeAPI->TZInfoType; + FixedOffset_type.tp_methods = FixedOffset_methods; + FixedOffset_type.tp_members = FixedOffset_members; + FixedOffset_type.tp_init = (initproc)FixedOffset_init; + + if (PyType_Ready(&FixedOffset_type) < 0) + return -1; + + Py_INCREF(&FixedOffset_type); + if (PyModule_AddObject(module, "FixedOffset", + (PyObject *)&FixedOffset_type) < 0) { + Py_DECREF(module); + Py_DECREF(&FixedOffset_type); + return -1; + } + + return 0; +} diff --git a/timezone.h b/timezone.h new file mode 100644 index 0000000..dd0d829 --- /dev/null +++ b/timezone.h @@ -0,0 +1,12 @@ +#ifndef CISO_TZINFO_H +#define CISO_TZINFO_H + +#include + +PyObject * +new_fixed_offset(int offset); + +int +initialize_timezone_code(PyObject *module); + +#endif