Skip to content

Commit

Permalink
Add parse_rfc3339 function (#70)
Browse files Browse the repository at this point in the history
* Add parse_rfc3339 function

Resolves #67

* Comma is not a valid RFC 3339 fractional seconds separator.

* Added section to README describing the new method.

* Added typing information for the new method.
  • Loading branch information
ExplodingCabbage authored and movermeyer committed Jul 23, 2018
1 parent 0108875 commit 5cfd346
Show file tree
Hide file tree
Showing 4 changed files with 124 additions and 6 deletions.
18 changes: 16 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,12 @@ ciso8601
:target: https://pypi.org/project/ciso8601/

``ciso8601`` converts `ISO 8601`_ date time strings into Python datetime objects.

Since it's written as a C module, it is much faster than other Python libraries.
Tested with Python 2.7, 3.4, 3.5, 3.6, 3.7b.

.. _ISO 8601: https://en.wikipedia.org/wiki/ISO_8601
.. _RFC 3339: https://tools.ietf.org/html/rfc3339

(Interested in working on projects like this? `Close.io`_ is looking for `great engineers`_ to join our team)

Expand Down Expand Up @@ -208,9 +210,9 @@ Time Formats

Times are optional and are separated from the date by the letter ``T``.

Consistent with `RFC 3339`_, ``ciso860`` also allows either a space character, or a lower-case ``t``, to be used instead of a ``T``.
Consistent with `RFC 3339`__, ``ciso860`` also allows either a space character, or a lower-case ``t``, to be used instead of a ``T``.

.. _RFC 3339: https://stackoverflow.com/questions/522251/whats-the-difference-between-iso-8601-and-rfc-3339-date-formats)
__ https://stackoverflow.com/questions/522251/whats-the-difference-between-iso-8601-and-rfc-3339-date-formats

The following time formats are supported:

Expand Down Expand Up @@ -258,6 +260,18 @@ While the ISO 8601 specification allows the use of MINUS SIGN (U+2212) in the ti

Consistent with `RFC 3339`_, ``ciso860`` also allows a lower-case ``z`` to be used instead of a ``Z``.

Strict RFC 3339 Parsing
-----------------------

``ciso8601`` parses ISO 8601 datetimes, which can be thought of as a superset of `RFC 3339`_ (`roughly`_). In cases where you might want strict RFC 3339 parsing, ``ciso8601`` offers a ``parse_rfc3339`` method, which behaves in a similar manner to ``parse_datetime``:

.. _roughly: https://stackoverflow.com/questions/522251/whats-the-difference-between-iso-8601-and-rfc-3339-date-formats

``parse_rfc3339(dt: String): datetime`` is a function that takes a string and either:

* Returns a properly parsed Python datetime, **if and only if** the **entire** string conforms to RFC 3339.
* Raises a ``ValueError`` with a description of the reason why the string doesn't conform to RFC 3339.

Ignoring Timezone Information While Parsing
-------------------------------------------

Expand Down
1 change: 1 addition & 0 deletions ciso8601/__init__.pyi
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from datetime import datetime

def parse_datetime(datetime_string: str) -> datetime: ...
def parse_rfc3339(datetime_string: str) -> datetime: ...
def parse_datetime_as_naive(datetime_string: str) -> datetime: ...
66 changes: 62 additions & 4 deletions module.c
Original file line number Diff line number Diff line change
Expand Up @@ -80,10 +80,10 @@ format_unexpected_character_exception(char *field_name, char c, size_t index,
#define IS_TIME_SEPARATOR (*c == ':')
#define IS_TIME_ZONE_SEPARATOR \
(*c == 'Z' || *c == '-' || *c == '+' || *c == 'z')
#define IS_FRACTIONAL_SEPARATOR (*c == '.' || *c == ',')
#define IS_FRACTIONAL_SEPARATOR (*c == '.' || (*c == ',' && !rfc3339_only))

static PyObject *
_parse(PyObject *self, PyObject *args, int parse_any_tzinfo)
_parse(PyObject *self, PyObject *args, int parse_any_tzinfo, int rfc3339_only)
{
PyObject *obj;
PyObject *tzinfo = Py_None;
Expand Down Expand Up @@ -131,10 +131,20 @@ _parse(PyObject *self, PyObject *args, int parse_any_tzinfo)
/* Day */
PARSE_INTEGER(day, 2, "day")
}
else if (rfc3339_only) {
PyErr_SetString(PyExc_ValueError,
"Datetime string not in RFC 3339 format.");
return NULL;
}
else {
day = 1;
}
}
else if (rfc3339_only) {
PyErr_SetString(PyExc_ValueError,
"Datetime string not in RFC 3339 format.");
return NULL;
}
else { /* Non-separated Month and Day (ie. MMDD) */
/* Month */
PARSE_INTEGER(month, 2, "month")
Expand Down Expand Up @@ -234,6 +244,18 @@ _parse(PyObject *self, PyObject *args, int parse_any_tzinfo)
PARSE_FRACTIONAL_SECOND()
}
}
else if (rfc3339_only) {
PyErr_SetString(PyExc_ValueError,
"RFC 3339 requires the second to be "
"specified.");
return NULL;
}
}
else if (rfc3339_only) {
PyErr_SetString(PyExc_ValueError,
"Colons separating time components are "
"mandatory in RFC 3339.");
return NULL;
}
else { /* Non-separated Minute and Second (ie. mmss) */
/* Minute */
Expand All @@ -251,11 +273,23 @@ _parse(PyObject *self, PyObject *args, int parse_any_tzinfo)
}
}
}
else if (rfc3339_only) {
PyErr_SetString(PyExc_ValueError,
"Minute and second are mandatory in RFC 3339");
return NULL;
}

if (hour == 24 && minute == 0 && second == 0 && usecond == 0) {
/* Special case of 24:00:00, that is allowed in ISO 8601. It is
* equivalent to 00:00:00 the following day
*/
if (rfc3339_only) {
PyErr_SetString(PyExc_ValueError,
"An hour value of 24, while sometimes legal "
"in ISO 8601, is explicitly forbidden by RFC "
"3339.");
return NULL;
}
hour = 0, minute = 0, second = 0, usecond = 0;
time_is_midnight = 1;
}
Expand Down Expand Up @@ -298,6 +332,12 @@ _parse(PyObject *self, PyObject *args, int parse_any_tzinfo)
/* tz minute */
PARSE_INTEGER(tzminute, 2, "tz minute")
}
else if (rfc3339_only) {
PyErr_SetString(PyExc_ValueError,
"Separator between hour and minute in UTC "
"offset is mandatory in RFC 3339");
return NULL;
}
else if (*c != '\0') { /* Optional tz minute */
PARSE_INTEGER(tzminute, 2, "tz minute")
}
Expand Down Expand Up @@ -348,6 +388,16 @@ _parse(PyObject *self, PyObject *args, int parse_any_tzinfo)
}
}
}
else if (rfc3339_only) {
PyErr_SetString(PyExc_ValueError,
"UTC offset is mandatory in RFC 3339 format.");
return NULL;
}
}
else if (rfc3339_only) {
PyErr_SetString(PyExc_ValueError,
"Time is mandatory in RFC 3339 format.");
return NULL;
}

/* Make sure that there is no more to parse. */
Expand Down Expand Up @@ -377,20 +427,28 @@ _parse(PyObject *self, PyObject *args, int parse_any_tzinfo)
static PyObject *
parse_datetime_as_naive(PyObject *self, PyObject *args)
{
return _parse(self, args, 0);
return _parse(self, args, 0, 0);
}

static PyObject *
parse_datetime(PyObject *self, PyObject *args)
{
return _parse(self, args, 1);
return _parse(self, args, 1, 0);
}

static PyObject *
parse_rfc3339(PyObject *self, PyObject *args)
{
return _parse(self, args, 1, 1);
}

static PyMethodDef CISO8601Methods[] = {
{"parse_datetime", parse_datetime, METH_VARARGS,
"Parse a ISO8601 date time string."},
{"parse_datetime_as_naive", parse_datetime_as_naive, METH_VARARGS,
"Parse a ISO8601 date time string, ignoring the time zone component."},
{"parse_rfc3339", parse_rfc3339, METH_VARARGS,
"Parse an RFC 3339 date time string."},
{NULL, NULL, 0, NULL}};

#if PY_MAJOR_VERSION >= 3
Expand Down
45 changes: 45 additions & 0 deletions tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,51 @@ def test_invalid_tz_offsets_too_large(self):
)


class Rfc3339TestCase(unittest.TestCase):
def test_valid_rfc3339_timestamps(self):
"""
Validate that valid RFC 3339 datetimes are parseable by parse_rfc3339
and produce the same result as parse_datetime.
"""
for string in [
'2018-01-02T03:04:05Z',
'2018-01-02t03:04:05z',
'2018-01-02 03:04:05z',
'2018-01-02T03:04:05+00:00',
'2018-01-02T03:04:05-00:00',
'2018-01-02T03:04:05.12345Z',
'2018-01-02T03:04:05+01:23',
'2018-01-02T03:04:05-12:34',
'2018-01-02T03:04:05-12:34',
]:
self.assertEqual(ciso8601.parse_datetime(string),
ciso8601.parse_rfc3339(string))

def test_invalid_rfc3339_timestamps(self):
"""
Validate that datetime strings that are valid ISO 8601 but invalid RFC
3339 trigger a ValueError when passed to RFC 3339, and that this
ValueError explicitly mentions RFC 3339.
"""
for timestamp in [
"2018-01-02", # Missing mandatory time
"2018-01-02T03", # Missing mandatory minute and second
"2018-01-02T03Z", # Missing mandatory minute and second
"2018-01-02T03:04", # Missing mandatory minute and second
"2018-01-02T03:04Z", # Missing mandatory minute and second
"2018-01-02T03:04:01+04", # Missing mandatory offset minute
"2018-01-02T03:04:05", # Missing mandatory offset
"2018-01-02T03:04:05.12345", # Missing mandatory offset
"2018-01-02T24:00:00Z", # 24:00:00 is not valid in RFC 3339
'20180102T03:04:05-12:34', # Missing mandatory date separators
'2018-01-02T030405-12:34', # Missing mandatory time separators
'2018-01-02T03:04:05-1234', # Missing mandatory offset separator
'2018-01-02T03:04:05,12345Z' # Invalid comma fractional second separator
]:
with self.assertRaisesRegex(ValueError, r"RFC 3339", msg="Timestamp '{0}' was supposed to be invalid, but parsing it didn't raise ValueError.".format(timestamp)):
ciso8601.parse_rfc3339(timestamp)


class GithubIssueRegressionTestCase(unittest.TestCase):
# These are test cases that were provided in GitHub issues submitted to ciso8601.
# They are kept here as regression tests.
Expand Down

0 comments on commit 5cfd346

Please sign in to comment.