Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parse_rfc3339 function #70

Merged
merged 6 commits into from
Jul 23, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 16 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,12 @@ ciso8601
:target: https://pypi.org/project/ciso8601/

``ciso8601`` converts `ISO 8601`_ date time strings into Python datetime objects.

Since it's written as a C module, it is much faster than other Python libraries.
Tested with Python 2.7, 3.4, 3.5, 3.6, 3.7b.

.. _ISO 8601: https://en.wikipedia.org/wiki/ISO_8601
.. _RFC 3339: https://tools.ietf.org/html/rfc3339

(Interested in working on projects like this? `Close.io`_ is looking for `great engineers`_ to join our team)

Expand Down Expand Up @@ -208,9 +210,9 @@ Time Formats

Times are optional and are separated from the date by the letter ``T``.

Consistent with `RFC 3339`_, ``ciso860`` also allows either a space character, or a lower-case ``t``, to be used instead of a ``T``.
Consistent with `RFC 3339`__, ``ciso860`` also allows either a space character, or a lower-case ``t``, to be used instead of a ``T``.

.. _RFC 3339: https://stackoverflow.com/questions/522251/whats-the-difference-between-iso-8601-and-rfc-3339-date-formats)
__ https://stackoverflow.com/questions/522251/whats-the-difference-between-iso-8601-and-rfc-3339-date-formats

The following time formats are supported:

Expand Down Expand Up @@ -258,6 +260,18 @@ While the ISO 8601 specification allows the use of MINUS SIGN (U+2212) in the ti

Consistent with `RFC 3339`_, ``ciso860`` also allows a lower-case ``z`` to be used instead of a ``Z``.

Strict RFC 3339 Parsing
-----------------------

``ciso8601`` parses ISO 8601 datetimes, which can be thought of as a superset of `RFC 3339`_ (`roughly`_). In cases where you might want strict RFC 3339 parsing, ``ciso8601`` offers a ``parse_rfc3339`` method, which behaves in a similar manner to ``parse_datetime``:

.. _roughly: https://stackoverflow.com/questions/522251/whats-the-difference-between-iso-8601-and-rfc-3339-date-formats

``parse_rfc3339(dt: String): datetime`` is a function that takes a string and either:

* Returns a properly parsed Python datetime, **if and only if** the **entire** string conforms to RFC 3339.
* Raises a ``ValueError`` with a description of the reason why the string doesn't conform to RFC 3339.

Ignoring Timezone Information While Parsing
-------------------------------------------

Expand Down
1 change: 1 addition & 0 deletions ciso8601/__init__.pyi
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from datetime import datetime

def parse_datetime(datetime_string: str) -> datetime: ...
def parse_rfc3339(datetime_string: str) -> datetime: ...
def parse_datetime_as_naive(datetime_string: str) -> datetime: ...
66 changes: 62 additions & 4 deletions module.c
Original file line number Diff line number Diff line change
Expand Up @@ -80,10 +80,10 @@ format_unexpected_character_exception(char *field_name, char c, size_t index,
#define IS_TIME_SEPARATOR (*c == ':')
#define IS_TIME_ZONE_SEPARATOR \
(*c == 'Z' || *c == '-' || *c == '+' || *c == 'z')
#define IS_FRACTIONAL_SEPARATOR (*c == '.' || *c == ',')
#define IS_FRACTIONAL_SEPARATOR (*c == '.' || (*c == ',' && !rfc3339_only))

static PyObject *
_parse(PyObject *self, PyObject *args, int parse_any_tzinfo)
_parse(PyObject *self, PyObject *args, int parse_any_tzinfo, int rfc3339_only)
{
PyObject *obj;
PyObject *tzinfo = Py_None;
Expand Down Expand Up @@ -131,10 +131,20 @@ _parse(PyObject *self, PyObject *args, int parse_any_tzinfo)
/* Day */
PARSE_INTEGER(day, 2, "day")
}
else if (rfc3339_only) {
PyErr_SetString(PyExc_ValueError,
"Datetime string not in RFC 3339 format.");
return NULL;
}
else {
day = 1;
}
}
else if (rfc3339_only) {
PyErr_SetString(PyExc_ValueError,
"Datetime string not in RFC 3339 format.");
return NULL;
}
else { /* Non-separated Month and Day (ie. MMDD) */
/* Month */
PARSE_INTEGER(month, 2, "month")
Expand Down Expand Up @@ -234,6 +244,18 @@ _parse(PyObject *self, PyObject *args, int parse_any_tzinfo)
PARSE_FRACTIONAL_SECOND()
}
}
else if (rfc3339_only) {
PyErr_SetString(PyExc_ValueError,
"RFC 3339 requires the second to be "
"specified.");
return NULL;
}
}
else if (rfc3339_only) {
PyErr_SetString(PyExc_ValueError,
"Colons separating time components are "
"mandatory in RFC 3339.");
return NULL;
}
else { /* Non-separated Minute and Second (ie. mmss) */
/* Minute */
Expand All @@ -251,11 +273,23 @@ _parse(PyObject *self, PyObject *args, int parse_any_tzinfo)
}
}
}
else if (rfc3339_only) {
PyErr_SetString(PyExc_ValueError,
"Minute and second are mandatory in RFC 3339");
return NULL;
}

if (hour == 24 && minute == 0 && second == 0 && usecond == 0) {
/* Special case of 24:00:00, that is allowed in ISO 8601. It is
* equivalent to 00:00:00 the following day
*/
if (rfc3339_only) {
PyErr_SetString(PyExc_ValueError,
"An hour value of 24, while sometimes legal "
"in ISO 8601, is explicitly forbidden by RFC "
"3339.");
return NULL;
}
hour = 0, minute = 0, second = 0, usecond = 0;
time_is_midnight = 1;
}
Expand Down Expand Up @@ -298,6 +332,12 @@ _parse(PyObject *self, PyObject *args, int parse_any_tzinfo)
/* tz minute */
PARSE_INTEGER(tzminute, 2, "tz minute")
}
else if (rfc3339_only) {
PyErr_SetString(PyExc_ValueError,
"Separator between hour and minute in UTC "
"offset is mandatory in RFC 3339");
return NULL;
}
else if (*c != '\0') { /* Optional tz minute */
PARSE_INTEGER(tzminute, 2, "tz minute")
}
Expand Down Expand Up @@ -348,6 +388,16 @@ _parse(PyObject *self, PyObject *args, int parse_any_tzinfo)
}
}
}
else if (rfc3339_only) {
PyErr_SetString(PyExc_ValueError,
"UTC offset is mandatory in RFC 3339 format.");
return NULL;
}
}
else if (rfc3339_only) {
PyErr_SetString(PyExc_ValueError,
"Time is mandatory in RFC 3339 format.");
return NULL;
}

/* Make sure that there is no more to parse. */
Expand Down Expand Up @@ -377,20 +427,28 @@ _parse(PyObject *self, PyObject *args, int parse_any_tzinfo)
static PyObject *
parse_datetime_as_naive(PyObject *self, PyObject *args)
{
return _parse(self, args, 0);
return _parse(self, args, 0, 0);
}

static PyObject *
parse_datetime(PyObject *self, PyObject *args)
{
return _parse(self, args, 1);
return _parse(self, args, 1, 0);
}

static PyObject *
parse_rfc3339(PyObject *self, PyObject *args)
{
return _parse(self, args, 1, 1);
}

static PyMethodDef CISO8601Methods[] = {
{"parse_datetime", parse_datetime, METH_VARARGS,
"Parse a ISO8601 date time string."},
{"parse_datetime_as_naive", parse_datetime_as_naive, METH_VARARGS,
"Parse a ISO8601 date time string, ignoring the time zone component."},
{"parse_rfc3339", parse_rfc3339, METH_VARARGS,
"Parse an RFC 3339 date time string."},
{NULL, NULL, 0, NULL}};

#if PY_MAJOR_VERSION >= 3
Expand Down
45 changes: 45 additions & 0 deletions tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,51 @@ def test_invalid_tz_offsets_too_large(self):
)


class Rfc3339TestCase(unittest.TestCase):
def test_valid_rfc3339_timestamps(self):
"""
Validate that valid RFC 3339 datetimes are parseable by parse_rfc3339
and produce the same result as parse_datetime.
"""
for string in [
'2018-01-02T03:04:05Z',
'2018-01-02t03:04:05z',
'2018-01-02 03:04:05z',
'2018-01-02T03:04:05+00:00',
'2018-01-02T03:04:05-00:00',
'2018-01-02T03:04:05.12345Z',
'2018-01-02T03:04:05+01:23',
'2018-01-02T03:04:05-12:34',
'2018-01-02T03:04:05-12:34',
]:
self.assertEqual(ciso8601.parse_datetime(string),
ciso8601.parse_rfc3339(string))

def test_invalid_rfc3339_timestamps(self):
"""
Validate that datetime strings that are valid ISO 8601 but invalid RFC
3339 trigger a ValueError when passed to RFC 3339, and that this
ValueError explicitly mentions RFC 3339.
"""
for timestamp in [
"2018-01-02", # Missing mandatory time
"2018-01-02T03", # Missing mandatory minute and second
"2018-01-02T03Z", # Missing mandatory minute and second
"2018-01-02T03:04", # Missing mandatory minute and second
"2018-01-02T03:04Z", # Missing mandatory minute and second
"2018-01-02T03:04:01+04", # Missing mandatory offset minute
"2018-01-02T03:04:05", # Missing mandatory offset
"2018-01-02T03:04:05.12345", # Missing mandatory offset
"2018-01-02T24:00:00Z", # 24:00:00 is not valid in RFC 3339
'20180102T03:04:05-12:34', # Missing mandatory date separators
'2018-01-02T030405-12:34', # Missing mandatory time separators
'2018-01-02T03:04:05-1234', # Missing mandatory offset separator
'2018-01-02T03:04:05,12345Z' # Invalid comma fractional second separator
]:
with self.assertRaisesRegex(ValueError, r"RFC 3339", msg="Timestamp '{0}' was supposed to be invalid, but parsing it didn't raise ValueError.".format(timestamp)):
ciso8601.parse_rfc3339(timestamp)


class GithubIssueRegressionTestCase(unittest.TestCase):
# These are test cases that were provided in GitHub issues submitted to ciso8601.
# They are kept here as regression tests.
Expand Down