to_datetime() ns precision inconsistencies between s and ns #15817

mchwalisz · 2017-03-27T11:10:55Z

Code Sample, a copy-pastable example if possible

import pandas as pd
pd.options.display.float_format = '{:.20f}'.format
with open('example.csv', 'w') as f:
    f.write('''
full,ts,nano
1490195805.433502912,1490195805,433502912
1490195805.933358907,1490195805,933358907
1490195806.433445930,1490195806,433445930
1490195806.933351039,1490195806,933351039
''')
df = pd.read_csv('example.csv')
df['from two ints'] = pd.to_datetime(df['ts'] * 10**9 + df['nano'], unit='ns')
df['from float to s'] = pd.to_datetime(df['full'], unit='s')
df['from float^9 to ns'] = pd.to_datetime(df['full'] * 10**9, unit='ns')

print('Types:')
print(df.dtypes)
print('Data:')
print(df.iloc[0])

Output

Types:
full                         float64
ts                             int64
nano                           int64
from two ints         datetime64[ns]
from float to s       datetime64[ns]
from float^9 to ns    datetime64[ns]
dtype: object
Data:
full                 1490195805.43350267410278320312
ts                                        1490195805
nano                                       433502912
from two ints          2017-03-22 15:16:45.433502912
from float to s           2017-03-22 15:16:45.433503
from float^9 to ns     2017-03-22 15:16:45.433502720
Name: 0, dtype: object

Problem description

I would expect that all of the following methods would give the same result. In the documentation that the epoch times will be rounded to the nearest nanosecond. I don't understand why conversion from float using seconds s unit gives different rounding than nano seconds ns. Not to mention strange float parsing coming from csv.

I expect the whole issue to be connected to #7307.

Expected Output

Types:
full                         float64
ts                             int64
nano                           int64
from two ints         datetime64[ns]
from float to s       datetime64[ns]
from float^9 to ns    datetime64[ns]
dtype: object
Data:
full                            1490195805.433502912
ts                                        1490195805
nano                                       433502912
from two ints          2017-03-22 15:16:45.433502912
from float to s        2017-03-22 15:16:45.433502912
from float^9 to ns     2017-03-22 15:16:45.433502912
Name: 0, dtype: object

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.9.10-040910-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 34.3.2
Cython: 0.25.2
numpy: 1.10.4
scipy: 0.19.0
statsmodels: None
xarray: None
IPython: 5.3.0
sphinx: 1.4.8
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.2
matplotlib: 2.0.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
httplib2: 0.9.1
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.5
boto: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2017-03-27T12:01:17Z

This is not really related to #7307, but more with the (possible) precision of floats.

The float "1490195805.433502912" is too large / has too high precision to be represented faithfully as a floating point. Therefore, when converting this float to a datetime64 (which representation is based on integers) you get a conversion error.

In [11]: 1490195805.433502912
Out[11]: 1490195805.433503

In [12]: 1490195805433502912
Out[12]: 1490195805433502912

In [14]: pd.to_datetime(1490195805.433502912, unit='s') 
Out[14]: Timestamp('2017-03-22 15:16:45.433503') 

In [15]: pd.to_datetime(1490195805433502912, unit='ns')  
Out[15]: Timestamp('2017-03-22 15:16:45.433502912')

mchwalisz · 2017-03-27T12:13:25Z

How do you explain the following then?

df['from float^9 to ns'] = pd.to_datetime(df['full'] * 10**9, unit='ns')

from float^9 to ns     2017-03-22 15:16:45.433502720

jorisvandenbossche · 2017-03-27T12:26:05Z

A string needs to be parsed to an actual floating point, and also here differences can occur based on which implementation you use:

In [25]: 1490195805.433502912
Out[25]: 1490195805.433503

In [26]: float('1490195805.433502912')
Out[26]: 1490195805.433503

In [27]: pd.read_csv(StringIO("a\n1490195805.433502912")).iloc[0,0]
Out[27]: 1490195805.4335027

In [28]: pd.read_csv(StringIO("a\n1490195805.433502912"), float_precision='high').iloc[0,0]
Out[28]: 1490195805.4335029

And additionally, also floating point arithmetic will give small differences, a float * 10^9 does not necessarily result in the same number as combining both separate parts as integers

jreback · 2017-03-27T12:38:53Z

@mchwalisz

furthermore, python floats are unbounded precision, while this is casted to a float64 which has high (though limited precision), about 15 significant digits. rounding is unavoidable. The only way to have exact precision is to use a fixed-width exact type (e.g. an int64). This is why we store the datetimes as int64. You can round-trip them exactly even via text this way.

In [6]: pd.to_datetime(1490195805.4335027 * 10**9, unit='ns').value
Out[6]: 1490195805433502720

In [7]: pd.to_datetime(1490195805.4335029 * 10**9, unit='ns').value
Out[7]: 1490195805433502976

In [8]: 1490195805.4335027 * 10**9
Out[8]: 1.4901958054335027e+18

In [9]: 1490195805.4335029 * 10**9
Out[9]: 1.490195805433503e+18

jreback · 2017-03-27T12:40:22Z

@mchwalisz happy to have a short section in timeseries.rst about this if you want / possibly in .to_datetime doc-string as well. not sure where else to put any docs about this.

jorisvandenbossche · 2017-03-27T12:40:33Z

A string needs to be parsed to an actual floating point, and also here differences can occur based on which implementation you use:

In [25]: 1490195805.433502912
Out[25]: 1490195805.433503

In [26]: float('1490195805.433502912')
Out[26]: 1490195805.433503

In [27]: pd.read_csv(StringIO("a\n1490195805.433502912")).iloc[0,0]
Out[27]: 1490195805.4335027

In [28]: pd.read_csv(StringIO("a\n1490195805.433502912"), float_precision='high').iloc[0,0]
Out[28]: 1490195805.4335029

And additionally, also floating point arithmetic will give small differences, a float * 10^9 does not necessarily result in the same number as combining both separate parts as integers.

This is one of the reasons the timestamps are stored as integers.

If you want to read more information about this issue, you can see http://floating-point-gui.de/

jorisvandenbossche added the Usage Question label Mar 27, 2017

jorisvandenbossche added this to the No action milestone Mar 27, 2017

jreback added Docs Datetime Datetime data dtype labels Mar 27, 2017

jorisvandenbossche closed this as completed Mar 27, 2017

jorisvandenbossche reopened this Mar 27, 2017

mchwalisz added a commit to mchwalisz/pandas that referenced this issue Apr 6, 2017

DOC: timeseries.rst floating point precision (pandas-dev#15817)

93f8716

mchwalisz mentioned this issue Apr 6, 2017

DOC: timeseries.rst floating point precision (#15817) #15919

Closed

1 task

mchwalisz added a commit to mchwalisz/pandas that referenced this issue Apr 6, 2017

DOC: timeseries.rst floating point precision (pandas-dev#15817)

cd10e92

mchwalisz added a commit to mchwalisz/pandas that referenced this issue Apr 7, 2017

DOC: timeseries.rst floating point precision (pandas-dev#15817)

e25033e

mchwalisz added a commit to mchwalisz/pandas that referenced this issue Apr 7, 2017

DOC: timeseries.rst floating point precision (pandas-dev#15817)

7b82e8b

jreback closed this as completed in 860d555 Apr 8, 2017

jreback modified the milestones: 0.20.0, No action Apr 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to_datetime() ns precision inconsistencies between s and ns #15817

to_datetime() ns precision inconsistencies between s and ns #15817

mchwalisz commented Mar 27, 2017

jorisvandenbossche commented Mar 27, 2017

mchwalisz commented Mar 27, 2017

jorisvandenbossche commented Mar 27, 2017

jreback commented Mar 27, 2017

jreback commented Mar 27, 2017

jorisvandenbossche commented Mar 27, 2017

to_datetime() ns precision inconsistencies between s and ns #15817

to_datetime() ns precision inconsistencies between s and ns #15817

Comments

mchwalisz commented Mar 27, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

jorisvandenbossche commented Mar 27, 2017

mchwalisz commented Mar 27, 2017

jorisvandenbossche commented Mar 27, 2017

jreback commented Mar 27, 2017

jreback commented Mar 27, 2017

jorisvandenbossche commented Mar 27, 2017

Output of `pd.show_versions()`