Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pd.TimeStamp & pd.Timedelta not always adhering to unit parameter in pandas-2.0 #52286

Closed
2 of 3 tasks
galipremsagar opened this issue Mar 29, 2023 · 11 comments · Fixed by #52311
Closed
2 of 3 tasks
Labels
Bug Docs Non-Nano datetime64/timedelta64 with non-nanosecond resolution Timedelta Timedelta data type Timestamp pd.Timestamp and associated methods
Milestone

Comments

@galipremsagar
Copy link

galipremsagar commented Mar 29, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> x = 340.3333333333333
>>> pd.Timestamp(x, unit='us')
Timestamp('1970-01-01 00:00:00.000340333')
>>> pd.Timestamp(x, unit='us').unit
'ns'
>>> pd.Timestamp(x, unit='us').as_unit('us')
Timestamp('1970-01-01 00:00:00.000340')
>>> pd.Timestamp(x, unit='us').as_unit('us').unit
'us'

Issue Description

When creating a pd.Timestamp & pd.Timedelta with a specific unit, the resulting Timestamp seems to be always of unit ns. We are having to perform as_unit('us') to retrieve the desired result.

Expected Behavior

>>> pd.Timestamp(x, unit='us')
Timestamp('1970-01-01 00:00:00.000340')

Installed Versions

INSTALLED VERSIONS

commit : c2a7f1a
python : 3.10.10.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-76-generic
Version : #86-Ubuntu SMP Fri Jan 17 17:24:28 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.0.0rc1
numpy : 1.23.5
pytz : 2023.2
dateutil : 2.8.2
setuptools : 67.6.0
pip : 23.0.1
Cython : 0.29.33
pytest : 7.2.2
hypothesis : 6.70.1
sphinx : 5.3.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.11.0
pandas_datareader: None
bs4 : 4.12.0
bottleneck : None
brotli :
fastparquet : None
fsspec : 2023.3.0
gcsfs : None
matplotlib : None
numba : 0.56.4
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : None
s3fs : 2023.3.0
scipy : 1.10.1
snappy :
sqlalchemy : 1.4.46
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : None
tzdata : None
qtpy : None
pyqt5 : None

@galipremsagar galipremsagar added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 29, 2023
@galipremsagar galipremsagar changed the title BUG: pd.TimeStamp not always adhering to unit parameter in pandas-2.0 BUG: pd.TimeStamp & pd.Timedelta not always adhering to unit parameter in pandas-2.0 Mar 29, 2023
@jbrockmendel
Copy link
Member

The "unit" keyword describes how the input is interpreted, not how the result is stored internally (which the .unit attribute represents). The overlapping naming is unfortunate.

cc @MarcoGorelli

@mroeschke
Copy link
Member

I forgot if this was discussed, but is it possible if a user passes a numeric + unit to automatically also store the representation in that unit?

@jbrockmendel
Copy link
Member

is it possible if a user passes a numeric + unit to automatically also store the representation in that unit?

For units that we support, yes. For minute and lower we'd default back to seconds. I'd be on board with this.

@mroeschke
Copy link
Member

For units that we support, yes. For minute and lower we'd default back to seconds. I'd be on board with this.

Likewise, makes sense to me

@jbrockmendel
Copy link
Member

OK with putting this in 2.0?

@mroeschke
Copy link
Member

Yes definitely

@mroeschke mroeschke added this to the 2.0 milestone Mar 30, 2023
@MarcoGorelli MarcoGorelli added the Blocker Blocking issue or pull request for an upcoming release label Mar 30, 2023
@MarcoGorelli
Copy link
Member

OK with putting this in 2.0?

totally - marking as blocker then if that's OK?

@jbrockmendel
Copy link
Member

Implementing this I now remember why we didn't do it already: it introduces possibly-unwanted rounding in float cases. i.e Timestamp(1.5, unit="s) drops the half-second. We have 6 tests TestTimestamp::test_unit that this breaks.

@DeaMariaLeon DeaMariaLeon added Timedelta Timedelta data type Timestamp pd.Timestamp and associated methods and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 30, 2023
@mroeschke
Copy link
Member

it introduces possibly-unwanted rounding in float cases

Ah okay. Yeah I don't think specifying unit should potentially round the input. Maybe for this particular case then we should document that unit does not set the unit and users should use as_unit afterwards

@mroeschke
Copy link
Member

totally - marking as blocker then if that's OK?

IMO I don't think this should be a strict blocker for 2.0 but a nice to fix/document

@MarcoGorelli
Copy link
Member

sure, sounds good!

separate issue but float inputs to to_datetime feel like something we should consider deprecating anyway...removing the 'blocker' label anyway

@MarcoGorelli MarcoGorelli added Docs Non-Nano datetime64/timedelta64 with non-nanosecond resolution and removed Blocker Blocking issue or pull request for an upcoming release labels Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Docs Non-Nano datetime64/timedelta64 with non-nanosecond resolution Timedelta Timedelta data type Timestamp pd.Timestamp and associated methods
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants