-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
BUG: Fix #13149 and ENH: 'copy' param in Index.astype() #13209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
tests pls. |
cb58d56 to
ade4e95
Compare
|
Added tests. (I hope I'm doing this right.) |
pandas/tests/indexes/test_numeric.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add tests for datetimelikes as well to cover the bases; I think your change might be changing behavior there as well (which is not correct)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean this conversion:
pd.DatetimeIndex([1,2,np.nan]).astype(int)
Out[16]: array([ 1, 2, -9223372036854775808])
Its behaviour is unchanged. I didn't touch other indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - just want to make sure that is tested as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
(BTW, DatetimeIndex.astype(int), TimedeltaIndex.astype(int) return a numpy.array. Isn't it a bug? PeriodIndex and Float64Index return Int64Index.)
4d310e9 to
07e27e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is wrong ; see if u can fix so these are returned as index types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed this, I guess.
doc/source/whatsnew/v0.18.2.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you can list the second point as a separate bug fix issue and use the PR number as the issue number
7243bb6 to
3d3a5b3
Compare
pandas/tseries/index.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, better to use is_integer_dtype(dtype) more generic her
(and change dtype == np.object_ to is_object_dtype(dtype))
and dtype == _NS_DTYPE to is_int64_dtype(dtype)
and dtype == str to is_string_dtype(dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and
dtype == _NS_DTYPEtois_int64_dtype(dtype)
I guess it's is_datetime64_ns_dtype(dtype).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep - typing fast !
|
looks pretty good. a couple of syntax changes. ping when green. |
doc/source/whatsnew/v0.18.2.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Int64Index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I ran into an issue with the null Period Period(NaT, 'D") not being seen as null by pd.isnull(). It's also impossible to directly check array or index equivalence if they contain null Periods.
I can elaborate on this and possible fixes (the fixes I tried are rather beyond the scope of this commit). Unless we don't want Period(NaT, 'D") to be treated as null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #12759 this will be addressed, so for now use Period('NaT', 'D') until this is fixed.
You can add a TODO / comment with this issue number and can be updated later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Added TODO.
doc/source/whatsnew/v0.18.2.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove this; this is an internal function
|
@pijucha looks pretty good. There are some existing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, Python2 returns an Index with mixed type: Date '2016-05-16' is a string, and NaT's are unicode. Is it all right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in py2 these should all be strings not Unicode
never should be mixed
can u show an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example from tests (nb. it failed assert_index_equal with pyhton2.7):
In [2]: idx = DatetimeIndex(['2016-05-16', 'NaT', NaT, np.NaN])
In [3]: idx
Out[3]: DatetimeIndex(['2016-05-16', 'NaT', 'NaT', 'NaT'], dtype='datetime64[ns]', freq=None)
In [4]: idx2 = idx.astype(str)
In [5]: idx2
Out[5]: Index([u'2016-05-16', u'NaT', u'NaT', u'NaT'], dtype='object')
In [7]: type(idx2[0])
Out[7]: str
In [8]: type(idx2[1])
Out[8]: unicodeI think this u('NaT') may be responsible:
# class DatetimeIndex:
def _format_native_types(self, na_rep=u('NaT'),
date_format=None, **kwargs):There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm looks like we are always returning Unicode
we should be returning a string here under py2 (and the display formatters will make them Unicode when displaying)
see if returning as a string breaks anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be ok. It's just passed Python2.7 tests with "not network and not slow and not disabled".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok great. make sure we add this as a test (and comment)
pandas/tseries/index.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this default na_rep. No breaks observed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k thanks
I don't know if you added that test (below) which discovered this bug? (if not pls add with a nice comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The test is among astype tests for DatetimeIndex (just an assert_index_equal(), which previously discovered the Indexes had different inferred types).
I'll comment. Or should I rather make it more explicit and place somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep I saw that. its fine
|
@pijucha looks really good! some test moving around / slight refactoring. ping when green. |
1. Float64Index.astype(int) raises ValueError if a NaN is present.
Previously, it converted NaN's to the smallest negative integer.
2. TimedeltaIndex.astype(int) and DatetimeIndex.astype(int) return
Int64Index, which is consistent with behavior of other Indexes.
Previously, they returned a numpy.array of ints.
3. Added:
- bool parameter 'copy' to Index.astype()
- shared doc string to .astype()
- tests on .astype() (consolidated and added new)
- bool parameter 'copy' to Categorical.astype()
4. Internals:
- Fixed core.common.is_timedelta64_ns_dtype().
- Set a default NaT representation to a string type in a parameter
of DatetimeIndex._format_native_types().
Previously, it produced a unicode u'NaT' in Python2.
|
@jreback It's green. |
|
@pijucha thanks! very nice PR. you really dove deep into some internals! keep em coming! |
|
@jreback Thanks for the guidelines! They helped a lot. |
git diff upstream/master | flake8 --diffUpdate
Previously, it converted NaN's to the smallest negative integer.
Int64Index, which is consistent with behavior of other Indexes.
Previously, they returned a numpy.array of ints.
Previously, it produced a unicode u'NaT' in Python2.