-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
ENH: allow categoricals in msgpack #12573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
pandas/io/packers.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback - should the Series constructor accept a Categorical with dtype=category? This part felt hacky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it already does.
However in this case want to construct a pd.Categorical as you can pass all of the attributes thru
In [1]: Series(list('aabc'),dtype='category')
Out[1]:
0 a
1 a
2 b
3 c
dtype: category
Categories (3, object): [a, b, c]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but you don't need this here at all. the dtype is not present so its None, the Series will just take a Categorical directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I conclude that no change is required here in scope of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, this needs to be reworked a bit, seem my 2nd comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, that might be a bug as in this case it is ok to specify the dtype (but it should be ignored as this is pretty generic as it doesn't specify anything in particular about the dtype, so its compatible). can you open a separate issue for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to start a new thread on this line of code.
So, if I understood, it shouldn't be necessary to special case the dtype? If I remove the dtype=ctor_dtype change (and leave it set to dtype=np_dtype), then it does not work. I get:
ValueError: cannot specify a dtype with a Categorical unless dtype='category'
I don't currently understand what is being asked for here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to rebase on master, that's the issue that was just merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was after rebasing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so that's what I was first going to write. This is not a numpy-dtype and cannot be treated like one. Instead put this code in encode/decode (e.g. check isinstance(obj, Categorical)). Then you won't have this issue at all.
Done. Please let me know if it is to taste. |
doc/source/whatsnew/v0.18.1.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
support for serializing and de-serializing categoricalls with msgpack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
aa487d1 to
6a3cdce
Compare
pandas/io/tests/test_packers.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chris-b1 is this correct? Why is it Series and not Categorical?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wouldn't make a difference here, e.g. df['c'] = Categorical(..) and df['c'] = Series(Categorical(..)) do the same thing
|
ok #12574 has been merged. pls rebase/update. |
6a3cdce to
59cf8fd
Compare
|
Rebased. |
doc/source/whatsnew/v0.18.1.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Support.........categoricals (sp)
d5d077b to
ecab0d4
Compare
|
Wow, I think I managed to clear all of your comments. Everything suddenly made sense in the end. Please let me know if I missed something and if I should squash or reword my commits. |
|
I broke the NDFrame tests. Investigating. |
pandas/io/packers.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you actually need to specify the codes/catgories dtypes. Those will be automatic. IOW, don't use convert here, just same codes : obj.codes and categories : obj.categories. These are recursive encoder/decoders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
84c4b6c to
a2a0693
Compare
pandas/io/packers.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prob use obj.codes here (same but _codes is internal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
a2a0693 to
ae97751
Compare
pandas/io/tests/test_packers.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in a strange position here. I added this test - it passes, but I didn't add the relevant code in convert/unconvert to pass the ordered parameter through. What gives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that check is not used at all! I was just going to tell you to take it out. a Categorical is fully serialized/deserialized via encode/decode the dtype is NEVER category. except when its a series but that is already handled.
5205852 to
330d327
Compare
|
@pwaller can you rebase/update |
116e3d8 to
d1792d9
Compare
|
this looks pretty good code wise. can you squash everything. |
|
cc @kawochen |
fc7e1f1 to
52ea6e4
Compare
|
Squashed. |
pandas/io/packers.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change line 271 to:
elif is_object_dtype(dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
52ea6e4 to
e7c4d9d
Compare
|
Rebased and comments addressed. |
|
I'm getting a test failure I'm not sure how to address and I don't have much time to look into it: |
pandas/io/packers.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be 'category'
e7c4d9d to
c257021
Compare
|
Updated again. Thanks for your patience. |
|
To fix that error |
DOC: support for categoricals in read_msgpack Add TestCategorical test cases Add Catecorical ordered=True ndframe test
|
Forced again. |
c257021 to
ff87282
Compare
|
ok looks good. ping when green. |
|
thanks! |
Supercedes #12191.
I've made a best-effort to rebase this against master.
It seems to work here.