Add to_arrow method to pylibcudf core types#19787
Add to_arrow method to pylibcudf core types#19787rapids-bot[bot] merged 14 commits intorapidsai:branch-25.10from
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
mroeschke
left a comment
There was a problem hiding this comment.
Curious, is there an immediate use case for this API?
This is really just a quality of life improvement for pylibcudf devs. Easier to type |
|
OK I would be open to having |
|
I would prefer not to have two ways to do the arrow conversion. The argument for keeping the interop module approach is that I would also like to isolate our pyarrow dependency as much as possible so that we could completely avoid the pyarrow dependency except when a user strictly requests it. I'm not very convinced of the quality-of-life improvement of this PR TBH. Are you finding the current approach cumbersome? My hope is that we move away from converting to pyarrow representations as much as possible actually. Outside of our test suite, my view is that we view the arrow data spec as the mode of interop so that any library that directly supports it will "just work", but we shouldn't be converting device->host outside of debugging too often. Would it help to lazily alias |
Sure, but we have two ways of ingesting arrow objects
Heh, maybe I'm in the minority here 😅, but I do find the current approach cumbersome.
OK yeah that makes sense.
I don't think so. I like this style (
plc.Column.from_arrow(...)
.do_stuff()
.to_arrow()
) |
|
Matt and I discussed earlier today and we decided to move forward with this PR and make the class methods the source of truth and eventually deprecate the interop functions instead. We can still minimize and isolate our pyarrow interactions, it'll just have to be spread over a few more files to do it. |
vyasr
left a comment
There was a problem hiding this comment.
Looks fine to me aside from a couple of tiny notes. Could you open a follow-up issue to migrate all calls in cudf-polars and cudf classic to use the new APIs, and then to deprecate the old interop module's functions?
| # TODO: Once the arrow C device interface registers more | ||
| # types that it supports, we can call pa.array(self) if | ||
| # no metadata is passed. |
There was a problem hiding this comment.
What is this comment referring to? I don't understand I'm afraid.
There was a problem hiding this comment.
There's a typo that should be pa.table
There was a problem hiding this comment.
I mean is that once arrow can consume more kinds of device arrays, we can call pa.table(self) directly. The following example fails today
In [1]: import pylibcudf as plc
In [2]: import pyarrow as pa
In [3]: pa.table(plc.Table([plc.Column.from_iterable_of_py([1,2])]))
---------------------------------------------------------------------------
ArrowKeyError Traceback (most recent call last)
Cell In[3], line 1
----> 1 pa.table(plc.Table([plc.Column.from_iterable_of_py([1,2])]))
File ~/.conda/envs/rapids/lib/python3.13/site-packages/pyarrow/table.pxi:6235, in pyarrow.lib.table()
File ~/.conda/envs/rapids/lib/python3.13/site-packages/pyarrow/table.pxi:6054, in pyarrow.lib.record_batch()
File ~/.conda/envs/rapids/lib/python3.13/site-packages/pyarrow/table.pxi:4044, in pyarrow.lib.RecordBatch._import_from_c_device_capsule()
File ~/.conda/envs/rapids/lib/python3.13/site-packages/pyarrow/error.pxi:155, in pyarrow.lib.pyarrow_internal_check_status()
File ~/.conda/envs/rapids/lib/python3.13/site-packages/pyarrow/error.pxi:92, in pyarrow.lib.check_status()
ArrowKeyError: Device type 2is not registeredThere was a problem hiding this comment.
Oh I see. Yeah... I'm not sure that'll ever be supported, but if it is then great. The best bet might be backing pyarrow with pylibcudf for GPU bits!
|
/merge |
Description
Adds
to_arrowmethods to pylibcudf core types. Not a priority but a quality of life improvement for pylibcudf developers. Eg. Call.to_arrowinstead ofplc.interop.to_arrowChecklist