-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(r): Support matrix objects as fixed-size-list arrays #692
Conversation
Do you foresee enhancing the C(++) side of things as well? Or maybe enough is in place -- I have not tried constructing or consuming '+w:$N' objects yet. |
This is nice. When running with numeric vectors it is even closer to zero-copy as it looks like as we retain the buffer: > library(nanoarrow)
> df <- data.frame(x = 1:10)
> df$matrix_col <- matrix(1:20, ncol = 2, byrow = TRUE)
> array <- as_nanoarrow_array(df)
> array
<nanoarrow_array struct[10]>
$ length : int 10
$ null_count: int 0
$ offset : int 0
$ buffers :List of 1
..$ :<nanoarrow_buffer validity<bool>[null] ``
$ children :List of 2
..$ x :<nanoarrow_array int32[10]>
.. ..$ length : int 10
.. ..$ null_count: int 0
.. ..$ offset : int 0
.. ..$ buffers :List of 2
.. .. ..$ :<nanoarrow_buffer validity<bool>[null] ``
.. .. ..$ :<nanoarrow_buffer data<int32>[10][40 b]> `1 2 3 4 5 6 7 8 9 10`
.. ..$ dictionary: NULL
.. ..$ children : list()
..$ matrix_col:<nanoarrow_array fixed_size_list(2)[10]>
.. ..$ length : int 10
.. ..$ null_count: int 0
.. ..$ offset : int 0
.. ..$ buffers :List of 1
.. .. ..$ :<nanoarrow_buffer validity<bool>[null] ``
.. ..$ children :List of 1
.. .. ..$ item:<nanoarrow_array int32[20]>
.. .. .. ..$ length : int 20
.. .. .. ..$ null_count: int 0
.. .. .. ..$ offset : int 0
.. .. .. ..$ buffers :List of 2
.. .. .. .. ..$ :<nanoarrow_buffer validity<bool>[null] ``
.. .. .. .. ..$ :<nanoarrow_buffer data<int32>[20][80 b]> `1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20`
.. .. .. ..$ dictionary: NULL
.. .. .. ..$ children : list()
.. ..$ dictionary: NULL
$ dictionary: NULL
>
> convert_array(array, tibble::tibble(x = integer(), matrix_col = matrix(integer(), ncol = 2)))
# A tibble: 10 × 2
x matrix_col[,1] [,2]
<int> <int> <int>
1 1 1 2
2 2 3 4
3 3 5 6
4 4 7 8
5 5 9 10
6 6 11 12
7 7 13 14
8 8 15 16
9 9 17 18
10 10 19 20
> |
Yes, it's so close! If R did row-major matrices it would be. I need one more iteration here to get this down to a single
Getting the nested offsets is still pretty tricky. I'd like to provide something like: int ArrowArrayViewGetListElement(ArrowArrayView* view, int64_t offset, ArrowArrayView** view_out, int64_t* offset_out, int64_t* length_out); ...which I think will be rather helpful for people to ingest the wild combination of Arrow storage that could possible give you a list as element of an array (e.g., listview, dictionary of list, run end encoded of list, etc.). |
meson-build CI failure is I think due to an Ubuntu runner upgrade. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some high level manual testing and this is working as expected for me.
Thank you for taking a look! |
Still needs some testing on the stream case, and is unfortunately not very zero copy; however, gets the job done (and I think fixes some cases where we would have otherwise silently handled a matrix as the storage type).
Inspired by #691!
Created on 2024-12-12 with reprex v2.1.1