RecordBatch get_array_memory_size
returns incorrect size if underlying buffers are shared
#5969
Labels
enhancement
Any new improvement worthy of a entry in the changelog
Describe the bug
The implementation of
get_array_memory_size
is incorrect according to its documentation which states that it "Returns the total number of bytes of memory occupied physically by this batch." If the underlying buffers are shared in the record batch, this function will overreport the size. This can happen for example if you write to the Arrow IPC format, as when your read back, as all data is continuous in one buffer.https://docs.rs/arrow-array/52.0.0/src/arrow_array/record_batch.rs.html#472
To Reproduce
Expected behavior
I'd expect the sizing to be the actual total size across the unique buffers in the record batch.
Additional context
The text was updated successfully, but these errors were encountered: