-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Easy way to zero-copy IPC buffers. #5165
Conversation
Rather than duplicating the reader perhaps we could extract a Something like
|
As written this is now a breaking API change... I'll have a play over tomorrow and see what I can come up with |
I have a way that goes: /// Arrow File reader
pub struct FileReader<R: Read + Seek> {
/// Buffered file reader that supports reading and seeking
reader: BufReader<R>,
/// Optional Buffer for when we want to map the underlying data without copying
buffer: Option<BufferReader>,
......
}
impl FileReader<Cursor<BufferReader>> {
pub fn try_new_map(buffer: Buffer, projection: Option<Vec<usize>>) -> Result<Self, ArrowError> {
let buffer = BufferReader { inter: buffer };
let cursor = Cursor::new(buffer.clone());
Self::try_new_inter(cursor, projection, Some(buffer))
}
}
impl<R: Read + Seek> FileReader<R> {
pub fn try_new(reader: R, projection: Option<Vec<usize>>) -> Result<Self, ArrowError> {
Self::try_new_inter(reader, projection, None)
}
fn try_new_inter(
reader: R,
projection: Option<Vec<usize>>,
buffer: Option<BufferReader>,
) -> Result<Self, ArrowError> {
let mut reader = BufReader::new(reader);
.......
}
} It then checks if the optional buffer is there during the dictionary loop and It's here and I can merge it with the current state if you want to go this direction. |
AsRef<[u8]> instead of a breaking trait.
FYI I've started work on a slightly different approach to this, starting with - #5179 |
Which issue does this PR close?
Closes #5153
Rationale for this change
There are ways to memmap buffers, if you know the header information, but no easy way to use the existing IPC file format in a zero-copy way.
What changes are included in this PR?
Reused the IPC reader code to read the header and footer information from an IPC file, but swapped out the read/copy of the buffer for the Buffer creation with
Buffer::from_custom_allocation
. Minor savings can be had by not allocating the arbitrary-sized footer and message read, but the easy way to do this requires a nightly API in the Cursor object.This duplicated some code, but it's messy to deduplicate because most of it is boilerplate around reading the flatbuffer messages. The better way to deduplicate is to change the
FileReader
to optionally accept a buffer and conditionally use that in the 2 locations where buffers are actually read. Perfectly happy to do it this way as well.Are there any user-facing changes?
There's a new struct that mirrors the current IPC reader struct that passes back zero-copy buffers.