Use PyBuffer_FillInfo for simple buffers & simplify Python buffer cleanup#436
Conversation
When using the Python Buffer Protocol with simple memory buffers like `PyWholeMemoryUniqueID` where it just bytes with a length, `PyBuffer_FillInfo` can fill out the `Py_buffer` object for us simply and easily. It also handles any validity checks as well different provided flags. This streamlines the code and simplifies maintenance in these cases.
The goal of this method is to handle any extra memory cleanup (like if we needed to allocate memory for `shape` or `strides`) or relax any imposed constraints (like restricting resizing while views are held). However none of this really applies here. The `Py_buffer` struct itself is being cleaned up during this process so there is no need to do that. Plus some values (like `obj`) need to persist pass this method so they can be `decref`'d by Python itself. Removing them can actually result in dangling references, which could causes issues with memory cleanup. So instead simply `pass` in these methods and let the normal cleanup process continue.
Greptile SummaryThis PR simplifies the Python buffer protocol implementation for
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Consumer as Buffer Consumer
participant Cython as PyWholeMemoryUniqueID
participant CPython as CPython Runtime
Consumer->>Cython: __getbuffer__(buffer, flags)
Cython->>CPython: PyBuffer_FillInfo(buffer, self, ptr, len, False, flags)
CPython-->>Cython: fills buf/len/format/shape/strides/obj, Py_INCREFs self
Cython-->>Consumer: buffer ready
Consumer->>Cython: __releasebuffer__(buffer)
Note over Cython: pass (no-op)
Cython-->>Consumer: return
Consumer->>CPython: Py_DECREF(buffer.obj)
CPython-->>Consumer: self ref released safely
Last reviewed commit: "Merge branch 'main' ..." |
|
|
||
| def __getbuffer__(self, Py_buffer *buffer, int flags): | ||
| buffer.buf = &self.wholememory_unique_id.internal[0] | ||
| buffer.format = 'c' |
There was a problem hiding this comment.
Usually the default is NULL or b"B", which is uint8_t (or unsigned char). This is what PyBuffer_FillInfo uses. Also this is how bytes and bytearray work
IIUC b"c" is basically equivalent, but is seldom used. So switching to b"B" should still work in similar cases and work better in cases where b"B" is expected
Though please let me know if there is additional context we should consider here
| def __getbuffer__(self, Py_buffer *buffer, int flags): | ||
| buffer.buf = self.c_ptr | ||
| buffer.format = 'c' | ||
| buffer.internal = NULL | ||
| buffer.itemsize = self.itemsize | ||
| buffer.len = self.shape[0] | ||
| buffer.ndim = 1 | ||
| buffer.obj = self | ||
| buffer.readonly = 0 | ||
| buffer.shape = self.shape | ||
| buffer.strides = self.strides | ||
| buffer.suboffsets = NULL |
There was a problem hiding this comment.
Had looked at doing the same in this case. However noticed there is an itemsize that may not be 1. Though this is also b"c" format. Reading the code the shape and itemsize appear to be important here. This doesn't really work with PyBuffer_FillInfo, which expects b"B" with itemsize of 1. So skipped this case
|
Recently there were some fixes made to CI in PR: #425 Have pulled in the latest changes from |
|
@linhu-nv please review when you have a chance |
|
Thanks Alex! 🙏 Looks like CI is passing now too 🎉 |
linhu-nv
left a comment
There was a problem hiding this comment.
Sorry for the late review. This looks good to me. Thanks! @jakirkham @alexbarghi-nv
|
/merge |
|
Thanks Lin! 🙏 |
When using the Python Buffer Protocol with simple memory buffers like
PyWholeMemoryUniqueIDwhere it just bytes with a length, thePyBuffer_FillInfofunction can fill out thePy_bufferobject for us simply and easily. It also handles any validity checks as well different provided flags. This streamlines the code and simplifies maintenance in these cases.Also the
__releasebuffer__method is used to handle any extra memory cleanup (like if we needed to allocate memory forshapeorstrides) or relax any imposed constraints (like restricting resizing while views are held). However none of this really applies here. ThePy_bufferstruct itself is being cleaned up during this process so there is no need to do that. Plus some values (likeobj) need to persist pass this method so they can bedecref'd by Python itself. Removing them can actually result in dangling references, which could causes issues with memory cleanup. So instead simplypassin these methods and let the normal cleanup process continue.