Skip to content

Use PyBuffer_FillInfo for simple buffers & simplify Python buffer cleanup#436

Merged
rapids-bot[bot] merged 3 commits intorapidsai:mainfrom
jakirkham:use_pybuf_fill_info
Mar 24, 2026
Merged

Use PyBuffer_FillInfo for simple buffers & simplify Python buffer cleanup#436
rapids-bot[bot] merged 3 commits intorapidsai:mainfrom
jakirkham:use_pybuf_fill_info

Conversation

@jakirkham
Copy link
Copy Markdown
Member

When using the Python Buffer Protocol with simple memory buffers like PyWholeMemoryUniqueID where it just bytes with a length, the PyBuffer_FillInfo function can fill out the Py_buffer object for us simply and easily. It also handles any validity checks as well different provided flags. This streamlines the code and simplifies maintenance in these cases.

Also the __releasebuffer__ method is used to handle any extra memory cleanup (like if we needed to allocate memory for shape or strides) or relax any imposed constraints (like restricting resizing while views are held). However none of this really applies here. The Py_buffer struct itself is being cleaned up during this process so there is no need to do that. Plus some values (like obj) need to persist pass this method so they can be decref'd by Python itself. Removing them can actually result in dangling references, which could causes issues with memory cleanup. So instead simply pass in these methods and let the normal cleanup process continue.

When using the Python Buffer Protocol with simple memory buffers like
`PyWholeMemoryUniqueID` where it just bytes with a length,
`PyBuffer_FillInfo` can fill out the `Py_buffer` object for us simply
and easily. It also handles any validity checks as well different
provided flags. This streamlines the code and simplifies maintenance in
these cases.
The goal of this method is to handle any extra memory cleanup (like if
we needed to allocate memory for `shape` or `strides`) or relax any
imposed constraints (like restricting resizing while views are held).
However none of this really applies here. The `Py_buffer` struct itself
is being cleaned up during this process so there is no need to do that.
Plus some values (like `obj`) need to persist pass this method so they
can be `decref`'d by Python itself. Removing them can actually result in
dangling references, which could causes issues with memory cleanup. So
instead simply `pass` in these methods and let the normal cleanup
process continue.
@jakirkham jakirkham requested a review from a team as a code owner March 18, 2026 22:38
@jakirkham jakirkham added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Mar 18, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 18, 2026

Greptile Summary

This PR simplifies the Python buffer protocol implementation for PyWholeMemoryUniqueID and PyWholeMemoryFlattenDlpack in two ways:

  • PyWholeMemoryUniqueID.__getbuffer__: Replaces nine manual Py_buffer field assignments with a single PyBuffer_FillInfo call. This is appropriate for a simple contiguous byte buffer and delegates flag validation and struct population to CPython. Note that the format string changes from 'c' (char) to "B" (unsigned byte) when PyBUF_FORMAT is requested — a semantically minor difference for an opaque ID buffer that is unlikely to affect any consumer.
  • Both __releasebuffer__ methods: Replaced with pass. The old code was actively unsafe: setting buffer.obj = None decremented the exporter's reference count before CPython performs its own Py_DECREF on buffer.obj, risking a double-decref and dangling pointer. The pass approach is correct — __releasebuffer__ is only needed to undo resource acquisitions made in __getbuffer__, and none were made here that need manual teardown.

Confidence Score: 5/5

  • Safe to merge — the changes are correct, and the releasebuffer simplification fixes a latent double-decref bug.
  • The use of PyBuffer_FillInfo is appropriate and idiomatic for a simple flat byte buffer. The releasebuffer change from field-nulling to pass is a correctness improvement, not just a style choice, as the old code could cause premature reference count decrements on buffer.obj. No new logic is introduced that could regress behaviour.
  • No files require special attention.

Important Files Changed

Filename Overview
python/pylibwholegraph/pylibwholegraph/binding/wholememory_binding.pyx Replaces manual Py_buffer field population with PyBuffer_FillInfo in PyWholeMemoryUniqueID.getbuffer, and simplifies both releasebuffer methods to pass — fixing a latent dangling-reference bug caused by prematurely nulling buffer.obj.

Sequence Diagram

sequenceDiagram
    participant Consumer as Buffer Consumer
    participant Cython as PyWholeMemoryUniqueID
    participant CPython as CPython Runtime

    Consumer->>Cython: __getbuffer__(buffer, flags)
    Cython->>CPython: PyBuffer_FillInfo(buffer, self, ptr, len, False, flags)
    CPython-->>Cython: fills buf/len/format/shape/strides/obj, Py_INCREFs self
    Cython-->>Consumer: buffer ready

    Consumer->>Cython: __releasebuffer__(buffer)
    Note over Cython: pass (no-op)
    Cython-->>Consumer: return
    Consumer->>CPython: Py_DECREF(buffer.obj)
    CPython-->>Consumer: self ref released safely
Loading

Last reviewed commit: "Merge branch 'main' ..."


def __getbuffer__(self, Py_buffer *buffer, int flags):
buffer.buf = &self.wholememory_unique_id.internal[0]
buffer.format = 'c'
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually the default is NULL or b"B", which is uint8_t (or unsigned char). This is what PyBuffer_FillInfo uses. Also this is how bytes and bytearray work

IIUC b"c" is basically equivalent, but is seldom used. So switching to b"B" should still work in similar cases and work better in cases where b"B" is expected

Though please let me know if there is additional context we should consider here

Comment on lines 1197 to 1208
def __getbuffer__(self, Py_buffer *buffer, int flags):
buffer.buf = self.c_ptr
buffer.format = 'c'
buffer.internal = NULL
buffer.itemsize = self.itemsize
buffer.len = self.shape[0]
buffer.ndim = 1
buffer.obj = self
buffer.readonly = 0
buffer.shape = self.shape
buffer.strides = self.strides
buffer.suboffsets = NULL
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had looked at doing the same in this case. However noticed there is an itemsize that may not be 1. Though this is also b"c" format. Reading the code the shape and itemsize appear to be important here. This doesn't really work with PyBuffer_FillInfo, which expects b"B" with itemsize of 1. So skipped this case

@jakirkham
Copy link
Copy Markdown
Member Author

Recently there were some fixes made to CI in PR: #425

Have pulled in the latest changes from main to retest on CI

@alexbarghi-nv
Copy link
Copy Markdown
Member

@linhu-nv please review when you have a chance

@jakirkham
Copy link
Copy Markdown
Member Author

Thanks Alex! 🙏

Looks like CI is passing now too 🎉

Copy link
Copy Markdown
Contributor

@linhu-nv linhu-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review. This looks good to me. Thanks! @jakirkham @alexbarghi-nv

@jakirkham
Copy link
Copy Markdown
Member Author

/merge

@rapids-bot rapids-bot bot merged commit dbb33ad into rapidsai:main Mar 24, 2026
54 checks passed
@jakirkham jakirkham deleted the use_pybuf_fill_info branch March 24, 2026 18:54
@jakirkham
Copy link
Copy Markdown
Member Author

Thanks Lin! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants