Skip to content

Fix append unique#423

Merged
rapids-bot[bot] merged 1 commit intorapidsai:mainfrom
linhu-nv:fix-append-unique
Apr 1, 2026
Merged

Fix append unique#423
rapids-bot[bot] merged 1 commit intorapidsai:mainfrom
linhu-nv:fix-append-unique

Conversation

@linhu-nv
Copy link
Copy Markdown
Contributor

@linhu-nv linhu-nv commented Mar 6, 2026

No description provided.

@linhu-nv linhu-nv requested review from a team as code owners March 6, 2026 05:32
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 6, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@linhu-nv linhu-nv requested a review from alexbarghi-nv March 6, 2026 05:32
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 6, 2026

Greptile Summary

Replaces manual CPython API calls (PyTuple_New, PyTuple_SetItem, PyObject_CallObject, etc.) in all six Cython callback wrappers with idiomatic Cython syntax, and changes the malloc callback interface to pass raw Python primitives (shape: tuple, dtype_int: int, malloc_type_int: int) instead of PyWholeMemoryTensorDescription/PyMemoryAllocType wrapper objects.\n\nKey changes:\n- All six python_cb_wrapper_* functions now use natural Cython fn(args) calls, eliminating the error-prone manual reference counting around argument tuples.\n- torch_malloc_env_fn updated to accept the new primitive arguments; enum comparisons updated to int(wmb.WholeMemoryMemoryAllocType.*) and dtype reconstruction uses wmb.WholeMemoryDataType(dtype_int).\n- strides and storage_offset from wholememory_tensor_description_t are no longer forwarded to the Python malloc callback — the current torch_malloc_env_fn doesn't need them (torch.empty only uses shape/dtype), but custom implementations would lose access to that information.

Confidence Score: 5/5

Safe to merge; the refcount simplification is correct and the only finding is a P2 API narrowing that doesn't affect any current caller.

The Cython simplification is semantically equivalent to the old manual PyAPI code and the reference counting for all callbacks is correct. The sole finding — strides/storage_offset being dropped from the malloc callback API — has no impact on the existing torch implementation since torch.empty only needs shape and dtype. No regressions are introduced.

No files require special attention.

Important Files Changed

Filename Overview
python/pylibwholegraph/pylibwholegraph/binding/wholememory_binding.pyx Replaced manual PyAPI tuple construction with idiomatic Cython calls across all six callbacks; malloc callbacks now pass raw primitives (shape tuple, dtype int, malloc_type int) instead of PyWholeMemoryTensorDescription/PyMemoryAllocType wrappers, dropping strides and storage_offset from the callback interface.
python/pylibwholegraph/pylibwholegraph/torch/wholegraph_env.py torch_malloc_env_fn signature updated to match new primitive-based interface; comparison logic correctly converts enum values to int for comparison; torch.empty call is functionally identical to before since only shape and dtype were ever used.

Sequence Diagram

sequenceDiagram
    participant C as C_Runtime
    participant CY as Cython_Callbacks
    participant PY as Python_env_fns
    C->>CY: create_context
    CY->>PY: fn(ctx) returns TorchMemoryContext
    C->>CY: malloc with tensor_desc and malloc_type
    CY->>PY: fn(shape_tuple, dtype_int, malloc_type_int, mem_ctx, ctx)
    PY-->>CY: int64 pointer
    CY-->>C: void pointer
    C->>CY: free with memory_context
    CY->>PY: fn(mem_ctx, ctx)
    C->>CY: destroy_context with memory_context
    CY->>PY: fn(mem_ctx, ctx)
Loading

Reviews (4): Last reviewed commit: "Replace manual PyTuple_New/PyTuple_SetIt..." | Re-trigger Greptile

Comment on lines +427 to +429
if global_context == NULL:
(<void **> memory_context)[0] = NULL
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory_context is dereferenced at line 428 without first checking for null. When global_context == NULL, the code writes to memory_context[0] without validating memory_context itself is non-null. Every other callback in this PR (destroy, malloc, free, output_malloc, output_free) guards against null memory_context before dereferencing. This should be consistent:

Suggested change
if global_context == NULL:
(<void **> memory_context)[0] = NULL
return
if global_context == NULL:
if memory_context != NULL:
(<void **> memory_context)[0] = NULL
return

@gforsyth
Copy link
Copy Markdown
Contributor

gforsyth commented Mar 6, 2026

/ok to test 64d68c3

@gforsyth gforsyth requested a review from a team as a code owner March 6, 2026 18:40
@gforsyth gforsyth requested a review from bdice March 6, 2026 18:40
@gforsyth
Copy link
Copy Markdown
Contributor

gforsyth commented Mar 6, 2026

/ok to test 18054d8

@gforsyth
Copy link
Copy Markdown
Contributor

gforsyth commented Mar 6, 2026

Looks like this did fix the first round of segfaults. I added a 3.14 test job for pylibwholegraph so we can see the failure bubble up in this PR.

Currently:

 
Current thread's C stack trace (most recent call first):
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at _Py_DumpStack+0x45 [0x77a03eb73805]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at +0x312c6c [0x77a03eb89c6c]
  Binary file "/lib/x86_64-linux-gnu/libc.so.6", at +0x45330 [0x77a03e6a3330]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at PyTuple_Size+0x8 [0x77a03ea193f8]
  Binary file "/pyenv/versions/3.14.3/lib/python3.14/site-packages/pylibwholegraph/binding/wholememory_binding.abi3.so", at +0x420ad [0x77a03b4d80ad]
  Binary file "/pyenv/versions/3.14.3/lib/python3.14/site-packages/pylibwholegraph/binding/wholememory_binding.abi3.so", at +0x27c19 [0x77a03b4bdc19]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at _PyObject_MakeTpCall+0xc0 [0x77a03e97e080]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at +0x10b0fa [0x77a03e9820fa]
  Binary file "/pyenv/versions/3.14.3/lib/python3.14/site-packages/pylibwholegraph/binding/wholememory_binding.abi3.so", at +0x3be3a [0x77a03b4d1e3a]
  Binary file "/pyenv/versions/3.14.3/lib/python3.14/site-packages/libwholegraph/lib64/libwholegraph.so", at _ZN14wholegraph_ops55wholegraph_csr_weighted_sample_without_replacement_funcIilfEEv18wholememory_gref_t31wholememory_array_description_tS1_S2_S1_S2_PvS2_iS3_S2_S3_S3_S3_yP22wholememory_env_func_tP11CUstream_st+0x137 [0x779fd7aa1717]
  Binary file "/pyenv/versions/3.14.3/lib/python3.14/site-packages/libwholegraph/lib64/libwholegraph.so", at _ZN14wholegraph_ops57wholegraph_csr_weighted_sample_without_replacement_mappedE18wholememory_gref_t31wholememory_array_description_tS0_S1_S0_S1_PvS1_iS2_S1_S2_S2_S2_yP22wholememory_env_func_tP11CUstream_st+0x19b [0x779fd7a6d22b]
  Binary file "/pyenv/versions/3.14.3/lib/python3.14/site-packages/libwholegraph/lib64/libwholegraph.so", at wholegraph_csr_weighted_sample_without_replacement+0x116d [0x779fd7a5d54d]
  Binary file "/pyenv/versions/3.14.3/lib/python3.14/site-packages/pylibwholegraph/binding/wholememory_binding.abi3.so", at +0x5c595 [0x77a03b4f2595]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at _PyObject_MakeTpCall+0xc0 [0x77a03e97e080]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at _PyEval_EvalFrameDefault+0x3f0f [0x77a03e914c1f]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at +0x26b3ba [0x77a03eae23ba]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at +0x109129 [0x77a03e980129]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at +0x39281c [0x77a03ec0981c]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at _PyObject_MakeTpCall+0xc0 [0x77a03e97e080]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at PyObject_Vectorcall+0x53 [0x77a03e97e2f3]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at _PyEval_EvalFrameDefault+0x3f0f [0x77a03e914c1f]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at +0x26b3ba [0x77a03eae23ba]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at +0x108e6f [0x77a03e97fe6f]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at +0x109060 [0x77a03e980060]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at +0x1b9de4 [0x77a03ea30de4]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at _PyObject_MakeTpCall+0xc0 [0x77a03e97e080]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at _PyEval_EvalFrameDefault+0x4e56 [0x77a03e915b66]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at +0x26b3ba [0x77a03eae23ba]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at +0x108e6f [0x77a03e97fe6f]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at +0x109060 [0x77a03e980060]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at +0x1b9de4 [0x77a03ea30de4]
  Binary file "/pyenv/versions/3.14.3/lib/libpython3.14.so.1.0", at PyObject_Call+0xa4 [0x77a03e980384]
  <truncated rest of calls>

with Cython-native function call syntax in all env callback wrappers,
and pass plain Python objects (tuple, int) instead of cdef class
instances across the C→Python boundary.
@linhu-nv linhu-nv force-pushed the fix-append-unique branch from 18054d8 to 81d049a Compare March 27, 2026 09:08
@alexbarghi-nv
Copy link
Copy Markdown
Member

/ok to test 81d049a

@alexbarghi-nv alexbarghi-nv added bug Something isn't working non-breaking Introduces a non-breaking change labels Apr 1, 2026
Copy link
Copy Markdown
Member

@alexbarghi-nv alexbarghi-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for getting this fix in.

@alexbarghi-nv alexbarghi-nv removed the request for review from a team April 1, 2026 07:15
@alexbarghi-nv
Copy link
Copy Markdown
Member

/merge

@rapids-bot rapids-bot bot merged commit fbea7cb into rapidsai:main Apr 1, 2026
56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants