-
-
Notifications
You must be signed in to change notification settings - Fork 33.5k
Description
Bug report
This came up in the context of nanobind. Nanobind implements (in a test) a traverse function:
int funcwrapper_tp_traverse(PyObject *self, visitproc visit, void *arg) {
FuncWrapper *w = nb::inst_ptr<FuncWrapper>(self);
nb::object f = nb::cast(w->f, nb::rv_policy::none);
Py_VISIT(f.ptr());
return 0;
};The nb::object smart pointer is internally reference counted. In other words, the above is roughly equivalent to:
int funcwrapper_tp_traverse(PyObject *self, visitproc visit, void *arg) {
PyObject *f = self->w->f;
Py_INCREF(f);
Py_VISIT(f);
Py_DECREF(f);
return 0;
};This leads to a leak in the free-threaded GC for subtle reasons: when determining resurrected objects, the free-threaded GC uses ob_ref_local to compute the refcount - incoming references, which may be (temporarily) negative. In this case, Py_INCREF() adds 1 to the refcount, but by the time Py_DECREF() is called, the local refcount is -1 which makes the object appear immortal.
There are a number of limitations on the implementations of traverse functions, which are not well documented. For example, it's not safe to allocate, free, track, or untrack Python objects. It's unclear to me whether there are other issues with calling refcounting functions in traverse callbacks.
I think we can make handle_resurrected_objects more robust to this by splitting the first pass over state->unreachable into two passes.
See also: PyO3/pyo3#3165, which was not related to the free-threaded build.