Skip to content

GC in free-threaded build has problems with Py_INCREF()/Py_DECREF() in tp_traverse handlers #123241

@colesbury

Description

@colesbury

Bug report

This came up in the context of nanobind. Nanobind implements (in a test) a traverse function:

int funcwrapper_tp_traverse(PyObject *self, visitproc visit, void *arg) {
    FuncWrapper *w = nb::inst_ptr<FuncWrapper>(self);

    nb::object f = nb::cast(w->f, nb::rv_policy::none);
    Py_VISIT(f.ptr());

    return 0;
};

The nb::object smart pointer is internally reference counted. In other words, the above is roughly equivalent to:

int funcwrapper_tp_traverse(PyObject *self, visitproc visit, void *arg) {
    PyObject *f = self->w->f;
    Py_INCREF(f);
    Py_VISIT(f);
    Py_DECREF(f);
    return 0;
};

This leads to a leak in the free-threaded GC for subtle reasons: when determining resurrected objects, the free-threaded GC uses ob_ref_local to compute the refcount - incoming references, which may be (temporarily) negative. In this case, Py_INCREF() adds 1 to the refcount, but by the time Py_DECREF() is called, the local refcount is -1 which makes the object appear immortal.

There are a number of limitations on the implementations of traverse functions, which are not well documented. For example, it's not safe to allocate, free, track, or untrack Python objects. It's unclear to me whether there are other issues with calling refcounting functions in traverse callbacks.

I think we can make handle_resurrected_objects more robust to this by splitting the first pass over state->unreachable into two passes.

See also: PyO3/pyo3#3165, which was not related to the free-threaded build.

Metadata

Metadata

Assignees

Labels

3.13bugs and security fixes3.14bugs and security fixestopic-free-threadingtype-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions