Fix leak and other small improvements #2

pablogsal · 2025-09-01T14:48:12Z

This PR fixes a bug I noticed while reviewing the code the first time, but never had time to confirm back then: the registry capsule was created without a destructor, so the pymb_registry allocation leaked at interpreter shutdown. I also added a couple of small defensive tweaks and other improvements I spotted in that first pass (using static inline in C to avoid linkage surprises, zeroing list hooks before first use, and a pointer-equality fast path when interning abi_extra). The only real issue is the leak — please feel free to just take whatever bits you find useful (read the commits for explanations).

I confirmed the leak fix with valgrind using a small extension:

Before

PYTHONMALLOC=malloc valgrind --leak-check=full --show-leak-kinds=definite     python3 repro.py
==5100== Memcheck, a memory error detector
==5100== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==5100== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==5100== Command: python3 repro.py
==5100==
registry ptr: 0x4f92610
dropped capsule; exiting process (check valgrind for leak)
==5100==
==5100== HEAP SUMMARY:
==5100==     in use at exit: 41,468 bytes in 5 blocks
==5100==   total heap usage: 20,792 allocs, 20,787 frees, 2,790,954 bytes allocated
==5100==
==5100== 40,000 bytes in 1 blocks are definitely lost in loss record 5 of 5
==5100==    at 0x4889F94: calloc (vg_replace_malloc.c:1328)
==5100==    by 0x5000D97: pymb_get_registry (pymetabind.h:559)
==5100==    by 0x5000D97: touch (leaky.c:12)
==5100==    by 0x49C7FB: ??? (in /usr/bin/python3.11)
==5100==    by 0x4BB053: PyObject_Vectorcall (in /usr/bin/python3.11)
==5100==    by 0x4AA3EF: _PyEval_EvalFrameDefault (in /usr/bin/python3.11)
==5100==    by 0x4A0C7F: PyEval_EvalCode (in /usr/bin/python3.11)
==5100==    by 0x5F9757: ??? (in /usr/bin/python3.11)
==5100==    by 0x5F644F: ??? (in /usr/bin/python3.11)
==5100==    by 0x606E6F: ??? (in /usr/bin/python3.11)
==5100==    by 0x606A17: _PyRun_SimpleFileObject (in /usr/bin/python3.11)
==5100==    by 0x60677F: _PyRun_AnyFileObject (in /usr/bin/python3.11)
==5100==    by 0x604A0B: Py_RunMain (in /usr/bin/python3.11)
==5100==
==5100== LEAK SUMMARY:
==5100==    definitely lost: 40,000 bytes in 1 blocks
==5100==    indirectly lost: 0 bytes in 0 blocks
==5100==      possibly lost: 0 bytes in 0 blocks
==5100==    still reachable: 1,468 bytes in 4 blocks
==5100==         suppressed: 0 bytes in 0 blocks
==5100== Reachable blocks (those to which a pointer was found) are not shown.
==5100== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5100==
==5100== For lists of detected and suppressed errors, rerun with: -s
==5100== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

After

oot@8ab8d8f42755:/src# PYTHONMALLOC=malloc valgrind --leak-check=full --show-leak-kinds=definite     python3 repro.py
==5118== Memcheck, a memory error detector
==5118== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==5118== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==5118== Command: python3 repro.py
==5118==
registry ptr: 0x4f92690
dropped capsule; exiting process (check valgrind for leak)
==5118==
==5118== HEAP SUMMARY:
==5118==     in use at exit: 1,468 bytes in 4 blocks
==5118==   total heap usage: 20,793 allocs, 20,789 frees, 2,751,048 bytes allocated
==5118==
==5118== LEAK SUMMARY:
==5118==    definitely lost: 0 bytes in 0 blocks
==5118==    indirectly lost: 0 bytes in 0 blocks
==5118==      possibly lost: 0 bytes in 0 blocks
==5118==    still reachable: 1,468 bytes in 4 blocks
==5118==         suppressed: 0 bytes in 0 blocks
==5118== Reachable blocks (those to which a pointer was found) are not shown.
==5118== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5118==
==5118== For lists of detected and suppressed errors, rerun with: -s
==5118== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

oremanj · 2025-09-08T02:09:45Z

Thank you for the PR! Sorry for the delay in reviewing; I've been on vacation for most of this past week.

I'm a bit concerned about teardown order here. Are there any official guarantees about the point in interpreter finalization where the interpreter state dict is cleared, especially relative to the point where module globals are cleared / the last GC run is performed? If not, then we might delete the registry while some bound types still exist, and the future destruction of those types could cause a use-after-free when their frameworks try to unregister them with pymetabind. I'm all for fixing the leak, but I think it might require some deeper changes.

If it is actually guaranteed that the interpreter state dict isn't cleared until very late in finalization, then I wonder if e.g. nanobind would want to do its leak check at that point, rather than in a Py_AtExit handler. (cc @wjakob) If not, I guess we could use Py_AtExit for the pymetabind destructor too.

oremanj · 2025-09-08T02:19:08Z

Thinking some more:

We want to destroy the registry as late as possible, because arbitrary Python code can run at finalization time, and we don't want to break any user teardown logic that happens to rely on cross-framework interop.
There's no valid workaround where we destroy the registry but leave cross-framework interop working, because without the registry, framework A has no way to learn that a pymb_binding provided by framework B was destroyed. It's therefore unsafe for any framework to try to make use of foreign bindings after the registry has been destroyed.

So I think we need at least one of:

Destroy the registry in Py_AtExit, rather than in a capsule destructor.
Have each framework hold a strong reference to the registry capsule, thus preventing the registry from being destroyed until all frameworks have been destroyed. If any frameworks leak themselves, the registry will too. This would require adding a pymb_remove_framework function.

I'm leaning towards the reference-counted solution, since we can't control the order in which our atexit handler runs relative to our frameworks'. But curious for your thoughts.

pablogsal · 2025-09-08T12:34:38Z

I'm a bit concerned about teardown order here. Are there any official guarantees about the point in interpreter finalization where the interpreter state dict is cleared, especially relative to the point where module globals are cleared / the last GC run is performed?

This is indeed an excellent point and you have very good reasons to be worried since finalization is a particularly nighmarish area.

Unfortunately there is no official guarantees and it changes slightly between versions (in particular pre 3.8 was so bad that there were substantial situations where the interpreter either leaks or breaks some of the GC guarantees). Me, Petr, Victor and others have worked hard to ensure at least we don't leak but the area is so complicated and tricky (specially with daemon threads) that we are not ready to document the order or offer guarantees in case we need to change it for anything. But of course we should do something so the fact that is not official doesn't mean this is not a problem frameworks need to solve so here is a summary of the order as I understand it currently (3.14, older versions may be missing steps or sligh reordering. Very old versions do several GC passes in a desperate attempt to not leak but we have fixed that):

Check if runtime is initialized, get final thread state
Wait for threading module shutdown of non-daemon threads
Execute Python atexit functions (registered via atexit.register())
Process any remaining pending calls
Mark interpreter and runtime as finalizing
Stop all threads globally, remove non-current thread states
Set runtime.initialized = 0, preventing new Python operations
First flush of sys.stdout and sys.stderr
Disable signal handling system
Initial garbage collection while interpreter is intact
Disable JIT, stop builtin dict watching
Delete special sys attributes (path, argv, last_exc, etc.)
Clear all sys.modules entries to None, create weakref tracking
Restore original builtins dict, intermediate garbage collection
Clear remaining live module dictionaries in reverse import order
Final sys dict and builtins dict clearing
Import system cleanup (indices, modules directory)
Warn about and attempt to clean up remaining subinterpreters
-- Only main interpreter should remain at this point --
Finalize evaluation state, second stdout/stderr flush
Disable tracemalloc (after all Python objects destroyed)
Import system core finalization, fault handler cleanup
Hash system statistics and cleanup
Cross-interpreter data, exception types, generic types cleanup
Clear interpreter state, audit hooks (main interpreter only) <--- interpreter dict cleared here
Main interpreter specific cleanup: hash randomization, arg parsing, filesystem encoding
Type system finalization
Delayed memory freeing, mimalloc heap cleanup (if mimalloc active)
Print final reference counts (if Py_REF_DEBUG enabled)
Misc pymalloc cleanup stuff
Execute C-level atexit functions (registered via Py_AtExit()) <---------------------
Final fflush(stdout/stderr), complete runtime finalization
End the party and go home

If it is actually guaranteed that the interpreter state dict isn't cleared until very late in finalization, then I wonder if e.g. nanobind would want to do its leak check at that point, rather than in a Py_AtExit handler. (cc @wjakob) If not, I guess we could use Py_AtExit for the pymetabind destructor too.

I think this is a good idea in principle (see below) since Py_AtExit uses a different system that normal atexit functions registered via the atexit module. This is the last thing to be called before destroying the runtime state. On the Cpython side the only drawback of this approach is that registering the hook can fail since we have a limit (IIRC is 32). Very unlikely this will happen and is reasonable to leak if this happens (maybe printing some warning).

The bigger problem is syncing this call with all frameworks so we are called before (or after depending on the ownership model of the frameworks and the registering) they cleanup things. Indeed for example nanobind already has some registered Py_AtExit:

https://github.com/wjakob/nanobind/blob/30bbda1acb943d5a7f3b5d079da7eaf60d2a469c/src/nb_internals.cpp#L468

so some sync here is needed because we probably cannot register after/before nanobind (or any other framework uses it). Enforcing this across all frameworks can be a bit tricky.

We want to destroy the registry as late as possible, because arbitrary Python code can run at finalization time, and we don't want to break any user teardown logic that happens to rely on cross-framework interop.

If I understand you correctly that means we want to destroy the registry before any framework destroys itself or otherwise the cleanup code may be in UB no?

I'm leaning towards the reference-counted solution, since we can't control the order in which our atexit handler runs relative to our frameworks'. But curious for your thoughts.

The refcount version can be slighly tricky because we may lose control over exactly how this happens (if for whatever reason we go into a cycle for example). Of course we can put infrastructure to ensure this cannot happen (or is very unlikely to happen) but in my experience refcount-bounded cleanup in bindings always comes back for revenge.

Another alternative to consider (not sure if is correct) is that since the plan is to integrate this into frameworks anyway we can have another API that the framework must call that does the unregistered of the framework and when no more frameworks are left then we destroy our state. The advantage of this approach is that we don't need to make assumptions on how frameworks cleanup themselves and we can reason in a per-framework basis. The only challenge is to ensure the semantics are sound. We could also maybe register a Py_AtExit call that warns the same as nanobind does if a framework doesn't unregister itself.

What do you think?

oremanj · 2025-09-08T17:44:12Z

Thank you for the finalization details, that's very helpful to have as a reference when thinking about this!

If I understand you correctly that means we want to destroy the registry before any framework destroys itself or otherwise the cleanup code may be in UB no?

No, I meant after. When a framework destroys itself, it's required to unlink itself from the registry's list of frameworks, per the comments on struct pymb_framework. It's also only allowed to do so if it has no bindings left, since any still-alive bindings could be cached in other frameworks and they could try to make calls into the deleted framework object in order to work with those bindings. As long as the documented constraints are followed, there's no problem with a framework being destroyed before the registry is.

OTOH, if the registry is destroyed while some frameworks are still alive, we would need a way for those frameworks to learn that the registry is being destroyed, which doesn't currently exist. The only safe way for them to respond to this destruction would be to disable all use of foreign bindings: even if some bindings are still alive, with the registry gone they have no way to tell how long that will remain true. While it's possible to implement this, it's extra infrastructure that I don't think we need.

The refcount version can be slighly tricky because we may lose control over exactly how this happens (if for whatever reason we go into a cycle for example). Of course we can put infrastructure to ensure this cannot happen (or is very unlikely to happen) but in my experience refcount-bounded cleanup in bindings always comes back for revenge.

To clarify, by "refcounted" I mean the following:

pymb_registry contains a borrowed back-reference to the capsule that points to it.
pymb_add_framework increfs the capsule (which it locates using said borrowed reference).
We add a pymb_remove_framework which decrefs the capsule, and require that to be used rather than the existing unlink-and-free method.
The capsule destructor frees the registry, as in this PR.

In the expected case, there is one reference to the capsule from the interpreter state dict and one from each registered framework, which means the registry is deallocated after both (the interpreter state dict has been cleared) and (each registered framework has destroyed itself). In the unexpected case, where someone goes snooping in the interpreter state dict and holds their own reference to the capsule, we still expect that reference to be dropped by finalization activities that occur before the interpreter state dict is cleared. Maybe I'm missing something, but I don't see how a cycle can occur here, since the capsule destructor doesn't need to access any Python objects and thus doesn't hold any references to Python objects.

Another alternative to consider (not sure if is correct) is that since the plan is to integrate this into frameworks anyway we can have another API that the framework must call that does the unregistered of the framework and when no more frameworks are left then we destroy our state. The advantage of this approach is that we don't need to make assumptions on how frameworks cleanup themselves and we can reason in a per-framework basis. The only challenge is to ensure the semantics are sound.

I think this is the same as the refcounted solution except that we maintain the use count privately instead of using the capsule object's refcount as a use count. I'm happy to go this route if there's some situation where it would work better, but as explained above I think the capsule object's refcount works fine for our purposes, and using it is easier than managing our own counter IMO.

We could also maybe register a Py_AtExit call that warns the same as nanobind does if a framework doesn't unregister itself.

I think it would actually be difficult to tell whether the registry was destroyed properly by the time Py_AtExit handlers are called. Globals are not really global across DSO boundaries depending on how the DSO is linked, and C doesn't have an equivalent to C++'s inline variables. I'm inclined to skip the atexit warning since it's difficult; if the pymb registry is being leaked, it's probably because some framework forgot to unregister themselves.

pablogsal · 2025-09-08T18:39:14Z

OTOH, if the registry is destroyed while some frameworks are still alive, we would need a way for those frameworks to learn that the registry is being destroyed, which doesn't currently exist. The only safe way for them to respond to this destruction would be to disable all use of foreign bindings: even if some bindings are still alive, with the registry gone they have no way to tell how long that will remain true. While it's possible to implement this, it's extra infrastructure that I don't think we need.

Agreed, that would be unnecessarily complex.

To clarify, by "refcounted" I mean the following: pymb_registry contains a borrowed back-reference to the capsule that points to it. pymb_add_framework increfs the capsule... We add a pymb_remove_framework which decrefs the capsule, and require that to be used rather than the existing unlink-and-free method.

Perfect, this makes much more sense now. I was missing the pymb_remove_framework API part. Using the capsule's refcount as the use count is indeed cleaner than managing our own counter.

Maybe I'm missing something, but I don't see how a cycle can occur here, since the capsule destructor doesn't need to access any Python objects and thus doesn't hold any references to Python objects.

You're right that direct cycles are unlikely. The concern I was raising is more about GC ordering during finalization since we were discussing the ordering of clearing this and other teardown: if the interpreter state dict ends up in cyclic references, these get cleared in a forced GC run that happens during interpreter teardown, and the order of destruction within unreachable cycles is undefined. This could theoretically mean the registry gets destroyed before some other teardown logic that relies on cross-framework interop, but this is probably being overly pedantic. Just highlighting it for completeness more than anything since we were talking about the exact time during finalization and if the destruction happens before or after teardown logic and this case alters the semantics slighly.

We could also maybe register a Py_AtExit call that warns the same as nanobind does if a framework doesn't unregister itself.

What I meant here is that at the Py_AtExit stage we could check who has called pymb_remove_framework and who hasn't if we choose to delay destruction until then, and warn about frameworks that forgot to unregister themselves. Just pointing it out as a potential nice-to-have debugging aid, but unclear if it's worth the complexity.

I'll update the PR to implement the refcounted solution:

Add a pymb_remove_framework() function that frameworks must call during their cleanup
Have pymb_add_framework() incref the registry capsule
Have pymb_remove_framework() decref the registry capsule
Keep the capsule destructor to free the registry when the last reference is dropped

I am missing anything? I will also rebase since there are some conflicts now.

oremanj · 2025-09-08T19:02:38Z

I'll update the PR to implement the refcounted solution: [...] missing anything?

Thank you! That sounds good to me.

The concern I was raising is more about GC ordering during finalization since we were discussing the ordering of clearing this and other teardown: if the interpreter state dict ends up in cyclic references, these get cleared in a forced GC run that happens during interpreter teardown, and the order of destruction within unreachable cycles is undefined.

Thanks for digging into that question. I agree that we don't need to worry about the cyclic case as a separate case.

This could theoretically mean the registry gets destroyed before some other teardown logic that relies on cross-framework interop

I actually think this is fine. The registry won't be destroyed until all frameworks are destroyed. A framework can't be destroyed until all its bindings are destroyed. A binding can't be destroyed until the Python type object it binds is destroyed. Thus, by the time the registry is destroyed, there are no types left that could be used cross-framework, so losing that capability won't impact functionality.

What I meant here is that at the Py_AtExit stage we could check who has called pymb_remove_framework and who hasn't if we choose to delay destruction until then, and warn about frameworks that forgot to unregister themselves.

At Py_AtExit, there is no longer an interpreter state dict, so how do we find the registry in order to find the frameworks that are still active in it? We would need a global cache of the registry pointer, which is difficult with a header-only pure-C library.

pablogsal · 2025-09-08T19:34:39Z

At Py_AtExit, there is no longer an interpreter state dict, so how do we find the registry in order to find the frameworks that are still active in it? We would need a global cache of the registry pointer, which is difficult with a header-only pure-C library.

Ah now I know what you meant with the global state across DSO boundaries.. Yeah this makes sense thanks for clarifying!.

Btw this makes me wonder... have you considered proposing this for CPython itself (maybe not now but eventually)? The interoperability problem seems fundamental enough that it might benefit from being part of the core runtime. The fact that every major binding framework (pybind11, nanobind, Cython, etc.) essentially has to solve the same problem and would need to include the header (where dealing with potentially different versions of it) suggests this could be a natural fit for standardization at the language level.

I am asking because having this as part of CPython could provide some compelling advantages. It would give the interoperability protocol official backing, which might accelerate adoption and frameworks wouldn't need to vendor the header and we can back the library ABI behind CPython's ABI guarantees. More importantly, it could make cross-framework interop a first-class feature of Python rather than an opt-in library concern. The life-cycle would also be easier to manage since now there is a true global: the one in the interpreter and we could be more in control of when things happen. We can simplify the implementation and not dealing with having to use the interpreter as proxy of global state.

That said, I realize the header-only approach gives you much more flexibility for iteration and getting all the frameworks on board. You can experiment with the API, gather real-world usage data, and refine the design without being constrained by CPython's release cycle. Once the design proves itself and reaches maturity, it might make for a strong PEP proposal. I also understand if you don't want to deal with the PEP cycle, but I am happy to help if you wish. The process can be time-consuming and politically complex but having a working implementation and backing from some framework maintainers would put this proposal in a very strong position.

In any case no pressure! Just want to throw the idea if you find it interesting :)

oremanj · 2025-09-08T20:16:40Z

I absolutely agree with you about the coordination problems involved in doing this outside of CPython. I would love for CPython to eventually subsume the functionality of pymetabind, but I personally don't have the capacity to manage that process in anything like the near future. And it would probably be much more compelling as a PEP if there is a strong history of implementation experience/adoption, which is only available if we start outside of CPython. (I also have some doubt that framework authors would be interested in spending a lot of effort and/or review bandwidth now on something that will only benefit their users in several years.)

Unfortunately, if a PEP is in fact later written and accepted, and assuming the PEP process doesn't just sign off on precisely the current ABI (which seems like a bit too much to hope for), the delayed upstreaming means that this interoperability mechanism is likely to see a non-interoperable ABI break just when it's hitting its stride. The pain of that can be reduced somewhat by conditional compilation (in pymetabind.h or in its users) that uses the future built-in mechanism on new Python versions where it exists vs the current implementation on older versions of Python, although that wouldn't help stable-ABI extension modules. It may also be possible to present a current-pymetabind-ABI-compatible view on the native registry using some sort of shim layer that isn't itself built into CPython. After five years, when all supported Pythons have the native registry, frameworks could move from using the shim layer for their stable-ABI builds to using the native feature.

It would be great if we could somehow get a preview of the changes to the current ABI that would probably be needed as part of the PEP process, so we can make them before the ABI is really set in stone. But I'm not sure how to do that without doing the PEP now, and I don't have the sense that a PEP before widespread use would be accepted. You're much closer to that process than I am, though; maybe I'm misreading the situation?

pablogsal · 2025-09-08T20:35:18Z

I think your concerns about the PEP process are understandable and I don’t want to push for this if you don’t feel comfortable with this but allow me offer my view on the matter .

Adoption is actually not a problem because as you said you can offer the library separately for older versions and people can conditionally include it if the version is old enough. Obviously the inner workings will be different so there is some challenging on offering the same API but is still an interesting idea. The key is that the frameworks adopting this will just have to add the fallback for old versions and eventually just remove it. This kind of backports are regularly done and we have many examples that keep updating new upstream features. The mock library maintained by Michael Foord is a perfect example of this (obviously that’s Python code which is fundamentally different but the point is still valid).

The ABI compatibility concerns you mention are exactly why we should do it now, not later. The “non-interoperable ABI break” you’re worried about is more likely if we wait, but if we standardize early, we can design the CPython integration to be ABI-compatible from day one. Waiting until there’s widespread adoption would make that compatibility nightmare much worse.

I know the process can be a bit heavy and annoying but honestly, the adoption threshold for acceptance isn’t as high as you might think. In this case what matters more is having the right technical design and buy-in from the key consumers. For this specific problem, those are the major binding framework maintainers (pybind11, nanobind, PyO3) which you need to convince anyway for adoption. The PEP process essentially formalizes the consensus-building you’re already doing.

On the cost of the process there isn’t much I can say other that I’m happy to help drive the PEP process if you don’t have the bandwidth. The technical design work you’re doing now would translate directly into the PEP content, and having a working implementation puts us in a much stronger position than most PEPs start with.

My math is that this is actually the optimal moment. You have a clear technical solution to a well-understood problem, demonstrated implementation experience, interest from framework maintainers, and you are in the process of outing this out to the community already.

Additionally I can ask around other core devs on what they think to test the waters to know if there are any early concerns before doing any investment.

We could even start by getting informal feedback from the framework maintainers on whether they’d support a PEP process in parallel with the current library development. If there’s enthusiasm from the framework side, that significantly de-risks the standardization path.

In any case I don’t want to throw extra work on you and o know that even answering this PR can be a nontrivial time investment so please feel free to decline. I just wanted to highlight the possibility as there is an interesting opportunity for this and didn’t want to let it pass unnoticed.

oremanj · 2025-09-08T21:01:22Z

Thanks for that perspective on the merits of going for it now. It sounds like the standardization route is probably less painful and more likely to pay off than I was thinking, and I will reevaluate my resistance to it. :-)

Additionally I can ask around other core devs on what they think to test the waters to know if there are any early concerns before doing any investment.

I would definitely appreciate that. If there's a Discourse thread or something that I can follow, please let me know; I'm happy to answer questions that come up.

Do you have a (very rough I imagine) estimate for how long a PEP process for this shape of feature would probably take? (Like, start-to-finish calendar time, as opposed to actively-spent engineering time.)

pablogsal · 2025-09-08T21:46:30Z

Thanks for that perspective on the merits of going for it now. It sounds like the standardization route is probably less painful and more likely to pay off than I was thinking, and I will reevaluate my resistance to it. :-)

I am glad it helped :)

I would definitely appreciate that. If there's a Discourse thread or something that I can follow, please let me know; I'm happy to answer questions that come up.

Next week is the core DEV sprint in Cambridge and a lot of us will be there so is a good opportunity for gathering some early feedback. With your permission, I can either ask some people in the C-API working group or alternatively I can host a team-wide short discussion to check what people thing. Whatever you think is best.

Do you have a (very rough I imagine) estimate for how long a PEP process for this shape of feature would probably take? (Like, start-to-finish calendar time, as opposed to actively-spent engineering time.)

It very much depends on how the discussion goes but given that this will be very specialized (as opposed to new syntax, new library, ... etc) I would say around 1/3 months maybe less if everyone agrees and we just need to deal with small nits. There are some strategies to shorten it out by getting pre-alignment with the framework maintainers to minimize round-tripping. The major risk is that the discussion enters a loop or new people keep appearing but I would say is low risk in this case.

In any case this is just my very rough estimate as I have seen basically all the spectrum. I would certainly recommend getting early feedback from maintainers and core devs before throwing the PEP out for sure.

oremanj · 2025-09-08T22:21:31Z

With your permission, I can either ask some people in the C-API working group or alternatively I can host a team-wide short discussion to check what people thing. Whatever you think is best.

I'm fine with whatever you think best there! Many thanks.

oremanj

Thanks! This looks pretty good, just a couple potential edge-case issues I noticed.

pymetabind.h

On C/Clang (not C++) the current `inline` default yields external inline semantics that do not emit a strong definition, which can lead to runtime errors like: dyld: symbol not found: _pymb_get_registry Switch the header-only default to `static inline` for **C** builds while keeping `inline` for **C++**.

pymb_get_registry() creates a heap-allocated registry and stores it in a capsule with no destructor, so the registry itself is leaked at interpreter shutdown. No Python C-API is called from the destructor except PyCapsule_GetPointer, and any error from that is cleared.

In `pymb_add_framework`, fast-path pointer equality before calling strcmp while interning `abi_extra`. This is a tiny win in the common case where multiple frameworks share the already interned pointer.

If a framework allocates binding/framework structs with malloc (not zeroed), the hook may contain garbage. Ensure hook is NULLed in add_framework/add_binding before we touch list pointers to prevent crashes in pymb_list_unlink().

… fails

pablogsal · 2025-09-10T00:54:26Z

Rebased on top of the latest main

Implement reference-counted solution to fix registry memory leak while ensuring proper teardown order. Registry capsule is now ref-counted by frameworks, preventing premature destruction during finalization.

oremanj · 2025-12-23T04:53:47Z

@pablogsal What was the outcome of the discussion at the core sprints?

pablogsal force-pushed the fixes branch 3 times, most recently from 83d673f to ac0b4fb Compare September 1, 2025 14:50

oremanj reviewed Sep 10, 2025

View reviewed changes

pymetabind.h Outdated Show resolved Hide resolved

pymetabind.h Show resolved Hide resolved

pablogsal and others added 5 commits September 10, 2025 01:53

Skip strcmp when abi_extra pointers already match

1077497

In `pymb_add_framework`, fast-path pointer equality before calling strcmp while interning `abi_extra`. This is a tiny win in the common case where multiple frameworks share the already interned pointer.

Changelog entry, formatting tweaks, free registry if capsule creation…

b355d1d

… fails

pablogsal force-pushed the fixes branch from 2da8455 to 40436e2 Compare September 10, 2025 00:54

pablogsal force-pushed the fixes branch 2 times, most recently from aed44fb to 11c9bff Compare September 10, 2025 00:55

Add pymb_remove_framework() API for proper registry cleanup

0bb26f4

Implement reference-counted solution to fix registry memory leak while ensuring proper teardown order. Registry capsule is now ref-counted by frameworks, preventing premature destruction during finalization.

pablogsal force-pushed the fixes branch from 11c9bff to 0bb26f4 Compare September 10, 2025 00:57

oremanj merged commit 0266da1 into hudson-trading:main Sep 10, 2025

oremanj mentioned this pull request Oct 13, 2025

type_caster_generic: add set_foreign_holder method for subclasses to implement pybind/pybind11#5862

Merged

Fix leak and other small improvements #2

Fix leak and other small improvements #2

Uh oh!

Conversation

pablogsal commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before

After

Uh oh!

oremanj commented Sep 8, 2025

Uh oh!

oremanj commented Sep 8, 2025

Uh oh!

pablogsal commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oremanj commented Sep 8, 2025

Uh oh!

pablogsal commented Sep 8, 2025

Uh oh!

oremanj commented Sep 8, 2025

Uh oh!

pablogsal commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oremanj commented Sep 8, 2025

Uh oh!

pablogsal commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oremanj commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pablogsal commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oremanj commented Sep 8, 2025

Uh oh!

oremanj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pablogsal commented Sep 10, 2025

Uh oh!

oremanj commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pablogsal commented Sep 1, 2025 •

edited

Loading

pablogsal commented Sep 8, 2025 •

edited

Loading

pablogsal commented Sep 8, 2025 •

edited

Loading

pablogsal commented Sep 8, 2025 •

edited

Loading

oremanj commented Sep 8, 2025 •

edited

Loading

pablogsal commented Sep 8, 2025 •

edited

Loading