Skip to content

Conversation

@pablogsal
Copy link
Contributor

@pablogsal pablogsal commented Sep 1, 2025

This PR fixes a bug I noticed while reviewing the code the first time, but never had time to confirm back then: the registry capsule was created without a destructor, so the pymb_registry allocation leaked at interpreter shutdown. I also added a couple of small defensive tweaks and other improvements I spotted in that first pass (using static inline in C to avoid linkage surprises, zeroing list hooks before first use, and a pointer-equality fast path when interning abi_extra). The only real issue is the leak — please feel free to just take whatever bits you find useful (read the commits for explanations).

I confirmed the leak fix with valgrind using a small extension:

Before

PYTHONMALLOC=malloc valgrind --leak-check=full --show-leak-kinds=definite     python3 repro.py
==5100== Memcheck, a memory error detector
==5100== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==5100== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==5100== Command: python3 repro.py
==5100==
registry ptr: 0x4f92610
dropped capsule; exiting process (check valgrind for leak)
==5100==
==5100== HEAP SUMMARY:
==5100==     in use at exit: 41,468 bytes in 5 blocks
==5100==   total heap usage: 20,792 allocs, 20,787 frees, 2,790,954 bytes allocated
==5100==
==5100== 40,000 bytes in 1 blocks are definitely lost in loss record 5 of 5
==5100==    at 0x4889F94: calloc (vg_replace_malloc.c:1328)
==5100==    by 0x5000D97: pymb_get_registry (pymetabind.h:559)
==5100==    by 0x5000D97: touch (leaky.c:12)
==5100==    by 0x49C7FB: ??? (in /usr/bin/python3.11)
==5100==    by 0x4BB053: PyObject_Vectorcall (in /usr/bin/python3.11)
==5100==    by 0x4AA3EF: _PyEval_EvalFrameDefault (in /usr/bin/python3.11)
==5100==    by 0x4A0C7F: PyEval_EvalCode (in /usr/bin/python3.11)
==5100==    by 0x5F9757: ??? (in /usr/bin/python3.11)
==5100==    by 0x5F644F: ??? (in /usr/bin/python3.11)
==5100==    by 0x606E6F: ??? (in /usr/bin/python3.11)
==5100==    by 0x606A17: _PyRun_SimpleFileObject (in /usr/bin/python3.11)
==5100==    by 0x60677F: _PyRun_AnyFileObject (in /usr/bin/python3.11)
==5100==    by 0x604A0B: Py_RunMain (in /usr/bin/python3.11)
==5100==
==5100== LEAK SUMMARY:
==5100==    definitely lost: 40,000 bytes in 1 blocks
==5100==    indirectly lost: 0 bytes in 0 blocks
==5100==      possibly lost: 0 bytes in 0 blocks
==5100==    still reachable: 1,468 bytes in 4 blocks
==5100==         suppressed: 0 bytes in 0 blocks
==5100== Reachable blocks (those to which a pointer was found) are not shown.
==5100== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5100==
==5100== For lists of detected and suppressed errors, rerun with: -s
==5100== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

After

oot@8ab8d8f42755:/src# PYTHONMALLOC=malloc valgrind --leak-check=full --show-leak-kinds=definite     python3 repro.py
==5118== Memcheck, a memory error detector
==5118== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==5118== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==5118== Command: python3 repro.py
==5118==
registry ptr: 0x4f92690
dropped capsule; exiting process (check valgrind for leak)
==5118==
==5118== HEAP SUMMARY:
==5118==     in use at exit: 1,468 bytes in 4 blocks
==5118==   total heap usage: 20,793 allocs, 20,789 frees, 2,751,048 bytes allocated
==5118==
==5118== LEAK SUMMARY:
==5118==    definitely lost: 0 bytes in 0 blocks
==5118==    indirectly lost: 0 bytes in 0 blocks
==5118==      possibly lost: 0 bytes in 0 blocks
==5118==    still reachable: 1,468 bytes in 4 blocks
==5118==         suppressed: 0 bytes in 0 blocks
==5118== Reachable blocks (those to which a pointer was found) are not shown.
==5118== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5118==
==5118== For lists of detected and suppressed errors, rerun with: -s
==5118== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@pablogsal pablogsal force-pushed the fixes branch 3 times, most recently from 83d673f to ac0b4fb Compare September 1, 2025 14:50
@oremanj
Copy link
Collaborator

oremanj commented Sep 8, 2025

Thank you for the PR! Sorry for the delay in reviewing; I've been on vacation for most of this past week.

I'm a bit concerned about teardown order here. Are there any official guarantees about the point in interpreter finalization where the interpreter state dict is cleared, especially relative to the point where module globals are cleared / the last GC run is performed? If not, then we might delete the registry while some bound types still exist, and the future destruction of those types could cause a use-after-free when their frameworks try to unregister them with pymetabind. I'm all for fixing the leak, but I think it might require some deeper changes.

If it is actually guaranteed that the interpreter state dict isn't cleared until very late in finalization, then I wonder if e.g. nanobind would want to do its leak check at that point, rather than in a Py_AtExit handler. (cc @wjakob) If not, I guess we could use Py_AtExit for the pymetabind destructor too.

@oremanj
Copy link
Collaborator

oremanj commented Sep 8, 2025

Thinking some more:

  • We want to destroy the registry as late as possible, because arbitrary Python code can run at finalization time, and we don't want to break any user teardown logic that happens to rely on cross-framework interop.
  • There's no valid workaround where we destroy the registry but leave cross-framework interop working, because without the registry, framework A has no way to learn that a pymb_binding provided by framework B was destroyed. It's therefore unsafe for any framework to try to make use of foreign bindings after the registry has been destroyed.

So I think we need at least one of:

  • Destroy the registry in Py_AtExit, rather than in a capsule destructor.
  • Have each framework hold a strong reference to the registry capsule, thus preventing the registry from being destroyed until all frameworks have been destroyed. If any frameworks leak themselves, the registry will too. This would require adding a pymb_remove_framework function.

I'm leaning towards the reference-counted solution, since we can't control the order in which our atexit handler runs relative to our frameworks'. But curious for your thoughts.

@pablogsal
Copy link
Contributor Author

pablogsal commented Sep 8, 2025

I'm a bit concerned about teardown order here. Are there any official guarantees about the point in interpreter finalization where the interpreter state dict is cleared, especially relative to the point where module globals are cleared / the last GC run is performed?

This is indeed an excellent point and you have very good reasons to be worried since finalization is a particularly nighmarish area.

Unfortunately there is no official guarantees and it changes slightly between versions (in particular pre 3.8 was so bad that there were substantial situations where the interpreter either leaks or breaks some of the GC guarantees). Me, Petr, Victor and others have worked hard to ensure at least we don't leak but the area is so complicated and tricky (specially with daemon threads) that we are not ready to document the order or offer guarantees in case we need to change it for anything. But of course we should do something so the fact that is not official doesn't mean this is not a problem frameworks need to solve so here is a summary of the order as I understand it currently (3.14, older versions may be missing steps or sligh reordering. Very old versions do several GC passes in a desperate attempt to not leak but we have fixed that):

  1. Check if runtime is initialized, get final thread state
  2. Wait for threading module shutdown of non-daemon threads
  3. Execute Python atexit functions (registered via atexit.register())
  4. Process any remaining pending calls
  5. Mark interpreter and runtime as finalizing
  6. Stop all threads globally, remove non-current thread states
  7. Set runtime.initialized = 0, preventing new Python operations
  8. First flush of sys.stdout and sys.stderr
  9. Disable signal handling system
  10. Initial garbage collection while interpreter is intact
  11. Disable JIT, stop builtin dict watching
  12. Delete special sys attributes (path, argv, last_exc, etc.)
  13. Clear all sys.modules entries to None, create weakref tracking
  14. Restore original builtins dict, intermediate garbage collection
  15. Clear remaining live module dictionaries in reverse import order
  16. Final sys dict and builtins dict clearing
  17. Import system cleanup (indices, modules directory)
  18. Warn about and attempt to clean up remaining subinterpreters
    -- Only main interpreter should remain at this point --
  19. Finalize evaluation state, second stdout/stderr flush
  20. Disable tracemalloc (after all Python objects destroyed)
  21. Import system core finalization, fault handler cleanup
  22. Hash system statistics and cleanup
  23. Cross-interpreter data, exception types, generic types cleanup
  24. Clear interpreter state, audit hooks (main interpreter only) <--- interpreter dict cleared here
  25. Main interpreter specific cleanup: hash randomization, arg parsing, filesystem encoding
  26. Type system finalization
  27. Delayed memory freeing, mimalloc heap cleanup (if mimalloc active)
  28. Print final reference counts (if Py_REF_DEBUG enabled)
  29. Misc pymalloc cleanup stuff
  30. Execute C-level atexit functions (registered via Py_AtExit()) <---------------------
  31. Final fflush(stdout/stderr), complete runtime finalization
  32. End the party and go home

If it is actually guaranteed that the interpreter state dict isn't cleared until very late in finalization, then I wonder if e.g. nanobind would want to do its leak check at that point, rather than in a Py_AtExit handler. (cc @wjakob) If not, I guess we could use Py_AtExit for the pymetabind destructor too.

I think this is a good idea in principle (see below) since Py_AtExit uses a different system that normal atexit functions registered via the atexit module. This is the last thing to be called before destroying the runtime state. On the Cpython side the only drawback of this approach is that registering the hook can fail since we have a limit (IIRC is 32). Very unlikely this will happen and is reasonable to leak if this happens (maybe printing some warning).

The bigger problem is syncing this call with all frameworks so we are called before (or after depending on the ownership model of the frameworks and the registering) they cleanup things. Indeed for example nanobind already has some registered Py_AtExit:

https://github.com/wjakob/nanobind/blob/30bbda1acb943d5a7f3b5d079da7eaf60d2a469c/src/nb_internals.cpp#L468

so some sync here is needed because we probably cannot register after/before nanobind (or any other framework uses it). Enforcing this across all frameworks can be a bit tricky.

We want to destroy the registry as late as possible, because arbitrary Python code can run at finalization time, and we don't want to break any user teardown logic that happens to rely on cross-framework interop.

If I understand you correctly that means we want to destroy the registry before any framework destroys itself or otherwise the cleanup code may be in UB no?

I'm leaning towards the reference-counted solution, since we can't control the order in which our atexit handler runs relative to our frameworks'. But curious for your thoughts.

The refcount version can be slighly tricky because we may lose control over exactly how this happens (if for whatever reason we go into a cycle for example). Of course we can put infrastructure to ensure this cannot happen (or is very unlikely to happen) but in my experience refcount-bounded cleanup in bindings always comes back for revenge.

Another alternative to consider (not sure if is correct) is that since the plan is to integrate this into frameworks anyway we can have another API that the framework must call that does the unregistered of the framework and when no more frameworks are left then we destroy our state. The advantage of this approach is that we don't need to make assumptions on how frameworks cleanup themselves and we can reason in a per-framework basis. The only challenge is to ensure the semantics are sound. We could also maybe register a Py_AtExit call that warns the same as nanobind does if a framework doesn't unregister itself.

What do you think?

@oremanj
Copy link
Collaborator

oremanj commented Sep 8, 2025

Thank you for the finalization details, that's very helpful to have as a reference when thinking about this!

If I understand you correctly that means we want to destroy the registry before any framework destroys itself or otherwise the cleanup code may be in UB no?

No, I meant after. When a framework destroys itself, it's required to unlink itself from the registry's list of frameworks, per the comments on struct pymb_framework. It's also only allowed to do so if it has no bindings left, since any still-alive bindings could be cached in other frameworks and they could try to make calls into the deleted framework object in order to work with those bindings. As long as the documented constraints are followed, there's no problem with a framework being destroyed before the registry is.

OTOH, if the registry is destroyed while some frameworks are still alive, we would need a way for those frameworks to learn that the registry is being destroyed, which doesn't currently exist. The only safe way for them to respond to this destruction would be to disable all use of foreign bindings: even if some bindings are still alive, with the registry gone they have no way to tell how long that will remain true. While it's possible to implement this, it's extra infrastructure that I don't think we need.

The refcount version can be slighly tricky because we may lose control over exactly how this happens (if for whatever reason we go into a cycle for example). Of course we can put infrastructure to ensure this cannot happen (or is very unlikely to happen) but in my experience refcount-bounded cleanup in bindings always comes back for revenge.

To clarify, by "refcounted" I mean the following:

  • pymb_registry contains a borrowed back-reference to the capsule that points to it.
  • pymb_add_framework increfs the capsule (which it locates using said borrowed reference).
  • We add a pymb_remove_framework which decrefs the capsule, and require that to be used rather than the existing unlink-and-free method.
  • The capsule destructor frees the registry, as in this PR.

In the expected case, there is one reference to the capsule from the interpreter state dict and one from each registered framework, which means the registry is deallocated after both (the interpreter state dict has been cleared) and (each registered framework has destroyed itself). In the unexpected case, where someone goes snooping in the interpreter state dict and holds their own reference to the capsule, we still expect that reference to be dropped by finalization activities that occur before the interpreter state dict is cleared. Maybe I'm missing something, but I don't see how a cycle can occur here, since the capsule destructor doesn't need to access any Python objects and thus doesn't hold any references to Python objects.

Another alternative to consider (not sure if is correct) is that since the plan is to integrate this into frameworks anyway we can have another API that the framework must call that does the unregistered of the framework and when no more frameworks are left then we destroy our state. The advantage of this approach is that we don't need to make assumptions on how frameworks cleanup themselves and we can reason in a per-framework basis. The only challenge is to ensure the semantics are sound.

I think this is the same as the refcounted solution except that we maintain the use count privately instead of using the capsule object's refcount as a use count. I'm happy to go this route if there's some situation where it would work better, but as explained above I think the capsule object's refcount works fine for our purposes, and using it is easier than managing our own counter IMO.

We could also maybe register a Py_AtExit call that warns the same as nanobind does if a framework doesn't unregister itself.

I think it would actually be difficult to tell whether the registry was destroyed properly by the time Py_AtExit handlers are called. Globals are not really global across DSO boundaries depending on how the DSO is linked, and C doesn't have an equivalent to C++'s inline variables. I'm inclined to skip the atexit warning since it's difficult; if the pymb registry is being leaked, it's probably because some framework forgot to unregister themselves.

@pablogsal
Copy link
Contributor Author

OTOH, if the registry is destroyed while some frameworks are still alive, we would need a way for those frameworks to learn that the registry is being destroyed, which doesn't currently exist. The only safe way for them to respond to this destruction would be to disable all use of foreign bindings: even if some bindings are still alive, with the registry gone they have no way to tell how long that will remain true. While it's possible to implement this, it's extra infrastructure that I don't think we need.

Agreed, that would be unnecessarily complex.

To clarify, by "refcounted" I mean the following: pymb_registry contains a borrowed back-reference to the capsule that points to it. pymb_add_framework increfs the capsule... We add a pymb_remove_framework which decrefs the capsule, and require that to be used rather than the existing unlink-and-free method.

Perfect, this makes much more sense now. I was missing the pymb_remove_framework API part. Using the capsule's refcount as the use count is indeed cleaner than managing our own counter.

Maybe I'm missing something, but I don't see how a cycle can occur here, since the capsule destructor doesn't need to access any Python objects and thus doesn't hold any references to Python objects.

You're right that direct cycles are unlikely. The concern I was raising is more about GC ordering during finalization since we were discussing the ordering of clearing this and other teardown: if the interpreter state dict ends up in cyclic references, these get cleared in a forced GC run that happens during interpreter teardown, and the order of destruction within unreachable cycles is undefined. This could theoretically mean the registry gets destroyed before some other teardown logic that relies on cross-framework interop, but this is probably being overly pedantic. Just highlighting it for completeness more than anything since we were talking about the exact time during finalization and if the destruction happens before or after teardown logic and this case alters the semantics slighly.

We could also maybe register a Py_AtExit call that warns the same as nanobind does if a framework doesn't unregister itself.

What I meant here is that at the Py_AtExit stage we could check who has called pymb_remove_framework and who hasn't if we choose to delay destruction until then, and warn about frameworks that forgot to unregister themselves. Just pointing it out as a potential nice-to-have debugging aid, but unclear if it's worth the complexity.


I'll update the PR to implement the refcounted solution:

  1. Add a pymb_remove_framework() function that frameworks must call during their cleanup
  2. Have pymb_add_framework() incref the registry capsule
  3. Have pymb_remove_framework() decref the registry capsule
  4. Keep the capsule destructor to free the registry when the last reference is dropped

I am missing anything? I will also rebase since there are some conflicts now.

@oremanj
Copy link
Collaborator

oremanj commented Sep 8, 2025

I'll update the PR to implement the refcounted solution: [...] missing anything?

Thank you! That sounds good to me.

The concern I was raising is more about GC ordering during finalization since we were discussing the ordering of clearing this and other teardown: if the interpreter state dict ends up in cyclic references, these get cleared in a forced GC run that happens during interpreter teardown, and the order of destruction within unreachable cycles is undefined.

Thanks for digging into that question. I agree that we don't need to worry about the cyclic case as a separate case.

This could theoretically mean the registry gets destroyed before some other teardown logic that relies on cross-framework interop

I actually think this is fine. The registry won't be destroyed until all frameworks are destroyed. A framework can't be destroyed until all its bindings are destroyed. A binding can't be destroyed until the Python type object it binds is destroyed. Thus, by the time the registry is destroyed, there are no types left that could be used cross-framework, so losing that capability won't impact functionality.

What I meant here is that at the Py_AtExit stage we could check who has called pymb_remove_framework and who hasn't if we choose to delay destruction until then, and warn about frameworks that forgot to unregister themselves.

At Py_AtExit, there is no longer an interpreter state dict, so how do we find the registry in order to find the frameworks that are still active in it? We would need a global cache of the registry pointer, which is difficult with a header-only pure-C library.

@pablogsal
Copy link
Contributor Author

pablogsal commented Sep 8, 2025

At Py_AtExit, there is no longer an interpreter state dict, so how do we find the registry in order to find the frameworks that are still active in it? We would need a global cache of the registry pointer, which is difficult with a header-only pure-C library.

Ah now I know what you meant with the global state across DSO boundaries.. Yeah this makes sense thanks for clarifying!.


Btw this makes me wonder... have you considered proposing this for CPython itself (maybe not now but eventually)? The interoperability problem seems fundamental enough that it might benefit from being part of the core runtime. The fact that every major binding framework (pybind11, nanobind, Cython, etc.) essentially has to solve the same problem and would need to include the header (where dealing with potentially different versions of it) suggests this could be a natural fit for standardization at the language level.

I am asking because having this as part of CPython could provide some compelling advantages. It would give the interoperability protocol official backing, which might accelerate adoption and frameworks wouldn't need to vendor the header and we can back the library ABI behind CPython's ABI guarantees. More importantly, it could make cross-framework interop a first-class feature of Python rather than an opt-in library concern. The life-cycle would also be easier to manage since now there is a true global: the one in the interpreter and we could be more in control of when things happen. We can simplify the implementation and not dealing with having to use the interpreter as proxy of global state.

That said, I realize the header-only approach gives you much more flexibility for iteration and getting all the frameworks on board. You can experiment with the API, gather real-world usage data, and refine the design without being constrained by CPython's release cycle. Once the design proves itself and reaches maturity, it might make for a strong PEP proposal. I also understand if you don't want to deal with the PEP cycle, but I am happy to help if you wish. The process can be time-consuming and politically complex but having a working implementation and backing from some framework maintainers would put this proposal in a very strong position.

In any case no pressure! Just want to throw the idea if you find it interesting :)

@oremanj
Copy link
Collaborator

oremanj commented Sep 8, 2025

I absolutely agree with you about the coordination problems involved in doing this outside of CPython. I would love for CPython to eventually subsume the functionality of pymetabind, but I personally don't have the capacity to manage that process in anything like the near future. And it would probably be much more compelling as a PEP if there is a strong history of implementation experience/adoption, which is only available if we start outside of CPython. (I also have some doubt that framework authors would be interested in spending a lot of effort and/or review bandwidth now on something that will only benefit their users in several years.)

Unfortunately, if a PEP is in fact later written and accepted, and assuming the PEP process doesn't just sign off on precisely the current ABI (which seems like a bit too much to hope for), the delayed upstreaming means that this interoperability mechanism is likely to see a non-interoperable ABI break just when it's hitting its stride. The pain of that can be reduced somewhat by conditional compilation (in pymetabind.h or in its users) that uses the future built-in mechanism on new Python versions where it exists vs the current implementation on older versions of Python, although that wouldn't help stable-ABI extension modules. It may also be possible to present a current-pymetabind-ABI-compatible view on the native registry using some sort of shim layer that isn't itself built into CPython. After five years, when all supported Pythons have the native registry, frameworks could move from using the shim layer for their stable-ABI builds to using the native feature.

It would be great if we could somehow get a preview of the changes to the current ABI that would probably be needed as part of the PEP process, so we can make them before the ABI is really set in stone. But I'm not sure how to do that without doing the PEP now, and I don't have the sense that a PEP before widespread use would be accepted. You're much closer to that process than I am, though; maybe I'm misreading the situation?

@pablogsal
Copy link
Contributor Author

pablogsal commented Sep 8, 2025

I think your concerns about the PEP process are understandable and I don’t want to push for this if you don’t feel comfortable with this but allow me offer my view on the matter .

Adoption is actually not a problem because as you said you can offer the library separately for older versions and people can conditionally include it if the version is old enough. Obviously the inner workings will be different so there is some challenging on offering the same API but is still an interesting idea. The key is that the frameworks adopting this will just have to add the fallback for old versions and eventually just remove it. This kind of backports are regularly done and we have many examples that keep updating new upstream features. The mock library maintained by Michael Foord is a perfect example of this (obviously that’s Python code which is fundamentally different but the point is still valid).

The ABI compatibility concerns you mention are exactly why we should do it now, not later. The “non-interoperable ABI break” you’re worried about is more likely if we wait, but if we standardize early, we can design the CPython integration to be ABI-compatible from day one. Waiting until there’s widespread adoption would make that compatibility nightmare much worse.

I know the process can be a bit heavy and annoying but honestly, the adoption threshold for acceptance isn’t as high as you might think. In this case what matters more is having the right technical design and buy-in from the key consumers. For this specific problem, those are the major binding framework maintainers (pybind11, nanobind, PyO3) which you need to convince anyway for adoption. The PEP process essentially formalizes the consensus-building you’re already doing.

On the cost of the process there isn’t much I can say other that I’m happy to help drive the PEP process if you don’t have the bandwidth. The technical design work you’re doing now would translate directly into the PEP content, and having a working implementation puts us in a much stronger position than most PEPs start with.

My math is that this is actually the optimal moment. You have a clear technical solution to a well-understood problem, demonstrated implementation experience, interest from framework maintainers, and you are in the process of outing this out to the community already.

Additionally I can ask around other core devs on what they think to test the waters to know if there are any early concerns before doing any investment.

We could even start by getting informal feedback from the framework maintainers on whether they’d support a PEP process in parallel with the current library development. If there’s enthusiasm from the framework side, that significantly de-risks the standardization path.

In any case I don’t want to throw extra work on you and o know that even answering this PR can be a nontrivial time investment so please feel free to decline. I just wanted to highlight the possibility as there is an interesting opportunity for this and didn’t want to let it pass unnoticed.

@oremanj
Copy link
Collaborator

oremanj commented Sep 8, 2025

Thanks for that perspective on the merits of going for it now. It sounds like the standardization route is probably less painful and more likely to pay off than I was thinking, and I will reevaluate my resistance to it. :-)

Additionally I can ask around other core devs on what they think to test the waters to know if there are any early concerns before doing any investment.

I would definitely appreciate that. If there's a Discourse thread or something that I can follow, please let me know; I'm happy to answer questions that come up.

Do you have a (very rough I imagine) estimate for how long a PEP process for this shape of feature would probably take? (Like, start-to-finish calendar time, as opposed to actively-spent engineering time.)

@pablogsal
Copy link
Contributor Author

pablogsal commented Sep 8, 2025

Thanks for that perspective on the merits of going for it now. It sounds like the standardization route is probably less painful and more likely to pay off than I was thinking, and I will reevaluate my resistance to it. :-)

I am glad it helped :)

I would definitely appreciate that. If there's a Discourse thread or something that I can follow, please let me know; I'm happy to answer questions that come up.

Next week is the core DEV sprint in Cambridge and a lot of us will be there so is a good opportunity for gathering some early feedback. With your permission, I can either ask some people in the C-API working group or alternatively I can host a team-wide short discussion to check what people thing. Whatever you think is best.

Do you have a (very rough I imagine) estimate for how long a PEP process for this shape of feature would probably take? (Like, start-to-finish calendar time, as opposed to actively-spent engineering time.)

It very much depends on how the discussion goes but given that this will be very specialized (as opposed to new syntax, new library, ... etc) I would say around 1/3 months maybe less if everyone agrees and we just need to deal with small nits. There are some strategies to shorten it out by getting pre-alignment with the framework maintainers to minimize round-tripping. The major risk is that the discussion enters a loop or new people keep appearing but I would say is low risk in this case.

In any case this is just my very rough estimate as I have seen basically all the spectrum. I would certainly recommend getting early feedback from maintainers and core devs before throwing the PEP out for sure.

@oremanj
Copy link
Collaborator

oremanj commented Sep 8, 2025

With your permission, I can either ask some people in the C-API working group or alternatively I can host a team-wide short discussion to check what people thing. Whatever you think is best.

I'm fine with whatever you think best there! Many thanks.

Copy link
Collaborator

@oremanj oremanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks pretty good, just a couple potential edge-case issues I noticed.

pablogsal and others added 5 commits September 10, 2025 01:53
On C/Clang (not C++) the current `inline` default yields external inline
semantics that do not emit a strong definition, which can lead to runtime
errors like:

  dyld: symbol not found: _pymb_get_registry

Switch the header-only default to `static inline` for **C** builds while
keeping `inline` for **C++**.
pymb_get_registry() creates a heap-allocated registry and stores it in a
capsule with no destructor, so the registry itself is leaked at interpreter
shutdown.  No Python C-API is called from the destructor except
PyCapsule_GetPointer, and any error from that is cleared.
In `pymb_add_framework`, fast-path pointer equality before calling strcmp
while interning `abi_extra`. This is a tiny win in the common case where
multiple frameworks share the already interned pointer.
If a framework allocates binding/framework structs with malloc (not zeroed),
the hook may contain garbage. Ensure hook is NULLed in add_framework/add_binding
before we touch list pointers to prevent crashes in pymb_list_unlink().
@pablogsal
Copy link
Contributor Author

Rebased on top of the latest main

@pablogsal pablogsal force-pushed the fixes branch 2 times, most recently from aed44fb to 11c9bff Compare September 10, 2025 00:55
Implement reference-counted solution to fix registry memory leak while
ensuring proper teardown order. Registry capsule is now ref-counted by
frameworks, preventing premature destruction during finalization.
@oremanj
Copy link
Collaborator

oremanj commented Dec 23, 2025

@pablogsal What was the outcome of the discussion at the core sprints?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants