-
Notifications
You must be signed in to change notification settings - Fork 2
Fix leak and other small improvements #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
83d673f to
ac0b4fb
Compare
|
Thank you for the PR! Sorry for the delay in reviewing; I've been on vacation for most of this past week. I'm a bit concerned about teardown order here. Are there any official guarantees about the point in interpreter finalization where the interpreter state dict is cleared, especially relative to the point where module globals are cleared / the last GC run is performed? If not, then we might delete the registry while some bound types still exist, and the future destruction of those types could cause a use-after-free when their frameworks try to unregister them with pymetabind. I'm all for fixing the leak, but I think it might require some deeper changes. If it is actually guaranteed that the interpreter state dict isn't cleared until very late in finalization, then I wonder if e.g. nanobind would want to do its leak check at that point, rather than in a Py_AtExit handler. (cc @wjakob) If not, I guess we could use Py_AtExit for the pymetabind destructor too. |
|
Thinking some more:
So I think we need at least one of:
I'm leaning towards the reference-counted solution, since we can't control the order in which our atexit handler runs relative to our frameworks'. But curious for your thoughts. |
This is indeed an excellent point and you have very good reasons to be worried since finalization is a particularly nighmarish area. Unfortunately there is no official guarantees and it changes slightly between versions (in particular pre 3.8 was so bad that there were substantial situations where the interpreter either leaks or breaks some of the GC guarantees). Me, Petr, Victor and others have worked hard to ensure at least we don't leak but the area is so complicated and tricky (specially with daemon threads) that we are not ready to document the order or offer guarantees in case we need to change it for anything. But of course we should do something so the fact that is not official doesn't mean this is not a problem frameworks need to solve so here is a summary of the order as I understand it currently (3.14, older versions may be missing steps or sligh reordering. Very old versions do several GC passes in a desperate attempt to not leak but we have fixed that):
I think this is a good idea in principle (see below) since The bigger problem is syncing this call with all frameworks so we are called before (or after depending on the ownership model of the frameworks and the registering) they cleanup things. Indeed for example nanobind already has some registered so some sync here is needed because we probably cannot register after/before nanobind (or any other framework uses it). Enforcing this across all frameworks can be a bit tricky.
If I understand you correctly that means we want to destroy the registry before any framework destroys itself or otherwise the cleanup code may be in UB no?
The refcount version can be slighly tricky because we may lose control over exactly how this happens (if for whatever reason we go into a cycle for example). Of course we can put infrastructure to ensure this cannot happen (or is very unlikely to happen) but in my experience refcount-bounded cleanup in bindings always comes back for revenge. Another alternative to consider (not sure if is correct) is that since the plan is to integrate this into frameworks anyway we can have another API that the framework must call that does the unregistered of the framework and when no more frameworks are left then we destroy our state. The advantage of this approach is that we don't need to make assumptions on how frameworks cleanup themselves and we can reason in a per-framework basis. The only challenge is to ensure the semantics are sound. We could also maybe register a What do you think? |
|
Thank you for the finalization details, that's very helpful to have as a reference when thinking about this!
No, I meant after. When a framework destroys itself, it's required to unlink itself from the registry's list of frameworks, per the comments on OTOH, if the registry is destroyed while some frameworks are still alive, we would need a way for those frameworks to learn that the registry is being destroyed, which doesn't currently exist. The only safe way for them to respond to this destruction would be to disable all use of foreign bindings: even if some bindings are still alive, with the registry gone they have no way to tell how long that will remain true. While it's possible to implement this, it's extra infrastructure that I don't think we need.
To clarify, by "refcounted" I mean the following:
In the expected case, there is one reference to the capsule from the interpreter state dict and one from each registered framework, which means the registry is deallocated after both (the interpreter state dict has been cleared) and (each registered framework has destroyed itself). In the unexpected case, where someone goes snooping in the interpreter state dict and holds their own reference to the capsule, we still expect that reference to be dropped by finalization activities that occur before the interpreter state dict is cleared. Maybe I'm missing something, but I don't see how a cycle can occur here, since the capsule destructor doesn't need to access any Python objects and thus doesn't hold any references to Python objects.
I think this is the same as the refcounted solution except that we maintain the use count privately instead of using the capsule object's refcount as a use count. I'm happy to go this route if there's some situation where it would work better, but as explained above I think the capsule object's refcount works fine for our purposes, and using it is easier than managing our own counter IMO.
I think it would actually be difficult to tell whether the registry was destroyed properly by the time Py_AtExit handlers are called. Globals are not really global across DSO boundaries depending on how the DSO is linked, and C doesn't have an equivalent to C++'s inline variables. I'm inclined to skip the atexit warning since it's difficult; if the pymb registry is being leaked, it's probably because some framework forgot to unregister themselves. |
Agreed, that would be unnecessarily complex.
Perfect, this makes much more sense now. I was missing the
You're right that direct cycles are unlikely. The concern I was raising is more about GC ordering during finalization since we were discussing the ordering of clearing this and other teardown: if the interpreter state dict ends up in cyclic references, these get cleared in a forced GC run that happens during interpreter teardown, and the order of destruction within unreachable cycles is undefined. This could theoretically mean the registry gets destroyed before some other teardown logic that relies on cross-framework interop, but this is probably being overly pedantic. Just highlighting it for completeness more than anything since we were talking about the exact time during finalization and if the destruction happens before or after teardown logic and this case alters the semantics slighly.
What I meant here is that at the I'll update the PR to implement the refcounted solution:
I am missing anything? I will also rebase since there are some conflicts now. |
Thank you! That sounds good to me.
Thanks for digging into that question. I agree that we don't need to worry about the cyclic case as a separate case.
I actually think this is fine. The registry won't be destroyed until all frameworks are destroyed. A framework can't be destroyed until all its bindings are destroyed. A binding can't be destroyed until the Python type object it binds is destroyed. Thus, by the time the registry is destroyed, there are no types left that could be used cross-framework, so losing that capability won't impact functionality.
At Py_AtExit, there is no longer an interpreter state dict, so how do we find the registry in order to find the frameworks that are still active in it? We would need a global cache of the registry pointer, which is difficult with a header-only pure-C library. |
Ah now I know what you meant with the global state across DSO boundaries.. Yeah this makes sense thanks for clarifying!. Btw this makes me wonder... have you considered proposing this for CPython itself (maybe not now but eventually)? The interoperability problem seems fundamental enough that it might benefit from being part of the core runtime. The fact that every major binding framework (pybind11, nanobind, Cython, etc.) essentially has to solve the same problem and would need to include the header (where dealing with potentially different versions of it) suggests this could be a natural fit for standardization at the language level. I am asking because having this as part of CPython could provide some compelling advantages. It would give the interoperability protocol official backing, which might accelerate adoption and frameworks wouldn't need to vendor the header and we can back the library ABI behind CPython's ABI guarantees. More importantly, it could make cross-framework interop a first-class feature of Python rather than an opt-in library concern. The life-cycle would also be easier to manage since now there is a true global: the one in the interpreter and we could be more in control of when things happen. We can simplify the implementation and not dealing with having to use the interpreter as proxy of global state. That said, I realize the header-only approach gives you much more flexibility for iteration and getting all the frameworks on board. You can experiment with the API, gather real-world usage data, and refine the design without being constrained by CPython's release cycle. Once the design proves itself and reaches maturity, it might make for a strong PEP proposal. I also understand if you don't want to deal with the PEP cycle, but I am happy to help if you wish. The process can be time-consuming and politically complex but having a working implementation and backing from some framework maintainers would put this proposal in a very strong position. In any case no pressure! Just want to throw the idea if you find it interesting :) |
|
I absolutely agree with you about the coordination problems involved in doing this outside of CPython. I would love for CPython to eventually subsume the functionality of pymetabind, but I personally don't have the capacity to manage that process in anything like the near future. And it would probably be much more compelling as a PEP if there is a strong history of implementation experience/adoption, which is only available if we start outside of CPython. (I also have some doubt that framework authors would be interested in spending a lot of effort and/or review bandwidth now on something that will only benefit their users in several years.) Unfortunately, if a PEP is in fact later written and accepted, and assuming the PEP process doesn't just sign off on precisely the current ABI (which seems like a bit too much to hope for), the delayed upstreaming means that this interoperability mechanism is likely to see a non-interoperable ABI break just when it's hitting its stride. The pain of that can be reduced somewhat by conditional compilation (in pymetabind.h or in its users) that uses the future built-in mechanism on new Python versions where it exists vs the current implementation on older versions of Python, although that wouldn't help stable-ABI extension modules. It may also be possible to present a current-pymetabind-ABI-compatible view on the native registry using some sort of shim layer that isn't itself built into CPython. After five years, when all supported Pythons have the native registry, frameworks could move from using the shim layer for their stable-ABI builds to using the native feature. It would be great if we could somehow get a preview of the changes to the current ABI that would probably be needed as part of the PEP process, so we can make them before the ABI is really set in stone. But I'm not sure how to do that without doing the PEP now, and I don't have the sense that a PEP before widespread use would be accepted. You're much closer to that process than I am, though; maybe I'm misreading the situation? |
|
I think your concerns about the PEP process are understandable and I don’t want to push for this if you don’t feel comfortable with this but allow me offer my view on the matter . Adoption is actually not a problem because as you said you can offer the library separately for older versions and people can conditionally include it if the version is old enough. Obviously the inner workings will be different so there is some challenging on offering the same API but is still an interesting idea. The key is that the frameworks adopting this will just have to add the fallback for old versions and eventually just remove it. This kind of backports are regularly done and we have many examples that keep updating new upstream features. The The ABI compatibility concerns you mention are exactly why we should do it now, not later. The “non-interoperable ABI break” you’re worried about is more likely if we wait, but if we standardize early, we can design the CPython integration to be ABI-compatible from day one. Waiting until there’s widespread adoption would make that compatibility nightmare much worse. I know the process can be a bit heavy and annoying but honestly, the adoption threshold for acceptance isn’t as high as you might think. In this case what matters more is having the right technical design and buy-in from the key consumers. For this specific problem, those are the major binding framework maintainers (pybind11, nanobind, PyO3) which you need to convince anyway for adoption. The PEP process essentially formalizes the consensus-building you’re already doing. On the cost of the process there isn’t much I can say other that I’m happy to help drive the PEP process if you don’t have the bandwidth. The technical design work you’re doing now would translate directly into the PEP content, and having a working implementation puts us in a much stronger position than most PEPs start with. My math is that this is actually the optimal moment. You have a clear technical solution to a well-understood problem, demonstrated implementation experience, interest from framework maintainers, and you are in the process of outing this out to the community already. Additionally I can ask around other core devs on what they think to test the waters to know if there are any early concerns before doing any investment. We could even start by getting informal feedback from the framework maintainers on whether they’d support a PEP process in parallel with the current library development. If there’s enthusiasm from the framework side, that significantly de-risks the standardization path. In any case I don’t want to throw extra work on you and o know that even answering this PR can be a nontrivial time investment so please feel free to decline. I just wanted to highlight the possibility as there is an interesting opportunity for this and didn’t want to let it pass unnoticed. |
|
Thanks for that perspective on the merits of going for it now. It sounds like the standardization route is probably less painful and more likely to pay off than I was thinking, and I will reevaluate my resistance to it. :-)
I would definitely appreciate that. If there's a Discourse thread or something that I can follow, please let me know; I'm happy to answer questions that come up. Do you have a (very rough I imagine) estimate for how long a PEP process for this shape of feature would probably take? (Like, start-to-finish calendar time, as opposed to actively-spent engineering time.) |
I am glad it helped :)
Next week is the core DEV sprint in Cambridge and a lot of us will be there so is a good opportunity for gathering some early feedback. With your permission, I can either ask some people in the C-API working group or alternatively I can host a team-wide short discussion to check what people thing. Whatever you think is best.
It very much depends on how the discussion goes but given that this will be very specialized (as opposed to new syntax, new library, ... etc) I would say around 1/3 months maybe less if everyone agrees and we just need to deal with small nits. There are some strategies to shorten it out by getting pre-alignment with the framework maintainers to minimize round-tripping. The major risk is that the discussion enters a loop or new people keep appearing but I would say is low risk in this case. In any case this is just my very rough estimate as I have seen basically all the spectrum. I would certainly recommend getting early feedback from maintainers and core devs before throwing the PEP out for sure. |
I'm fine with whatever you think best there! Many thanks. |
oremanj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks pretty good, just a couple potential edge-case issues I noticed.
On C/Clang (not C++) the current `inline` default yields external inline semantics that do not emit a strong definition, which can lead to runtime errors like: dyld: symbol not found: _pymb_get_registry Switch the header-only default to `static inline` for **C** builds while keeping `inline` for **C++**.
pymb_get_registry() creates a heap-allocated registry and stores it in a capsule with no destructor, so the registry itself is leaked at interpreter shutdown. No Python C-API is called from the destructor except PyCapsule_GetPointer, and any error from that is cleared.
In `pymb_add_framework`, fast-path pointer equality before calling strcmp while interning `abi_extra`. This is a tiny win in the common case where multiple frameworks share the already interned pointer.
If a framework allocates binding/framework structs with malloc (not zeroed), the hook may contain garbage. Ensure hook is NULLed in add_framework/add_binding before we touch list pointers to prevent crashes in pymb_list_unlink().
|
Rebased on top of the latest main |
aed44fb to
11c9bff
Compare
Implement reference-counted solution to fix registry memory leak while ensuring proper teardown order. Registry capsule is now ref-counted by frameworks, preventing premature destruction during finalization.
|
@pablogsal What was the outcome of the discussion at the core sprints? |
This PR fixes a bug I noticed while reviewing the code the first time, but never had time to confirm back then: the registry capsule was created without a destructor, so the
pymb_registry allocationleaked at interpreter shutdown. I also added a couple of small defensive tweaks and other improvements I spotted in that first pass (using static inline in C to avoid linkage surprises, zeroing list hooks before first use, and a pointer-equality fast path when interning abi_extra). The only real issue is the leak — please feel free to just take whatever bits you find useful (read the commits for explanations).I confirmed the leak fix with valgrind using a small extension:
Before
After