Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated memory management #543

Open
wants to merge 54 commits into
base: main
Choose a base branch
from

Conversation

samschott
Copy link
Member

@samschott samschott commented Nov 19, 2024

This PR changes the memory management model of Rubicon. Previously, we would release objects on Python __del__ calls only if we created them ourselves and own them from alloc and similar calls.

In this PR, we always ensure that we own objects when we create a Python wrapper, by explicitly calling retain if we did not get them from alloc etc and always calling autorelease when the Python wrapper is garbage collected. This has a few advantages:

  1. Users will no longer need to manually retain objects when receiving them from non-alloc methods and to release them before the Python object goes out of scope to avoid memory leaks.
  2. Unless users manually release an object, Rubicon guarantees that a Python wrapper always points to an existing Objective-C object -- the one that it was created for.

This change should be backward compatible for most users because existing manual retain and release calls don't cause any issues if balanced and would have already caused segfaults if there are more releases than retains.

TODO:

  • Update docs.
  • Test with toga.
  • Figure out a decent transition plan for clients.

PR Checklist:

  • All new features have been tested
  • All new features have been documented
  • I have read the CONTRIBUTING.md file
  • I will abide by the code of conduct

@samschott
Copy link
Member Author

@mhsmith, care to have first look? Please note the TODOs still listed above.

Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turned out to be a lot less invasive than I thought it would be.

I've done a check of the toga-chart issue that triggered this change; and it seems to resolve that problem.

I also did a quick check with Toga's testbed suite on macOS; that code has a bunch of manual retains and autorelease/releases. I thought they should all be balanced though - worst case, objects would be over-retained - so I was a little surprised that it the testbed segfaults almost immediately (and the stack trace doesn't give any obvious pointers what is causing the issue).

If I remove all the retains and releases, the testbed segfaults; but on inspection, some of the uses are for objects that are created in Python, then handed to ObjC to manage (e.g., Toolbar items created here), or the copyWithZone handler here). But I guess those uses of memory handling make sense - and they're a lot closer aligned to the "spirit" of ObjC memory handling, Plus, in at least the ToolbarItem case, it could be avoided by keeping the toolbar instance in the cache of items.

@freakboy3742
Copy link
Member

Related - if we land this, I suspect a version bump to 0.5 might be called for. This is just backwards incompatible enough that I think it's worth flagging the significance of the change.

@samschott
Copy link
Member Author

samschott commented Nov 20, 2024

Thanks for the thorough checks! I've updated the PR description to give a better summary of the change and also discuss why this should be non-breaking for most users.

I'll have a closer look at the segfaults that you encountered, later. They might be caused by the usage of release instead of autorelease in __del__ which does not give Objective-C a chance to take over ownership.

Maybe there are also ways to prevent users from shooting themselves in the foot, e.g., raise an exception on manual release calls if there is only a single reference left.

@mhsmith
Copy link
Member

mhsmith commented Nov 20, 2024

Thanks, this looks great. I'm busy today, but I'll take a look at this as soon as I can.

@samschott samschott marked this pull request as ready for review November 21, 2024 01:12
@samschott
Copy link
Member Author

samschott commented Nov 23, 2024

I've had a closer look now at the segfaults and could identify two cases where they happen:

  1. The object is released by Rubicon but still needed in ObjC, for example because it is being assigned to a property. Replacing release by autorelease in __del__ fixes this by giving ObjC a chance to take over ownership.
  2. Toga has a few "stray" release or autorelease calls sprinkled throughout the codebase to manually clean up memory, for example because of point (1). This relies on Rubicon internally setting _needs_release = False after those calls and disabling its own cleanup logic. This no longer works and results in one too many release calls now.

beeware/toga#2978 contains all the changes that I found to (1) prevent segfaults and (2) remove now unneeded manual memory management.

Copy link
Member

@mhsmith mhsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added #539 and #48 to the Fixes list in the top comment.

docs/how-to/memory-management.rst Show resolved Hide resolved
src/rubicon/objc/api.py Outdated Show resolved Hide resolved
changes/256.bugfix.rst Outdated Show resolved Hide resolved
src/rubicon/objc/api.py Outdated Show resolved Hide resolved
src/rubicon/objc/api.py Outdated Show resolved Hide resolved
tests/test_core.py Outdated Show resolved Hide resolved
changes/256.removal.rst Outdated Show resolved Hide resolved
src/rubicon/objc/api.py Outdated Show resolved Hide resolved
changes/256.bugfix.rst Outdated Show resolved Hide resolved
tests/test_core.py Outdated Show resolved Hide resolved
tests/test_core.py Outdated Show resolved Hide resolved
tests/test_core.py Outdated Show resolved Hide resolved
@samschott
Copy link
Member Author

Something that just occurred to me in reviewing beeware/toga#2978 in the context of this change - does this also resolve the NSImage "init fail" issue?

If we do things right, i.e., the way that ARC would handle it, it would indeed remove the need for special handling on the NSImage "init failed" issue.

but that mirrors the underlying ObjC behavior, so it strikes me as the sort of thing that can be documented.

Agreed, I think we would represent different objects by different Python wrappers instead of swapping out the pointer under the hood.

@samschott
Copy link
Member Author

I've looked through ARC docs and found the relevant places to explain the init behavior: https://clang.llvm.org/docs/AutomaticReferenceCounting.html#semantics-of-method-families.

Basically, all methods in the alloc, copy, mutableCopy, new and init families return a retained object.

init is special because it consumes its self parameter. I.e., its takes ownership of it. This is what allows it to either (1) pass on ownership to the caller of init when returning or (2) release the object and return a different retained object.

Getting those semantics wrong seems to me the root cause of beeware/toga#2978.

@freakboy3742
Copy link
Member

I've looked through ARC docs and found the relevant places to explain the init behavior: https://clang.llvm.org/docs/AutomaticReferenceCounting.html#semantics-of-method-families.

Basically, all methods in the alloc, copy, mutableCopy, new and init families return a retained object.

init is special because it consumes its self parameter. I.e., its takes ownership of it. This is what allows it to either (1) pass on ownership to the caller of init when returning or (2) release the object and return a different retained object.

Nice find - that's a helpful reference. Probably worth adding a link to that in a docstring somewhere as well.

If I'm reading that right (and especially the "Rationale" block"):

  1. We need to strip leading underscores before the check against _OWNERSHIP_METHOD_PREFIXES
  2. There's an edge case where we should be looking for the objc_method_family attribute to determine if a method is in one of these special families
  3. We are safe to assume "starts with init, takes an id, returns an id" is a method that needs to do an extra release

(1) seems like an edge case, but it's also not hard to accomodate. (2) also seems like an edge case, but one that's a little harder to implement, so if that was bumped to a "known limitation" with documentation and open feature ticket, I'd be OK with that.

@freakboy3742
Copy link
Member

  1. Is there a particular reason you suggest patching the old instance, rather than creating a new instance?

Speaking with @mhsmith in person, he pointed out that if we create a new Python object for the init, the alloc Python object will continue to be a valid Python reference, but one that will be explosive if used. That's completely consistent with the ObjC behavior, but would be at surprising behavior to a Python user. Replacing the pointer reference on the existing object avoids this.

This is all true; my question is how much we care, balanced against the complexity and efficiency of the implementation.

Rubicon is a Python wrapper around ObjC; I don't think it's unreasonable to expect our users to need a basic understanding of ObjC, especially if we document these edge cases. "Don't use the alloc ptr once it's been init'd" is one of those edge cases. And the use case for keeping a pointer to the alloc independent of the init is niche at best.

The only thing that gives me pause is the performance aspect. If every .alloc().init() causes 2 Python objects to be created, that is going to have a direct performance overhead. If it will only be 2 Python objects for objects that change pointer, that's a lot less of a concern.

If the fix to rewrite the pointer for those cases is <10 lines of code with no real other risks, then we might as well take the win. But if it's more complex than that, or there's any question over whether there are edge cases in getting the caching right, then the simple solution of "just create 2 objects and document the limitation" is something I can live with.

@samschott
Copy link
Member Author

samschott commented Nov 28, 2024

Speaking with @mhsmith in person, he pointed out that if we create a new Python object for the init, the alloc Python object will continue to be a valid Python reference, but one that will be explosive if used.

That's a good point. Sending messages to an uninitialized object will segfault for most selectors (with a few useful exceptions such as init). But I don't think this pitfall can be avoided completely. If the user chooses to do a standalone alloc() instead of alloc().init(), they can still run use an uninitialized object.

Another option could be explicitly track / flag uninitialized objects.

The only thing that gives me pause is the performance aspect. If every .alloc().init() causes 2 Python objects to be created, that is going to have a direct performance overhead. If it will only be 2 Python objects for objects that change pointer, that's a lot less of a concern.

I am trying out a solution at the moment that would indeed only create 2 Python objects if the pointer changes, keeping with what ObjCInstances are supposed to represent. I am however still running into test failures in Python 3.12 + macOS 15 which are a bit concerning.

If I'm reading that right (and especially the "Rationale" block") [...]

I think you are reading this correctly. Though my first stab is certainly less sophisticated than that.

@mhsmith
Copy link
Member

mhsmith commented Nov 28, 2024

AIUI, your proposal is extra handling in ObjCBountMethod.__call__ that will be invoked if self.method.name.startswith("init")

I'm not familiar enough with Rubicon to suggest the best way to implement this, but yes, this is generally what I had in mind.

this logic will:

  1. check that self.receiver (the object on which the method is invoked) is the same value as the return value of the method; 2. If it isn't the same pointer, __call__ will modify the ObjCInstance instance itself, and the ObjCInstance cache to reflect the new value, releasing the "old" version from the alloc

Nothing should be released, because the old pointer was already "consumed" by the init method.

Are there "init" methods that don't start with init?

It looks like this is answered by the specification Sam found, along the converse question of "are there methods that do start with init but shouldn't have special behavior".

does this also resolve the NSImage "init fail" issue? The return value from init will be different to the alloc'd object... so the alloc'd object should be released; the only catch is that we don't need to create the ObjCInstance because it's a None object.

Again, we shouldn't call release, because init has already done that. But we should set the ObjCInstance's internal pointer to NULL, and make sure we handle that NULL pointer in the future as follows:

  • Attempting to call a method on the object, or use its pointer in any other way, should throw an exception.
  • When the object's __del__ method is called, it should not be autoreleased.

@mhsmith
Copy link
Member

mhsmith commented Nov 28, 2024

  1. We need to strip leading underscores before the check against _OWNERSHIP_METHOD_PREFIXES

It's a little more complicated than that:

A selector is in a certain selector family if, ignoring any leading underscores, the first component of the selector either consists entirely of the name of the method family or it begins with that name followed by a character other than a lowercase letter. For example, _perform:with: and performWith: would fall into the perform family (if we recognized one), but performing:with would not.

Or for a more realistic example, any method beginning with the string initialize is not in the init family.

The spec also gives a few other conditions for deciding whether a method is in a family, but I don't know whether it's possible to check them at runtime, or how likely they are to matter in practice.

@samschott
Copy link
Member Author

samschott commented Nov 28, 2024

The spec also gives a few other conditions for deciding whether a method is in a family, but I don't know whether it's possible to check them at runtime, or how likely they are to matter in practice.

I don't have enough experience with Objective-C to know this either, so I've implemented the name-base checks for now. Name-based rules seem sufficient for selectors families, which is required but not sufficient for the method family.

I've found what I believe is a relatively lightweight solution now that does not change ObjCInstance pointers under the hood and keeps the current semantics of "if we have Python wrapper, we Obj-C object must still exist". Admittedly, this object is much less useful when not yet initialized, but that's a problem that I don't want to solve for in this PR.

Why I like the current current solution:

  1. It upholds clear promises in the API: A ObjCInstance instance always refers to the same wrapped Objective-C object and this object is never deallocated while we still have a Python wrapper. This will hopefully make it easier to reason about future issues.
  2. It treats all methods that return a retained object ({"init", "alloc", "new", "copy", "mutableCopy"}) equally when putting in them cache.
  3. It has special handling for init consuming its self argument.

if self.superclass.methods_ptr is None:
with self.superclass.cache_lock:
self.superclass._load_methods()
# Traverse superclasses and load methods.
Copy link
Member Author

@samschott samschott Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure why a full traversal of superclasses is required now but wasn't previously for tests on Python 3.12 + macOS 15 to pass. But I do believe that a class hierarchy traversal makes sense regardless.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed this is weird; I can only assume we're hitting some weird cache initialisation order thing (e.g., the test was previously initializing the super class before the class that was causing a problem). However, the solution here makes sense.

Copy link
Member Author

@samschott samschott Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running the toga testbed suite reveals a similar issue for TogaSlider.setValue(value, animated=True) where the superclass method cannot be found. This is despite the fix here and could be related to the TogaSlider being a subclass defined in Python.

Copy link
Member Author

@samschott samschott Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the direct method call TogaSlider.setValue_animated_(value, True) succeeds: https://github.com/beeware/toga/actions/runs/12089448592/job/33714837506?pr=2978

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've managed to fix that by allowing direct method lookup if there is no cached partial method, similar to what we already do for the old style syntax.

This is very likely a race condition somewhere, but the code around ObjCClass._load_methods, ObjCClass._cache_method and ObjCPartialMethod quite complex and hard to follow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But all those changes to method lookup are making this PR a bit unwieldy. I've reverted some of the unneeded commits and am happy to split this off into an entirely different PR if that makes it easier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the method lookup isn't strictly related to the memory retention issue, but I'm OK with the level of complexity it adds to this PR in the interest of addressing some issues that we know exist when this PR is used in Toga.

Copy link
Member Author

@samschott samschott Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, something is still very fishy about my solution in this PR. _load_methods should have already been recursive, with the main difference that is was hard-stopping the recursion if one of the superclasses had already called _load_methods. This makes assumptions of the superclass chain not changing during init.

The new solution forces recursion to continue, but in a horribly hacky way. I do still want to find a more elegant solution here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have had another look at all those recursive calls and I think that (1) I do understand better now what is going on and (2) have a more elegant solution than this PR to method loading fixes (https://github.com/samschott/rubicon-objc/tree/method-loading).

But this is indeed better reviewed in isolation, so I am happy to proceed with this PR as is and send out an independent PR for stylistic improvements and hopefully clearer code logic.

Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good to me. A couple of comments inline, mostly about the comments :-)

src/rubicon/objc/api.py Outdated Show resolved Hide resolved
src/rubicon/objc/api.py Show resolved Hide resolved
if self.superclass.methods_ptr is None:
with self.superclass.cache_lock:
self.superclass._load_methods()
# Traverse superclasses and load methods.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed this is weird; I can only assume we're hitting some weird cache initialisation order thing (e.g., the test was previously initializing the super class before the class that was causing a problem). However, the solution here makes sense.

tests/test_core.py Outdated Show resolved Hide resolved
tests/test_core.py Show resolved Hide resolved
changes/256.feature.rst Outdated Show resolved Hide resolved
@freakboy3742
Copy link
Member

I've found what I believe is a relatively lightweight solution now that does not change ObjCInstance pointers under the hood and keeps the current semantics of "if we have Python wrapper, we Obj-C object must still exist". Admittedly, this object is much less useful when not yet initialized, but that's a problem that I don't want to solve for in this PR.

Why I like the current current solution:

Agreed - I like the solution you've presented here. It may still have some interesting behavior if you try to use an alloc'd object... but then, so will ObjC, so it shouldn't be surprising.

Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this PR as it stands now; in addition to solving the originally reported problem (beeware/toga-chart#191), it solves the bigger problem of memory management (#256) problem; it addresses the remaining iOS leakage issues (see beeware/toga#2853); and apparently also fixes a method caching issue that has been unreported to date.

I've left one comment about the argument used when constructing arguments, but that's a minor cleanup suggestion, and one that I'll gladly back down from if you disagrees.

Before we merge, I'd also like @mhsmith's final review in case he can think of any edge cases or other issues I might have missed.

src/rubicon/objc/api.py Outdated Show resolved Hide resolved
if self.superclass.methods_ptr is None:
with self.superclass.cache_lock:
self.superclass._load_methods()
# Traverse superclasses and load methods.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the method lookup isn't strictly related to the memory retention issue, but I'm OK with the level of complexity it adds to this PR in the interest of addressing some issues that we know exist when this PR is used in Toga.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants